Interfacial informatics

Using machine learning methods to analyse and predict events occurring at interfaces is more complicated than applying machine learning to participating entities, such as adsorbates and adsorbents separately. Whether combining molecular or materials descriptors, or explicitly defining the topology of the space in between, the choice of features is critical to successfully understanding the potential energy surface that determines the probability of events, or the importance of unique characteristics that can guide decision making. If reliably defined these descriptors can be used in advanced machine learning methods to model dynamics, co-adsorption and interfacial evolution based on complex data; an area traditionally reserved for molecular dynamics or kinetic Monte Carlo. In this perspective, we provide some insights into how interactions and interfaces can be more effectively described and introduce some relevant machine learning methods that go beyond the unsupervised pattern recognition or supervised classification and regression currently preferred by the community.


Introduction
Many important properties of molecules and materials only arise when they interact; some desirable, such as catalysis, and some undesirable, such as corrosion.The interface is where energy is harvested, self-assembly occurs, and materials grow.It is also where catalysts are poisoned and toxic reactive species are expelled.One could argue that more science happens at the interface than at any other location.The increasing availability of data from both computational and experimental sources has led to data-driven science being the current dominant paradigm for developments in materials science [1].However, care must be taken given that none of the well established machine learning (ML) methods were developed with this application in mind [2].
The interface is a very complex place, and it is easy to see why many researchers are turning to ML methods to predict adsorption energies or configurations.To date this has involved using or combining simple and established methods from cheminformatics and materials informatics which draw on different types of descriptors [3][4][5][6][7].These methods can be reliably applied but are ultimately limited in their utility.Describing the system in terms of molecular or materials descriptors requires specificity (which molecule, and what material) and means the results inherently coarse grain over the atomistic degrees of freedom and risks missing details about about the interaction between the two.This also fails to capture transcendent patterns in the data that are not physically intuitive.
An interfacial descriptor that does not require specific chemical or crystallographic information can overcome this restriction, but studying materials with even the most simple one-dimensional surface is more difficult than studying the bulk properties as a degree of periodicity is lost.This effect is of particular relevance to computational modeling given periodic boundary conditions (PBC) can no longer be applied and a choice must be made, either to have a large boundary box (to avoid the interaction of periodic images) or deal with the edge cases which result from any non-periodic boundaries.Either choice will increase complexity and associated computational expense.To create a realistic representation of the interface any irregularity or non-periodicity must be included, and a significantly longer length-scale used than the unit cell length needed to define the bulk.For example, Oda et al [8] systematically studied of symmetric tilt grain boundaries in body-centered-cubic iron required interfaces with around 1000 atoms in a supercell (compared to the usual one atom in a bulk cell).Their study applied ML with grain boundary specific features and virtual screening to develop a predictor of energy and structure from only a small number of training boundaries.However, those descriptors and model are specific to that material and type grain boundary.
In addition to length scales, varied timescales are also a challenge for modeling or measuring interfaces [9], as emergent behaviours (e.g.growth, reconstruction or surface diffusion) are long-term events in comparison to describing electronic structure (e.g.binding or charge transfer) [10,11].Bruntun and Kutz give a perspective on ML approaches linking multiscale dynamics to the macroscale properties of materials [12].They review promising ML and sparse optimisation approaches that avoid the clash of timescales for materials discovery and characterisation of molecules and bulk materials.Interfaces add yet another layer of complexity, since the differences in timescale are in orders of magnitude and depend on the type of interface under consideration.Continuum methods for describing gases or liquids can be effectively used to simplify the description of solid/liquid or solid/gas interfaces [13], but atomistic details are essential for an effective description of solid/solid interfaces and any reaction at the interface [14,15].These sort of considerations impact the choice of method for gathering the data as well as the type of information that can realistically be extracted from it, regardless of feature selection and ML model being developed.These popular methods typically require single adsorption events or pre-determined combinations that are chosen by researchers and subject to numerous biases.There are a number of ML methods that are ideal for studying multi-site, multi-molecule, multi-species and multi-event interactions.These methods are well tested in other domains (such as finance, or sentiment analysis), but have failed to gain significant attention in the physical sciences.
In this perspective, we have focused on informatics pertaining to adsorption events, and contrasting approaches to describing an interface and advocating the use of molecular/materials agnostic feature spaces.We first explore ways that interactions and interfaces are currently being described, introduce new ways that they could be described more generally, and briefly suggest some novel ML approaches that have some advantages for studying interfaces but are not yet being widely used in the domain.

Describing the interaction
The most elementary approach to interfacial informatics is to begin by combining descriptions of the components and the way they interact.
Adsorption is the process of an adsorbate molecule or atom (from liquid or gas) binding to an adsorbent surface (solid or liquid), and there are two types of adsorption: physisorption and chemisorption.For physisorption, the distance of adsorbate to the surface is larger, and the interactions and binding energies are usually weaker, but this does not necessarily simplify the interaction descriptor.The weak interactions typically involve van der Waals forces but the adsorbate can experience significant mobility (more on modeling dynamics later on).It is also possible to adsorb multiple layers via physisorption.
Chemisorption involves the formation of covalent or ionic bonds between the adsorbent and the adsorbate, and is strongly dependent on the nature of the configuration of the surface site.Different adsorption sites have different adsorption energies.For theoretical surface science, the strongest binding energy is indicative of the most likely adsorption site, but chemisorbed species still exhibit mobility (e.g.hydrogen on different metal surfaces) in search of a stable position e.g.kink or defect with a stronger binding energy.This means a very general description of the interaction is needed, to ensure it is flexible enough to represent all possible combinations and configurations.

Real-space descriptors
In computational materials chemistry, the overall system of adsorbate and adsorbent is easiest to describe using a list of Cartesian coordinates of atoms in real-space [16].This can be either in a large system defining all atoms (such as a protein) or within a small box with PBCs (as described above).A box with PBCs is very versatile and can be used to describe the surfaces of metals [17], ceramics [18,19], nanoparticles [20,21], 2D-materials [22] or porous materials [23,24] such as metal-organic frameworks (MOFs) [25].This representation has three dimensional coordinates and the corresponding atom-type, e.g. as the nuclear charge.This is the minimum information necessary for simulations, and for ML should be encoded into a vector of features (a descriptor), which can be further enriched with other physical measurements, topology, electronic or atomic properties, or anything else relevant to the interaction.
For the real-space description, it is important that the molecular representations are invariant to geometric operations on the whole system, and so a graph-based representation is intuitive.The atoms are nodes, referred to as vertexes, while the bonds or molecular interactions are edges between the nodes.For ML both nodes and edges can have associated properties, e.g.nuclear charge on nodes and bond order on edges, Figure 1.Molecular graph example on formic acid visualising the properties just on nodes.For simplicity, the double bond is disregarded.In the dashed ovals graph fragmentation is shown in two-body (blue), three-body (orange) and four-body fragments (green).These can then be encoded into something such as a frequency of occurrence vector.The yellow part of the vector represents that not just one but multiple features per fragment and from the molecular representation are possible.
fulfilling the requirements of both generality and flexibility.An advantage of graph-based representations is that the number of atoms in the system is not relevant for the feature creation [26].Another advantage is that graph-based fragments can be used as proxies for functional groups (see figure 1).However, depending on how many nodes are included for fragments, or how rings and periodic boundaries are treated, and the nature of the data set itself, this can lead to thousands of features.Care should be taken to avoid the 'curse of dimensionality' where the number of features is too high for the number of data instances.
An alternative representation uses topological features such as distances and angles.The distance between two atoms is the point product of their coordinates and can be easily extracted from computational outputs or with some effort from microanalysis.This approach introduces some ambiguity, as deciding which of these distances is a bond requires more specific cut-off distances between every atom-type are required [27].A bond is often defined by the product of the two atomic covalent radii plus or times a certain variance, referred to as tolerance factor [28,29].It is then straightforward to define angles between all excising adjacent bonds.These general structural features have the advantage of being able to account for different representations for molecules and surfaces.
For molecules, it is possible to define a countable list of bonds.Hence, the features can be a sum of all bonds or these can also be sorted into bond types (e.g.C-H, C-C) known as bag of bonds (BoB) [30][31][32].Another possibility is to include all two-body interactions in form of interaction matrices.The size of such a matrix is proportional to the square of the number of atoms in the molecule.An often used example is the Coulomb matrix which only uses the atomic positions⃗ r and their nuclear charge Z [16,33].The diagonal elements are the energies of the individual atoms (0.5Z 2.4  i ), off-diagonal elements are the Coulomb repulsion between two atoms . Further for the adsorption of hydrocarbons, sub-graphs with a frequency of occurrence are another possibility for representations [29].Disadvantages of matrix representations and BoB are the vector feature extension with zeros for smaller molecules with a lower number of atoms (see figure 2) [31,33].Sparse matrices could result in low variance that triggers automatic accidental removal during data cleaning.Descriptors such as atom-centered symmetry function (ACSF) [34], bispectrum [35] and smooth overlap of atomic position (SOAP), can help to avoid some of these issues, and have also been applied to adsorption studies [36][37][38].A wider review on molecular fingerprints is given by Reveil and Clancy [39].
To describe this adsorption site often the number of atoms from the surface coordinated to the adsorbate is important, which is known as a coordination number.On a flat metal surface these are usually one, two, three-fold or four-fold, respectively the top, bridge, tri-fold hollow and squared hollow position.Higher folds are possible but unlikely.On metal clusters, with predicting binding energies of small molecules, it was shown that only the first layer and the direct adsorption site are important [40].Also on metals, a general coordination number, which also regards the number of neighbours to the adsorption site, is more significant [41].In the case of metals another representation is the d-band model [42].Here the filled and unfilled quantum d-states are shown to be in correlation with adsorption activity.Electronic d-band features (e.g.center or kurtosis) and local electronegative are important non-structural descriptors for metal surfaces [40,41,43,44].Other non-structural description for surfaces is the group in the periodic table of the metal, surface energy, and melting temperature [45].For surfaces that are more complex than metals, such as 2D-materials, a simple alternative is to use topological autocorrelation scores (ATS), [46] or radial distribution function (RDF) scores [47].The ATS uses the topological distances in a molecular graph and correlates it with the bond order, while the RDF score uses the distance from a reference point and has several bins with specific energy ranges.Another additional representation can be the partial RDF [48], where the distances to the surrounding atoms are captured either from an atom species or a specific point in the system.Here the number of descriptors can be easily limited by the researcher using threshold criteria, so that the number of features for a diverse range of systems can be constant.Without using bins for specific distances, they are directly comparable.As for molecules, graph representations with labelled adsorption sites can distinguish different surfaces from their bulk structure [49], with the added advantage that different surfaces do not have to be geometrically relaxed and specific adsorption sites defined, making it ideal for experimental characterisation.In the case of porous materials, the BET surface area, mesopore volume, micropore volume, helium void fraction, gravimetric surface area, largest pore diameter, pore limiting diameter, framework density, and the pore size standard deviation are other structural features from experiments instead of simulations [50,51].Here it is remarkable that with only a few features reliable predictions are still possible for quite complex structures.
The main advantage of real-space descriptors is that they are interpretable; [41,45,51] if a feature is found to be important it can be easily be correlated to a specific part of the system or specific property that can potentially be controlled.With this interpretability comes some specificity and a potential loss of generality.The sum of bonds can be easily produced, but the number of features varies with the number of bonds in each system.In larger molecules the order of bonds can be important, but in ML the order of features does not (or should not) matter.This could lead to loss of information when only regarding two and three-body interactions (e.g. bond and angles).BoB has been shown to produce higher accuracy than only using bonds, but there is a trade-off.For small-molecule adsorption, the accuracy improvement is insignificant, but the number of features created by BoB lies in the hundreds instead of using just one feature, such as the largest molecule distance [51].Deciding if increased accuracy justifies the increased complexity should be done on a case-by-case basis.

Describing the interface
Moving from describing an interaction to describing an interface presents a range of challenges, because defining what part of the system is the interface can be non-trivial.As we have seen above, robust methods can separate surface and bulk atoms relative to a nearest neighbour bond lengths or a defined probe length-scale, however, this implies an interface rather than characterising it.Taking this information and simply characterising the topography of the complete interface places equal emphasis on all sites, irrespective of their proximity or relevance to an adsorption event.Features that are vital for strong models of molecules or materials may be redundant or have a low impact in describing the interface between the two.At interfaces molecules can deform the material surface and vice versa, for example during the relaxation and reconstruction of the surface, or the dissociation of a molecule in the adsorption process [52].Even if molecules and materials are perfectly described in isolation we need to ensure some features account for this interfacial feedback.
One approach is taking the raw data, Cartesian coordinates of adsorbate and adsorbent, as features [53].This has several drawbacks.There is no generality, since any rotation or translation completely changes the features generated.Any surface feature, no matter how ubiquitous (e.g. a step edge, or equally spaced atoms), has arbitrarily different feature values if they are separated in real-space.To improve this, better molecule/material agnostic features need to be consistently generated specifically.If we take a ML-driven approach to describe the interface and define a feature space where proximity indicates similar topology, molecular orientation and separation rather than simple proximity in Cartesian space, the feature space is less scientifically intuitive, but more data-driven.

An effective interfacial descriptor
Here we describe a versatile and general interfacial descriptor to meet these challenges.The definitions below assume that we wish to describe the interface of a small molecule at a surface, and generalisations will be discussed afterwards.Firstly, the surface of the adsorbent must be defined.There are many ways to do this, but an efficient approach is to use an alpha shape algorithm which has the advantage of detecting concave surfaces, and is often referred to as a concave hull.This can be generated using the Delaunay triangulation as described by Edelsbrunner and Mücke [54].
• Take the set X of all atoms in the molecule where N is the total number of atoms in that molecule.
• Define M as an integer number of surface atoms used for defining a cutoff • Consider the closest M atoms on the surface to each atom x i in the molecule note that based on this definition a single surface atom may appear in more than one Y i .The set of unique features for an interface will be defined by distances.• The features which define the interface are the union of the following sets of Euclidean distances (a) Unique distances between all atoms in the molecule, The union of all sets of unique distances calculated for all giving NM/2(M − 1) distances that describe the local surface (within cutoff M) for each molecule atom.(c) The union of all sets of unique distances calculated for each atom of the adsorbate and respective local surface atoms for a total N(N − 1)/2 + NM(M − 1)/2 + (MN) features.These features are calculated easily because only the distance matrix between all molecules and surface atoms is required.Figure 3 gives a minimum complexity example of the calculation of the features.
The advantage is that the complete interaction matrix for the whole system is not required, and the number of features does not change during the dynamic process, though they might change when dealing with different concentrations or bigger molecules.This could be handled by the usual feature imputation Smaller M indicates that shorter ranged surface interactions are considered, though they are scaled to the relevant surface, and therefore coupled to the interface rather than a radial cutoff difference.Choices for M could be based on average bond lengths or potential cutoffs to capture more of the surface chemistry, but we would suggest erring towards larger M and using feature selection to optimize the features space following extraction.This can be done with standard data science practices, or with ML methods such as recursive feature elimination or decision trees [55].This not only avoids the drawbacks of the 'curse of dimensionality' , but the features selected provide insight into the scale of interactions most impacting the interface while reducing evaluation bias [2].
Although this approach is adsorbate and adsorbent agnostic, we can encode the species of each atom as needed.Ordering of the feature distances will take care of this without the need for additional features if a consistent molecule or monotonic surface is studied.We assumed a small molecule for defining these features, but larger molecules are possible, if rather cumbersome.In such cases, it is simple to update X to consider only adsorbate atoms close to the adsorbent if we set N the same was M is chosen.
When measuring distances it is natural to think of bonds, but rather than characterising bonds, this approach describes the topography of the surface in relative coordinates; the interface becomes its own unique frame of reference.It is not intuitive to take, for example, ten distances between four points and visualise the shape.Although we tend towards features that echo human interpretation, these features are more effective at describing the interface and will lead to more insightful understanding in ML studies.The defined coordinate system is still interpretable, which makes it more desirable than other transformation methods used for dimension reduction, such as principal components analysis (PCA), where the interpretability of separate features is lost.
When implementing this approach we highlight that care should be taken with normalisation (change the numerical scale, while keeping the relative differences between values) of these topographical features in this interfacial descriptor.While it is appropriate to scale all features individually for optimised ML, information is lost.This is a transform that is often discounted.If features maintain consistent and meaningful units (such as the distance here) more information can be obtained from unsupervised learning techniques.An aim of this Perspective was to share this approach with the community so that the utility can be addressed by a wide audience.The approach is not restricted in terms of chemical accuracy, as it is not molecule and materials specific; the accuracy will depend on the quantity of the feature extraction.

Describing multiplicity and dynamics
Once the interactions or interfaces have been described effectively, subject to the aims of the study and the required interpretability of the features, this yields valuable information about the potential energy surface and provides the basis for interventions to increase the probability of desirable interactions, such as catalytic reactions, or decreasing the probability of undesirable interactions such as corrosion.Being able to identify which interfacial features are correlated to different outcomes points to changes that can be made to the material or chemical environment in the lab.
However, much of the work that has been reported to date omits the fact that events occurring at the interface are rarely singular, do not happen in isolation, and always perturb the system.For example, multiple molecules adsorbing on a surface often compete for the same (low energy) active sites, both with each other, and other molecules that may be present.Co-adsorption of different species is complicated as the potential energy surface for each will differ; sites with a high affinity for one species may have a low affinity for another, or their preferred site may be comparable leading to competition.If one adsorbate has a higher binding energy, this leads to poisoning of the surface if the non-functional species wins.
When all of the most active sites are saturated then the desirability of alternative sites (previously deemed inactive) must be considered, as well as other impacts that saturating the active sites has upon the remainder such as initiating reconstructions.Adsorption on one surface site perturbs the physical structure of the adjacent sites, and this alters the feature space as well.The conditional probability of adsorption could, for example, manifest as a cascade of in-active sites becoming activated, or it could result in steric hindrances and exclusion zones obstructing subsequent adsorption.
Finally, a lot of past work is premised on an understanding of the potential energy surface; knowing the adsorption energies of each species at each site.This casts the problem as a supervised learning task, which is relatively easy.In reality the potential energy surface is often poorly understood, and labels (such as specific energies) may be difficult or expensive to obtain.Labeling requires human interpretation, whether it be through a simulation or an experiment, and is therefore time consuming and vulnerable to researcher bias [2].Interfacial informatics is particularly challenging when the training set is unsupervised, which will likely be the case following any perturbation.Once an event has occurred, the interface (and therefore the feature space) has changed, and everything we think we knew about it (the labels) is lost.

Multi-agent reinforcement learning (MARL)
The co-adsorption of multiple molecules of similar or diverse species that compete for active sites may be modelled with MARL.
Reinforcement learning (RL) is an area of ML that involves sequential decision making under uncertainty [56][57][58].A distinctive difference between RL and the other two ML paradigms, unsupervised or supervised ML, is that the algorithm has information about the environment in which the decision is taking place.This is a very important method in the field of robotics and automation [59], and has been previously used in materials science [60,61].In RL a software agent decides what action to take in the environment to maximize a cumulative reward; it operates on a 'trial and error' procedure (see figure 5(a)).The agent is controlled by a 'policy' , which is a mapping from perceived states of the environment to actions to be taken when in those states.The policy defines the learning agent's behaviour at a given time; it is the agent's strategy.Examples in interfacial informatics might be selecting the location for an adsorption event (in an environment of many different active surface sites), with a reward dependent on lowering the free energy of the system.In RL the algorithm operates with only limited knowledge of the environment and with limited feedback on the quality of the decisions, so the quality of the decision increases as subsequent events take place and the agent learns from its past experience.
For example, the action at an interface could be either to stay or move to a different adsorption site (kink, top, bridge or hollow).The policy gives each of these sites (states) a certain probability, prescribing actions to be taken.After homogeneous initiation adsorbates could randomly move on the surface, and the policies can be updated over time making some adsorption sites more attractive, as the rewards (stronger binding) are higher.This example might manifest as a rapid convergence on the most active sites with the highest binding energies as the agent learns the structure of the surface.Policy strengths could be used to discriminate between chemisorption over physisorption in regards to the available activation energy.This is a very powerful method and when combined with deep learning enables RL to scale to previously intractable decision-making problems, such as settings with high-dimensional states (e.g.molecules with degrees of freedom) and possible reaction spaces where the number of calculations increases rapidly with a number of inputs [62][63][64].Such problems exceeds the computational constraints of conventional computers, but can be partially be overcome by deep RL [65].
MARL expands on this concept and involves many of these artificially intelligent agents sharing the environment and collaborating (or competing) towards a common goal [66,67].Social cooperation and hierarchies are extremely challenging, so MARL draws heavily on game theory, allowing for the reinforcement of individually sub-optimal events that contribute to the collective goal of the entire system [68].Agents in MARL learn not only the environment, but the actions of other agents as collaborators (see figure 5(b)), and are capable of learning complicated dynamics [69].MARL has become an important research method that is widely used to study complex problems such as distributed sensing [70], and energy distribution [71], multi-robot coordination [72] and urban and air traffic control [73].Immediately from these applications we see parallels with physicochemical interfaces, where sensing, energy surfaces and 'air traffic' of adsorbates are critical.In the present context, the multiple agents could be used to model co-adsorption on a surface, where the collaboration could involve the common goal of minimising the free energy of the interface, but it could also be to maximise the entropy or optimise the spatial packing (the rewards).When multiple agents are present (two or more different adsorbates) the behaviour could, for example, be unique for each agent and also give priority to one co-adsorbate over another (the policy).The outcome is a model capable of predicting the interfacial configuration based entirely on the input features at a given point in time.The policies in MARL are generally adaptive [74], which is ideal for cases where preferential interfacial sites have been saturated, or one type of adsorbate has been depleted, and the policy needs to be revised to remain relevant.
Recent developments in MARL include a range of environmental influences that parallel issues at interfaces, since RL still requires description of the feature space.This includes environmental obstacles that agents have to avoid [75,76], ideal for representing dopants or adventurous non-functional molecules that poison the surfaces of catalysts.There are also new advances enabling conditions to be set on the policies that drive collaboration, making collective behaviour more or less of an imperative.This could be used to drive the system more aggressively to the ground state.

Hawkes point process
Conditional processes, where events at one site change the interface and promote or suppress events at adjacent sites, may be modelled as a Hawkes point process.
A point field is defined mathematically in probability theory as a collection of points randomly positioned on an underlying mathematical space.This space can be a Cartesian plane, making it highly relevant to an interface.A point process is a stochastic model of objects or events that can be represented as points on the plane and provides a powerful way of analyzing spatial data [77,78].A point process considers the sequence of times {t 1 , t 2 , ...t n } in the interval [0,T] at which an event is recorded.Point processes are characterised by an intensity function, λ(t, X t ) which predicts the infinitesimal probability P that an event will occur at t given the historical data, X t , such that: This is distinct from a counting process, which models the cumulative count of the number of events up to the current time.Given a distribution function, a particular point process can predict the timing of future events conditional on the pattern of past events.A simple schematic representation of a point process is shown in figure 6(a), where we can see the accumulation of counts, n(t), with respect to time step, t.Point processes have been used in epidemiology [79], computational neuroscience [80], astronomy [81], seismology [82], and materials science [83].
A multivariate Hawkes point process is a special case that has a self-exciting property, a clustering effect, and a long memory; the intensity depends on the history of the process and what is causing it [84,85].It is a non-Markovian extension of the Poisson process, and contains a conditional intensity, λ * (t, X t ).
The conditional intensity is defined with respect to its natural filtration {X t , t ≥ 0}, which is easier to define than a conditional distribution function: where λ > 0 and the response function µ(t) is a positive function that satisfies ´+∞ 0 µ(s)ds < 1.If the conditional intensity function exists it uniquely characterises the infinite-dimensional distributions of the point process, whereas distribution functions may not be unique.If an event causes λ * (t) to increase the process is 'self-exciting' , which in turn causes the clustering of T. If an event causes λ * (t) to decrease the process is called 'self-regulating' , and this causes temporal interludes between clusters.More details and derivations can be obtained from [85,86].
A simple schematic representation of a conditional intensity function in a Hawkes process is shown in figure 6(b), where we can see one self-exciting cluster (as a series of Poisson processes) and a self-regulating interlude (with a background intensity of λ).A Hawkes point process can take many functional forms, depending on the system under consideration, with coefficients obtained by fitting to a data set (X) using either supervised or reinforcement ML.
Hawkes processes have proven particularly useful in financial markets [87], but all of these properties mentioned above are relevant to interfaces.The self-exciting property means that each event increases the rate of future events for some period of time; a cascade effect of the type observed when adsorbates initiate perturbations that activate more surface sites.The self-regulating property can be used to describe an equilibration.The clustering effect captures the temporal heterogeneity observed in these events, akin to an earthquake being 'clustered' with a series of aftershocks; a property that is useful in situations where certain classes of surface active sites (such as atoms on kinks or steps) become saturated before secondary sites (such as terraces) become attractive.The long memory enables the modeling of interfacial evolution, as opposed to the state of the interface at a single point in time.This approach can handle the complexity of nanomaterial and molecular systems where the number density is very high, as it can be extended to infinite components [88].
In the case of interfaces, events could be adsorption, desorption, movement on the surface, or reactions.Some examples of these processes include multi-component reactions, where one or more molecules must be adsorbed before the reaction occurs, or multi-step reactions with intermediate species or the dynamic modeling of material growth on an interface.A relevant example of a multi-step reaction on a catalyst is the binding of an adsorbate to the surface (intermediate) with a second adsorbate, or a molecule from the gas or liquid phase, reacting with it to form a product (second intermediate).This process can repeat until the last step of desorption of the final product, but each step will be affected by its predecessor.
To illustrate this we can consider a three step catalytic reaction where a species A adsorbs on the surface (SF) before it reacts with species B to produce C: A + SF → A ad , A ad + B → C ad , C ad → C, here ad is the adsorbed state on the surface.The filtration models the information available at different points in time during a random process; here the three-step catalytic surface reaction.If only the adsorbate A is adsorbed to the surface (A ad ) there is a conditional intensity for the reaction to the product C to happen afterwards.If in the last step the desorption of the product C has a very small activation barrier, conclusively the product desorbs fast and freely.This would mean that the second reaction to build product C is the rate-determining step and the intermediate A ad is accumulating.In this catalytic example some reaction steps could have different kinetics, some intermediates could become trapped in a local minimum or and unfavourable side products could occur.In a simulation this can be dealt with by running the calculation long enough to sample over all possible states or driving the simulation to specific configurations via acceleration approaches.However, with a Hawkes process, each reaction step (event) has a conditional intensity and different probabilities can be used to model the evolution of the system.The need for extremely long simulations or artificially driving the system toward to preferred state is alleviated.

Self-supervised learning
Interfaces where the properties are unknown, or evolve over time, may be modelled with self-supervised learning.For example, properties could include wettability, pH of the surface, strain resistance, or a chemical affinity for sensing or catalysis.
The vast majority of ML underway in the world is unsupervised.In molecular or materials informatics this includes systems where the structures are known, or can be easily characterised, but the properties are difficult to characterise or remain elusive.This can be understood when we think about the complexity of an interface and how challenging it is to measure one particular molecular affinity in a sea of adsorbates or disordered lattices.This claim that most data is unsupervised may seem counterintuitive when we consider that the majority of published literature using ML in materials or chemical science uses supervised learning techniques.Supervised learning is a powerful way of inferring insightful structure/property relationships and the ability to rapidly predict something like a binding energy from prior knowledge of an adsorbate and an adsorbent is very compelling.Self-supervised learning (SSL) could be considered as a link between the two, and effectively allows researchers to use powerful supervised methods on unsupervised data and generate the labels (such as physical and chemical properties) automatically [89,90].
Developed with the intention of reducing the dependence of deep learning on massively large (and expensive) data sets [91], SSL (also known as pretext learning) is a type of representational learning that refers to techniques that do not use human-annotated data sets to learn [92].SSL first learns the features (the pretext task), and then learns the modeling task of interest, and is largely responsible for the success of natural language processing (NLP) [93].The concept is simple and involves withholding part of the data and training the SSL model to 'fill in the blanks' by creating surrogate labels from the unlabeled data set; effectively learning the new state or true state from past or incomplete information [94].The task defines a proxy loss and the model learns the semantic representation [95].The labels can be part of the 'missing information' and can be generated along with the true unsupervised patterns in the data.SSL is particularly useful for feature extraction, feature imputation, anomaly detection and exploratory analysis such as category embeddings, and potentially the repair of corrupted data sets [96].
This technique is used in a variety of research contexts [97], but may be less well known in areas that use a lot of tabular data (such as the physical sciences) because each column represents its own unique distribution.In the case of tabular data, as opposed to text or image data (which are the usual candidates for SSL) a practical example is a next-step prediction [98].Given a sequence, such as states in a molecular dynamics or kinetic Monte Carlo simulation, or a time-sequence of measurements as an interface equilibrates or degrades, an SSL model can predict the next state.This can be achieved by truncating the sequence at step t − 1 and using step t as a target label to optimize a given supervised learning algorithm.The utility of this approach in predicting the evolution of an interface, which is equivalent to sampling unknown points on the potential energy surface, is readily apparent.
There are still challenges to be overcome in SSL, which is an active area of research in the machine learning community, such as how to make reasoning compatible with basic gradient-based learning, or how to learn to plan complex event sequences (decomposing a complex task into sub-tasks) [91].Broadly speaking, all the generative models can be considered as self-supervised, but SSL focusses on producing good quality features which are a necessity in the data-driven design of physical systems.In the development of SSL for tabular data, the chemical sciences community has a lot to contribute and provides an ideal test-bed for the development of this method in the field of computer science.Opportunities for collaboration in this space are plentiful.

Outlook
Interfaces may be the biggest challenges for data-driven discovery and design, and the use of statistical and ML in chemistry and materials science.Currently interfacial data sets, and methods developed explicitly for studying them, are underrepresented in repositories and materials informatics platforms [99,100], and this represents a big opportunity.
Although we (collectively) have gathered a lot of information about interfaces over decades of research, casting that knowledge into a form suitable for ML is not straightforward.We need to think differently about how we define interfaces, how we describe them, and how we interpret new types of (data-driven) insights.There are a variety of methods available to this domain that go beyond supervised regression matching energies with configurations, many of which are short-lived or difficult to prepare in the lab.We must remember, in reality the lowest energy ground state configuration is an outlier, no matter how desirable it may be.
While not discussed explicitly, the methods described here are equally applicable to desorption; the separation of adsorbate from the surface.All adsorption is in equilibrium with desorption, and the specific surface area that participates in interfacial actively is mostly determined by the Brunauer, Emmett, and Teller (BET) method, which uses the physisorption and desorption measurements with different pressures to obtain adsorption-desorption-isotherms.
In materials chemistry, the number of atoms in molecules or systems can vary enormously and can change over time, particularly in reactions.In most commonly used ML methods the model is trained with a fixed number of features, which means the maximum number of atoms must be in the training set.Recent work in the field of ML has explored situations where features are added or change interpretation over time.Dhurandhar et al propose a high-performing approach to provably determine the point at which the new/changed features become relevant with respect to a target label in an agnostic (supervised) learning setting [101].Another piece of closely related work is that of change point detection which has been heavily studied in statistics [102].
Although not discussed at all, the methods included herein are also applicable to other types of interfaces, such as grain boundaries and membranes.A grain-boundary is a challenging solid/solid interfacial system, and the interfacial energy is strongly related to the tilt angle, twist angle, lattice mismatch, density of defects, impurities, dopants and more.Defects at grain boundaries can be highly mobile, changing their location, configuration and number.
Interfacial informatics is a unique combination of cheminformatics or materials informatics, and has the potential to emerge as an exciting field of research in its own right.

Figure 2 .
Figure 2. Interaction matrices of formic acid (a) and formaldehyde (b), (c) are encoded into a feature vector with a length of 15 features.Atom interactions are in light blue.The diagonal elements in blue are not interactions, but can be either dropped or replaced by another property of that atom.In dark blue, the matrix extensions with zeros are shown to fit the vector size of (a).Depending on the ordering of the interactions, different features can be set to zero as shown in (b) and (c).

Figure 4 .
Figure 4. Simple feature variation for single atom (a) horizontal translation (b) vertical translation, (c) rotation around a point.The blue arrow showing the movement of the atom in the left-hand insets becomes the x-axis in the right-hand plots.The six features plotted are: S1 = {} for single atom, S2 top row, S3 bottom row.

Figure 5 .
Figure 5. (a) Simplified schematic for reinforcement learning, and (b) a simplified schematic for MARL.

Figure 6 .
Figure 6.(a) Simplified schematic for a point process for counting events, N, and (b) a simplified schematic for the conditional intensity function in a Hawkes point process.