Graph representation of multi-dimensional materials

The integration of graph-based representations with machine learning methodologies is transforming the landscape of material discovery, offering a flexible approach for modelling a variety of materials, from molecules and nanomaterials to expansive three-dimensional bulk materials. Nonetheless, the literature often lacks a systematic exploration from the perspective of material dimensionality. While it is important to design representations and algorithms that are universally applicable across species, it is intuitive for material scientists to align the underlying patterns between dimensionality and the characteristics of the employed graph descriptors. In this review, we provide an overview of the graph representations as inputs to machine learning models and navigate the recent applications, spanning the diverse range of material dimensions. This review highlights both persistent gaps and innovative solutions to these challenges, emphasising the pressing need for larger benchmark datasets and leveraging graphical patterns. As graph-based machine learning techniques evolve, they present a promising frontier for accurate, scalable, and interpretable material applications.


Introduction
In recent decades, the application of data science and machine learning (ML) to material science has evolved from a relatively niche field to an integral part of the natural sciences [1].The growing body of research in this domain covers all stages of the materials development cycle including discovery, characterisation, property prediction [2][3][4], screening [5,6], retrosynthesis [7,8], analysis of simulation trajectories [9], and optimising synthesis conditions [10,11].In all cases, appropriately representing the relevant material (whether in terms of structure, chemistry, properties, or functionalities) is a crucial consideration when choosing or designing ML methods to assist materials development.The information gained through theory, experiments, and simulations needs to be translated into appropriate mathematical representations so that it can be accessible to ML algorithms.A more recent addition to the toolbox of ML models for chemistry and material science is graph machine learning (GML) [12][13][14], which is specifically designed to process a particular expressive input format, namely graph-based representations.
GML, especially its deep learning (DL) [15,16] branch graph neural network (GNN), is being increasingly applied in material informatics and computational chemistry [17][18][19].Their architecture allows them to directly work on natural input representations of molecules and materials and perform downstream predictions in an end-to-end manner.Also, it allows a high level of flexibility to incorporate physical laws and any atomic features throughout the learning process and is typically superior or comparable to descriptor-based or conventional ML models in many applications [20][21][22][23][24][25][26].
Despite promising GML developments toward greater material dataset coverage, higher sample efficiency, better model performance, and a greater variety of downstream tasks, there has not been enough work addressing or summarising a vertical comparison between materials of multiple dimensions.Because the definitions of the representations are not rigorously standardised, different representations can be conceived separately for the same system, leading to possible ambiguities and issues with reproducibility, as well as the general soundness of the model.Importantly, the choice of representation can be critical to the transferability and predictive capabilities of a model.The main objective of this review article is to bridge this gap by providing an overview of graph-based representations for different dimensions of materials and briefly their model design, offering evidence-based guidance for material scientists to choose appropriate graph representations based on material dimensions or characteristics.The following sections will cover the background in ML and inputs to ML models, featuring GML algorithms, and existing applications.In addition to providing a snapshot of the current progress of graph-based representations across dimensions, we also discuss some open challenges and promising directions.

Machine learning in material science: principles and descriptors
Machine learning is a computation technique that aims to identify complex patterns within data to make informed decisions or predictions.Its algorithms thrive on large amounts of data, deriving insights by training on examples and experience [15].Within the realm of material science, the applications of ML techniques have been frequently used for structure-property predictions, data-driven material design, and optimisation.
A critical step in applying ML to material science is the transformation of raw, often unstructured material data (such as atomistic images, coordinates, or chemical formulas) into structured, machine-readable formats, termed 'descriptors' .These descriptors essentially serve as the language that ML algorithms understand, bridging the gap between abstract material properties and quantifiable data.
Chemoinformatics descriptors play a critical role in both qualitatively and quantitatively characterising a molecule.Traditionally, they have been derived from data as basic as the Cartesian coordinates of atoms or their elemental types [27].The goal of descriptor design is to encapsulate as much structural information as possible while ensuring a balance between computational complexity and expressiveness.
In the domain of theoretical molecular descriptors, the descriptors can capture varying degrees of structural information about molecules, whose time consumed to generate is proportional to the richness of encoded information [28,29].Descriptors that encode connectivity information include Wiswesser line notation (WLN) [30], International Chemical Identifier (InChl) [31], the simplified molecular input line entry system (SMILES) [32], and BigSMILES for macromolecules [33].For those that encode structural specificity, descriptors such as the Coulomb matrix (CM) [34], smooth overlap of atomic positions (SOAP) [35], and atom-centred symmetry functions (ACSFs) [36] rely heavily on the precise spatial distribution of atoms.

Graphs, graph embedding, and GML
Graphs act as a versatile representation for ML when dealing with non-Euclidean data, such as geometric shapes or sounds.They are also often applied to material problems due to their natural alignment with the structure of the molecule, which is defined by atoms and their interactions [37].Unlike other encodings, such as SMILES, graphs show resilience against perturbations [38] and maintain interpretability even in partial states [39].Inherent to their design, graphs are invariant to permutations and align well with fundamental physics symmetries [12].They can integrate additional physical or chemical knowledge through node or edge attributes, which are often crucial to determining material properties.Moreover, many materials exhibit hierarchical structures and can be represented by graphs at various levels of granularity, from atomic to macroscopic.
With ML, graphs can handle classification or regression at the node, edge, or graph level.They have found applications in areas such as social networks where link prediction can be used for friend recommendation [40,41] or E-commerce systems where node classification is used for product categorisation or user segmentation [42,43].In materials science and chemistry, the primary focus has been on graph-level predictions [18].
A graph is defined by its nodes (vertices) and edges (links), G = {V, E}.Connectivity is typically represented through a binary adjacency matrix A. The (i, j)-th entry of A is denoted by 1 if v i and v j are connected; otherwise 0. Nodes can carry attributes, yielding an attribute matrix X |V|×d , where d is the dimension of the node attributes.Edges can be attributed similarly.
Despite the exact architectures, many GNNs in the domain of chemistry and materials science can be categorised under the umbrella of message-passing networks (MPNN/MPN) [58].As demonstrated in figure 1, MPNNs learn node representations by aggregating neighbour node information and the edge attributes that connect them.After iterative aggregation of the information flow, a complete graph representation is derived by a permutation-invariant pooling/readout operation on all node representations [25].

Graph representations for materials: past and present
Historically, molecular graphs or chemical graphs, first considered as early as 1874 [81], have represented compounds using undirected and labelled graphs to identify and arrange atoms based on their bonding [82].These concepts were later expanded to solid-state materials, considering their three-dimensional (3D) atom arrangements [18].Popular benchmarks for material datasets include the materials project [83] and the open quantum materials database [84].Other datasets have applications that overlap with drug discovery, such as ZINC [85,86] and ChEMBL [87].Organometallic compounds, however, defy this standard representation, given their complex bonding scheme which cannot be explained by the valence bond theory [88].Contemporary approaches often convert string-based molecular representations, such as SMILES, into canonical ball-and-stick chemical graphs using software like RDKit [89].Typical representations might deplete hydrogen atoms [90].Graph representations are diversified with respect to (i) the definition of basic components, specifically nodes and edges; (ii) the contextual information these components encapsulate; and (iii) the dynamics of interactions fostered by these components.Based on the above criteria, we categorise the graph-based descriptors into four primary variations: atom-centred, bond-centred, fragment-based, and higher-order.Figure 2 shows a rough distribution of the categories under each dimension.
• Atom-centred: Depict atoms as nodes and use varying heuristics to define connectivity, such as interatomic distances, categorical chemical bond types (e.g.single, double, triple, or aromatic), proximity-based criteria such as K-nearest neighbours, or using a cutoff radius r. • Bond-centred: Bonds serve as nodes, often embedding 3D geometric information on edge definition.
• Fragment-based: Break down molecules into commonly observed substructures, such as functional groups, rings, residues, monomers, or other chemically significant moieties, employing heterogeneous graphs or hypergraphs for representation.• Higher-order: Encompass multiple scales or visual perspectives, capturing angular data and intricate interactions.
Additionally, we recognise the existence of reciprocal space representations, particularly relevant in crystalline material studies for capturing essential electronic and vibrational properties [91].Phonon graphs, for instance, elucidate phonon dispersion relations [92] and diffraction patterns associated with lattice vectors [93].Although they are not strictly adhering to conventional chemical graph norms, we notice works that perform transformations to reveal patterns in such space [94,95].Furthermore, we explore advanced descriptors that transcend conventional chemical graph norms by expanding angles or distances into mathematical constructs like Gaussian-like or spherical Fourier-Bessel functions to represent complex physical phenomena, especially in the context of wave functions, electron density distributions, and potential energy surfaces.These advanced modelling techniques require considerations of symmetry and invariance principles, bringing us into the broader domain of geometric DL, which we find to be not only relevant but also deemed essential to our discourse.This class of approaches allow for the extension of traditional graph-based models to encompass a wider spectrum of physical phenomena and material characteristics, thereby enriching our analysis and understanding of material science through the lens of machine learning [96].
While integrating ML with material science has led to diverse applications, they have been unsegmented by dimensionality.Material dimensionality inherently affects atom or molecule arrangements, influencing material properties.Recognising the unique features across dimensions helps to tailor representation techniques.This review classifies graph-based representations by dimensionality, highlighting prevalent methodologies and potential future directions.An overview of prominent research in this domain is presented in table 1.It is to be noted that some works utilise multiple categories of representations, and while the table selects influential works discussed in the text, it is not exhaustive.

Representing zero-dimensional (0D) materials
0D materials, commonly known as spherical nanomaterials, are nanoscale entities confined in all three spatial dimensions, with sizes typically no larger than 100 nm.Examples of 0D materials include fullerenes, organic molecules, quantum dots, and smaller nanoparticles [97].Owing to their unique quantum confinement effects, these materials have attracted significant research attention.Given their limited size, these molecular systems often lack higher-order groups or complicated interaction dynamics.The conventional molecular graph has therefore become a dominant representation for these systems [39,[98][99][100][101][102][103].
Molecular generation is a challenging topic for both material and drug discovery, aiming to produce chemically valid structures and at the same time with optimised properties.Methods often exploit chemical graphs with defined bond categories, such as single, double, and triple bonds, to conveniently represent and examine the generated graphs bounded by a set of chemical constraints.For instance, the graph convolutional policy network (GCPN) takes the intermediate graph and the collection of scaffold subgraphs as inputs, and outputs the action, which predicts a new link to be iteratively added using graph convolutional network (GCN).The reinforcement learning (RL) agent will maximise a given property function [102].In another very related work called molecularRNN (MRNN) [39], NodeRNN supported by a valency-based rejection sampling, momentarily decides on the atom type of the following node.EdgeRNN then links the newly generated node to the set of previously generated nodes.In contrast, Kwon et al represented heavy atoms as nodes with vectors indicating atom type, formal charge, and the number of explicit hydrogens as features [104].This work advocates a non-autoregressive approach (graph variational autoencoder (VAE) [105]) out of an efficiency consideration without iterative procedures in the generation.
Atom-centric representations with varying definitions of edge features are more commonly used in numerous tasks.The MPNN [58] framework, for instance, subsumes models that learn a message-passing algorithm and aggregation procedure.The reformulated generic framework exemplifies the prediction of 13 chemical properties of the QM9 dataset [107,108] with three edge representations depending on the variants of the model: discrete bond types, binned bond distances and the concatenation of the Euclidean distance and the type of bond.The graph neural network force field model (GNNFF) [109] extracts features of the local atomic environment that are translationally invariant but rotationally covariant to the coordinates of the atoms to predict the atomic forces.Directed edges are used to represent the influence of an atom on its K-nearest-neighbouring atom.
In addition to atom-centred representations, bond-centred graphs also offer valuable insights, including aspects of molecular geometry such as bond angles [106,110,111].Many important material properties, including electronic properties such as band gaps, are highly sensitive to bonding.The work by Fang et al [106] exemplifies this with a dual-graph model.Their atom-bond graph is paired with a bond-angle graph where bonds act as nodes and angles form edges (figure 3).Bond representations bridge these two graphs, first learning from neighbouring bonds and angles, and then updating atom vectors using neighbouring atoms and bonds.Gasteiger et al introduced a method leveraging empirical bounds to incorporate the conformation ambiguity of molecular configurations and employing a symmetric variant of personalized PageRank [112] for graph-based distance calculation, thereby circumventing dependency on exact atom positions for distance and angle measurements [111].They further compute these distances and angles from synthesized coordinates, transforming the molecular graph into a directed line graph where nodes evolve into directed edges with embedded distances and angles as features, enhancing the graph's descriptive power without the need for precise atomistic details.
Fragment-based representations offer a distilled perspective by bundling several entities into unified interaction sites.This abstraction, although it streamlines the representation for scalability and longer timescales, may omit some details [113].These representations are frequently derived from higher resolution inputs and undergo an iterative process for gradual collapsing components together, or they participate in a higher-order representation that spans multiple levels of granularity which ensures sufficient details are learned.
In the literature, molecular dynamics and property prediction utilise methods that segment molecular graphs into smaller subgraphs [114][115][116].For example, Armitage et al proposed a molecular fragment-based graphical autoencoder to produce efficient fingerprints in the small data regime [38].The candidate molecules are divided into circular groups of a fixed radius centring around a bond or an atom, or non-circular groups that cannot be uniquely identified from any bond/atom within a predefined radius.Another notable technique is the N-gram graph model by Liu et al, which segments molecular graphs into walk sets of length n [117].The embedding of each walk is the element-wise product of the vertex embeddings passed by.Walks of different lengths are then concatenated to get a final graph embedding.This process resembles a simple GNN with no learnable parameters.
Despite these innovative methods, it remains challenging to preserve the chemical context of the fragmented parts.Several solutions, such as the FraGAT model by Zhang et al, focus on obtaining fragments with potential functional groups by breaking every single acyclic bond in a graph of hydrogen-depleted molecules [118].Three branches in the model are used to extract and encode multiscale structural features.The original graph, fragment-pair obtained by bond cut, and two abstract super nodes corresponding to the pair are encoded, respectively, in each channel.Zhang et al further refined this process, fragmenting based on BRICS [119] and additional rules to control the size of motif vocabulary [120].This reduces the number of ring variants and breaks the side chain, emphasising the relevance of substructures in molecule representation.In the motif-layer, motif trees are constructed based on their connections.This work proved that considering information from substructures with functional groups can be more informative.For larger molecules, the approach of Kwon et al involves compressing molecular graphs by substituting recurring substructural patterns between two heavy atoms with new edges [121].The occurrences of these patterns are recorded as additional edge features.This is an improvement made upon their previous work [104], which improves the scalability while maintaining the validity of the diverse molecular graphs generated.
While some higher-order graph representations mentioned above [118,120] encapsulate different molecular scales, they neglect varying interaction patterns among different layers.Notable models, such as the multiplex molecular graph neural network (MXMNet) and PaxNet, focus on dual interaction types [122,123].The local covalent layer primarily captures geometric information including pairwise distance and angles, while the global layer considers pairwise distance only.Information between layers is transferred through a mapping between the same nodes.Zang et al took a step further with a hierarchical pretraining framework [124].The framework first decomposes the input molecular graph into motif nodes, mining implicit semantic information (see figure 4).The training proceeds alongside the normal and motif nodes.A graph-level node is added at the top level to assimilate information from all nodes and redistribute global insights back to normal nodes via motif nodes, fostering interactions between varying orders.Higher-order approaches show that graph representations and GNNs can be used on different scales with a myriad of interaction mechanisms.
While research into ML-ready molecular representations thrives, gaps remain, especially for 0D materials such as fullerenes and nanoparticles.Traditional chemical topology, despite its rich history of evaluating graph-theoretical algorithms for molecular properties or unresolved mathematical problems [125,126], awaits fuller integration into modern ML representations.

Representing one-dimensional (1D) materials
1D materials are defined by their unique geometry: they extend through one dimension and are confined in the other two.Notable examples include carbon nanotubes [127], inorganic nanowires [128], supramolecular assemblies [129], and biopolymer nanocrystals [130].Among these, polymeric systems stand out for their widespread use and diverse properties such as resilience, low density, and elasticity, which then attract a significant amount of attention for ML applications.
Modelling 1D polymers, much like molecules, often utilises atom-based representations derived from SMILES [26,82,131,132].However, these large molecular structures, composed of anywhere from 10 to 10 6 repeating chemical subunits (monomers) [133], could generate a prohibitively large volume of nodes and edges that present computational challenges if represented naively.A prevalent method to address this is decomposing polymers into their monomeric or repeating units.While effective, some models, such as the atom-centred representation by Park et al [26], emphasise learning monomer-level information but have a limitation in describing the morphological characteristics for accurate polymer property prediction.The multitasking architecture POLYMERGNN [132] also faces a similar problem by representing individual polyesters as a combination of monomer units.
A novel bond-centred approach to conquer the scalability issue of Wang et al [134] breaks polymers into highly localised subgraphs that are flexibly extendable to force field computations of large organic molecules.These subgraphs emphasise interactions primarily within two bond radii, allowing energy predictions attributed to specific bonds.
In the area of polymer generation, hypergraph-based methods have emerged as a promising representation.Guo et al devised a learnable graph grammar approach to represent the chain structure, their inherent scale, and structural complexity [135,136].A hyperedge can join all nodes in a ring structure or only two nodes in a polymer chain [135].A bottom-up search builds up production rules from the finest-grained level.The grammar is constructed interactively by sampling a set of hyperedges and collapsing them into individual nodes.In another work of Guo et al [136], a hyperedge is constructed by selecting the subset of nodes according to the monomer types as demonstrated in figure 5, which reduces the representation cost for large polyurethane chains further.
However, these strategies occasionally overlook the intricate linkage architectures between subunits.Mohapatra et al capture complete monomers as nodes in a hierarchical graph, embedding atomic data into the broader macromolecule's structure via stereochemical extended connectivity fingerprints (ECFP) [137,138].Further innovative strategies, such as the introduction of stochastic edges by [139], have also been explored to address the recurrent nature and stoichiometric intricacies of polymers.Each edge is associated with a weight measuring the probability of occurrence in the polymer chain.An edge direction accounts for asymmetry in the neighbours of atoms when two atoms have different frequencies adjacent to each other.
Periodicity in polymers further complicates representations.Monomer graphs might produce nonunique representations by merely translating atoms along the polymer chain.Using naive monomer, dimer, and trimer graphs might lead to inaccuracies due to the incomplete bonding environment at their terminals [140].Solutions like Antoniuk et al's technique, which involves linking atoms at the terminal trimer of a polymer chain based on a standard trimer graph, ensure a uniform bonding environment throughout [140].Another approach proposed by Gurnani et al [141], is a graph augmentation technique.By using transformed repeat units-units that undergo a translational modification and are treated as equivalent-their approach enhances the GNN's capability to achieve an approximate invariance in addition and subtraction post-training.
Drawing from the example of periodic graphene nanoribbons, Wang et al devised a representation wherein atoms within a unit cell stand for nodes.For edge determination, not only do atoms represent themselves, but their periodic mirror images are also taken into account [142].Edges connect atom pairs within a predefined cutoff radius and this results in a graph that can support multiple edges between node pairs.Whether an edge connects nodes that belong to the same cell or adjacent ones can be distinguished by the edge length.
In conclusion, the study of 1D materials, especially polymers, has benefited immensely from data-driven approaches.Challenges such as compactly representing chain structures, periodicity, and stoichiometry are constantly being addressed.However, the lack of standardised benchmark datasets for 1D materials remains a hindrance, prompting many researchers to rely on diverse sources for data collection.This lack of consistency complicates efforts to benchmark and align various methodologies effectively.

Representing two-dimensional (2D) materials
2D materials, recognisable by their plate-like forms such as nanoflakes, membranes, and nanosheets, have revolutionised the material science domain.In particular, graphene, a single layer of carbon atoms that form a 2D honeycomb lattice, is characterised by its superior properties, including extreme mechanical strength, high thermal and electronic conductivity, and unique gas impermeability [143,144].An atom-centric representation proposed by Hu and Parker [145] defines connections using a cut-off distance, which is computed based on the number density of atoms and nth nearest neighbour peaks.Beyond graphene, other 2D materials, such as transition metal carbides (MXene) [146], transition metal chalcogenides [147], and monolayer phosphorene [148], have been applied for various applications, from electronics to catalysts.
One major hurdle in 2D material study is the combinatorial explosion of the defect search space.ML offers great tools for the investigation of how imperfections impact material behaviour.Specifically, Zhang et al [149] adopted regular molecular graphs to categorise 2D blue phosphorene (BP) structures and forecast grain boundary (GB) energies.Their approach integrated the graph isomorphism network (GIN) [150], known for its discriminative power, with a multiobjective genetic algorithm search.In an innovative approach to characterise point defects, Kazeev et al introduced a sparse representation, visualising 2D crystals as point clouds of defects instead of atoms [151].This representation removes unaffected atoms, introduces virtual atoms at vacancy sites, and offers a distinctive perspective, as depicted in figure 6.
Recent works have also supplemented graph representations of structure with density of states information.Sa et al [152] employed a graph-based approach to identify promising candidates in Janus III-VI van der Waals (vdW) heterostructures.They utilised the crystal graph convolutional neural network (CGCNN) [153] for predicting the formation energy, lattice constants, and Perdew-Burke-Ernzerhof (PBE) band gap.Then, to enhance the accuracy of HSE06 band gap predictions, they introduced two transfer learning strategies: first, enriching the graph embedding vector with tensor features derived from the PBE band gap scalar; and second, connecting the HSE06 band gap predictions to PBE band gap predictions using two separate fully connected layers while sharing the weights of the crystal graph embedding vector following the pooling layer in CGCNN.Venturi et al have also harnessed CGCNN for 2D materials such as perovskites and MXenes [154].
Unlike conventional methods, Lu et al [155] introduced the crystal graph multilayer descriptor (CGMD).This technique melds a crystal graph with elemental properties.As detailed in figure 7, they first transform crystal structures into atomic adjacency matrices and then enrich them with seven layers of elemental property.Each layer (named the feature layer) in the descriptor represents one elemental property of materials, such as atomic local environments and unpaired electrons.The coupled multiplayer descriptor is then learned by gradient boosting classification for complicated electronic and magnetic property prediction.
Although 2D materials are pivotal in material applications and they align naturally with graph representations, the advent of GML in this domain is relatively recent.Current models and representations, primarily centred on atoms, often take inspiration from those tailored for 3D materials and adapted for 2D scenarios.

Representing 3D materials
3D materials exhibit unique electrical, thermal, and optical properties, distinguishing them from their 1D or 2D counterparts.A typical example is graphite.This 3D material, composed of multiple stacked layers of graphene, displays properties different from those of isolated graphene layers.The inherent complexity of 3D materials can arise from periodic boundary conditions, perturbations due to multiple types of disorder, or the absence of long-range atomic ordering [18].
For single crystals where the lattice extends without interruption throughout the material, the descriptor for a unit cell is sufficient to account for lattice periodicity and space group symmetries [156].Among the pioneering models in this domain is the crystal graph convolutional neural network (CGCNN) [153].This work takes perovskites as an example, and it is constructed to have strong bonding interactions.Atomic neighbours within a 6 Å radius are considered connected if they share a Voronoi face with the centre atom and have an interatomic distance shorter than the sum of the Cordero covalent bond lengths by a margin of 0.25 Å.Given periodic boundary conditions, atoms can share multiple facets, leading to the formulation of an undirected multigraph, as shown in figure 8(a).The resulting representation, known as the 12-nearest-neighbour graph, connects each atom to precisely twelve of its closest counterparts [153,157,158].Another commonly applied model by Chen et al, the materials graph network (MEGNet), offers a holistic framework that caters to both molecules and crystals.This model introduces global state attributes, such as the temperature, enabling accurate state-dependent property prediction.This attribute plays an important role in the sequential update of bond and atom attributes [156].
Yet, while the 12-nearest-neighbour graph remains a popular representation, it has limitations.For instance, in nonclose-packed crystals, this method could inadvertently include atoms from beyond the immediate vicinity.Addressing this, Park and Wolverton leveraged Voronoi tessellation, linking each node to its Voronoi neighbours for enhanced precision [160].This refinement, coupled with a novel edge update mechanism, enabled better predictions of atomic force magnitudes by capturing three-body interactions.Karamad et al [161] also employed a similar representation and incorporated orbital-orbital interaction as additional atomic features, where the atomic orbital interactions are counted based on the distribution of valence shell electrons.This is obtained by the orbital field matrix (OFM) model.Some special representations have also been embraced to capture the periodicity.For example, directed multigraph representation are used to generate new periodic graphs [162].Edges in these multigraphs are labelled with directed 3D vectors, representing the spatial translation between atomic nodes.Message-passing neural networks grouped with SE(3)-equivariant networks can be trained to move atoms to their equilibrium positions and eventually be used to generate stable structures.Kosmala et al introduce an Ewald message passing block that can be layered on top of any MPNNs.This addition aims to capture the high-frequency, long-range energy contributions resulting from the periodicity of the system.They achieve this by imposing a frequency truncation within the reciprocal space, which is accessed through a Fourier transform [94].
Given that one's understanding of a crystalline material's rigorous classification is deeply anchored in its symmetry properties, Jorgensen et al tried to determine the local symmetry group without the need for detailed atomic positions [159].They first built a quotient graph where the edges are defined based on the Voronoi diagram, as shown in figure 8(b).While small perturbations in the atom positions may introduce small Vornoi facets, they set a cutoff in the solid angle in order to remove the small facets and thus increase stability.
Several studies have also tackled disordered materials, ranging from weak disorders, such as defects and grain boundaries, to strong disorders present in glassy and porous materials.In the case of polycrystalline materials composed of multiple small crystals separated by GBs, fragment-based descriptors have gained traction.These descriptors dissect materials into discrete grains, analysing grain interactions.Examples  include the work by Dai et al, who delved into the magnetostriction of polycrystalline alloys [163].In this model, each grain in the polycrystalline microstructure, symbolised as a node, carries a five-component feature vector detailing its orientation, size, and the number of neighbouring grains (figure 9).Yang and Buehler [164], otherwise, predicted the potential energy distribution in polycrystalline aluminum bulks with aluminum atoms as nodes and the input node features are coordinates at 50 K.The output node labels are the corresponding potential energies at 100 K.This is because the atomic potential energy distribution can reflect not only the positions of dislocations but also possible entanglements and interactions related to them.
For strongly disordered materials, such as nanoporous metal-organic frameworks (MOFs), traditional CGCNN approaches have been adapted [166].Such adaptations categorise atomic neighbours by distance (short and long) and focus on specific atoms for better representation.Only atoms in inorganic poly-nuclear clusters (secondary building units) and their nearest neighbours (SBU-NN) are considered.Bapst et al [167] determine the dynamical transition (relaxation time) of glassy systems solely from the initial static particle positions over a wide range of temperatures, stress, and timescales.Particles at a distance of less than 2 hops are connected and labeled by the directed 3-dimensional relative position.A similar N-hop neighbouring glass graph is also used for inverse design to find glasses with new properties [165] (figure 10).
Other noteworthy graph representations, such as the one showcased by Goodall and Lee [168], provide dense weighted graphs where nodes symbolise elemental compositions, weighted by their fractional abundances.
As evident, graph representations for 3D materials of all kinds have made significant strides in recent years.From the foundational 12-nearest-neighbour to the sophisticated Voronoi tessellation-based representation, researchers now have a wider arsenal of tools.However, there are still challenges to overcome.The transferability of these graphs and models to diverse material types, particularly amorphous systems, needs further exploration.Moreover, a scarcity of comprehensive benchmark datasets for these systems could be central to future breakthroughs.Not many higher-order representations are found either, which can be due to the pure stacking of the repetitive unit cell structures in ordered crystalline, and the difficulty of dividing up layers in disordered crystals.

Summary and future opportunities
Graph-based representations, fused with ML paradigms, have been transforming the realm of material science, offering nuanced insights into atomic or molecular arrangements across dimensions.Despite the advances, the focal attention remains largely on isolated molecules and crystals, suggesting a vast, uncharted territory awaiting exploration.
While graphs adeptly encapsulate atomic or molecular arrangements, inherent in GML models is the difficulty of fully capturing the 3D spatial interactions, especially as we traverse toward geometrically sophisticated systems from a long range.Such limitation invariably necessitates the incorporation of traditional chemical methodologies.This synergy, while bridging gaps, demands domain knowledge, potentially introducing subjective biases and undermining the universality of graph representations.The computational voracity of GNNs increases significantly with the inclusion of auxiliary data, not to mention the complexity of encoding intricate real-world interactions intrinsic to materials.
A pivotal impediment in leveraging GML for material science resides in the limited availability of large, high-fidelity datasets [169].Current repositories predominantly skew towards crystalline structures or those rooted in quantum mechanics, often synthetically generated through computational modalities [18].Researchers often have to exhaust diverse sources for data collection for other material species.High-quality labeled datasets are also hard to find, casting doubt on supervised learning approaches.In particular, many current representations originate from string-based formats without vital stereochemical information.This can potentially compromise the fidelity of predictions tied to the atomic spatial configurations, such as specific electronic properties.
Another gap emerges when examining individual material classes: many sophisticated graphical patterns manifested particularly to certain materials have never been explored.Rather, most work remains relegated to the classical ball-and-stick atom-centred paradigm, focusing on tuning functions in the prediction space.The craft of customised materials-specific representations could not only amplify the efficacy of the model but also reveal novel scientific patterns.There is still a great gap in branching out of modern molecular graph theory and graph representations for ML applications.

Figure
Figure 2.Estimated distribution of representation categories across dimensions, highlighting representative graph types.This analysis is derived from a review of 62 papers.For a comprehensive summary of key articles, particularly those examined in the text, see table 1.

Figure 3 .
Figure 3.The atom-bond graph G views bonds as edges connecting atoms, while the bond-angle graph H views bond angles as edges linking bonds.The double-dash arcs highlight their correspondence.Reproduced from [106].CC BY 4.0.

Figure 4 .
Figure 4. Node-motif and moti-graph edges are added to construct the augmented graph.GNN is then used to learn hierarchical node representations on the augmented graph.Reproduced from [124].CC BY 4.0.

Figure 6 .
Figure 6.A sparse representation with nodes being defects, edges connect other defects within a cutoff radius and dashed lines connect nodes via a periodic boundary condition.Reproduced from [151].CC BY 4.0.

Figure 7 .
Figure 7.The crystal graph multilayer descriptor (CGMD) process.Each crystal structure is mapped to a binary atomic adjacency matrix.Element feature matrices, derived from elemental properties, are then combined with the atomic adjacency matrix.Diagonal elements in the matrix represent atomic properties, while off-diagonal ones signify the ratios of elemental property values between the row and column atoms.[155] John Wiley & Sons.[© 2020 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim].

Figure 8 .
Figure 8. Crystal graph representations addressing periodicity and symmetry.(a) The construction of a crystal undirected multigraph due to the periodicity [153].Reproduced with permission from American Physical Society (2018).(b) Quotient graph of BaSnO3 where the edges are labelled by the point groups of the corresponding facets of the Voronoi diagram [159].Reproduced with permission from American Physical Society (2019).(a) Reprinted figure with permission from [153], Copyright (2023) by the American Physical Society.(b) Reprinted figure with permission from [159], Copyright (2019) by the American Physical Society.
2. Estimated distribution of representation categories across dimensions, highlighting representative graph types.This analysis is derived from a review of 62 papers.For a comprehensive summary of key articles, particularly those examined in the text, see table 1.

Table 1 .
Table of applications discussed sorted by material dimensions (D) and the groups of their representations.For each work, both topology and technique information are mentioned.The topology aspects include the definition of node, the node attributes, the definition of edge, and the edge attributes.The column Techniques only refers to the methodologies directly applied to process graph-based representations.

Table 1 .
aThe broad class of techniques that are directly used to process graph-based representations.For VAEs, the architecture class for the encoder is reported.bForfragment-based methods, this column refers to the definition of fragments.cTheinitial node features, similar to edge features.Only encoded information is reported, while the format can be one-hot encoding, binary, or others.d For fragment-based methods, the connectivity definition between fragments is reported if exists, else the connectivity within fragments.