Graph machine learning framework for depicting wavefunction on interface

The wavefunction, as the basic hypothesis of quantum mechanics, describes the motion of particles and plays a pivotal role in determining physical properties at the atomic scale. However, its conventional acquisition method, such as density functional theory, requires a considerable amount of calculation, which brings numerous problems to wide application. Here, we propose an algorithmic framework based on graph neural network to machine-learn the wavefunction of electrons. This framework primarily generates atomic features containing information about chemical environment and geometric structure and subsequently constructs a scalable distribution map. For the first time, the visualization of wavefunction of interface is realized by machine learning methods, bypassing complex calculation and obscure comprehension. In this way, we vividly illustrate quantum mechanics, which can inspire theoretical exploration. As an intriguing case to verify the ability of our method, a novel quantum confinement phenomenon on interfaces based on graphene nanoribbon is uncovered. We believe that the versatility of this framework paves the way for swiftly linking quantum physics and atom-level structures.


Introduction
The wavefunction ψ (r) is a momentous but ubiquitous component in quantum physics and helps to bridge physical properties to actual materials in electronics and optoelectronics.However, its mathematical quantity contains lots of information that is difficult to master.The wavefunction of a quantum many-body system can be expanded under a set of orthogonal bases, and the number of orthogonal bases, namely the dimension of Hilbert space, increases exponentially with the size of the system.Expensive calculations bring essential difficulties to the representation and utilization of this large-scale mathematical entity.
In recent years, machine learning (ML) methods have been exploited to solve a variety of classification and regression tasks [1] and extensively used in computational physics and material science [2].Common applications of physics-informed neural networks [3] include prediction of material properties based on large public datasets such as the Open Quantum Materials Database and the Materials Project [4,5], acceleration of traditional molecular dynamics simulation [6,7], reverse design of crystal structures with target properties [8], depiction of energy band structure and charge density and more [9][10][11][12].
In such a vibrant field, researchers also pay attention to the intrinsic probabilistic nature of quantum mechanics generated by the wavefunction.When the neural network bridges structures and physical properties, the wavefunction is hidden as complex-valued weight parameters in the calculation process.Neural network quantum states [13] are therefore presented as the many-body wavefunction ansatz.A set of internal weights specifies the value of wavefunction.Derived weights can be input into the variational Monte Carlo [14] calculations to solve Hamiltonian ground states.A more distinctive development is the deep Boltzmann machine [15], which flexibly represents any state generated by quantum circuits.When stacking computational layers in a deep network, the calculation process is equivalent to polynomials, and it can satisfy the approximation of any smooth functions.Moreover, the design of network architecture should fit in the exchange symmetry of different particles, especially the anti-symmetry of fermions [16].However, an accurate description of wavefunction can only be realized in a simple system, whether it is the transverse-field Ising model, the antiferromagnetic Heisenberg model, the deuteron or the free bosons [13,17,18].As the size of quantum many-body system increases, no approach can be applied to solve such a complex Schrödinger's equation as we know, let alone the wavefunction.
In this paper, we propose a ML framework based on crystal graph to depict wavefunction on atom-level interfaces.We bypass the hardship of obtaining the target functions of hundred atoms accurately and seek to draw them directly.Specifically, physics-informed features are input into elaborately selected message passing paradigms to produce a wavefunction map belonging to each atom, and all atomic maps are accumulated to generate the whole wavefunction distribution.A promising example of graphene nanoribbon (GNR) is chosen for verification since its electronic properties change significantly with its spatial structure [19,20].GNRs with different widths and terminations can be accurately synthesized from molecular precursors [21].Localized and delocalized wavefunction states can be explicitly observed in these bottom-up GNRs [22].

Workflow of the framework
As shown in figure 1, our framework is constructed in four parts.Panels (a), (b), (c), and (d) respectively represent processes of data preprocessing, graph convolution, wavefunction depiction, and loss back propagation.To build the dataset, we use the QuantumATK software, a simulation platform of materials science, to carry out density functional theory (DFT) calculations.First, in figure 1(a), the data preprocessing encompasses data downscaling and dimension arrangement.Data downscaling is based on Born's statistical interpretation of wavefunction P (r) = |ψ (r)| 2 , which reduces wavefunction map data in three dimensions to a two-dimensional (2D) scale.We take the square of modular values of wavefunction and accumulate them in the direction perpendicular to interface to reduce data dimensions.Just as shown in the middle diagram of figure 1(a), the distribution of wavefunction on the low-dimensional interface is obtained.Dimension arrangement is based on the region of interest layer [23,24] to standardize the shape of each 2D wavefunction map.The size of the obtained distribution map varies greatly, while input of neural network is preferably of fixed size.Through the above processes, a novel dataset is established and the training target is the intensity distribution heatmap of wavefunction.
The next part, figure 1(b), shows the graph convolution.It is a novel way to convert crystal into graph [25].Inspired by the crystal graph convolutional neural networks (CGCNN) [4], we use the method of creating the crystal graph according to distances between neighbors.Depending on the chosen cutoff, the connectivity between atoms is determined by the distance.The distance cutoff in this paper is 8.0 angstroms, which matches many literatures [4,6].We use the K-nearest neighbor (KNN) qualification to construct graphs.When K = 12, the 12 nearest neighbor atoms are taken into consideration.Regarding the effect of KNN atoms on the derived graph, we perform some visualization work (supplementary material figure S3).Nodes and edges contained in the graph represent atoms and relationships between atoms, respectively.The initial feature vectors of nodes are constructed from nine physical properties (supplementary material table S1), where discrete quantities are encoded according to the one-hot code for the number of kinds, and continuous quantities are processed similarly after partitioning the interval.The edge features are expanded to a fixed-length tensor by radial basis functions (RBFs) based on the bond length scalar.Eventually, the initialized feature vectors of nodes and edges are processed into tensors of length 92 and 80. Graph neural networks (GNNs) calculate features of nodes, neighbors and edges, which is generally called the message passing paradigm.This paradigm is divided into three steps: in the first step, the message function generates messages by combining the edge features with the features of its two nodes.Second, the aggregation function aggregates the messages received by the nodes.In the third step, the update function updates the features by combining the aggregated messages with the nodes' features.It is worth mentioning that different kinds of message passing methods are compatible in our framework and can be chosen to suit specific applications.Here, four widely used methods, such as the graph convolution network (GCN) [26], the graph Sample and aggregate(GraphSAGE) [27], the CGCNN, and the atomistic line graph neural network (ALIGNN) [28], are employed for demonstrations.The CGCNN and the ALIGNN, which are oriented to crystallography, consider more edge features, while the GCN and the GraphSAGE merely use atomic feature vectors in citation classification tasks.In order to ensure fairness, we make slight modifications and consider four convolution formulas as a whole in table 1.
The third part, figure 1(c), products wavefunction heatmap.Node features are extracted and input to fully connected layers to transform dimensions.Then, weight parameters obtained from the Gaussian RBF are used to calculate the output map.The overall formula is: Table 1.Formulas of passing atomic features from layer l to layer l + 1 in four models.

Model
Convolution Function )) )) where I is the intensity of wavefunction heatmap, i calculates all atoms, Z i is the number of valence electrons for each atom, R i is the position vector of atom i, P i,j and H i,j are the position vector and feature value of pixel j of atom i, σ is the variance of coefficients generated by RBF.As shown in figure 2, we generate the final wavefunction heatmap tensor in three steps.First, in panel (a), the output of network belonging to atom i is extracted with the size (100 × 100).Then, in panel (b), a distance-dependent additive coefficient is generated based on the relationship between the atom's position vector R i and the position vector of pixel j of atom i.This coefficient is normalized by the maximum of the full heatmap and multiplied with H i,j the value of pixel j, and then multiplied by Z i the number of valence electrons, and the result is considered to be the wavefunction of atom i on the overall electronic wavefunction.Finally, in panel (c), the wavefunctions of all atoms are summed to obtain the final wavefunction distribution heatmap.Table 2 lists the variation of data structure during the algorithmic computation process, namely figures 1(b) and (c).First, based on the aforementioned graph generation process, derived graphs are the fundamental data structures applied in the algorithm.The graph generation process in table 2 clarifies this flow.Then, during the dimensional transformation process, the initial node features and edge features on the graph are fed into the network layers in table 2 to obtain features of sizes (N n × 256) and (N e × 256), respectively.Third, the next process is the graph convolution, where the (N n × 256) and (N e × 256) node and edge features are updated by an optional graph convolution method.The updated features contain topological information about the structure and the dimensions remain invariant.Then, there is the postprocessing and wavefunction depiction process corresponding to figure 1(c).We extract the updated node features and use the multilayer perceptron (MLP) layers to transform the size.The number of neurons in the final fully connected layer matches the size of output heat map.Taking the output size (N n × 10 000) in table 2 as an example, the wavefunction map based on this tensor is (100 × 100), which means that each neuron serves as a pixel point of the final heatmap.
The final part, figure 1(d), completes the loss calculation and gradient back propagation.In order to evaluate framework performance scientifically, we introduce the structural similarity (SSIM) [29] index as a loss function.The mathematical formulas are as follows: x, y refer to the pixel values of the two images being compared, respectively, µ x and µ y represent mean values of image x and y respectively, and σ 2 x , σ 2 y and σ xy represent the variances and covariance of two images.The formulas are: (5) Then, the constants C 1 and C 2 are taken to be some very small positive numbers, such as 0.1, to avoid system instability during the division operation.After calculating the loss function gradient, back propagation is performed.All the above calculations are deployed by the PyTorch [30] and the deep graph library [31].

Quantum confined wavefunctions on interface
GNRs have many novel characteristics.According to first-principle calculations, researchers point out that the armchair GNR (AGNR) can be classified into three families because of periodically varying band gaps [19].Some corpora report that quantum well (QW) [32] can be constructed by connecting nanoribbons with different band gaps [33] or embedding boron atom as quantum dot [34].Among these QWs, wavefunctions show various fancy patterns, which are associated with physical properties.On this basis, we construct two kinds of interfaces as our feature dataset.One is an interface connected by wide and narrow segments, the other is porous GNR [35].Both of these materials show extraordinary potential for practical applications.With respect to the first type, negative differential resistance [36] devices can be manufactured.With regard to the latter type, porous graphene materials and their derivatives serve as excellent components in supercapacitors and lithium-ion batteries [37,38].Figure 3 clearly shows emblematical interfaces and wavefunctions.The interface naming manner is in the form of N-A-B-ID-IF, where N denotes the width of two parts of AGNR, and A and B represent the numbers of carbon atoms of two parts in the B direction.ID represents the serial number belonging to this type and IF means the interface.
Our model is trained on the electron wavefunctions obtained from the electron density.For each K-point, there is a Kohn-Sham equation solved for a set of wavefunctions ψ i,k (eigenvectors) and energy levels E i,k (eigenvalues).
ψ i,k is the wavefunction of wave vector k belonging to band i. Wavefunctions belonging to different energy bands present utterly different states.Comparing subgraphs figures 3(a) and (b), or (c) and (d), we find that slight morphological movement causes drastic adjustment of wavefunction distribution.In figures 3(a) and (c), the wavefunction can spread to the whole structure.However, in figures 3(b) and (d), the wavefunction displays a confined state, which means a compact localized state (CLS) [39].Examining the corresponding band structures, the valence band and conduction band of confined states near the Fermi level are flatter than those in unconfined states.A flat band means that the energy of electrons is independent of momentum, and the effective mass of electrons tends to infinity, which corresponds to a localized distribution of electrons in real space.Many exotic phenomena with close internal relations, such as electron kinetic energy quenching, flat band, CLS, and fractional quantum Hall effect, have been regarded as a novel  playground [40][41][42].Rationally speaking, this is a good scene to show the ability and advantages of our framework.
Simulations are carried out with generalized gradient approximation in DFT with Perdew-Burke-Ernzerhof exchange correlation functional.The Monkhorst-Pack k-meshes of the structures are set to 1 × 1 × 32 and the density mesh cutoff is set to 80 Hartree.The vacuum layer of 20 angstroms is added in the aperiodic direction.The structural relaxations of each system are performed in advance and allowed until the absolute value of the force acting on each atom is less than 0.05 eV/angstrom.The self-consistent calculation is limited to the tolerance of 10 −4 eV.

Generality of the framework
The dataset contains a total of 333 sample structures, which are divided into training, validation, and test sets according to the ratio of 8: 1: 1. Due to the lack of expensive computing power, data sets need to be accumulated to solve the bottleneck of data volume.To take full advantage of the dataset, we carried out five-fold cross-validation to test the performance of the framework.A fixed random seed 100 is set to shuffle the data set.The Adam [43] optimizer is employed and the incipient learning rate is 0.001.The decline curve of loss function on the training set is shown in figure 4, and the ALIGNN realizes the fastest decline process.To be fair, all models adopt the same hyperparameters and training epochs.The training loss can reach a very low level which nears zero.Based on its excellent performance, we select the optimal model to finish the subsequent wavefunction heatmap depiction.
To demonstrate our framework's generality and find the best deployment for GNR cases, the graph convolution part of our framework is changed by several alternative message passing methods.Table 3 lists all five-fold SSIM loss results on the test set.We find that the ALIGNN achieves the best performance.The SSIM index is 0.843.Achieving such high verisimilitude between results and targets, this framework owns favorable depicting ability and can facilitate analysis of wavefunction patterns.

Parameter optimization and wavefunction depiction
Here, the hyperparameters of framework with the ALIGNN are optimized to get the best performance.The batch size and the number of convolution layers are considered, and all SSIM loss results are listed in table 4. First of all, with the growth of batch size, accuracy changes inconspicuously.Based on the same number of layers, loss results with batch size of 2, 4, 8 and 16 have little difference.On the other hand, the influence of convolution layers is also probed into.We find that it is the best state for the framework to reach approximately seven layers.The average five-fold results are about 0.157, and framework with 7 or 8 layers can both reach it in table 4. In fact, this is quite different from general networks with about three layers.We consider that the selected 12-nearest neighbors within eight angstroms are insufficient to detect the whole structure because of a large interface.Therefore, more layers can achieve preferable results and possess a broader range of perception.Furthermore, residual connections are utilized to tackle the over-smoothness [44] problem.The number of layers continues to increase and does not trigger obvious performance degradation.
In figure 5, we select eight distinctive structures with emblematical wavefunctions as illustrations.The consistency between ab initio theory and network outputs confirms the superior performance of framework.For panels (a)-(d), the topology theory of GNRs can be introduced to fathom the cause of utterly different wavefunction distribution [45].Rhim and Yang [46] indicates the eigenfunction of a Q × Q Hamiltonian matrix can be written as: where ψ i,k is the wavefunction belonging to band index i and momentum k, R is all the lattice vectors for total N unit cells.v i,k,q is the qth eigenvector of the corresponding matrix and |R, q⟩ represents an electron wavefunction in the qth orbital at R. For a quasi-1D GNR system, the topology of this object can be characterized by the Zak [47] phase.The calculation formula is: where γ i is the Zak phase and d is the size of unit cell.Due to spatial symmetries of GNRs, γ i for band i is quantized at 0 or π , which respectively represent a topological trivial or nontrivial band.For simplicity of calculation, the symmetry-protected topological phases in GNRs can be characterized by a global invariants Z 2 , which is associated with the Zak phase.
Z 2 is also quantized at 0 or 1, which respectively indicate a trivial or nontrivial topological insulator.When two segments with different Z 2 are joined to form an interface, some localized states will emerge at the contact, according to the bulk-boundary correspondence [48].For structures with symmetry, there are formulas as shown in figure 6 to quickly compute topological invariants [47].
The adjacent topological phases change across the GNR interface, which introduce localized states in the band gap [49].Some cases can be well explained by this conclusion.In figures 5(a) and (b), 4AGNR and 7AGNR segments are both topologically nontrivial (Z 2 = 1), according to explicit formulas proposed by [47].On the contrary, their counterpart, the 15AGNR segment, is topologically trivial (Z 2 = 0).Therefore, this coupling mode causes the topological phase, which accords with topologically induced bands.The Bloch wavefunctions of these bands at the gamma point show a CLS in nontrivial structures.Tuned by width, edge and the type of termination, the other coupling mode is shown in figures 5(c) and (d).The same topological type of AGNRs are connected, including the nontrivial type (6AGNR and 17AGNR, Z 2 = 1) and the trivial type (8AGNR and 15AGNR, Z 2 = 0).No localized interface state is expected in the band gap under the circumstances.These wavefunctions extracted from energy bands near the Fermi level can spread to the whole structure.The good consistency with topological theory in these cases verifies our method's validity.In other cases, topological theory can not give conclusions directly since those GNR structures are complex.Perhaps this is the coupling of multiple effects, such as porous GNRs in figures 5(e)-(h).At this time, the value of our method becomes more obvious.It vividly describes the sparks that may stimulate new theoretical exploration.Because of periodic boundary conditions, the left part acts as a potential barrier region, and the right part is regarded as a well region.Presented by panels (e) and (f), two structures are extremely similar but wavefunctions are quite different.As can be seen intuitively, the former wavefunction is localized in the well region but the latter distributes uniformly on the interface.Among panels (f), (g) and (h), wavefunctions propagate unperturbed along the ribbon.With the change of shape, wavefunctions are poles apart.An interesting phenomenon is that diffused wavefunction almost only exists on one side of the hole, which may be related to the width of two parts.The origin leaves room for discussion of new theories.The wavefunction symmetry mechanism [50], as a key factor, will affect the degree of spatial localization.Furthermore, for all samples, the curvature of energy band is highly correlated with the distribution state of wavefunction.We can observe that the stronger the diffusion, the more curved the energy band, and the stronger the localization, the flatter the energy band.
So far, we have successfully depicted multiple intricate patterns of wavefunction, which are caused by topological phases and other interesting physical mechanisms.All research objects are reasonable and momentous.This work paves a practical way for studying the perplexing interface from the perspective of wavefunction.

Conclusion
In summary, based on graph ML, we develop an efficient framework to depict wavefunction.After the training process, the model with significant fidelity explicitly engenders significant statistics about wavefunction, which can demonstrate distinct interface states and associate with physical properties such as the flat band and topological phase.Besides that, our framework is also flexible and compatible with different message passing methods, which brings the merit of being employed in various fields.

Figure 1 .
Figure 1.Schematic illustration of the whole framework.(a) The data preprocessing procedure for exported data from ab initio calculations.The ultimate training target is obtained.(b) The establishment of crystal graphs and implementation of message passing paradigm on them.(c) The execution of depicting wavefunction based on derived features.(d) The process of loss calculation and gradient back propagation.

Figure 2 .
Figure 2. Graphical illustration of equation (1).The formula plots the distribution map of wavefunction based on the extracted updated node features.

Figure 3 .
Figure 3. Four interfaces belonging to the two types mentioned above.Panels (a), (b), (c) and (d) represent 3-5-15-4-IF, 3-5-15-5-IF, 3-6-14-6-IF and 3-6-14-7-IF, respectively.The upper and lower wavefunction diagrams belong to the valence band (VB) and the conduction band (CB) at the gamma point within the Brillouin zone.Next to them are their energy band structures.The value of isosurface is QuantumATK's default 50% of maximum value.The color bar draws the phase of wavefunction.

Figure 4 .
Figure 4. SSIM loss during the training process.After about 300 epochs, the vibration of training loss is alleviated because the learning rate is reduced by a certain step.

Figure 6 .
Figure 6.Topological classes of AGNR with different terminals.Using the Z2 invariant as a characterization, where the floor function [X] is taking the largest possible positive integer smaller than X.

Table 3 .
Five-fold SSIM loss results of different message passing methods on the test set.Optimal performances are marked in bold.

Table 4 .
SSIM loss results of different framework hyperparameters with the ALIGNN.Optimal performances are marked in bold.