Efficient determination of the Hamiltonian and electronic properties using graph neural network with complete local coordinates

Despite the successes of machine learning methods in physical sciences, the prediction of the Hamiltonian, and thus the electronic properties, is still unsatisfactory. Based on graph neural network (NN) architecture, we present an extendable NN model to determine the Hamiltonian from ab initio data, with only local atomic structures as inputs. The rotational equivariance of the Hamiltonian is achieved by our complete local coordinates (LCs). The LC information, encoded using a convolutional NN and designed to preserve Hermitian symmetry, is used to map hopping parameters onto local structures. We demonstrate the performance of our model using graphene and SiGe random alloys as examples. We show that our NN model, although trained using small-size systems, can predict the Hamiltonian, as well as electronic properties such as band structures and densities of states for large-size systems within the ab initio accuracy, justifying its extensibility. In combination with the high efficiency of our model, which takes only seconds to get the Hamiltonian of a 1728-atom system, the present work provides a general framework to predict electronic properties efficiently and accurately, which provides new insights into computational physics and will accelerate the research for large-scale materials.


INTRODUCTION
The Hamiltonian lies at the heart of solid state physics as it determines the electronic properties， the eigenstates of a Hamiltonian can then be used to obtain other physical properties.For periodic systems, the Hamiltonian needs to be solved in reciprocal space, which then gives band structure information.The past half century has witnessed the development of solving Hamiltonian problems based on density functional theory (DFT) 1 , which has achieved great success in understanding electronic properties.
However, with increasing system size, the cost of solving Hamiltonian using DFT increases dramatically and has become one of the greatest challenges in solid state physics and materials science.
Machine learning can provide a possibility to solve such problems.However, despite past decades have witnessed many successes of machine learning methods in predicting physical properties [2][3] and interatomic potentials [4][5][6] , prediction of the Hamiltonian, and thus electronic properties, is still unsatisfactory.Current models generally yield low accuracy and poor extensibility to predict electronic properties such as bandgaps, defects, and optical properties.Mapping the Hamiltonian onto some locality is crucial to design an extendable model [7][8] .Inspired by the tight binding method, the Hamiltonian can be represented by hopping parameters connecting the localized basis functions.Using numerical pseudoatomic orbitals (PAOs) ( )  r as the basis 9 , the hopping parameter between atoms i and j is defined by where  and  denote the orbitals,  is the atomic position and n R is the lattice vector.Note that hopping parameters have locality and can be determined by local structures because hopping can be negligible when atomic distance is larger than a certain cutoff radius 7 .Therefore, the Hamiltonian matrix, which is constructed by hopping parameters, can be determined from local structures, and thus an extendable model for predicting the Hamiltonian is theoretically feasible.In previous studies, Hegde and   Bowen proposed a model for predicting Hamiltonians in a small basis set and studied the s-orbital Cu and sp3-orbital diamond systems 10 .Schütt et al. developed the SchNOrb model for small, individual molecules with data augmentation 11 .Gu et al. introduced a neural network (NN) representation of tightbinding Hamiltonians for one-dimensional carbon chains 12 .Compared with the previous studies, we appropriately treat the equivariance, and thus significantly increase the prediction accuracy.
Because the Hamiltonian matrix is equivariant under rotations of coordinates, prediction of hopping parameters is more complicated than predicting measurable properties such as bandgaps, which are independent of the choice of coordinate system.Although the data augmentation technique can reflect rotations 11 , it is obviously inefficient.To fix the rotation freedoms, Li et al. recently introduced the deepH model 13 , in which the local coordinate (LC) for each atom pair are defined based on the local structures.
The hopping parameters can be trained in the LC and then transformed back to the global coordinate to construct the Hamiltonian matrix.Using the atomic-like basis functions, the rotational transformations can be well described by the Wigner-D matrix.Nevertheless, more efficient methods for predicting the Hamiltonian are still being explored.For example, in the work of Li et al., hermiticity of the Hamiltonian is not properly considered.Moreover, the commonly defined LC is an incomplete descriptor, and this will be clarified in the following.
In this work, based on the graph neural network architecture and LC transformation, we have developed an extendable model, called graph LC-Net, to predict the hopping parameters and hence obtain DFT Hamiltonian.In addition to the rotational covariance, the key advantages of our model over similar works are as follows: the hermiticity of the Hamiltonian is utilized, a complete LC for each atom pair is defined by our new algorithm to avoid the degeneration problem for highly symmetric structures and the convolutional neural network (CNN) is used to handle the LC information.The node and edge features are updated by message passing with an attention mechanism.The output of the model is the triangular portion of the hopping parameter matrix, and then all the hopping parameters are assembled into the Hamiltonian matrix.Using graphene and SiGe random alloy systems as examples, we demonstrate that our graph LC-Net, trained using small-size systems, can predict the Hamiltonian as well as electronic properties such as band structures and densities of states (DOS) for large-size systems within the DFT accuracy, but with very little cost, i.e., it takes only about 10 seconds to get the Hamiltonian of a 1728atom system using only one GPU card.In addition, the linear scaling with the number of atoms of graph LC-Net makes it very promising for studying electronic properties of large-scale systems.

Embedding layer
The atomic structure is represented by a graph, in which the nodes represent atoms, and the edges represent bonds between atoms.As shown in Fig. 1, the graph LC-Net model consists of an embedding layer, an LC layer, several interaction layers, and an output layer.The inputs include the atomic number  is the variance.

Local coordinate (LC) layer
The  13 .However, the order of the neighbors is not considered in the LCMP layer.To preserve the information of neighbor order, we utilize the CNN architecture for natural language processing (text-CNN) [15][16] as shown in figure 1(c CNN framework, each neighbor is treated as a word in our model.Since the neighbors are sorted, we expect that the neural network is able to learn the information of order, which is not considered by the LCMP 13 .Nevertheless, the text-CNN layer could handle the serialized information appropriately.When a different LC is chosen, the input of spherical harmonics will be different, and thus the information of LC is learned.We have used three windows of sizes 2, 3 and 4, each of which has 64 filters.Thus, for each LC, 192 new features are extracted by the convolution operator.The output is a vector of LC N elements representing the LC information from the neighbors of atom i.The neighbors of atoms j are processed similarly.Finally, the outputs of atom i and atom j are concatenated to give LC feature

Interaction layer
After the processing of the embedding layer and the LC layer, information is passed to the multiple-stacked interaction layers.In the interaction layer, the attention mechanism [17][18] and skip connection are used for updating node features.Note that, our model is a universal framework, the interaction layers could be replaced by other advanced graph-based models for potentially better performance.To utilize the LC features, we just concatenate the LC feature ij s and the edge feature ij e , since they are in one- to-one correspondence.As shown in figure 1(d), the node features where k denotes the neighbor of i, l refers to the l-th interaction layer, and  denotes a dense layer.
The attention coefficients , ik where T a is a learnable weight variable, LeakyReLU 19 is an activation function and denotes concatenation.After the node features are updated, the edge features are updated from where h-swish is the hard version of the swish activation function 20 .

Output layer
The output layer consists of two stacked dense layers with h-swish, ( ) where is the upper triangular portion of the hopping matrix of ij.Since the Hamiltonian is Hermitian, the model does not need to output all matrix elements.However, the hopping matrix in general is not Hermitian, that is, , and then the training will not converge.To overcome this problem, the hopping matrices are divided into two groups according to the distances of the neighbors, as described in the method section.The two groups are trained separately.Then we obtain two models for the hopping parameters, called forward model and backward model, respectively.Finally, the Hamiltonian matrix is constructed by all the hopping matrices, as shown in Fig. 3.

Loss function
The graph CL-Net is trained by minimizing the difference between the predicted data ij h and the DFT calculated data ˆij h .We note that the hopping parameters are highly imbalanced, that is, most targets are close to zero while a few are larger than 10 eV.To better handle the imbalanced data, a regression version of the Focal loss [21][22] is used with ( ) where  and  are adjustable hyper-parameters, and y denotes the output hopping parameters.

Model evaluation
With the above architecture for learning the Hamiltonian in hand, we turn to real systems to demonstrate the feasibility of learning electronic properties using the NN methods.The goal is to map the local atomic structures to the hopping parameters

Degeneration problem with local symmetry
The first step to train a Hamiltonian model is to define the LC for each atom pair.In the previous study 13 , the LC is determined by the nearest neighbor 1 n and the second nearest neighbor 2 n .However, this method only works on perturbated structures, that is, the nearest and second-nearest neighbors are unique.In the case of highly symmetric local structures, for instance, defect free crystals, the above method fails to determine a unique LC since the nearest neighbor cannot be determined uniquely by distances, and then results in data conflict.To be specific, for the neighbors of atom r r , the atoms 1 n and 2 n are indistinguishable.This is also known as the degeneration problem 23 .Similar problems can happen for the off-site hopping parameters.As a result, the errors of highly symmetric structures will be significantly larger, even in training set, than those of randomly perturbated structures due to data conflict.We find that this problem can be solved by utilizing the information of the global coordinate to define a complete LC for each edge, and the algorithm is provided in the method section.
To illustrate the feasibility of the new algorithm, we test the performance of different methods on a perfect graphene structure and a zincblende SiGe alloy.We first calculate LC for each edge using the original method of deepH 13 , and then perform calculations with our new algorithm.The learning curves for graphene and SiGe are shown in Fig. 4

Graphene dataset
We create a dataset of graphene to compare the performance of our graph LC-Net with the DeepH model proposed in ref. 13 .We perform molecular dynamics (MD) simulation of a 6×6×1 graphene structure for 5ps to generate dataset, and the computational details are consistent with the case of DeepH.
Then we sample 500 structures as training set and other 500 structures as validation set.We find that, on the validation set, the MAEs of forward model and backward model are 3.5 meV and 1.9 meV, respectively, as shown in Fig. 5  DFT results.

SiGe random alloy dataset
We demonstrate the feasibility of our method with the three-dimensional SiGe random alloy.The SiGe random alloy dataset are generated by randomly occupying the zinc-blende lattice sites with the Si or Ge atoms.The number of possible combinations in a supercell with N sites is given by the combinatorial number ( ) ,2 C N N , which could be incredibly large as the total atom number increases.
Note that, most of the structures are identical under certain symmetry operations.Therefore, we develop a scheme to filter the identical structures to construct a training set of SiGe random alloy.

DISCUSSION
The major advantage of graph CL-Net over DFT calculation is the computational efficiency.The computational cost of graph CL-Net scales linearly with the number of atoms, compared with the cubic scaling of the DFT calculation.For the 6×6×6 supercell with 1728 atoms, graph CL-Net only costs 10.5 seconds to predict the Hamiltonian.For comparison, the computational time for the DFT self-consistent calculation is about 4800 seconds using 96 CPU cores, depending on the iteration steps.In this work, graph CL-Net is trained using one GeForce RTX 3090 card, and the DFT calculations are performed using the Intel Xeon Platinum 9242 CPU.
With DFT accuracy and very high computational efficiency, graph CL-Net can be applied to study electronic properties of very large systems, such as high-entropy alloys 24 or defect structures 25 .Graph CL-Net is also expected to accelerate inverse design of materials with targeted electronic properties [26][27] such as band gaps, where DFT calculations are performed thousands of times for structure relaxation and property evaluation.The structure relaxations can be efficiently performed using the NN potential models 28 , and our graph CL-Net for Hamiltonian can be used to evaluate the electronic properties to avoid the computationally expensive DFT calculations.
In conclusion, based on the graph neural network architecture and using the local atomic structure information, we have developed the graph CL-Net to predict the hopping parameters and hence obtain DFT Hamiltonian.We have designed the complete LC algorithm for each edge to fix the rotational freedom of the Hamiltonian matrix and reserve the hermiticity of Hamiltonian to improve the efficiency.
By employing graphene and SiGe random alloys as examples, we have demonstrated the high accuracy, extendibility, and efficiency of graph CL-Net in predicting Hamiltonian and electronic properties.This work thus opens doors for studying the electronic properties of large-scale systems using machine learning methods.

DFT calculation
Before calculating the hopping parameters, all the structures are relaxed into fixed cells using the Vienna ab initio Simulation Package 29 .The projector augmented wave (PAW) 30 type pseudopotential is used and the plane wave energy cut-off is set to 450 eV.The generalized gradient approximation (GGA) parameterized by Perdew, Berke and Ernzerhof (PBE) 31 and the local spin density approximation (LSDA) of Ceperley-Alder (CA) 32 are used for the exchange-correlation functional for graphene and SiGe alloy, respectively.The relaxation is stopped when the change of the total energy is smaller than 0.001 eV between two ionic steps.
Then we employ the numerical pseudo-atomic orbitals (PAOs) as implemented in OpenMX software 33 to perform DFT calculations for the hopping parameters.The PAO is given by a product of a radial function R and a real spherical harmonic function Y by ( ) ( ) ( ) where R is defined numerically and is finite within a cutoff radius.The Hamiltonian and the overlap matrices are given by ( where i and  denote atom and orbital, respectively,  is the atomic position and n R is the lattice vector.According to the principle of locality, the hopping parameters For graphene, the C6.0-s2p2d1 PAOs are used with cut-off radius of 6.0 Bohr as the basis functions.The Monkhorst-Pack k-meshes of 5 × 5 × 1 are used.For SiGe random alloy, the Si7.0-s2p2d1 and Ge7.0-s2p2d1 PAOs are used as the basis functions and the cut-off radius is set as 7.0 Bohr.For the supercells up to 4×4×4, the k-points are generated using the automatic scheme to ensure the spacing between kpoints are smaller than 0.16 Å -1 .For the 5×5×5 and larger supercells, a single gamma point is used.

Local coordinate (LC)
First consider the case that the nearest neighbor and the second nearest neighbor are different atoms.
We use two noncolinear vectors For onsite hopping ( ij = ), the two vectors are determined by 12   12 , , where 1 n and 2 n denote the nearest neighbor and the second nearest neighbor of atom i, respectively.
For offsite hopping ( ij  ), we first calculate min () di , which is the distance from the nearest neighbor to the target atom i .To utilize the hermiticity, the LCs are defined in two different ways according to the distances.If   Calculate the rotation matrix R to transform v1 to (1, 0, 0)   For v in 2 {List } v do Calculate the vector v in rotated coordinate Calculate the angle  between v and va if a smaller  is found then 2  vv end if end for a The auxiliary vector should be chosen to avoid ambiguities, i.e., there are not two or more vectors in the v2 list share the same  .In this work, va = (3.4, 4.5, 5.6) is used.

Rotation of Hamiltonian
The rotation transformation with real spherical harmonics basis is rather complicated. 34To be convenient, we first convert real spherical harmonics basis into complex spherical harmonics basis, and then the rotation transformation R can be done with the Wigner D-matrix where , ij denote atoms, and , , , ab denote orbitals.Finally, hopping parameters are converted back to real spherical harmonics basis.

Figure 1 .
Figure 1.The architecture of graph LC-Net.(a) Graph LC-Net consists of an embedding layer, an LC layer, five the edge distances   ij r , and the LC information   ij y .In the embedding layer, the initial node features( )

.. 14 ,
a window of h neighbors to produce a feature.Using the convolution operator on the input ij ik y generates a new feature map 1 Then max pooling is performed to extract a scalar from each feature map.These features are then passed to a dense layer to generate output vector LC N i s .The neighbors of atom j are processed similarly, and finally the outputs are concatenated to give the LC feature vector 2We use an example to further illustrate how to utilize the text-CNN to extract LC information.Consider a local structure with respect to atom i in Fig.2(a), the LC is defined on edge ij, and atoms 1 -4 are neighbors of atom i.The input of the LC layer is the spherical harmonics are used.In analogy with the text-

Figure 2 .
Figure 2. Illustration of using CNN to extract LC information.(a) The local structure of atom i.(b) Schematic of

Figure 3 .
Figure 3.The hopping parameters are divided into two groups according to the definition of LC.We train a forward -Net.In this work, s2p2d1PAOs are used as basis functions and the hopping parameters for each atom pair ij are represented by a 13×13 matrix.The mean absolute error (MAE) between DFT calculated and neural network hopping parameters is used to evaluate our method.The electronic properties including band structures and DOSs are then calculated for comparison.
(a) and Fig 4(b), respectively.It is shown that the training loss of the model with original LC is significantly larger than that with our complete LC.

Figure 4 .
Figure 4.The learning curve of predicting the hopping parameters with different methods for LC.(a) The example (a) and (b).Our result is comparable with that of DeepH (0.4 meV ~ 8.5 meV).The element-wise error distributions are shown in Fig.5(c).The band structures and DOS are shown in Fig.5(d).Note that, the authors of DeepH train 169 models for the hopping parameters to reconstruct the Hamiltonian of the graphene dataset, while we train only 2 models to obtain all the hopping parameters, demonstrating the efficiency of our method.

Figure 5 .
Figure 5. Model evaluation with the 6×6×1 graphene MD dataset.(a) The learning curve of forward model.The LCs

Figure 6 .
Figure 6.(a) Illustration of the two local structures of atom 0. The empty and full circles denote elemental type A

Figure 7 .
Figure 7. Model evaluation with the SiGe random alloy validation set.(a) The MAE results of hopping parameters.

R
are determined by the local structures of atoms i and j within a certain cutoff radius.The overlap parameters without DFT calculations.Then the eigenvalues v k as well as eigenstates E k of a system are obtained by solving the generalized eigenvalue problem:

n
denotes the nearest neighbor of atom i.

2 v
b  The function find_v2() calculates the angle between and an auxiliary vector va, chose the one with smaller l is the angular quantum number and m is the magnetic quantum number.The rotation of a complex spherical harmonics is given by LC layer is used to obtain the LC information for each edge ij e which is uniquely determined by the local structures of atoms i and j.Note that, to utilize the hermiticity of the Hamiltonian, the edges ij and ji must be in the same LC.The details of calculating LC are described in the methods section.The LC information can be represented by real spherical harmonics and distances.Here we use the edge ij as an example to illustrate how LC information is obtained.First, the neighbors of atom i, denoted by k, are ij s from the orientational vectors ij ik y and ij jk y , Li et al. introduce the local coordinate message passing (LCMP) layer nearest and second nearest neighbors are unique */ The v1 and v2 are determined by the 1 st and 2 nd neighbors of atom i, respectively else Get candidates for v1: a if size of