Equivariant Neural Networks for Spin Dynamics Simulations of Itinerant Magnets

I present a novel equivariant neural network architecture for the large-scale spin dynamics simulation of the Kondo lattice model. This neural network mainly consists of tensor-product-based convolution layers and ensures two equivariances: translations of the lattice and rotations of the spins. I implement equivariant neural networks for two Kondo lattice models on two-dimensional square and triangular lattices, and perform training and validation. In the equivariant model for the square lattice, the validation error (based on root mean squared error) is reduced to less than one-third compared to a model using invariant descriptors as inputs. Furthermore, I demonstrate the ability to reproduce phase transitions of skyrmion crystals in the triangular lattice, by performing dynamics simulations using the trained model.

Recently, surrogate models to speed up and scale up simulations by replacing the calculation of energy by exact diagonalization (ED) with neural networks (NNs) have appeared [17].This framework is similar to machine learning interatomic potentials, which allow for large-scale molecular dynamics simulations with the accuracy of density-functional theory [18][19][20][21][22].In the previous study [17], they use SO(3)-invariant descriptor (dot products b jk = S r j • S r k and scalar triple products χ jmn = S r j • (S rm × S rn )) as inputs and ensure that the energy of itinerant electrons E el is invariant for rotations of spins.This corresponds to traditional Böhler-Parinello type machine learning interatomic potentials [18], which use invariant descriptors as input.
In recent years, the concept of equivariance has gained attention in deep learning, and its practical applications have advanced in fields such as materials science and computer vision [23][24][25][26][27][28][29].Equivariance is a property of certain functions or models that ensures that their output changes consistently with respect to transformations applied to their input.In other words, when a transformation, such as rotation or translation, is applied to the input, the output also undergoes a similar transformation.This property is particularly useful in various fields, including physics and computer vision, as it allows models to maintain a coherent relationship between input and output data despite changes in the input's structure or orientation.
In this paper, I present an equivariant convolutional neural network (ECNN) architecture as a surrogate model to the calculation of itinerant electrons in the Kondo lattice model.I focus on two operations: translations of the lattice and rotations of the spins.Convolutional neural networks (CNNs) are equivariant with respect to the translation operation [23], and the tensor product expansion using Clebsch-Gordan coefficients is equivariant with respect to the rotation operation [24,28,30].By integrating these architectures, I define a convolutional layer that is equivariant to both of these operations.For the case of the square lattice, this ECNN demonstrates superior predictive performance, with the validation error based on root mean square error approximately 1/3 that of a fully connected neural network using invariant descriptors.Moreover, for the case of the triangular lattice, I conduct dynamics simulations and confirm the ability to reproduce the phase transitions of the skyrmion lattices.
Here, S r are (classical) localized spins with a fixed length |S i | = 1 and S = {S r } (for the lattice size N) is a set of whole localized spins, c † rα (c rα ) are creation (annihilation) operators of an itinerant electron at site r with spin α, s r = (1/2) αβ c † rα σ αβ c rβ are itinerant electron spins, and σ = (σ x , σ y , σ z ) is a vector of Pauli matrices.The first term of eq. ( 2) represents the kinetic energy of itinerant electrons, the second term of eq. ( 2) represents the Hund coupling between itinerant electron spins s i and localized spins S i .H s in eq. ( 3) induludes only classical spin terms.In this paper, I only consider the Zeeman coupling to an external magnetic field H z along the z-direction.However, I note that this term does not affect the input/output relations of the neural networks and can include any localized-spin-only term.

Equivariance
In condensed matter physics, many physical properties are described by tensors.Their tensor properties are dominated by the symmetries of the physical system.Equivariance can more generally represent the tensor properties and the tensor operations (e.g.vector addition, dot products, and cross products) [24,25].A function f : X → Y (for vector spaces X and Y) is equivariant with respect to a group G and group representations D X (g)  In dealing with classical spins, it is necessary to discuss equivariance with respect to SO(3) [24].Spherical harmonics Y l,m ( r) (where l = 0, 1, . . . is the degree, m = −l, −l + 1, . . ., l is the order, and r is a unit vector) play an important role.This is because Y l,m ( r) is equivariant to SO(3).In other words, for any g ∈ SO(3), holds.Here, R are the rotation matrices and D l,mm ′ are the Wigner D-matrices [31].
What is even more important is that the tensor product of irreducible representations of SO(3) can be decomposed into new irreducible representations using Clebsch-Gordan coefficients.Specifically, the tensor product of two irreducible representations u l 1 of degree l 1 and v l 2 of degree l 2 is decomposed into a direct sum of irreducible representations of degrees from l 1 + l 2 to |l 1 − l 2 |.This decomposition is expressed as: Here, C l,m l 1 ,m 1 ,l 2 ,m 2 = l, m|l 1 , m 1 , l 2 , m 2 are the Clebsch-Gordan coefficients.For the sake of simplifying notation, this operation is sometimes written as l 1 ⊗ l 2 → l.The tensor product is also equivariant.Interestingly, the tensor products include fundamental vector operations.
For example, 1 ⊗ 1 → 0 and 1 ⊗ 1 → 1 correspond to the dot and cross products of vectors, respectively [24].This fact suggests that the tensor product is well-suited for describing classical spin models.

Equivariant Neural Networks Potential for Spin Configurations
To simulate localized spin configurations (e.g.Monte Carlo sampling, simulated annealing, and Landau-Lifshitz-Gilbert (LLG) simulation), the energy E(S) or the effective magnetic fields B r (S) = − ∂E ∂Sr are usually needed.For concreteness, I focus on the (stochastic) LLG simulation.In the adiabatic limit, the time evolution of spins at finite temperature is governed by the stochastic LLG equation Here, ζ r is the Gaussian stochastic fields representing thermal fluctuations and α is the Gilbert damping constant.The energy E(S) can be divided into E = E el + E s as same as eq.( 1).The classical spin energy E s = H s can be easily computed, however, the itinerant electron energy E el = H el needs exact diagonalizations of H el , whose computational cost is ordinarily O(N 3 ).Here, • is the expectation value for itinerant electrons.
In the NN model, I assume locality for the total energy of the itinerant electrons, E el (S), and decompose it into local contributions [17] as Here  output, nonlinear layers, and self-interaction layers, finally producing an energy output that is invariant to translation and rotation operations.

equivariant convolution layer
The equivariant convolution layers L are given by Here, V r,c,l,m and V r,c,l,m are feature vectors of inputs/outputs of the layer L at the position r, the channel c, the degree l, and the order m.V = {V r,c,l,m } is a set of feature vectors.The subscriptions such as i, f , and o describe "input", "filter", and "output", respectively.W co,lo r ′ −r,c i ,l i ,l f are neural network parameters of equivariant convolutions, Y l,m are the spherical harmonics, and N (r) is the set of positions around position r.In usual CNNs, the main layers consist of filters (also called kernels), which slide over the image (or the feature map), performing element-wise multiplication and summing the results.This operation guarantees translational equivariance [23].The equivariant convolution in this paper guarantees rotational equivariance for the spins and translational equivariance for the lattice by replacing element-wise multiplication with the tensor product involving the spherical harmonics of the central spin.In this paper, N (r) is defined as the set of positions connected to the center r itself and those connected to r through direct hopping terms, i.e., N (r) = {r ′ |r ′ = r or t rr ′ = 0}.Figure 2 shows LSEs for the case of a square lattice with nearest-neighbor hopping and a triangular lattice with both nearest-neighbor and third-nearest-neighbor hopping.I use e3nn library [33,34] to implement the tensor product, which is based on PyTorch [35].

activation function
In usual neural networks, nonlinear functions called activation functions η(x) are applied to each component of the feature vector x.However, our model requires the following transformation in order to maintain the equivariance: for V r,c,l := m |V r,c,l,m | 2 [24].I use the swish activation function [36,37] for differentiability in this paper. self-interaction which scale the feature vectors elementwise and mix the components of the feature vectors at each point [24,38].To maintain equivariance, the same weights should be used for every m.

Performance in Square Lattice
To  This suggests that the inductive bias of the ECNN model is much stronger than that of the IFNN model.
To investigate the scalability of ECNN model, I examine the computation time when varying the lattice size of the input spin configurations.Both the ECNN predictions and the ED calculations are performed on a single NVIDIA Tesla A100 (40 GB) GPU. Figure S1 shows the average computation time for ECNN predictions and ED calculations over 10 predictions.For small lattice sizes, the ECNN prediction time shows almost no size dependency, while for larger lattice sizes, it generally exhibits linear scaling.Even for a very large lattice size of 576 × 576, energy and effective magnetic field can be calculated in just about one second.When comparing the ECNN prediction time to the ED calculation time, there is approximately a 700-fold speed-up in the case of 128 × 128, which is the limit lattice size that can be calculated using ED.Although lattice sizes larger than 576 × 576 could not be calculated in this study due to GPU memory limitations, it is possible to handle larger sizes using multiple GPUs by dividing the problem based on the assumption of locality.It should be noted that when evaluating the prediction error for all sizes that are computationally feasible with ED using the ECNN model trained on 32 × 32 data, I do not observe a worsening of prediction error due to the change in lattice size.

Dynamics Simulation in Triangle Lattice
To investigate whether more complex magnetic structures can be reproduced, I examine the dynamics on the triangular lattice.In previous work [12][13][14][15], skyrmion crystal  The first reason is that equivariant operations with respect to symmetry transformations tend to preserve information more easily.This point has also been noted in Ref. 23, which first drew attention to equivariant neural networks.On the other hand, when using invariant descriptors, a significant loss of information occurs at the point of transformation to the descriptors.From the point of view of physics, for example, when evaluating the energy of a molecule, it is expressed as a simple mathematical operation of taking the expectation value of the Hamiltonian with its eigen wavefunction.This eigen wavefunction is equivariant with respect to the symmetry operations of the Hamiltonian as a function of coordinates, but it is not invariant.Although Hohenberg-Kohn's series of theories does not forbid deriving energy from the invariant electron density, the fact that the specific form of the functional is not known to this day suggests that it is not a simple functional.From this analogy, using equivariant layers may offer the possibility of achieving physically correct solutions "more simply" than using invariant descriptors.
The second reason is that the convolution operation reflects the graph topology of the hopping in the Kondo lattice model.In the case of a fully connected neural network, the treatment of spin combinations within a given cutoff is equivalent.On the other hand, in convolutional neural networks, the LSE is expanded by stacking kernels that reflect the hopping connections (Fig. 2).The graph topology is reflected, with the spin of the directly connected sites to the center being the most important, and the spin of sites with a greater number of intervening connections being less emphasized.Figure S3 shows the correlation between the spin and effective magnetic field in the skyrmion crystal phase for both ED and ECNN model cases.Indeed, the correlation in the ECNN model reproduces that of the ED well, indicating that the graph topology represented by the ECNN model provides a physically valid description.
In the application of deep learning in materials science, one of challenges is the preparation of a large amount of high-quality data.If the cost of collecting the data needed for training is high, deep learning will not be practical, even if its performance is superior.Similar to other equivariant neural networks [25], the ECNN model can make highly accurate predictions with a small amount of data and small lattice sizes.In tasks that require a broad parameter space exploration, such as creating phase diagrams or optimizing material properties, this efficiency is clearly advantageous.
Furthermore, it can be considered that the intermediate layers of the ECNN model generate feature vectors suitable for representing the quantum states of itinerant electrons, as the tensor products can represent general vector and tensor operations in physics and can accurately evaluate energies.Therefore, by performing transfer learning or fine tuning, it could be possible that the ECNN model is used to predict not only energy and effective magnetic fields but also other quantum properties such as optical properties and transport properties.

CONCLUSION
In this paper, I develop an equivariant convolutional neural network architecture that accelerates spin dynamics in systems where itinerant electrons and localized spins are interacted.The tensor-product-based convolution ensures equivariance with respect to spin rotation and lattice translation.I implement and verify this approach for both square and triangular lattices.For the square lattice, the developed method exhibits higher accuracy than invariant descriptor-based neural networks.Furthermore, it can perform large-scale calculations with 572 × 572 sites in just about 1 second.In the case of the triangular lattice, it is found to have sufficient accuracy for evaluating phase transitions in the skyrmion crystal phases.The model in this study has two hopping terms, the first and third nearest neighbors, resulting in a complex distance dependence.Nevertheless, the ECNN generally reproduces the peak positions.This demonstrates that the convolution in the ECNN can represent the graph structure of the Kondo lattice model.Furthermore, for r 7, there is almost no correlation between the spins and the effective magnetic field, indicating that the assumption of locality is valid.
I consider the Kondo lattice model (double exchange model) consisting of itinerant electrons and localized spins, whose Hamiltonian is given by

I
note that invariance is a specific case of D Y = id Y .In other words, equivariance is an extension of the concept of invariance.

FIG. 2 .
FIG. 2. Local spin environment in an ECNN with L convolution layers.(a) Square lattice with only nearest-neighbor hopping, (b) Triangular lattice with both nearest-neighbor and third-neighbor hopping.
, S r is the set of spins considered as the local environment at position r (call S r a local spin environment (LSE) later), and in the NN model, it is determined by the kernel N (r) and the depth of the layers.The NN model predicts this local energy ǫ(S r ), and this assumption allows the NN model to use different lattice sizes during training and simulation.As for the effective magnetic field, it can be computed efficiently using automatic differentiation.

Figure 1 (
Figure 1(a) illustrates the configuration of the neural network used in this study, which consists of stacked interaction blocks and a final output block.Interaction blocks (Fig. 1(b)), which extract equivariant features are composed of equivariant convolution layers and nonlinear layers.For stability of training processes, ResNet type skip-connections [32] are adopted.The output block (Fig. 1(c)) is made up of a convolution layer with degree l = 0

FIG. 3 .
FIG. 3. Effective magnetic field calculated by ED versus effective magnetic field predicted by (a) equivariant convolutional neural network (ECNN) and (b) the invariant-descriptor-based fully connected neural network (IFNN).The colors represent the data distribution obtained by the kernel density estimation.

Figure 3 (
Figure 3(a) shows components of the effective magnetic field predicted by the ECNN model after training and the exact results obtained through exact diagonalization (ED).Based on the RMSE criterion, the training and validation errors are 0.0466 and 0.0481, respectively, indicating surprisingly high accuracy in the predictions, despite using only 100 data points.For comparison, Fig. 3(b) presents the results of prediction with the same data using the invariant-descriptor-based fully connected neural network (IFNN) in ref. 17.The code and conditions used for training are taken from the Github repository [17].Based on the RMSE criterion, the training and validation errors are 0.124 and 0.148, respectively, with the validation error being more than three times larger for the ECNN.It is worth noting that in the IFNN, the validation error slightly increases after reaching its minimum value

(
SkX) phases have been extensively studied in the system with first-nearest-neighbor hopping t 1 = 1.0, third-nearest-neighbor hopping t 3 = −0.85,Hund coupling J = −1.0,and chemical potential µ = −3.5.Training data is generated for the 36 × 36 site triangular lattice Kondo lattice model with temperature T = 0.005 and Gilbert damping constant α = 1.0 by performing stochastic ED-LLG simulations with three different magnetic fields H z = 0.002, 0.005, 0.008.I sample 100 data points each from non-equilibrium and equilibrium states for each magnetic field and add 300 random spin configuration data points
arXiv:2305.03804v1 [cond-mat.str-el]5 May 2023 Supplemental Materials for: Equivariant Neural Networks for Spin Dynamics Simulations of Itinerant Magnets Yu Miyazaki * Department of Applied Physics, The University of Tokyo, Hongo, Bunkyo, Tokyo 113-8656, Japan (Dated: May 9, 2023) CORRELATION TEST In periodic structures such as skyrmion crystal phases, the distance dependence of the correlation between spins and effective magnetic fields is crucial.Therefore, I investigate the correlation of the effective magnetic field in the triangular lattice model following reference [? ].First, I randomly select a spin S r from the spin configuration S and rotate it randomly.I denote the new spin configuration as S ′ , and calculate the change in the effective magnetic field ∆B = B(S ′ ) − B(S).I then plot the magnitude of the change normalized by the original effective magnetic field |∆B/B| = |B(S ′ ) − B(S)|/|B(S)| as a function of the distance r from the center.

Figure
Figure S3(a) shows the results for this operation using exact diagonalization (ED), and Figure S3(b) shows the results using the equivariant convolutional neural network (ECNN).