Machine learning inspired models for Hall effects in non-collinear magnets

The anomalous Hall effect has been front and center in solid state research and material science for over a century now, and the complex transport phenomena in nontrivial magnetic textures have gained an increasing amount of attention, both in theoretical and experimental studies. However, a clear path forward to capturing the influence of magnetization dynamics on anomalous Hall effect even in smallest frustrated magnets or spatially extended magnetic textures is still intensively sought after. In this work, we present an expansion of the anomalous Hall tensor into symmetrically invariant objects, encoding the magnetic configuration up to arbitrary power of spin. We show that these symmetric invariants can be utilized in conjunction with advanced regularization techniques in order to build models for the electric transport in magnetic textures which are, on one hand, complete with respect to the point group symmetry of the underlying lattice, and on the other hand, depend on a minimal number of order parameters only. Here, using a four-band tight-binding model on a honeycomb lattice, we demonstrate that the developed method can be used to address the importance and properties of higher-order contributions to transverse transport. The efficiency and breadth enabled by this method provides an ideal systematic approach to tackle the inherent complexity of response properties of noncollinear magnets, paving the way to the exploration of electric transport in intrinsically frustrated magnets as well as large-scale magnetic textures.


I. INTRODUCTION
The anomalous Hall effect (AHE) has been an essential measurement tool to study magnetic matter since its discovery over 100 years ago, exhibiting a plethora of mechanisms responsible for transverse electric transport [1].The theoretical description of this ubiquitous phenomenon has an equally long history of new revelations, offering an ever modernizing view of electric transport in magnetic materials, with deeper insight into material characteristics gradually accumulating.
While extrinsic, scattering driven contributions to transverse resistivity can dominate in certain regimes, it is the contribution rooted in the intrinsic electronic structure that provides a large part of the signal in many cases, especially in strongly spin-orbit coupled materials [2,3].Investigations of intrinsic AHE are inseparably linked to the study of the geometry and topology of electronic states, most prominently condensing in Berry-phase effects emerging as a result of various flavors of emergent fields.Although AHE is traditionally associated with Berry phase in reciprocal space, the celebrated topological Hall effect (THE) is related to the Berry phase electrons acquire in magnetic textures with nonzero scalar spin chirality ξ = s A • (s B × s C ) in real space [4].While the emergent field picture has had great success particularly due to its conceptual elegance, experimental evidence suggesting the necessity to expand this approach, especially in materials with strong spin-orbit interaction, is accumulating rapidly [5][6][7][8][9].
In the past years, the field of AHE in complex magnetic materials has been experiencing a true uprising [10].The key observation is that a combination of crystal symmetries with time-reversal symmetry breaking by non-collinear magnetic order can result in contributions to the AHE which go beyond those traditionally associated with ferromagnetic magnetization or simplified real-space emergent field pictures.One of suggested guiding principles in keeping a systematic overarching counting of possible contributions to the AHE in a system which undergoes drastic changes in its magnetic configuration lies in a symmetry expansion of the anomalous Hall conductivity in terms of products of spins on different atomic sites of arbitrary power [11].This approach has been applied in the past to the case of a bipartite lattice, demonstrating that it is possible to conceptually separate the AHE in systems with two spins into chiral and crystal contributions [12,13], which collect different powers of the spin expansion.Demonstrating the intricate physics that lies behind this decomposition, it was in particular shown that the corresponding chiral Hall effect (CHE) intertwines together the Berry phases of Bloch electrons in real and reciprocal spaces [6].While being very rigorous and potentially very promising in uncovering the physics of higher-order contributions to the AHE, applying this method to systems containing more spins ideally requires an automated approach.
Generally, the research in the area of AHE can benefit greatly from a deep understanding of the implications that crystal symmetry has on the geometry and topology of Bloch electrons, which can be modified excessively by tuning parameters in the enormous phase space of the magnetic configurations.While this phase space offers intriguing opportunities for exploration [14,15], the numerical complexity grows exponentially with the number of magnetic moments in the system.Increased computational cost for large system sizes or complex combinatorics are problems many disciplines suffer from as a whole.Consequently, natural sciences, and especially computational solid state physics, are witnessing a paradigm shift to find a way around increasingly costly simu-lations.Machine learning techniques have been gaining more and more attention over the past 10 years and have found application particularly in the search for pretrained electron potentials for accelerated density functional theory (DFT) calculations [16], computational materials design [17][18][19][20], rapid exploration of large parameter spaces [21], or interpretation of extensive experimental databases [22].However, comparable studies focusing on magnetic degrees of freedom are few and far between [23].
The machine learning techniques have matured so far that feature selection, feature extraction and model regularization can distill relevant features from high-dimensional input both effectively and reliably, and therefore aid algorithms which are in principle unaware of any physical intuition in finding models which we would accept as physical.In essence, using the right tools out of a huge, readily available toolbox, it is possible to fit models depending on just the relevant fraction of a large number of features, thereby reducing a phase space which might be too large to conquer to manageable size.Consequently, the main objective of this work is to train minimal, regularized models predicting the electric transport properties of complex magnets in terms of suitable descriptors with machine learning techniques.
At the heart of this modelling framework an expansion in symmetric invariants is employed as a basis for the descriptor space.This kind of expansion is a powerful ingredient in the sense that it confines the shape of the terms being used, but at the same time provides us with all possible spin interaction terms respecting the crystal symmetry.It is therefore a valuable alternative to spin modelling techniques, where product terms are constructed "by hand" [23], because it is complete with respect to the point group symmetry.In magnetism, such atomistic spin models traditionally offer great utility for the study of magnetic properties in realistic materials, and recently, efforts have been made to optimize these spin Hamiltonians with the help of machine learning techniques [24].However, the issues surrounding the role of the magnetic reference state used for the calculations, performed in order to extract the interactions parametrizing the spin Hamiltonian have been subject to debate [25,26].Accordingly, expansion techniques which do not refer to a fixed reference state, like the one presented in this work, might offer a new perspective on one of the staple methods in magnetism research.
Furthermore, the expansion method is not bound to a specific limit, neither microscopically canted spins nor mesoscopically extended magnetic textures, and might therefore be employed to derive a description which transitions smoothly between the theories for the two limits.Ultimately, the main goal of this project is to obtain linear models with reasonable generalization statistics on unseen data for calculations of the AHE in complex canted magnets with strong spin-orbit interaction, using a minimal number of descriptors constructed from the symmetric invariant expansion.From these models we aim to extract, first, the already familiar contributions to the AHE, as well as further, previously unknown terms in higher orders of the expansion.Second, we will test whether these models can be reliably and efficiently employed for extrapolation of the AHE in the magnetic phase space.Third, by varying the Fermi energy, we will examine whether the model construction is consistent enough to enable the exploration of the AHE in the combined phase space of spin moments and that of Bloch states with intricate reciprocal space geometry.
Mapping the fitted model coefficients as functions of the tight-binding model parameters as well as the system size will give us insight into the behavior of the different contributions to the AHE in different regimes.We classify the many higher order terms in low dimensional, microscopically canted magnets with respect to vector chirality ξ = s A × s B in order to derive a recipe for predicting the magnetotransport properties in large scale magnetic textures, such as spin spirals, skyrmions, or multi-q states.While we focus here on the chirality-even contributions to the AHE, which correspond e.g. to the so-called crystal Hall effect in case of antiferromagnets [11,27], one of our goals lies in providing an ability to identify the differences in transport signatures between different textures and associate these differences with specific ξodd or -even terms in the model.This will contribute to identification and distinction of magnetic textures by measurement of electric transport, which might play a key role in development of novel computing architectures.
In our work we bring together aspects form multiple fields, namely the calculation of the AHE from an electronic model, the expansion of a tensor in symmetric invariants, in this specific case the anomalous Hall conductivity tensor, and the modelling of the tensor based on invariant expansion utilizing machine learning methods.Therefore, we begin by introducing the electronic model and its symmetries in Section II A as well as the linear response method used in the calculation of the anomalous Hall effect in Section II E. We continue with illustrating the representation-based expansion machinery in Section II B, before explaining the details of the model selection pipeline in Section II F. Finally, we present and analyze fitting results in Section III D. We comment on the feature scaling with respect to parameters of the tight-binding model and offer a concluding discussion in Section IV.

II. METHOD
In our work we draw inspiration from three different perspectives, which define the three levels of our approach: on the surface, this is a machine learning approach to fitting sparse, linear models.The second level reveals, that input features to the learning algorithm are supplied by an expansion of the target in orders of spin, firmly tied to the underlying lattice symmetry.The base level of this framework provides the target to the algorithm, by electronic structure calculations on a discrete, magnetic lattice.Defining a working process to successfully combine these three perspectives in a modelling pipeline is at the core of our manuscript.In order to clarify the multifaceted approach presented here, we define the working parts of the pipeline first, before illustrating the methods in obtaining training data later in Section III.

A. Electronic model and its symmetries
In this work, we aim at predicting the electric transport properties of electrons on a bipartite honeycomb lattice of magnetic spins.To model the electronic structure, we employ an effective two-dimensional lattice tight-binding (TB) Hamiltonian (in the xy-plane) which reads: where c † iα (c iα ) denotes the creation (annihilation) of an electron with spin α at site i, ⟨...⟩ restricts the sums to nearest neighbors, the unit vector d ij points from j to i, and σ stands for the vector of Pauli matrices.Besides the hopping with amplitude t, Eq. ( 1) contains the Rashba spin-orbit coupling of strength α R originating for example in the surface potential gradient perpendicular to the plane (i.e.along ẑ).The remaining term in Eq. ( 1) is the local exchange term with λ ex characterizing the strength of exchange splitting and ŝi stands for the direction of spin on site i.Here, we work with the following parameters of the model: t = 1.0 eV, α R = 0.4 eV, and λ ex = 1.4 eV.
The analysis presented in this work also heavily relies on the crystallographic point group symmetry of the bipartite, nonmagnetic, honeycomb lattice, which is illustrated in Fig. 1.We can identify five nontrivial families of symmetry, or conjugacy classes, in the lattice, comprising the point group of C 6v , see Table I: on one hand, the lattice shows three classes of rotational symmetry, namely the six-, three-, and twofold rotation around an axis perpendicular to the lattice plane, which is commonly chosen to be the z-axis, with the lattice lying in the xy-plane.On the other hand, the nonmagnetic lattice is invariant under two classes of mirror operations, with the mirror plane being aligned with either the corners of the hexagon or the midpoints of the hexagons edges.

B. Expansion in terms of symmetric invariants
Given an observable of interest, which can be of tensorial nature, we have to invoke the representation of corresponding tensor in terms of the crystallographic point group and then expand it in the symmetric invariants of this representation, which correspond to the eigenvalues of the invariant subspaces in the representation.Below, we briefly present the basics of representation theory that we make use of in our study [28].
We call a representation a map ρ : G → GL(n, C), where GL(n, C) is the general linear group of n × n matrices with complex elements.n is in this case called the dimension of the representation of ρ.Every representation has to be a group homomorphism, meaning that ρ(g 1 g 2 ) = ρ(g 1 )ρ(g 2 ) for all g 1 , g 2 ∈ G.We speak of two representations ρ and ρ ′ as FIG. 1. Illustration of symmetry operations of the honeycomb lattice.The symmetry operations leaving the bipartite honeycomb lattice (atom types in red and blue) invariant can be illustrated on the plane containing the lattice, here the xy-plane.The point group of the honeycomb lattice, C6v, shows three classes of rotation around the z-axis, namely two-, three-and sixfold rotations C2(z), C3(z) and C6(z).Furthermore, there are two classes of mirror operations, σv and σ d , indicated here by the mirror plane going through the corners of the hexagon (σv) or through the midpoint of the hexagons edges (σ d ).

TABLE I.
Character table of C6v.Shown are the characters of each irreducible representation for each conjugacy class, alongside linear and quadratic basis functions which generate the respective representations.A tensor inducing a particular representation of the point group transforms accordingly to the characters of this representation.This allows us to easily augment the dataset by applying the symmetry operations to the spin configurations and multiply the target value by the according character value of the representation.In the case of the anomalous Hall conductivity (AHC) tensor, which belongs to the representation A2, the tensor changes sign under the mirror operations with the mirror planes cutting through the edges and the middle of the sides of the hexagon, σv and σ d .
equivalent, if there exists a matrix A such that Aρ(g)A −1 = ρ ′ (g), for all g ∈ G.All representations of the same crystallographic point group have an equivalent unitary representation, fulfilling ρ(g −1 ) = ρ(g) † , where † denotes complex conjugate transposition.The character of a representation ρ is defined as the matrix trace χ ρ (g) = trρ(g).The character map is constant under all conjugacy classes, because the trace is cyclic, χ(h −1 gh) = χ(g), for all g, h ∈ G.For any two representations ρ 1 and ρ 2 we find where ⊗ denotes the Kronecker or tensor product and ⊕ is the direct sum of matrices.We call a representation reducible if it is equivalent to a direct sum of representations, and irreducible if it is not.The representations ρ of a crystallographic point group can be decomposed into a direct sum of irreducible representations Γ i as follows: where n c is the number of conjugacy classes of the group.The multiplicity coefficients can be extracted as λ i = ⟨Γ i |ρ⟩, since the Schur orthogonality theorem states that with the Kronecker delta δ i,j .One specific representation we are interested in is the magnetic representation, i.e. the product of site and spin representation: This representation has the dimension d = 3 • n s , with n s the number of three-dimensional pseudovectors describing classical spins in the system, acting on vector of dimension d obtained from stacking the spins into one column as (s x 1 , s y 1 , s z 1 , s x 2 , s y 2 , s z 2 , • • • , s x ns , s y ns , s z ns ) T , commonly referred to as the image.The matrix Γ mag (g), with g being any element of the crystallographic point group, describes the combined action of the operation on the lattice sites and the magnetic moments via a reshuffling of the sites and the local rotation of magnetic moments in spin space under g.For example, in the honeycomb lattice with pointgroup C 6v and two atoms in the unit cell, the rotation by π around the z-axis, denoted by C 2 , rotates both pseudovectors locally by 180 • and interchanges the atomic sites A and B, resulting in the matrix: The blockdiagonal form that all matrices in this representation have is obvious here.Turning our attention now to a generic tensor we wish to expand in spin interaction terms, we require a representation of the group in higher dimensional spin space, to describe the action on terms comprising the product of any number of spins.Specifically, we calculate the invariant subspaces and corresponding eigenvalues of a projector P , which is the sum of all symmetry operations in the group calculated in the representation: where by ρ we denote the representation corresponding to the order of spin o.We call the eigenvalues equal to one symmetric invariants, since they encode the spin product terms we construct in a certain order of spin in scalar values, which are invariant under operations of the crystallographic point group.
For a given tensor expressed as a product of o spins, the magnetic representation induced by this tensor on the point group, Γ o = (Γ mag ) ⊗o , will have the dimension d o .Accordingly, the exponential growth of the representation's dimensionality renders the calculation of the necessary matrix operations impractical for larger system or high orders of spin.
Therefore, we make use of the fact that the product of scalars is symmetric under permutation of the scalars, and carry out the calculations in the symmetric space, which holds only the symmetric components of the product space.This reduces the dimensionality of the resulting representation from d o to (n+d−1)! n!(d−1)! .For example, this space does not hold all 36 components of the representation Γ mag ⊗ Γ mag for the two spin system, but only the 21 components not related by changing the order of vector components in the product.Accordingly, instead of calculating both s x 1 • s y 2 and s y 2 • s x 1 , we only calculate one term, and project back from this symmetric space after the calculation.
We thus compute the projector in the symmetric space as where by ρ s we denote the representation in the symmetric space corresponding to product space of order o.The symmetric invariants are now given by the eigenvalues of the invariant subspaces of this projector equal to one, hence we diagonalize the projector to find a total of m such spaces, where m is the multiplicity.Consequently, we find a number of m such invariants for each order of spin o, which we denote by the symbol I i,j , where i ∈ 0, C. Distribution of spin samples for two atom system We sample the magnetic moments uniformly on a sphere to cover the whole phase space offered by the magnetic configuration.The method used for this uniform sampling on a curved surface is called sphere-point picking [29], which avoids bunching of samples near the poles when simply drawing the spherical coordinates φ, θ from uniform distributions.For N moments, let u i and v i , i = 1, ..., N be random variables on (0, 1).Then .These two variables can be a good choice for a two atom system, since they correlate with the two order parameters of the magnetic configuration, which are the ferromagnetic and staggered antiferromagnetic moment sFM/AFM = s A ±s B
give the spherical coordinates with the azimuth angle φ in the xy-plane, 0 ≤ φ < 2π, and the polar angle θ from the positive z-axis, 0 ≤ θ ≤ π.
We can further augment the data computed with the TB model by applying the operations of the point group, see Section II A, on the spins and the target.By taking a look at the character table of the group, the behaviour of the target, belonging to a one-dimensional representation, can simply be deduced from the character: for the case of A 2 , the target will pick up a sign corresponding to the character of the operation in this representation.We therefore calculate the spin configurations resulting from applying the group operations and apply the sign to the target accordingly.Additionally utilizing the known behavior of the tensor component under time reversal symmetry, we can effectively scale the size of the dataset by a factor of thirteen without the need for further TB calculations, with eleven nontrivial group operations in C 6v and the time reversal symmetry.

D. Chirality and lattice site exchange
In our following analysis, the notion of chirality − specifically the vector chirality ξ defined as a vector product between spins on two inequivalent sites of the honeycomb lat- tice, ξ = s A × s B , see Fig. 2 − and corresponding symmetric and antisymmetric in ξ contributions will play a central role.
In order to disentangle chiral and nonchiral contributions to the AHE, we (anti)symmetrize the data obtained from tightbinding calculations with respect to the rotational sense, or vector chirality ξ, of the canted magnetic configuration.This symmetrization is achieved by exchange of the atomic sites A and B, as can be easily seen from the definition of the vector chirality ξ, which is antisymmetric under this operation: We therefore perform the following decomposition of the anomalous Hall conductivity tensor σ αβ − or any given (scalar or tensor) quantity of interest for that matter − into even (non-chiral, σ nc ) and odd (chiral, σ c ) parts with respect to the vector chirality: While in this work, we focus predominantly on the non-chiral part of the AHC, by seperating chiral and non-chiral parts, we aim at distinguishing chiral and non-chiral terms in the symmetrically invariant expansion as well, since this construction allows us to fit models which use ξ-even or ξ-odd features respectively.Consequently, we can write the equation for the fitted model as follows: where we construct the ξ-even or ξ-odd invariants in the exact same way as above: Accordingly, we make use of the symmetric and antisymmetric angles θ + and θ − , defined as , to visualize the symmetric invariants and anomalous Hall effect and their dependence on the magnetic configuration.These two variables can be an especially good choice of latent variables for the two atom system, since they correlate with the two order parameters of the magnetic configuration, which are the ferromagnetic and staggered antiferromagnetic moment s FM/AFM = s A ±s B 2 .

E. Calculation of the anomalous Hall conductivity
Given specific electronic structure, we calculate the transverse anomalous Hall conductivity at zero temperature using the Kubo formalism, which allows us to take into account the effect of disorder in the system on the conductivity tensor.In order to do so, we replace the retarded and advanced Green functions G 0 of the perfect crystal by the full Green function is the self-energy representing the effect of disorder.Here, we use a constant broadening model such that Σ(E, k) = −iΓ.With the constant broadening Γ we obtain a Green function diagonal in the eigenspace of the Hamiltonian: where ϵ nk are the single-electron eigenenergies.The antisymmetric part of the conductivity tensor, which can be expressed in terms of G R/A [3], splits into two contributions: where α and β are the Cartesian indices and E f is the Fermi energy.We refer to σ I αβ as the Fermi-surface term in the anomalous Hall conductivity (AHC), since it only picks up contributions from the Fermi surface.The term σ II αβ collects terms from all occupied states up to the Fermi level and is therefore referred to as the Fermi-sea term in the AHC.In evaluating the Kubo expressions for the conductivity of the two-atom system we have used roughly 65 • 10 4 k-points to perform the Brillouin zone integrals and a value of 25 meV for the broadening parameter Γ.

F. Algorithm layout
In this work, we aim at fitting a linear model of the symmetric invariants I o,j to the target y, where I o,j indicates the j-th invariant out of a total of n o invariants in the order o, up to a maximal order o max , and c o,j is the corresponding fitted model coefficient.Stacking all invariants and coefficients into vectors in increasing order, x = [x 0 0 , ..., x 0 n0 , x 1 0 , ..., x 1 n1 , ..., x omax 0 , ..., x omax no max ], we can write: The symmetric invariants are possibly correlated, as are the features in many model selection tasks.Further, most of the common algorithms are optimized towards working with normal distributions with unit variance and zero mean as inputs.Therefore, we need to perform a number of tasks in order to obtain a robust feature space as a solid basis for selecting the model.First and foremost, the dataset is split into training and test set to avoid training a model which just repeats the labels (overfitting).The training pipeline, see black dashed box in Fig. 4, consists of a number of preprocessing steps (light blue dashed box) and the model selection step.The preprocessing involves scaling features to unit variance and zero mean (standardization), see Section II F, and decorrelation and feature selection by principal component analysis (PCA), see Section II F, as well as a variance threshold removing all features with variance below a given threshold and standardization after PCA.We then perform the model selection step by training regularized models on the features selected by the PCA method, see Section II F. We take care to avoid data leakage along the pipeline: if we perform e.g. the PCA on the whole dataset before splitting into train and test set, the prediction of the model will be overly optimistic.Therefore, the preprocessing transformations are trained on the training set only.

Standardization
The model selected by cross validation maps the input features x m onto the single input label y m via the model coefficients c m , where the M true features are obtained from N observations: Input features and labels have been obtained from the true features and labels by standardization, subtracting the mean µ and dividing by the standard deviation, or scale, s, before training the model: where D [sx] is the matrix containing the standard deviation of each true feature on the diagonal, D ii [sx] = s i x , with all other entries equal to zero.The vector µ x , contains the mean values of each true feature, and the summation index j runs over the samples.

Principal component analysis
Further, the standardized features are decorrelated by using principal component analysis (PCA) [30].This method finds the singular value decomposition of the feature matrix, which is in essence a generalization of finding the diagonal of the feature matrix.The method of singular value decomposition, on which PCA is based, relies on a following geometric consideration [31].Considering a unit sphere in n dimensions, the image of this unit sphere under any m × n matrix is a hyperellipse.A hyperellipse is the m dimensional equivalent of the twodimensional ellipse, obtained by stretching the twodimensional unit circle by some factors σ 1 , . . ., σ m along some orthogonal directions u 1 , . . ., u 2 ∈ R, which are, for convenience, normalized to unit length, ||u i || 2 = 1.We refer to the vectors σ i u i as the principal semiaxes of the hyperellipse, with length σ i , and for a matrix A of rank r, exactly r of these lengths turn out to be nonzero.In particular, if m ≥ n, at most n of the will be nonzero.
With this geometric picture in mind, let us define the singular value decomposition (SVD) of A ∈ C m×n : let m and n be arbitrary natural numbers ≥ 0, then the SVD of A is defined as: where Additionally, we assume the diagonal entries σ j of Σ to be non-negative and ordered in nonincreasing order, σ 1 , ≥ . . ., σ p ≥ 0, where p = min(m, n).Looking at this definition, above statement is made quite clear: the unit sphere S ∈ R n is transformed to the hyperellipse SA ∈ R m , since the unitary map V * preserves the sphere, the diagonal matrix Σ stretches it and the unitary matrix U rotates and reflects the ellipse.For a proof, that every matrix A has such an SVD, see also Ref. [31].
The usefulness of the SVD for machine learning problems can now be illustrated by investigating the diagonal matrix Σ. Considering the SVD of a feature matrix of shape m × n, constructed from stacking the vectors of n features for all the m samples, the matrix Σ is potentially very large, especially for problems with many samples, many features, or both.However, two characteristics of this matrix make the SVD worth calculating.On one hand, the fact that Σ is diagonal implies that the corresponding singular vectors are orthogonal, or in statistical terms, uncorrelated.On the other hand, the number of non-trivial singular values is not given by the dimensions m or n, but by the rank r of the matrix, which can be significantly smaller than m and n for features that are correlated.Accordingly, for matrices with rank r < n, m, the resulting feature space is potentially much smaller than the original feature space.
In the context of machine learning, methods involving singular value decomposition are referred to as principal component analysis (PCA) methods, where the principal components are equivalent to the coordinates of the principal axes, and the measure of explained variance corresponds to the singular value, or length, of the principal axes.This correspondence lends itself to the analysis of variance in the feature space, as the singular values, as the norm of the singular axis they correspond to, determine how much overall feature variance a change along a certain axis induces.

Variance Threshold
Features with low variance will very likely not have a large influence on the model target.To compare feature variances in a meaningful way, we first scale all the features to the range [0, 1] individually by using a MinMaxScaler.We then discard features with a variance below a certain threshold.This allows us to reduce the dimension of the feature space with very small effort.The threshold for the data shown in this work is v t = 10 −5 .

Feature selection with statistical methods
Since the modelling technique introduced in this work relies on an expansion agnostic of any physical intuition regarding the presence of certain interactions like exchange or spinorbit coupling in the model Hamiltonian Eq. ( 1), condensation of the input feature space to the relevant components is a key ingredient for meaningful modelling of the electric transport.
To this end, several techniques can be utilized in conjunction with the regularized regression algorithm, which prefers a sparse description of the target by penalizing models with many nonzero coefficients.While the PCA-transformation can be used to extract portions of the latent space explaining overall variance of the feature space, we suggest to use a statistical measure which incorporates also the correlation with the target variable.
Specifically, we focus here on a technique estimating the linear correlation between single input features and the target variable, the f-regression test, since the working hypothesis is that the electric transport can be described by a linear model of the input features.
This quantity measures the cross-correlation between two variables x and y over N observations, given by the Pearson correlation coefficient: Internally, the Pearson correlation coefficient is converted to an F-score (a number between 0 and 1).Computing the fregression score for each component and ranking the components according to this metric, we can select a top scoring percentile to fit the model and disregard features scoring below a threshold.This approach nicely complements the PCA transformation step, which ranks the constructed components by explained variance ratio in the feature space.The f-regression score presents a systematic way to assess the importance of input features for the model based on their correlation with the target, which the PCA transformation is not able to do.

Model selection by regularized regression
We regularize this model by the elastic net penalty [32].The regularized loss-function F (w) to minimize is therefore the sum of loss-function L(c T • x, y) and regularization term R(c): where N is the number of samples, and α is the parameter controlling the strength of regularization, with α = 0 corresponding to an ordinary least squares (OLS) fit.The OLS fit can be adapted to an squared-epsilon-insensitive loss, by allowing the model to ignore errors below a certain threshold of ϵ, and applying the standard squared loss above the threshold.This adapted error function allows for slightly improved description of outliers in some cases.We use a value of ϵ = 10 * * −2 here.The parameter l 1 is the ratio interpolating between two kinds of regularization related to a norm on the weights or coefficients of the model, LASSO and RIDGE, by L 1 -norm (l 1 = 1, LASSO) or by L 2 -norm (l 1 = 0, RIDGE).While the RIDGE regression penalizes the absolute size of coefficients, LASSO penalizes the number of nonzero coefficients.In general, RIDGE regression gravitates towards models where the size of all coefficients is optimal.In contrast, LASSO prefers sparse models, where the number of nonzero coefficients is optimal.
Regularization as a mix of the two methods helps us to find a sparse model, with the optimal number of nonzero coefficients, while keeping the size of coefficients in check, and is also commonly referred to as compressive sensing.The models are ranked by a scoring method of choice, in this case the coefficient of determination, R 2 , calculated from the residual sum and total sum of squares: Here, for a sample numbered by the index i, we denote the true value of the target by y i true and the model prediction for the sample by y i predict .The mean of all true target values is calculated as ȳtrue = 1/N N j y j true , where N is the total number of samples.
The best possible score is 1.0, and the score can be negative, since the model can be arbitrarily bad.The score on either train or test set is not our only measure of the models fidelity: we can additionally probe the functional dependence along specific paths in the feature space.This allows us to quantify not only statistical quantities like the MSE, but also to probe how faithfully the model reproduces the functional dependence of the target along certain paths or in certain regions, for example varying only the polar angle θ on an arc starting from the north pole, θ ∈ {0, π}, keeping the rest the same.Especially when the model extrapolates from one region, overfitting in this regard must be closely monitored.

Stochastic Gradient Descent regression
The regressor is the tool that minimizes the loss function defined by the regularization, elastic net defined in Eq. (28) in this case, on the given training data.We use the Stochastic Gradient Descent (SGD) regressor here, which is an especially effective optimization tool for large quantities of training data, usually > 10 4 samples.SGD estimates the true gradient of the loss function on one training sample, as opposed to other methods like batch gradient descent, and updates the weights along the gradient.One pass over the whole training set is referred to as one iteration here.After an estimate of the gradient ∂F (c)/∂c has been computed on one sample, the model coefficients, or weights, are updated as follows: where by R(c) we refer to the regularization term of the elastic net and by L(c T • x, y) we refer to the loss-function, as defined in Eq. ( 28), and µ is the learning rate.

Hyperparameter optimization via cross validation
SGD is influenced by a number of hyperparameters, which we choose based on a cross-validation search.This method is used to optimize the values of hyperparameters for a particular algorithm by repetitively running the algorithm for different parameter values and thereby determining the highest scoring point in the parameter space.Note that the score here does not necessarily need to be the test score of the resulting model found by the algorithm, but can be any metric supplied by the user.Some obvious examples would be to optimize for minimal runtime, optimal model score, or even a trade off between the two.We make use of this tool to finetune the regularization parameter α reg , and to find the optimal l 1 -ratio between Lasso and Ridge in the elastic net, as well as the optimal learning rate µ.In general, different loss functions can be probed in order to better handle outliers or zero inflation, like the squared epsilon insensitive loss function, which ignores errors below a given epsilon.
In order to achieve the highest efficiency for the cross validation method, we employ the Halving Grid Search algorithm.A given set, or grid, of parameters, serves as candidates.All candidates are evaluated in the first pass with a limited amount of resources, and a top fraction of the candidates with respect to a scoring method of the users choice is kept while the rest is discarded.The surviving candidates are then evaluated again with an increased amount of resources and the process is repeated.While the number of candidates carried over to the following iterations is divided by factor s, the amount of resources is multiplied by the same factor s.
A hyperparameter search using this method is especially efficient, since not all candidates are evaluated on equally large amounts of resources, in this case training samples.In order to avoid overfitting, the algorithm employs cross validation as well by using the k-fold strategy, where the available resources are split into k consecutive folds, each of the used as a validation set once while all other folds form the training set.

Backtransformation of model coefficients
The model coefficients are not the true physical coefficients connecting true physical features and labels.However, to interpret the trained model in physical terms, we require the coefficients in the physical feature space: A mapping between model coefficients and true coefficients is found by applying the inverse transformation along the pipeline to the coefficients.This means inserting Eq. ( 20) and Eq. ( 21) from Section II F into Eq.( 31), and multiplying by , to arrive at the coefficients for the PCAtransformed features x p : Then, applying the inverse transformation of the PCA, we arrive at the true coefficients of the original symmetric invariants: We can therefore extract the physical parameters from the trained model by multiplying with the standard deviation of the target and dividing by the standard deviation of the feature to reverse a standardization step or by applying the inverse transform of the exactly invertible transformations along the pipeline, such as PCA.

A. Input data
By making use of the method presented in Section II C, we generate a total number of 10 4 samples, each describing the two magnetic moments on sites A and B of the bipartite honeycomb lattice.The moments point in randomly chosen directions, such that the overall distribution of directions is uniform over the unit-sphere, see Fig. 3.The anomalous Hall conductivity, which represents the target quantity, is numerically calculated by using the Kubo approach detailed in Section II E. The calculation provides the target value for different values of the Fermi energy, which allows the analysis of the fitted models with respect to changes when probing the transport at different points in the electronic structure, characterized by different regimes of reciprocal geometry.
An explicit implementation of the symmetry expansion, outlined in Section II B, provides input features to the modelling pipeline.Since the anomalous Hall conductivity tensor contains an odd number of spins, the even expansion terms for the anomalous Hall conductivity tensor vanish.In principle, objects of arbitrary order in spin can appear in the model.For the sake of practicability, we calculate the first six non-vanishing orders of the expansion, namely orders {1, 3, 5, 7, 9, 11} with {1, 7, 26, 76, 185, 392} terms respectively, representing a complete description of possible interaction terms up to order eleven in spin, containing 687 different terms.We clarify specifically, that higher orders in spin do not incorporate a higher number of distinct magnetic moments.The order in spin corresponds plainly to the power of the image, which is the pseudovector of dimension d introduced in Section II B, representing the magnetic state of the system.
In Figs. 5 and 6 we present an overview of the symmetric invariants calculated for the two-atom system.They show the original features symmetrized and antisymmetrized with respect to vector chirality ξ respectively, the panels show one invariant each as a function of azimuth angles θ + and θ − , sorted by increasing expansion order from top left to bottom right.Inspecting the data shown in Fig. 5, the invariants reveal their structure as functions of the spherical coordinates.All ξ-even invariants share the more or less sharp nodal lines in both horizontal and vertical direction, at coordinates Between these nodal lines, blocks with alternating sign are present, which grow in complexity for higher order, displaying multiple sign changes (panel 6), polar features (panel 16) and fine modulations (panel 11).Investigating the data presented in the first panel, displaying the most simple structure, the correlation with the overall magnetization, or sum of the two magnetic moments of the system, is evident.This in turn promotes the idea, that this first invariant should be assigned large weight in the description of the anomalous Hall effect, since this phenomenon is conventionally associated with the overall magnetization.Invariants beyond the 36 shown here display quite similar morphology, with the same building blocks repeating with slight variations.
The interpretation of these patterns becomes more intuitive when associating the values of θ ± with specific magnetic configurations.In the center of the panels, at coordinates (0, 0), we find the standard ferromagnetic alignment of moments along the z-axis in positive direction.Traversing the panel in the horizontal direction corresponds to a collective rotation of both spins around an axis, with the nodal lines indicating the moments passing through the xy-plane and the AHE consequently vanishing.A similar explanation holds for the vertical direction, and traversing along this axis corresponds to a rotation by equal amounts in different directions per site, which we refer to as spin canting.However, the nodal lines along the θ − -axis are not associated with the moments passing through the xy-plane, but rather with the moments falling into an antiferromagnetic alignment.When either rotation or canting angle has reached the value of π, the magnetic config-uration has been reversed to the ferromagnet aligned with the z-axis in negative direction.
If we imagine now, that we only rotate one of the spins around an axis in the xy-plane, then both θ + and θ − will increase or decrease with the same rate.In this way, depending on the rotation direction, we reach the four nodes placed at the corners of the tiles.These correspond to the antiferromagnetic alignment along the z-axis, with one spin aligned in positive and negative z-direction each.As can be easily imagined, traversing from here in the θ + direction mimics the collective rotation of the antiferromagnet around an axis in the xy-plane, with the nodal lines again representing the position where the moments pass through this plane.
Focussing now on the ξ-odd invariants shown in Fig. 6, the most obvious feature of the data presented here is that the first invariant is exactly zero everywhere.Considering the fact that this invariant is the only one of first order in spin, it is conceivable that this invariant is related to the overall magnetization.As the magnetization, or ferromagnetic moment of the configuration, is insensitive to exchange of the lattice sites, the antisymmetrization leads to a cancellation.As is the case of ξ-symmetric invariants, there are dominant nodal lines, however at different coordinates given by These nodal lines transform into quite different morphologies in higher order, namely the kind similar to the data shown in panel 11, with quadrupoles surrounding the critical points of (θ Additionally, only the isotropic characteristics are present, e.g. in panels 11, 13, 14, 15, 16 at points shifted by π/2 from the nodal points mentioned before.
Referring to explicit magnetic arrangements, we can relate the characteristics of the invariants to specific properties of the magnetic configuration.The horizontal nodal line, visible in all of the antisymmetric invariants, marks the ferromagnetic configurations with θ − = 0, connected by collective rotation of spins.The two vertical nodal lines then correspond to the arrangements with ferromagnetic moment rotated into the xy-plane.In contrast to the symmetrized data, the antiymmetrized components are all exactly vanishing for perfect ferromagnetic alignment of spins, but not for the antiferromagnetic arrangement, with the exception of coordinates (±π/2, ±π/2).These coordinates mark an antiferromagnetic arrangement in the xy-plane.The isotropic features, appearing at coordinates (θ correspond to the perfect antiferromagnetic configurations along the z-axis, perpendicular to the lattice plane.
Finally, the dataset is augmented by applying the lattice symmetries, as detailed in Section II A. From each of the spin configurations, 12 additional configurations can be generated, one for each nontrivial group operation and one for the time reversal operation.Multiplication with the operations character in the representation yields the corresponding values of the anomalous Hall conductivity tensor and the symmetric invariants from the original values.
In conclusion, the modelling pipeline works on a set of 13 • 10 4 samples − obtained from augmenting the 10 4 ini-FIG.7. Cross-correlation matrix of the input features.The correlation matrix of symmetric invariants symmetrized with respect to vector chirality ξ.The cross-correlation is calculated over the total number of samples, which is 65000.Obviously, the symmetric invariants are, as already apparent from the similar structure in the space of (θ+, θ−), heavily correlated amongst each other.Perfectly uncorrelated features would result in a matrix with ones along the diagonal and zeros everywhere else.

FIG. 8. Result of principal component analysis (PCA).
The correlation matrix of symmetric invariants symmetrized with respect to vector chirality ξ, after principal component analysis (PCA).The total number of samples used here is 65000, the PCA transformation is trained on 44750 samples and 16250 samples are held out as a test set, shown in the top right corner.The correlation matrix assumes a block diagonal form after performing the transformation found by PCA.
tial TB calculations by exploiting the cyrstal symmetries and time reversal symmetry − where each sample is characterized by a magnetic configuration of s A and s B , a corresponding value of the anomalous Hall conductivity, and a collection of 687 symmetric invariants.Before continuing with any further steps, the feature data are standardized by substracting the mean and scaling by the variance, as illustrated in Section II F.

B. PCA transformation
After standardizing the features according to Section II F, a PCA transformation constructs a decorrelated and greatly condensed feature space, as clarified in Section II F. This transformation is especially useful for the kind of data considered here, as the features show strong correlation, see offdiagonal terms in Fig. 7. Turning our attention to the data shown in Fig. 8, we realize that the correlation matrix assumes a block diagonal form after performing the transformation found by PCA.The matrix displays a square block in the bottom left corner, indicating that the PCA transformation finds a number of ∼ 110 uncorrelated components in the feature space.While the amount of off-diagonal weight in this matrix is visibly less than before, the off-diagonal terms are quite large, especially in higher orders, for feature indexes above 110.Closer inspection of the components in this region reveals that the explained variance of the components is orders of magnitude smaller than for the first components, and therefore the cross-correlation is inflated by the inverse dependence on the variance scale.Accordingly, we discard features beyond ∼ 110 by applying a variance threshold after the PCA transformation.
When we focus now on the data shown in Figs. 9 and 10, we observe that the PCA transformation succesfully reduces the dimensionality of the feature space, while keeping the characteristics of the different types of invariants intact and separated.Figs. 9 and 10 display the features obtained by PCAtransformation from the ξ-symmetric and ξ-antisymmetric features as functions of the spherical coordinates θ ± .The panels show one invariant each, sorted by decreasing explained variance from top left to bottom right.The principal components of the feature space scoring high in explained variance represent the mixtures of types that were identified in the original feature space.Most strikingly however, the polar structures, more associated with higher order invariants, score highest in explained variance.From the data presented in this section, it is apparent that efficient dimensionality reduction can be achieved by the PCA-transformation, which additionally provides us with invertability (so we can easily transform back to the original space) and conceptual simplicity (since it is a linear transformation).

C. Feature selection with statistical methods
Investigating the f-regression scores presented in Fig. 11, indicating the correlation between single feature and target, we can form a clear picture of which input features are relevant for describing the electric transport.The entries are scattered on the y-axis against the values of explained variance on the x-axis, in order to relate the component's importance regarding feature space variance and target correlation respectively.On both axes, the data is shown on a logarithmic scale.FIG. 9. PCA-transformed invariants symmetrized with respect to exchange of atomic sites.The components are normalized to one in each panel, otherwise higher order features would not be visible due to their small scale.The transformation effectively merges by linear combination the multiple, quite similar symmetric invariants shown in Fig. 5, into a few relevant morphologies as the dominant directions in the resulting latent space with respect to explained variance.This allows to reduce the input feature space to a handful of relevant features.FIG. 10.PCA-transformed invariants antisymmetrized with respect to exchange of atomic sites.Overall, the effect of the PCA is similarly drastic as in the case for the ξ-symmetric features, leaving us with a handful of relevant features as input.
Overall, the data is split into two clusters with respect to the explained variance: on the left hand side, redundant components accumulate at the limit of vanishing explained variance.
As we are forcing the PCA transformation to construct as many components as there are features, components beyond the true rank of the feature matrix are redundant, having zero explained variance and being constant, as we already established in Section II F. Due to the vanishing variance of these components, the f-regression scores in this cluster are highly inflated.
On the right hand side, a cluster of possibly relevant components forms an almost linear trend on the double logarithmic scale.Components below the true rank of the feature matrix are assigned non-vanishing values of explained variance, which show some correlation with the size of the fregression score.This correlation is a confirmation of the intuitive reasoning, that components with larger portions of explained feature space variance should be more relevant when determining the value of the target on the basis of the features.In contrast to this intuition, the data shows a sizable spread around the trend-line.Accordingly, components associated with higher-order characteristics of the feature space with lower overall variance, still show significant correlation with the target.Measuring the linear correlation of single features and the target is therefore yielding valuable information when performing modelling, which can not be retrieved from feature analysis alone.

D. Model selection: Anomalous Hall effect
Below, we present the results obtained from fitting linear models of the symmetric invariant expansion to the intrinsic part of the anomalous Hall effect (AHE) at a fixed value of Fermi energy E F = −1.5 eV.To this end, for compactness, we focus only on the ξ-symmetric non-chiral part of the AHE, noting that while the conclusions formulated below generally hold true also for the ξ-odd part of the AHC, the much smaller average magnitude of the chiral AHC in our system makes the numerical analysis in the chiral case generally more cumbersome.In Fig. 12 we present the data fitted by two ξ-even models with different number of PCA-transformed components, selected on the basis of f-regression score, in order to illustrate the importance of using all components up to the true rank of the feature matrix.While the larger model has access to the top 90% of components with respect to the f-regression test (blue crosses), the smaller model only uses the top 30% of components (red dots).This amounts to 99 and 33 components out of 110, respectively.
The first question to be addressed is whether the method used here is able to reproduce the training data with sufficient confidence.In order to rate the model performance we employ the well-known coefficient of determination R 2 , defined in Eq. 29.The best score possible is 1, and a score of 0 would correspond to a model simply predicting the mean of the training data for every data point.From the correlation plots presented in Fig. 12 (a,b) we can immediately see that both models perform reasonably well, with the large model scoring 0.99 (blue crosses) and the smaller model 0.97 (red dots) respectively.Obvious deviations from perfect linear correlation (black line) occur in the pockets above and below the FIG.11.Explained variance ratio and f-regression score for the ξ-symmetrized PCA components.Explained variance ratio of the PCA components on a logarithmic scale on the x-axis and fregression score on a logarithmic scale on the y-axis.The PCA transformation finds a number of nontrivial components with nonzero explained variance ratio, corresponding to the true rank of the feature matrix.The sharp step in explained variance around index 110 indicates the true rank, as the values for the explained variance ratio drop to zero for components beyond this index.Correspondingly, the data fall into two clusters, one with relevant influence on the feature space in the right cluster, and one with negligible influence in the left cluster, correspondingly expressed in the values of explained variance.Notably, the relationship between f-regression score and explained variance shows a linear trend in the right cluster, however the data points show sizable spread.The overall negligible variance and magnitude of components beyond the true rank of the feature matrix, i.e. data points in the left cluster, lead to inflated f-regression scores.We therefor disregard these components scores as uninformative.
point (0, 0) in the center of the plot for both models.Such deviations, ranging up to 0.6 for the model with less coefficients, with the target value being normalized to 1.0, would be a reason for concern, if they occurred for relevant portions of the test data.However, inspecting the distribution of residuals, shown in Fig. 12 (c), proves that only a small number of samples deviate significantly from the center.Inspecting the variance of the residuals for the larger model reveals, that the deviations from the model approximately follow a normal distribution (blue line).From the indicated variance ranges (black dotted lines) it is apparent, that deviations greater than 3 • σ ≈ 0.18 are quite rare.By comparing the blue and red distribution, the effect of restricting the model to less components is illustrated by the larger spread of the red distribution.More weight is allocated to the tails of the curve, corresponding to samples with larger difference between target and prediction value.models are sparse.Utilizing the combination of PCA, statistical feature selection and regularization, the method is able to condense the input feature space, containing a total number of 687 features, to 33 and 99 features used in predicting the target value respectively.Allowing the model to use a larger fraction of the input features, as we have seen in discussing the distribution of residuals above, leads to a significant reduction in deviations of predictions from the target.However, the values of the coefficients used in both models do not change drastically, indicating stability of the models.Similarly, when removing small valued coefficients from the fitted models by hand (not shown here), which we call feature annealing, has a modest effect on the prediction.Depending on the threshold below which coefficients are annealed, the predictions are relatively stable.This indicates that the annealed coefficients are not relevant to the model and could be suppressed by adjusting the regularization parameter when fitting the model, see cross-validation approach in Section II F.
A special remark can be made for the coefficients of components 108 and 109, appearing in Fig. 12 d) in 5 th and 7 th place.Keeping in mind that the coefficients are ordered from left to right by f-regression score, these two coefficients are extreme cases of a small explained variance ratio, but comparatively large correlation with the target.Investigating the influence of these two coefficients on the model by hand (not shown here) reveals their strong influence on the model fidelity, in spite of their small relative size, below 25%, compared to the largest model coefficient.Annealing these two coefficients from the model increases the maximal residuals by a factor of 1.5, and − most importantly − the fraction of samples with residuals larger than the 3σ-range is more than doubled.In contrast, removing two coefficients of similar size, but with lower f-regression score, has much less drastic effects on the residuals.In conclusion, f-regression score is a reliable and effective metric for assessing the importance of single components for the model, controlling the overall prediction quality and number of outliers.
Considering that the input features do not contain any information about the electronic properties of the system explicitly, the discrepancy in prediction of the true value might be most probably explained by referring to the band structure, consisting of four bands only [11].Tuning the direction of magnetic moments can result in drastic and abrupt changes in the position of the bands with respect to the Fermi energy.By probing the electric transport at constant Fermi energy, the analysis is sensitive to sudden changes in occupation, for example when single bands are pushed above or below the Fermi energy by altering the magnetic configuration.Correspondingly, analysis of the prediction for different energetic positions in the band structure shows, that the amount of deviation in the model prediction depends on the Fermi energy.In order to avoid this nonuniform behavior, the target can be calculated at constant filling.This approach, which we discuss in detail below, requires much more numerical effort, since the calculations of the AHC have to be complemented by precise calculations of the density of states.In contrast, the expansion could be adapted to incorporate certain band structure effects, sacrificing conceptual simplicity for a more accurate description of the transport.On the other hand, considering the fact that in larger systems the electronic structure contains many more bands, the changes in occupation due to the rearrangement of bands might be less drastic, and the general behavior of the coefficients and fitting much smoother.Correspondingly, deviations in model prediction could be dramatically reduced.

Model coefficients as function of the Fermi energy
The AHE is a quantity driven by the occupied states of a system.It is therefore instructive to investigate the influence of the Fermi energy E F on the selected model − we quantify this relationship by fitting models at several values of the Fermi energy and tracking the values of the model coefficients.Only states below E F are occupied and therefore contribute to the AHC, which allows us to associate model coefficients and their size with regimes in the electronic structure contributing to the electric transport.As the data presented in Fig. 13(a) clearly shows, the model coefficients in the ξ-even case are smooth functions of the energy, which provides a basic sanity check for the model: the selected model transfers from one regime in the electronic structure to another, without selecting a different set of features.We can further identify a few dominating features in the model, with the corresponding model coefficients being as much as five times larger than the other coefficients in size.Perhaps the most significant characteristic of the presented data is the sign change that all coefficients undergo at a Fermi energy of about −0.65 eV, which corresponds to a sign change in the target, or AHE, at that specific energy.Overall, we also observe how the interplay between dominant, lower, and higher-order components of the chiral AHE can sensitively depend on the electronic structure.
Inspecting the behavior of the predictions and target values for four representative spin configurations (blue, orange, cyan, red) as a function of the Fermi energy, we can immediately recognise that in the even case, Fig. 13

Transport at constant electronic filling
Coming back to the systematic deviations of model predictions from the true target, shown in Fig. 12(b) as pockets below and above zero at the center of the plot, we suggested to identify varying electronic filling as one of the underlying difficulties for a linear model.Indeed, tuning the magnetic configuration results, as a direct consequence of the presence of spin-orbit interaction in the electronic model Eq. ( 1), in changes to the energetic position and shape of electronic bands in reciprocal space.As a direct consequence, the electronic occupation at a given value of the Fermi energy is not constant under changes in the magnetic configuration.In order to judge whether differences in electronic filling, driven by these changes, introduce systematic deviations in the fitted model, we present the residuals of the fitted data as a function of electronic filling, with a color code indicating the absolute value of magnetization, in Fig. 14.
As basic physical intuition concerning magnetic systems with strong exchange splitting and weak spin-orbit interaction would suggest, the electronic filling of the system, shown on the x-axis in the figure, correlates strongly with overall magnetization in the system.For the region of [0.56, 0.6] electrons per atom, the residuals are constrained to the range [−0.1, 0.1].Most interestingly, the deviations of model prediction from true target value increase suddenly around a value of about 0.61 for the electronic filling, with a magnitude of up to 0.4.Furthermore, the strict correlation between occupation and magnetization is lifted at this value, with strongly compensated configuration showing small residuals, and samples with moderate magnetization inhibiting large deviations.We can thus conclude that the residuals of the fitted model show systematic dependence on electronic filling, which is modulated by the magnetic configuration, in a strongly nonlinear fashion.The data presented in Fig. 14 is strong evidence, supporting the idea that systematic deviations in the linear model are caused by nonlinear characteristics of the electronic structure, induced by changes to the magnetic configuration and mediated by the coupling of spin to electronic degrees of freedom through spin-orbit interaction.
In order to rectify the approach, we present a model fitted to training data calculated at constant electronic filling in Fig. 15.As is the case for Fig. 12, the model is fitted to ξsymmetrized data, with a score larger than 0.99, and visibly smaller deviations in prediction when comparing the data presented in panels (a), showing model predictions on the y-axis and true target value on the x-axis, in Fig. 15 and Fig. 12.Under closer inspection of the data shown in panels (b), which present residuals y true − y pred on the y-axis and true values on the x-axis, we conclude that the overall scale of deviations relative to the absolute value of the target is reduced by factor of about four when considering the effect of constant filling.The comparison of the model coefficients, presented in panel (d) in each figure, reveals, that very similar components are active in the model, while the precise ordering and assigned weight is slightly different.We can thus conclude that, overall, while accounting for the effect of constant electron filling may somewhat improve the accuracy of the modelling, the qualitative predictions and understanding of the model can be already achieved at the level of a much more computationally efficient approach which relies on fixing the value of the Fermi energy.The two approaches are expected to converge in the limit of many atoms in the unit cell, as it is the case for example for complex spin textures or fluctuating magnets.

IV. DISCUSSION
The main goal of this study was to assess whether the electric transport, generated by the intricate changes in electronic structure introduced by complex non-collinear magnetism, can be predicted by using only descriptors of the magnetic pattern realized on a two-dimensional lattice.First, in order to obtain a complete, but non-redundant, description of the general conductivity tensor, an expansion in symmetric invariants of the underlying lattice point group was utilized to find a suitable set of descriptors.Second, a machine-learning pipeline was designed, tested and trained for the purpose of finding a sparse, linear model of the conductivity as a function of the symmetric invariants, which displays reasonable generalization statistics on unseen data.Relevant features of the input data were extracted by utilizing two complementary methods: on the one hand, feature selection was performed by ranking the features with a statistical metric measuring the correlation between feature and target, the f-regression test.On the other hand, regularization of the model search with the elastic net penalty was used to assign lower scores to models using a larger number of features, therefore pushing the search towards models which minimize the number of nonzero coefficients.Combining these two steps has led to significant condensation of the input feature space in conjunction with good overall generalization ability, verified by reasonable scoring of the final model on the test set.
The results of this study illustrate, that an excellent description of electric transport in magnetic materials can be achieved on the basis of the magnetic structure only, when the electronic filling of the system is considered to be unchanged by tuning the magnetic configuration, while the analysis performed for a fixed Fermi energy can already give a very good insight into the qualitative constituents of the model and the relative importance of different terms in the conducticity expansion.Given suitable modelling techniques, explicit calculations of the electronic structure, especially challenging in materials exhibiting strong spin-orbit interaction and hosting complex magnetic textures, can be augmented and enhanced by the symmetric invariants method, encoding the magnetic FIG. 15.Fitting results for the anomalous Hall effect at constant electronic filling.The ξ-even part of the anomalous Hall effect (AHE) of the two atom system can be well described by a linear model of a few variables also in the case of constant electronic filling.The structure of the plot and corresponding variables are the same as in Fig. 12. order parameters of the system.
Commenting especially on the statistical feature analysis, the results suggest the importance of magnetic texture characteristics beyond the established order parameters of ferromagnetic and staggered antiferromagnetic moment.Overall, while the feature space can be condensed considerably, the resulting linear models still show remarkable entanglement of magnetization properties beyond the established order parameters.While the resulting model is linear, the PCA components used as building blocks for the model, are sensitive beyond linear dependence to intricate changes in the orientation of magnetic moments.Our findings thus underline the importance of higher-order in spin contributions for a consistent description of the AHE already in the simplest case of bipartite canted magnets described by elementary electronic models with spin-orbit interactions.This is the main finding of our work concerning the physical properties of quantum spin systems.We can speculate with a high degree of certainly that the situation is even more complex for example in frustrated spin systems of kagome type, where various contributions to the AHE are still under debate (see e.g.[33]).Our approach is thus unique in identifying the magnitude and symmetry of leading AHE contributions without resorting to an excruciating process of guessing complemented by scarce calculations.
A special word should be said about the potential of suggested methodology in exploring the influence of various further parameters and corresponding more complex phasespaces on the transport properties of dynamically evolving or fluctuating magnets.Indeed, we expect that the effect of various characteristics of the quantum system as reflected in its Hamiltonian − such as the magnitude and symmetry of the hoppings, strength of correlations, exchange splitting or spin-orbit interaction strength, which can be tuned dynamically e.g. by temperature, laser pulse, or other external means − can be consistently included into our analysis.Going beyond, we dare to suggest a possibility of including the time evolution of the spin system according to some dynamical equation into consideration explicitly, which may result in finding a clear path towards consistent modelling of the memory kernel of the system [34].
Overall, we can conclude that, in spite of our work focussing on a simple four-band electronic structure component, the principal possibility of modelling that we have demonstrated here marks an important step in integrating machine learning methods into the field of spintronics research and in particular magneto-transport phenomena.Symmetryenhanced compressive sensing in conjunction with principal component analysis turns out to be a promising candidate to conquer the complexity of magnetic phase-spaces, and the study of spin and electronic transport phenomena offers plenty of avenues for extending the machine learning technique.The symmetric invariants, although already showing to be a feature space with impressive potential for generalizability, can be compared and possibly augmented with features directly extracted from the magnetic texture in real space, or the electronic structure in reciprocal space.Specifically, direct feature extraction, e.g. through variational auto encoders (VAE), lends itself to the case of large textures with many atomic sites, when the construction of local chiral and non-chiral contributions is not trivial, as opposed to the two-atom system studied here.Acquiring a firm ability to predict the transport properties of large spin textures is pertinent for our ability to model and understand the response characteristics of matter in an automated manner, as such presenting a challenge for research at the junction of solid states physics and machine learning.

FIG. 2 .
FIG.2.Illustration of a two-spin invariant.The many possible interactions between two spins on a lattice contributing to a tensor can be decoded into the symmetric invariants that the representation of the tensor induces on the point group of the lattice.It is instructive to visualize the invariants in the space of two suitable variables v1 and v2, for example the sum and difference of the azimuthal angles of the spins on two inequivalent sites of a honeycomb lattice θ+ = θ A +θ B 2

FIG. 3 .
FIG.3.Distribution of the directions for the two magnetic moments.The directions of the magnetic moments are illustrated by their intersections with the unit sphere, with a view on the xy-plane shown in red, and a view on the xz-plane shown in blue.In contrast to a naive sampling of the direction, the directions cover the surface area of the unit sphere uniformly, without strong agglomeration at the poles.

FIG. 4 .
FIG. 4. Flowchart representation of the method.The dataset is split into training and test sets to avoid training a model which just repeats the labels (overfitting).The training pipeline (black dashed box) consists of a number of preprocessing steps (yellow dashed box) and the model selection step.The preprocessing involves scaling the input features to unit variance and zero mean (standardization), feature decorrelation and selection by principal component analysis (PCA), see Section II F, as well as a variance threshhold and a second standardization after PCA.The hyperparameters of the pipeline are optimized via a GridSearch approach, see left column.The model selected by the pipeline is tested on a hold-out data set.

FIG. 5 .
FIG. 5. Invariants symmetrized with respect to exchange of atomic sites.Invariants are shown one per panel, with order increasing from top left to bottom right, as functions of θ+ and θ−, with both variables spanning the range [0, π].The center of the plot is located at the coordinates (0, 0).In this parameter space the invariants show distinct features, which allows to categorize them into a handful of classes.

FIG. 6 .
FIG. 6. Invariants antisymmetrized with respect to exchange of atomic sites.Invariants are shown one per panel, with order increasing from top left to bottom right, as functions of θ+ and θ−, with both variables spanning the range [0, π].The center of the plot is located at the coordinates (0, 0).For the discussion of morphologies, see text.
FIG. 12. Fitting results for the anomalous Hall effect.The ξ-even part of the anomalous Hall conductivity can be well described by a linear model of a few variables.The figure shows correlation plots between true value ytrue and fitted target yfit (a) as well as residual yres = ytrue − yfit (b), where the red line indicates perfect linear correlation.The distribution of residuals is given in panel (c).Panel (d) presents the coefficients of the model sorted by feature selection score, indicating the linear correlation of one feature and the target estimated by the f-regression test.The model score, shown in the top left corner of panel (a), indicates a strong performance of the model, as the top score that can be achieved is 1.Investigating the coefficients shown in panel (d), it is evident that effective condensation of the input feature space can be achieved by combining PCA, statistical feature selection and model regularization.

FIG. 13 .
FIG. 13.Fermi energy dependence.(a) Model coefficients as a function of the Fermi energy.One model is fitted at each of the sampled Fermi energies E f .The model coefficients vary depending on the energy E f at which the model is fitted.Notably, at E f ≈ 0.7 eV, all the coefficients change sign, as does the target.(b) Predictions of the ξ-even part of the anomalous Hall conductivity (AHC) as a function of the Fermi energy.Here, the prediction quality of the model can be inspected for four randomly chosen spin samples (blue, orange, petrol, red), where the true values are shown in dashed lines with square symbols, while the the predictions are shown with solid lines and circles.The data presented here are examples, for which the residuals are below 0.1.
(b), the model prediction (circular markers, solid line) resembles the target values (square markers, dashed line) closely across different values of the energy.At the Fermi energy of about 0.7 eV, the target and prediction for all samples change sign.The samples we are referring to here are randomly selected from the samples with residuals below 10% at all values of the Fermi energy, which constitutes about 82% of the training set.The examples presented in Fig. 13(b) are a strong indicator that the model closely represents the true target value on a significant amount of previously unseen test data.

FIG. 14 .
FIG. 14. Residuals of the fitted model as a function of electronic filling.Deviations of predicted value from true value of the target, σxy,true − σ xy,pred show distinguished pockets below and above zero.Plotting the deviations as a function of electronic filling of the model, which in turn is altered by the magnetic configuration, reveals strong correlation between residuals and filling.At filling value of ∼ 0.62, the model predictions deviate strongly, and the overall linear correlation between filling and absolute value of the ferromagnetic moment breaks.