Circumventing data imbalance in magnetic ground state data for magnetic moment predictions

Magnetic materials play a crucial role in the transition to more sustainable forms of energy and electric vehicles. There is an anticipated shortage in magnetic materials in the future, and as a result there is an urgent need to discover and design new magnetic materials. Computational magnetic material design using density functional theory is daunting because of the challenge in identifying magnetic ground states from a combinatorially large set of possibilities. Machine learning offers a path forward by enabling efficient surrogate models that can more readily enumerate these states, but there is a dearth of training data available, and what is available tends to be imbalanced with too much non-magnetic data. In this work we show that the discrete and previously tackled data imbalance that exists at the level of the magnetic ordering leads to an imbalanced continuous distribution with many zeros when the data is unraveled at the atomic magnetic moment level, which subsequently leads to models with low accuracy for magnetic properties. We mitigate this by using a two-part model framework. Our scheme is able to classify atoms into magnetic and non-magnetic with an F1 score and Matthew’s correlation coefficient (MCC) of ~91% and then to provide an implicit embedding representation that maps directly onto the magnitude of the magnetic moment with a mean absolute error of 0.1 μB . Beyond screening for new magnetic materials, we demonstrate an additional practical use case of our scheme: the provision of good initial guesses for magnetic moments in first-principles electronic relaxations. Such initialization is shown to lead to faster convergence to configurations that lie closer to the ground state.


Introduction
Magnetic materials have a wealth of applications ranging from data storage [1], quantum computing [2] to high power density applications like vehicle traction [3], which has motivated the search for efficient and complete magnetic structure characterization.This search has been carried out both at the experimental and computational level.Experimentally this involves methods such as neutron or x-ray scattering that can provide atomic-scale resolution.The drawback with current experimental approaches is that such methods require large-scale neutron sources, with limited capacity and beamtime availability, making a pure experimental exploration of magnetic materials infeasible.On the computational front, density functional theory (DFT) simulations have allowed for high-throughput studies [4] that can unearth the most stable electronic and magnetic ground state configurations.However, such computational methods suffer from the computational complexity of DFT and from the combinatorial complexity of the magnetic ordering landscape and are often constrained to collinear magnetic orderings.The interaction between these two layers of complexity has limited the large scale computational screening of materials for magnetic properties.
The projected dearth of rare-earth magnetic materials [5,6], stemming from the widespread use of electric vehicles and offshore wind farms, has contributed to a recent surge in research aimed at fully characterizing the magnetic state of materials [7][8][9][10][11][12].Complete magnetic characterization [13] is a gargantuan task.Consequently these studies typically address magnetic structure characterization in pieces, although they steadily contribute to a complete magnetic structure characterization when viewed in conjunction.
Concurrently, the rapid advancement in graph neural networks (GNNs) [14][15][16][17], as a way to establish structure-property relationships, has made the task of magnetic structure characterization more feasible.These networks typically have a featurization block, which performs convolution on graphs to create an atomic representation, followed by an output block, which maps that atomic representation, also called an embedding, onto a property like the energy, force, or magnetic moment.While GNNs can automate the process of feature selection, they are still subject to the same pitfalls as any other regression technique, with data imbalance and lack of interpretability [18] still posing challenges.Severe imbalance in magnetic data has been documented [9] at the collinear magnetic ordering level with most materials collapsing in the ferromagnetic (FM) and ferrimagnetic (FiM) orderings and a minority in the anti-ferromagnetic (AFM) and non-magnetic (NM) classes.Irrespective of whether the data is categorical or continuous [19], strategies for dealing with data imbalance ultimately manifest as a re-weighting of the samples.
In this work, we re-cast the problem of magnetic structure characterization from a classification on the magnetic ordering at the molecular level [20] to a direct regression on the atomic level magnetization.We show that data imbalance still exists at the level of the continuous magnetic moment label, with the majority of the atoms collapsing to a local magnetization ~0 µ B .We show that blindly regressing on this target label leads to predictions biased to the majority label.As a scheme to circumvent this problem, we apply a two-part model [21] that is known for handling continuous distributions with many zeros [22], to produce a binary mask together with a set of atomic level embeddings.The binary mask allows us to partition the atoms into those that have appreciable magnetic states and those that are non-magnetic.Additionally, it provides a representation for the atom in the context of the local atomic environment, allowing us to easily triage the dataset in a way that circumvents an imbalanced regression.With this scheme, we show that GNNs are successful at mapping atom level embeddings onto the magnitude of magnetic moments: our binary classifier exhibits an F1 score and Matthew's correlation coefficient (MCC) score of ~91% and our subsequent regressor exhibits an mean absolute error (MAE) of 0.1 µ B on test set data.

Dataset
We queried the Materials Project [4] for structures whose relaxed ground state is in an AFM ordering.Part of the rationale behind such data curation is to balance the number of samples with positive and negative magnetic moments.The other reason is to investigate whether a GNN's message passing interaction block has enough resolution and discrimination to factor in the sign of the magnetic moment, potentially unlocking the ability to predict ordering.As shown in figure 1, this query comes with ~90% of the data collapsing to a local magnetization in the vicinity of zero, representing a standard skewed continuous distribution with many zeros.

Models & methodology
To perform the classification part of our scheme, we employ GemNet-OC [23] as the base model.We defined two output targets, which are interpreted as probabilities within a binary cross-entropy loss function, shown in equation (1).The output targets are illustrated as red (non-magnetic) and blue (magnetic) logit nodes in figure 2. The loss is weighted as per the weighting strategy [24] implemented in SCIKIT-LEARN [25] to circumvent class imbalance [26][27][28][29][30][31].The structures are represented as graph data objects and the model is made to return the logits together with the corresponding 256-dimensional node embeddings from the interaction blocks.The choice of the magnetic moment cutoff (0.1 µ B ), as a way to generate a binary mask label, is a trade-off between eliminating the point-mass that is very close to zero and keeping enough samples.As part of the supplementary information (SI), we show how the choice of the cutoff affects the regression in the case of a more varied dataset that contains FM and FiM structures in addition to AFM structures (see figure S3).This expanded dataset has also been made accessible as part of the public repository.
where: L : binary cross-entropy loss p : probability of an observation belonging to class y = 1 as derived from the logit values and a softmax [32] layer y : target indicator variable ∈ {0, 1}.
In the regressor part of our two-part model scheme, the node embeddings from the classification task are directly mapped onto the magnetic moment property via a set of dense layers, filtering out the samples that have minimal local magnetization to prevent bias.The dense layers are made up of feed-forward blocks, themselves comprised of a linear layer, followed by a BatchNorm1d [33] layer and a LeakyReLU activation function.Dropout [34] layers and residual connections are employed.We refer the reader to the data and code availability statement for a complete model and parameter specification.As part of the interaction blocks, the nearest neighbors around each center atom send messages along the edge embeddings via learnable network parameters.After each convolution layer, the messages update the previous center atom's embedding state.The messages also interact via the angle, α ijk , when they are sent in the context of triplets (i,j,k) and of dihedral angles, γ ijkl , when they are sent in the context of quadruplets (i,j,k,l).Edge distances, triplet angles and dihedral angles are expanded using an orthornomal basis set before undergoing message passing.(c) The output blocks are dense layers that project the updated atom embeddings onto arbitrary targets, in this case a logit that is proportional to the probability of an atom being magnetic (blue) or not (red).(d) The binary logits then go through softmax layers where they are converted to probabilities before they can be processed in the binary cross-entropy loss criterion.(e) We exploit the atom embeddings from the classification part in order to filter out the over-represented non-magnetic samples.(f) The magnetic atom embeddings are directly regressed onto the absolute value of the magnetic moments.

Results & discussion
Our classifier is able to partition the test set data into magnetic and non-magnetic atoms with an F1 score and MCC score of ~91%.While no one metric can summarize the effectiveness of a model, we have chosen the F1 score and MCC score since they have beneficial properties in the face of data imbalance [35] in the context of binary classification [36].An important product of this classification is the automatic featurization from the interaction blocks of the GNN, which allows us to extract atom representations in their local environments as learned from the classification task and to subsequently smooth out the dataset that will undergo regression.
Naive application of a GNN model to the dataset in figure 1(a) leads to a model that is biased toward the mode, as shown in figure 3(a), which underscores the need for a two-part model.One might be tempted to extend the binary classification to a ternary one that captures the sign of the magnetic moment, since that would enable the prediction of the collinear ordering on the structure.However, as shown in figure 4, the interaction blocks are victim of the local symmetry in the crystal, precluding us from discriminating between spin-up and spin-down states as part of our atom embeddings.
Another example of the inability of the interaction blocks in partitioning between the spin-up and spin-down states is the poor performance of the regression model when one does not take the absolute value of the magnetic moments.One possible explanation for this is that we are essentially mapping similar embeddings to different target values, which introduces errors in trying to compromise on the fit.This issue   5), magnetic and positive (magenta) and magnetic and negative (green).We see that the automatic featurization from the interaction blocks cannot discriminate between the atoms with positive magnetic moments from those with negative magnetic moments as those projections lie close to each other in 2D space. is rectified by performing the regression on the absolute value of the magnetic moments, leading to 0.1 µ B MAE on the test set data as illustrated in figure 3(b)).
At this point, one could ask how our scheme would perform if applied to ground state structures that collapse to orderings other than AFM, since the latter are predominantly metal oxides.In order to answer this question, we have applied our classifier, which was solely trained on AFM structures, to embed the atoms from structures that collapse in all orderings other than NM, as queried from the Materials Project.It is not surprising that our MCC score drops to ~70% when trying to partition atoms into magnetic and 2D projection of the high-dimensional atom embeddings of the test set data using UMAP [37].Each dot here represents an atom in the global context of a molecule, the global extent depending on how many convolutional layers are employed.The atoms are partitioned based on a binary mask with red representing atoms exhibiting minimal local magnetization, defined as <0.1 µB and blue otherwise.Visually, we see that the classifier is able to separate the atoms with magnetic character from those with negligible magnetic state, which translates in an F1 score of ~91% on the test set data.It is convenient that the classifier minimizes the overlap between the atoms that have magnetic states and those that are non-magnetic but also yields embeddings that show feature variation within their classes, allowing us to directly perform the regression onto the magnetic moment property.
non-magnetic: a lot of the atoms now belong to structures that would be considered out-of-domain (OOD).Most FM structures belong to the metal class of materials while FiM structures expose us to a spectrum of spinel types.What is encouraging is that despite the drop in the classification accuracy metric, the atom representations are still conducive to performing the regression, with parity plots and MAEs reminiscent of the AFM only metrics (see SI for more details), underscoring the generality of the scheme.

Case study
One application of the trained classifier and regressor could be to screen for new magnetic materials.Another application lies in the provision of good initial guesses for the magnetic moments to first-principles quantum mechanical codes like the Vienna Ab-initio simulation package (VASP) [38,39].Given the local nature of such optimizations, initial magnetic moments are known to have significant effects on the evolution of the electronic minimization [40] and ultimately energetic results that make their way into applications [41].
The study of the application of materials starts with the bulk, where one has to solve for the degrees of freedom that will lead to the most stable configuration, since it is the most likely to be expressed.These degrees of freedom are geometric and electronic in nature.Computationally, stress is applied to the material in a series of deformations, which yield a relationship between the energy and the volume and from which the equilibrium volume can be extracted through an equation of state (EOS) [42] fit.The EOS fit will change depending on how one initializes the electronic configuration in the quantum mechanical codes.As a result, it is customary for a computational chemist to run all three of the most common magnetic configurations (AFM, NM, FM) and pick the most stable ordering.While one can enumerate all three orderings, the magnitude of the magnetic moment to apply to each atom is still a parameter that one cannot screen greedily and as a result, is often guided by chemical intuition.While such an optimization involves a lot of steps, it is now automated as part of a workflow which we have previously released [43].Inside that workflow, which we coin as WhereWulff, the logic for applying initial magnetic moments within VASP is to decorate the bulk structure with the most probable oxidation states following the crystal field theory [44], which describes the splitting of degenerate orbitals in cations surrounded by anion charges.This scheme is implemented using Pymatgen's [45] get_crystal_field_spin method, which requires the coordination environment as well as whether the high spin or low spin state is desired.We see that the non-magnetic atoms (•) are generally different from the magnetic ones (+), except in the case of the Pt atom, which lies close to the Ru and Mo atoms, all of which are predicted to possess magnetic behavior.Scatter points that are plotted with • markers are those atom embeddings that the classifier projects, through its output block, onto a non-magnetic label while scatter points that are plotted with + markers are projected onto the magnetic label.For a detailed reporting of the projection of the atom embeddings onto the magnitude of the magnetic moment, the reader is encouraged to look at table S1 in the SI.
One can imagine that it is not always possible to assign an oxidation state to all atoms in a structure, in which case the method we rely on in WhereWulff will fail and the need for a surrogate model is felt.A concrete example of a bulk structure for which this is a problem is Mo 2 Ru 13 PtO 32 , with Pt being the culprit.
To circumvent this issue, we employ our two-part model in order to provide good initial guesses to VASP: the binary mask from our classifier, the results of which are illustrated in figure 6, indicates non-negligible local magnetization on all atoms except for Pt and O.The regressor suggests that the molybdenum atoms are going to collapse to a local magnetic moment of 1.30 and 1.20 µ B while the ruthenium atoms are predicted to collapse to local magnetic moments ranging from a low of 0.63 to a high of 1.08 µ B .To demonstrate the value of initializing the VASP relaxations with the results from our two-part model, we set up three single point calculations for the geometrically relaxed bulk structure, Mo 2 Ru 13 PtO 32 .The setups only differ in their MAGMOM keyword parameter values, which is the mandatory vector parameter that VASP requests as an initial guess to its electronic minimization routine when spin polarization is turned on: (1) the two-part model case initializes the magnetic moments with the vector of magnetic moments predicted by the model, (2) the FM case initializes the magnetic moments with large magnitudes (5 µ B ) since this heuristic strategy of initializing the system with moments that have significant buffer has been shown to allow systems to collapse in their desired ordering when one does not have prior intuition about the magnetic properties of the system [46], and finally (3) the NM case initializes every single atom with 0.6 µ B .The results from the electronic minimization show that not only can the two-part model case reach the electronic ground state 30% faster but it leads to an energy that is ~0.6 eV lower than in the other two cases with the FM case being lower in energy than the NM case.

Conclusions & future work
In this work, we have shown that re-casting the problem of imbalanced magnetic order classification at the structure level to the prediction of the local magnetization at the atom level is itself plagued by imbalanced regression.Even though we end up with more samples than in the magnetic order classification case, the continuous distribution is severely skewed toward 0 µ B .We present a two-part model scheme, in the context of GNNs, to circumvent this issue and smooth out the dataset for the purpose of the regression, thereby bypassing bias.We show that while a GNN classifier is successful at featurizing the atoms into embeddings that can be directly mapped onto the magnitude of the magnetic moment, the embeddings are not able to capture the collinear spin state as part of their features, most likely a corollary of the symmetric nature of the local convolution operations that constitute the interaction blocks.This tells us that while these implicit embeddings are capable of capturing features like coordination and oxidation state (crucial to predicting the magnitude of the magnetic moment), they would need an additional explicit feature that represents whether the atom is in the spin up or spin down state.This explicit feature would also allow us to solve for the energy as a function of the magnetic ordering.
To present a specific use case for our two-part model, beyond screening for new magnetic materials, we show how it can be helpful in providing initial magmom guesses to VASP, cutting down on the time to reach a configuration that is lower in energy and closer to the true ground state.We also demonstrate the general nature of our scheme by applying our classifier that had only been trained on AFM structures to FiM and FM structures, showing that the embeddings can still be used to perform the regression without bias on this expanded dataset with new classes of materials.
There are two major limitations that were brought into focus by the end of this study, both of which could be solved by tweaking the GNN architecture: (1) the need to include the collinear ordering as part of the initial atom embeddings and (2) the need to couple the magmom predictions with the energy predictions via a joint loss, since those properties are inherently linked.
Looking ahead, we intend to implement such a GNN and also work on coupling that model with our first-principles WhereWulff bulk workflow to create a closed-loop pipeline.As part of this self-discovery pipeline, we would screen materials and trigger WhereWulff only when the two-part model predicts an atom embedding that is OOD, at which point we would finetune the model on the new data point.

Figure 1 .
Figure 1.(a) The distribution of atom level magnetization for the dataset we queried from the materials project.The mode lies heavily at 0 µB.The symmetry of the dataset comes from the fact that we queried for structures that had relaxed in an AFM ordering.(b) The distribution of atom level magnetization after we have filtered out atoms whose local magnetization is less than 0.1 µB in magnitude.Even though we still see that the data exists in clusters, we have removed the bias at 0 µB.The envelopes are the result of Gaussian Kernel density estimation (KDE), which shows the extent of smoothing between the dataset in (a) and that in (b).

Figure 2 .
Figure 2. (a)Structures are encoded as graphs with the atoms as nodes and the connections between the atoms as edges.Each atom starts off with initial embeddings as defined by its atomic number.(b) As part of the interaction blocks, the nearest neighbors around each center atom send messages along the edge embeddings via learnable network parameters.After each convolution layer, the messages update the previous center atom's embedding state.The messages also interact via the angle, α ijk , when they are sent in the context of triplets (i,j,k) and of dihedral angles, γ ijkl , when they are sent in the context of quadruplets (i,j,k,l).Edge distances, triplet angles and dihedral angles are expanded using an orthornomal basis set before undergoing message passing.(c) The output blocks are dense layers that project the updated atom embeddings onto arbitrary targets, in this case a logit that is proportional to the probability of an atom being magnetic (blue) or not (red).(d) The binary logits then go through softmax layers where they are converted to probabilities before they can be processed in the binary cross-entropy loss criterion.(e) We exploit the atom embeddings from the classification part in order to filter out the over-represented non-magnetic samples.(f) The magnetic atom embeddings are directly regressed onto the absolute value of the magnetic moments.

Figure 3 .
Figure 3. (a) The parity plot on samples that are different from the mode was obtained without resorting to the classifier to filter out the large number of samples that collapse to a non-magnetic state: the model predicts the mode, ~0 µB and the mean absolute error (MAE) in this case is misleading if one does not exclude samples close to the mode.(b) Parity plot obtained via our two-part model scheme, which rectifies the bias, producing an MAE of 0.1 µB in the range from ~1 µB to ~7 µB.

Figure 4 .
Figure 4. 2D projection of the 256-dimensional embeddings of the test set data from a ternary classification with three classes: non-magnetic (not shown in this figure but depicted as red dots in figure5), magnetic and positive (magenta) and magnetic and negative (green).We see that the automatic featurization from the interaction blocks cannot discriminate between the atoms with positive magnetic moments from those with negative magnetic moments as those projections lie close to each other in 2D space.

Figure 5 .
Figure 5. 2D projection of the high-dimensional atom embeddings of the test set data using UMAP[37].Each dot here represents an atom in the global context of a molecule, the global extent depending on how many convolutional layers are employed.The atoms are partitioned based on a binary mask with red representing atoms exhibiting minimal local magnetization, defined as <0.1 µB and blue otherwise.Visually, we see that the classifier is able to separate the atoms with magnetic character from those with negligible magnetic state, which translates in an F1 score of ~91% on the test set data.It is convenient that the classifier minimizes the overlap between the atoms that have magnetic states and those that are non-magnetic but also yields embeddings that show feature variation within their classes, allowing us to directly perform the regression onto the magnetic moment property.

Figure 6 .
Figure 6.2D projection of the atom embeddings obtained from passing the bulk material, Mo2Ru13PtO32, through the classifier.We see that the non-magnetic atoms (•) are generally different from the magnetic ones (+), except in the case of the Pt atom, which lies close to the Ru and Mo atoms, all of which are predicted to possess magnetic behavior.Scatter points that are plotted with • markers are those atom embeddings that the classifier projects, through its output block, onto a non-magnetic label while scatter points that are plotted with + markers are projected onto the magnetic label.For a detailed reporting of the projection of the atom embeddings onto the magnitude of the magnetic moment, the reader is encouraged to look at table S1 in the SI.