Generalization of Graph-Based Active Learning Relaxation Strategies Across Materials

Although density functional theory (DFT) has aided in accelerating the discovery of new materials, such calculations are computationally expensive, especially for high-throughput efforts. This has prompted an explosion in exploration of machine learning assisted techniques to improve the computational efficiency of DFT. In this study, we present a comprehensive investigation of the broader application of Finetuna, an active learning framework to accelerate structural relaxation in DFT with prior information from Open Catalyst Project pretrained graph neural networks. We explore the challenges associated with out-of-domain systems: alcohol ($C_{>2}$) on metal surfaces as larger adsorbates, metal-oxides with spin polarization, and three-dimensional (3D) structures like zeolites and metal-organic-frameworks. By pre-training machine learning models on large datasets and fine-tuning the model along the simulation, we demonstrate the framework's ability to conduct relaxations with fewer DFT calculations. Depending on the similarity of the test systems to the training systems, a more conservative querying strategy is applied. Our best-performing Finetuna strategy reduces the number of DFT single-point calculations by 80% for alcohols and 3D structures, and 42% for oxide systems.


Introduction
The urgent need to address climate and societal challenges has prompted the search for innovative solutions, and new material design and discovery have emerged as critical pathways.For instance, new catalysts are being explored for various crucial chemical reactions such as nitrogen reduction to replace the energy-intensive Haber-Bosch process [1,2], and alcohol dehydrogenation for energy storage applications [3].Metal-oxides are being investigated as catalysts for oxygen reduction and evolution reactions in fuel cells [4,5].With the continued advancements in simulation tools and increased computing power, we can harness the power of computational chemistry to further accelerate the process of materials discovery.By utilizing computational techniques, researchers can rapidly screen a vast space of chemical compositions, structures, and properties that would otherwise be prohibitively expensive and time-consuming to explore experimentally [6,7].Simulations can also provide valuable insights into the underlying reaction mechanisms, allowing for the optimization of material performance.
However, the computational cost of first-principles simulations becomes infeasible for large chemical spaces.In these cases, one prominent approach in accelerating material discovery is through the use of machine learning (ML) methods, which leverage large databases and advanced algorithms to predict the properties and performance of materials.A properly-trained ML model is vastly lower in computational cost than computational tools such as density functional theory (DFT) and can have comparable accuracy, provided it is being applied on sufficiently similar systems to the training data [8].The development of large material datasets, such as the Open Catalyst 2020 (OC20) Dataset and Open Catalyst 2022 (OC22) Dataset for metal and metal-oxide materials [9,10], the CoRE [11], CSD MOF [12], QMOF [13], MOFX-DB [14] for metal-organicframework (MOF)s, and Database of Zeolite Structures [15] and Zeo-1 [16] for zeolites, has enabled data-driven materials modeling research on these materials classes.Remarkable progress has been made in the development of ML models for accurate calculation of interatomic potentials and prediction of materials properties.Graph neural network (GNN) models are specifically designed to handle the inherent graph-like structure of molecules and crystals, where atoms and bonds are represented as nodes and edges in a graph.This allows GNN models to capture the complex interactions between atoms.GNN models such as GemNet [17], GemNet-OC [18], SCN [19], eSCN [20], and M3GNeT [21], can learn vast amounts of chemical information from training data, facilitating the efficient screening of materials and identifying promising candidates for specific applications.For example, the present leading ML model on the Open Catalyst Project (OCP) Leaderboard can achieve a total energy mean absolute error (MAE) as low as 0.22 eV on OC20-like catalyst structures.
Training ML models on large datasets can be time-consuming and computationally demanding, especially in the field of molecular modeling, where diverse chemical structures and properties are involved.One approach to mitigate this challenge is to utilize transfer learning.Transfer learning allows a model to leverage the knowledge and representations learned during pretraining to initialize its parameters for a new dataset, significantly reducing the need for extensive data and computational resources in training [22].In our prior work, we proposed the Fine-Tuning Accelerated molecular simulations framework (Finetuna) [23] as a promising implementation of online transfer learning with pre-trained ML models.Finetuna utilizes an active querying strategy to determine when to perform a DFT calculation and fine-tune the ML model using the DFT results.More specifically, we benchmarked Finetuna performance on a set of catalyst systems from the OC20 validation set with a GemNet model trained on the OC20 training data.Finetuna demonstrated that local optimizations of OC20-like systems can be greatly accelerated without sacrificing accuracy, as evidenced by a reduction of 90% DFT calculations compared to a baseline approach.We note that the Finetuna framework is one of many online active learning approaches for accelerating atomistic simulations.Other on-the-fly active learning frameworks typically start from scratch, using Gaussian process models or simpler neural network potentials [24][25][26][27][28][29].
In this work, we seek to extend Finetuna to different out-of-domain chemical systems and gain insights into the capabilities and limitations of the workflow.We conduct a case study of the Finetuna workflow on three groups of materials, namely (C >2 ) alcohols, metal-oxides, and three-dimensional (3D) structures, for different applications.While the OC20 dataset is limited to the exploration of small (C ≤2 ) adsorbates, we explore the adsorption of C >2 alcohol-to-ketone intermediates in this work.We also explore metal-oxide catalysts and the structures of zeolites and MOFs which significantly deviate from the catalyst surfaces in the original OC20 dataset.Finally we examine the effect of spin polarization and systems with significantly more atoms than those in the training dataset, both factors that significantly decrease computational speed of DFT.We introduce new querying strategies for these systems with our best-performing Finetuna strategy capable of reducing DFT calls by 80% for both (C >2 ) alcohol and 3D structure systems and 42% for oxide systems.

Finetuna workflow
The active learning workflow accelerates geometric optimizations using a pretrained GNN for fast force estimations.As shown in Figure 1, the atomic structure is evaluated by the machine learning potential (MLP), and if a querying criterion is met, such as force convergence is reached or a step threshold is exceeded, a DFT single-point calculation will be triggered (also referred to as a "parent call").The DFT-calculated forces are used to fine-tune the MLP.The optimizer uses the atomic forces to update the structure, and the MLP is used for force prediction.Similar to other relaxation processes, the convergence criterion is based on the maximum force in the system.
In the Finetuna workflow, many components, e.g., the ML model, the DFT functional and software, the optimizer, and the querying strategy, can be modified to suit specific systems.The choice of the MLP includes all the available pretrained models, such as the GemNet model trained on OC20 dataset, the GemNet model trained on OC22 dataset, the GemNet-OC model trained on both OC20 and OC22 datasets, etc.We refer to these models as GN-OC20, GN-OC22, GNOC-OC20+22, respectively.The selection of the DFT functional depends mostly on the chemical system of interest; we discuss this further in Section 2.2.We limit the DFT code to the Vienna Ab initio Simulation Package (VASP) to be consistent with the OC20 and OC22 dataset [30][31][32][33][34], and the optimizer to Broyden-Fletcher-Goldfarb-Shanno (BFGS) in the Atomic Simulation Environment Python package (ASE) in order to eliminate variations arising from differences between optimizers [35].Interfaces to other DFT codes can be found in the GitHub repository [36], and a detailed comparison of different optimizers can be found in the prior work [23].
The querying strategy determines when a DFT calculation is performed.The four main strategies are demonstrated in Figure 1c."Initial point" queries at the initial structure."K-steps" determines the lower bound of querying frequency, similar to [23].The 'MLP converged' strategy queries when the maximum force from ML prediction is less than the convergence criterion.We also introduce a new strategy, labeled as "high MLP error" in Figure 1, which queries the next point whenever the previous training error fails to drop below a set threshold.The training error is defined as the L2 norm of the difference between the DFT forces and the retrained MLP prediction.This approach is needed for higherror systems because training on a single frame may not improve the model's prediction enough to allow the framework to correctly converge.
The baseline Finetuna framework uses the GN-OC20 model as the MLP, and queries the parent calculator at the initial point, every 30 ML steps, or when the ML-predicted maximum force is below the convergence criterion (in this work, 0.05 eV/Å).The baseline strategy is applied to all systems discussed herein.However, the GN-OC20 model exhibits different levels of accuracy on the different types of systems tested in this work.We expect these differences to be caused by how well-represented those systems are in the OC20 training dataset.This, in turn, should affect how conservative the user ought to be in setting the Finetuna algorithm querying strategy.Systems that lie further outside the domain of the training data should require more training, and predictions by the model should be less trustworthy.Therefore, we also test different pre-trained MLPs for the oxide systems, and apply the additional query by high MLP error to the 3D structure systems.The detailed strategies will be further explained in Section 2.2.

Alcohol dehydrogenation
Alcohol dehydrogenation has been widely explored as a means to catalytically produce and store hydrogen gas (H 2 ) in fuel cells.Modeling the dehydrogenation of relatively large (C >2 ) alcohol molecules to ketones is a complex process as it involves varying possible surface intermediates at every step of the reaction network.Many useful industrial applications in catalysis involve reactions that either lead to the formation or breakdown of alcohols, such as the dehydrogenation of alcohols, the formation of fuel from CO 2 reduction reactions, or the catalytic formation of plastics.Here, we assess the performance of Finetuna in modeling a two-step alcohol dehydrogenation reaction.We investigated three out-ofdomain adsorbates of different sizes: ).The surface intermediates investigated are similar to the original OC20 training dataset, but these systems are also considered out-of-domain because they are much larger than anything in the pretraining dataset.Because of this similarity, we expect training to be relatively easy and for the Finetuna algorithm to converge relatively quickly compared to more out-of-domain systems.
For large organic molecules (C >2 ), the contributions to the binding energy come from a mixture of physisorption and chemisorption.This is due to the large dipole moments inherent to these molecules which results in significant van der Waals contributions to binding energy.Modeling these molecules using DFT typically requires the addition of dispersion corrections.Here we use the Bayesian error estimation functional with van der Waals correlation (BEEF-vdW) functional to account for these dispersion interactions [37].The OC20 dataset was calculated using the revised Perdew-Burke-Ernzerhof (RPBE) functional with carbon-based adsorbates limited to C 1 and C 2 molecules without accounting for any dispersion interactions.To assess the influence of these dispersion interactions, we perform single-point DFT calculations with RPBE and BEEF-vdW functionals on the initial structures of alcohol systems.We find that the force differences between the two DFT functionals are very small compared to the difference between DFT forces and MLP forces.Details can be found in Figure ??.

Metal-oxides
Metal-oxides have been extensively studied as candidate catalysts for crucial electrochemical reactions, e.g. the oxygen evolution reaction (OER), oxygen reduction reaction (ORR), and hydrogen evolution reaction (HER).However, oxide systems can be challenging to study using DFT calculations due to their complex electronic structure.The release of the OC22 dataset, which focused on metal-oxides, has provided a valuable resource to advance the understanding of metal-oxide systems.Nevertheless, it is important to note that the OC22 dataset represents only a small subset of the extensive design space of oxides, underscoring the need for efficient exploration of these systems.Metal-oxide systems tested in this work are taken from the OC22 validation set.These are also adsorbate/slab configurations similar in form to training data in the OC20 dataset.These systems often exhibit spin ordering, so spin-polarized calculations are generally required.In addition, a Hubbard U correction is applied to certain elements in these systems to improve the description of localized electron states [38].We suspect this can result in a potential energy surface with multiple nearby local minima, and thus give different outcomes when performing geometric optimizations [39].For the selected testing systems, we used the same DFT settings from the OC22 dataset [10].
Evaluating pre-trained graph models like GemNet on oxide systems results in higher errors than non-oxide materials, due to the complexity of oxides.Training on both oxide and non-oxide materials improves the model accuracy, but energy and force prediction on oxide materials is still a more difficult task.Graph model featurization schemes may fail to capture some complex properties that are more common in, or specific to, oxide materials, such as magnetic effects and long-range charge effects [40].Unlike metallic systems, semiconductors can have essentially identical structures, but relax to significantly different final geometries due to long-range effects of the number of shared electrons across the entire system, or exhibiting different magnetic configurations of the same structure.It has been shown that the GemNet-OC model trained only on OC20 dataset performs especially poorly on OC22 prediction tasks, e.g. the force MAE is 0.384 eV/Å [10].Even when given many training examples like those in the OC22 dataset, the accuracy of GNNs like GemNet on oxide systems does not approach the accuracy on non-oxide systems.As shown in the OCP leaderboard, the best GemNet model for out-of-distribution force predictions results in a force MAE of 0.031 eV/Å for OC22, versus 0.023 eV/Å force MAE for OC20 at the time of writing [9,10].In addition to the baseline Finetuna strategy that uses GN-OC20, we also test the performance of GN-OC22 and GNOC-OC20+22 as the underlying MLP, due to the anticipated difficulty of this task.The interaction between magnetic spins in systems with spin polarization is one of the contributing factors to the complexity of oxide materials.Spin polarization effects are typically long-range.However, GNN models like GemNet are developed under the assumption that local interactions dominate [10].

3D structures
Zeolites and MOFs are two common materials with porous structures.They have gained significant attention in recent decades due to their versatile properties and wide-ranging applications in gas separation, water treatment, and catalysis [41][42][43].One advantage of these 3D structures as catalysts is their tunability.By modifying the composition, structure, and pore size, their selectivity and performance can be calibrated for specific applications [44,45].
The test 3D structures cover a wide range of systems, including copperdoped zeolites, MOF, and zeolites with a variety of adsorbates such as gold nanoparticles and aromatics.These catalysts are modeled very differently from OC20-like metal surfaces.Rather than adsorbing onto the surface of a dimensionally confined slab structure, which is the core focus of the OC20 dataset, these structures are fully three-dimensional, and adsorbates, if present, are incorporated into pores in the 3D structure.Structural relaxations then take place throughout the simulation cell (rather than only in a few atomic layers near the surface/adsorbate).Examples of such complex structures are not found within the OC20 training dataset.While there may be numerous examples of certain environments containing similar metal-oxygen configurations, there should be many differences due to the addition of three-dimensional surroundings for most atomic neighborhoods.The metal shells of these 3D structures are also effectively made up of metal-oxide configurations, adding to the difficulty.We anticipate this kind of structure to be very difficult to adapt to for a GNN model like GemNet, trained only on OC20-like structures.
To compensate for this, we test another, more conservative, querying strategy for Finetuna, in addition to the baseline approach.We measure the mean error of the retrained forces after each parent call during the Finetuna loop; by taking the L2 norm of the difference between the parent forces and the retrained model prediction.If the retrained mean force error (after fine-tuning) is still above some threshold, this signals that training was insufficient, and the parent DFT call is triggered on the next step of the relaxation as well.This process repeats at every parent call until the model error drops below a threshold (in this case 0.05 eV/Å).This approach aims to ensure that the model is sufficiently trained on the new system to make force predictions that will take the relaxation in a reasonable direction to reduce the forces and energy.

Results & Discussion
We selected 27 alcohol-to-ketone systems, 10 oxide systems, and 81 3D structure systems and performed both VASP BFGS and Finetuna relaxations.The systems that experienced convergence issues with VASP BFGS and the systems that failed due to memory limitations are excluded from this section.A full list of the tested systems and results can be found in the Supplementary Information.The size of the chemical systems in the experiment varied from 50 to 250 atoms.We report the energy difference between the VASP BFGS and Finetuna relaxations in eV/atom to facilitate comparison.
A summary of the overall performance of Finetuna compared to VASP BFGS can be found in Table 1, and a plot of the individual runs is shown in Figure 2.For each class of systems, we evaluate the efficiency of the Finetuna workflow by calculating the ratio between the total number of DFT calls with Finetuna relaxation and that with VASP BFGS.This metric is referred to as the percentage DFT calls and represents the overall saving of DFT calculations.The unconverged Finetuna runs are labeled in red crosses in Figure 2. We found that none of the metal-oxide systems converged with the baseline GN-OC20 model.This is not surprising due to the lack of oxide systems and spin polarization information in the OC20 training data.For simplicity, the results from Finetuna with GN-OC22 model are shown in Figure 2, and the comparisons with other models can be found in the supporting information Figure ??.In general, Finetuna underperforms on both accuracy and speed when compared to the results in Ref. 23 across all three system types.We believe this is the result of a significant domain shift, which should make realigning the GNN model with fine-tuning slower and less accurate.We consider magnetic effects and 3D structural effects to be the most significant causes of domain shift.By increasing the conservativeness in the Finetuna algorithm through repeated training when errors are high, the overall parent calls can be brought below 20% for alcohol-to-ketones and the 3D structures, and 58% for oxide systems.Across alcohol-to-ketones, metal-oxides, and 3D structure systems, there are a number of cases where Finetuna does not converge.As shown in Figure 3, most of the unconverged cases have observed high ML predicted forces (> 5.0 eV/Å) across the entire relaxation trajectory.This implies that for trajectories where the MLP made a large force prediction, they generally should not be trusted to converge, and should be terminated early.In addition, we would expect that (provided initial structure guesses are reasonable), the occurrence of such a large force would indicate that the system has moved to a less realistic region of the configuration space and any local minimum it finds would likely be less physically meaningful.The only exceptions to this are with a few of the alcohol-to-ketones systems, where the trajectory slowly converged (exceeding 100% of the DFT parent calls) despite the high error.Interestingly, these systems do not appear to find significantly different local minima from the original DFT relaxation, so in these cases, it would appear that Finetuna took a somewhat roundabout route to find a very similar result.
For the oxide systems, we found that the potential energy surfaces are generally more complicated, as evidenced by the discrepancies in the relaxed energy from the two DFT calculations (VASP BFGS and VASP conjugate gradient (CG) in Figure ??).The total magnetization (M ) is given by the net spin of the electrons in the system: where µ B is the Bohr magneton, and n ↑ and n ↓ represent spin densities.For our purposes, M represents the extent of spin polarization in a given system.Figure 4 shows a case where relaxation with two different models led to nearly identical sets of atomic positions but considerably different magnetizations and energies.The This suggests that the ML potential and the performance of the algorithm may be compromised when applied to spin-polarized systems, and highlights the importance of developing MLPs, such as CHGNET [46], that effectively capture magnetic information for these applications.

Conclusion
In this study, we conducted experiments to assess the performance of the Finetuna workflow on various chemical systems including alcohol-to-ketones, metal-oxides, and MOFs.Our findings revealed that, while showing promising results in the original domain (OC20-like systems) with a significant reduction of DFT calls by 90%, Finetuna was less performant on these out-of-domain systems.The results show that we were able to reduce the number of DFT calls by 85% for alcohol-to-ketones systems, 49% for oxide systems, and 82% for MOF systems, with a few unconverged cases.We believe that the complex electronic interactions in these chemical systems, particularly the oxides with spin polarization, limit Finetuna's performance.The long-range magnetic interaction introduces more complexity to the potential energy surface.While this has affected the efficiency of the Finetuna workflow, it also presents opportunities for further exploration.
We have established a heuristic criterion to predict the success of a Finetuna run.Specifically, if the maximum force predicted by the ML model exceeds 5 eV/Å, we recommend using a different initialization strategy, or terminating the simulation and performing DFT calculations instead.
As the accuracy of ML models improves and more material datasets become available, we believe that the Finetuna workflow can be applied to a broader range of chemical systems.We anticipate the rapid emergence of new ML models and look forward to their integration into the workflow.However, to effectively utilize these new models, further research is needed to explore transfer learning strategies and develop fine-tuning techniques specific to the models.The choice of a pre-trained model with suitable architectural features and trained on relevant data can significantly impact the model's performance and generalizability.Investigating out-of-domain detection methods can enhance the robustness of our approach.Additionally, the inclusion of uncertainty quantification in ML models would be valuable, as it could be leveraged as a querying strategy to guide the selection of informative data points for further exploration.These future directions hold promise for advancing the efficiency and accuracy of material discovery applications.The active learning and finetuning approach has the potential to greatly accelerate molecular simulations, leading to the development of more efficient and sustainable materials with a wide range of applications.

Supporting Information: Generalization of Graph-Based Active Learning Relaxation Strategies Across Materials
Xiaoxiao Wang1, * , Joseph Musielewicz

S1 Oxide results
The outcomes of the experimentation involving six distinct calculation methods applied to the metal oxide systems are summarized in Figure S1.The the Vienna Ab initio Simulation Package (VASP) Broyden-Fletcher-Goldfarb-Shanno (BFGS) method serves as the baseline, VASP Conjugate Gradient (CG) uses the built-in CG optimization method in VASP, VASP Gaussian Process Minimizer (GPMin) uses Gaussian processes to model the potential energy surface and is an optimization method that is built in to the Atomistic Simulation Environment Python package (ASE), and the rest are the Finetuna experiments with GN-OC20, GN-OC22, and GNOC-OC20+22 respectively.None of the Finetuna runs with GN-OC20 as the model converged, and is therefore not included in the plot.

S2 Functional comparison
The initial structure of the selected alcohol systems is calculated with revised PBE (rPBE), Bayesian error estimation functional with van der Waals correlation (BEEF-vdW) functionals, and the GN-OC20 model.For each system, the L2 norm of the differences between the two DFT forces, and the DFT and MLP obtained from these methods is plotted in Figure S2.The systems with unconverged Finetuna runs are labeled as 'x's.As shown in the figure, the differences between DFT forces of all systems are less than 0.1 eV/Å, whereas the differences between DFT with BEEF-vdW functional and the GN-OC20 model vary.Particularly, when the discrepancy between DFT and MLP is high, i.e.: above 0.4 eV/Å, the Finetuna run tends to fail.

S3 Similarity Mapping
To better understand where our test systems sit relative to the distribution of Open Catalyst Project 2020 (OC20) training data, we perform latent space analysis with PCA, a dimensionality reduction method.The atomic embeddings are extracted from the latent space of the final interaction block of the GemNet model.These descriptors provide a 256-dimensional vector on a per-atom basis, which should describe the atomic neighborhood of each atom in each frame of Figure S3: PCA maps of each of the system types.
the training set.We plot the atomic environment on a per-atom basis for these maps because a high predictive error on a single atom should be sufficient to disrupt an entire relaxation.The density of each map gives us a qualitative understanding that the alcohol-to-ketone systems should be most in-domain, while the zeolite systems appear to be furthest out of the domain.However, this type of plot does not give us a quantitative measure with which to set a threshold.

Figure 1 :
Figure 1: (a) Online active learning workflow.(b) Demonstration of different querying strategies in a sample relaxation trajectory.MLP error is based on the ML predicted forces.(c) Sample structures of the test systems and their corresponding DFT functional used in this work.

Figure 2 :
Figure 2: Finetuna performance compared to VASP BFGS on (a) alcohols, (b) oxides, and (c) zeolite systems.The red 'X' represents the unconverged Finetuna relaxations, and the red 'O' labels the systems where a high MLP maximum force is observed along the relaxation trajectory.Comprehensive information regarding each individual system can be found in the Supplementary Information (Table ??, Table ??, and Table ??).

Figure 3 :
Figure3: Performance metrics plotted against the maximum force at a parent call along each Finetuna trajectory.Each point corresponds to a separate relaxation trajectory.Along the x-axis is the maximum force error for a parent call, measured by taking the L2 norm of the difference between each force vector, and taking the mean of those over the whole system at that point.

Figure 4 :
Figure 4: Detailed analysis of the AgIrO 4 system.(a) Change of DFT energy and magnetization along the relaxation.(b) Comparison of the relaxed structure from two Finetuna relaxations.

Figure S1 :
Figure S1: DFT and Finetuna performance compared to VASP BFGS on metal oxide systems.GN-OC20 experiments are not included in the figure because none of them converged.

Table 1 :
Summary of the Finetuna performance on alcohol, oxide, and 3D structure systems.The baseline querying strategy calls a DFT calculation at the initial point, every 30 MLP steps, and when MLP predicted forces meet the convergence criterion.

Table S1 :
Summary of DFT and Finetuna performance on oxide systems.The baseline querying strategy calls a DFT calculation at the initial point, every 30 MLP steps, and when MLP predicted forces meet the convergence criterion.