Surface segregation in high-entropy alloys from alchemical machine learning

High-entropy alloys (HEAs), containing several metallic elements in near-equimolar proportions, have long been of interest for their unique mechanical properties. More recently, they have emerged as a promising platform for the development of novel heterogeneous catalysts, because of the large design space, and the synergistic effects between their components. In this work we use a machine-learning potential that can model simultaneously up to 25 transition metals to study the tendency of different elements to segregate at the surface of a HEA. We use as a starting point a potential that was previously developed using exclusively crystalline bulk phases, and show that, thanks to the physically-inspired functional form of the model, adding a much smaller number of defective configurations makes it capable of describing surface phenomena. We then present several computational studies of surface segregation, including both a simulation of a 25-element alloy, that provides a rough estimate of the relative surface propensity of the various elements, and targeted studies of CoCrFeMnNi and IrFeCoNiCu, which provide further validation of the model, and insights to guide the modeling and design of alloys for heterogeneous catalysis.


I. INTRODUCTION
Catalysts are widely used in modern industrial chemistry processes to lower the barriers and thus enhance the rates of a multitude of diverse chemical reactions.Among the many different classes of catalysts, a lot of attention has been recently devoted to high-entropy alloys (HEAs).Initially introduced by Yeh 1 and Cantor 2 for metallurgic and mechanical applications, HEAs were shown to exhibit promising catalytic [3][4][5][6][7] and especially electrocatalytic [8][9][10] behavior.
The computational study and modeling of HEAs, and in particular their catalytic properties, is a promising approach to rapidly explore the enormous composition space.However, this is a challenging endeavor: Disordered alloys typically require large unit cells to obtain a statistically representative structure, what makes the ab initio simulations of HEAs computationally expensive and time consuming.Simulations of both elemental metals and HEAs based on empirical interatomic potentials are much faster, but are usually less accurate [37][38][39] , especially when used to model multicomponent structures 40 .Furthermore, most recent examples of traditional potentials for HEAs focus on a very narrow range of compositions and are specifically optimized for a precise scientific question 41,42 .Even machine learning interatomic potentials (MLIPs), which typically address this tradeoff by approximating the outcome of electronic structure calculations [43][44][45][46] , cannot be applied straightforwardly.Many popular models based on atom-centered descriptors 47,48 suffer from an exponential scaling of memory and computational requirements with respect to the number of distinct elements.Recent developments in Graph Neural Network (GNN) 49,50 and equivariant 51,52 models use a chemical space embedding approach that doesn't scale with the size of the chemical space, but lacks interpretability.Both approaches also suffer from the exponential increase in data requirements associated with the large composition space.
In our previous work 53 , we introduced a generalpurpose machine-learning (ML) model for the study of HEAs with up to 25 transition metals in the composition (the HEA25-4-NN), as well as proposed an efficient approach to generate a training set for it.Based on the idea of a physically-motivated and interpretable alchemical contraction of the feature space 54 , the model demonstrated very promising accuracy, robustness and transferability in terms of various bulk HEA simulation scenarios, which is even more remarkable in light of the difficulty in modeling even pure transition metals using MLIPs 55 .However, catalysis research requires the ability to model surfaces and defective structures, which is beyond the capabilities of a model that is trained exclusively on distorted fcc and bcc bulk configurations.Here we extend the scope of the model to study structure and stability of the surfaces of HEAs, with a particular focus on the problem of surface segregation, which is of great importance for catalysis.
We demonstrate that by extending the bulk training set with a small number (less than 20%) of surface and molten structures, it is possible to achieve a similar level of accuracy also for these defective configurations.We trace the transferability of the model to the feature space contraction matrix, which is optimized based on the bulk structures and reduces the chemical complexity of the training problem, as well as the data requirements.We then perform a simulation of the segregation process in a Cantor-style alloy with an equimolar content of all the 25 transition metals included in the training set to evaluate the segregation propensity of different elements.Finally, we study the segregation process in two selected HEAs.

II. THEORY AND COMPUTATIONAL DETAILS
The details of the reference DFT calculations and the structure of the ML model closely follow those used in a recent paper focused on bulk structures 53 , which allows us to perform an insightful comparative study.Here we provide a brief summary of these details and focus on the strategy we adopt to extend the training set to include disordered and inhomogeneous structures.

A. First-principles calculations
The training data on energies and interatomic forces of the structures was obtained from density functional theory (DFT) simulations performed with the Vienna Ab initio Simulation Package (VASP) 56 .The convergence criteria for the electronic selfconsistent cycle was 10 −6 eV, while the cutoff energy for the plane-wave basis set was 550 eV.We used a Γ-centered Monkhorst-Pack 57 scheme with a resolution of 0.04 π Å−1 for sampling the first Brillouin zone.
The behavior of the core electrons and their interaction with the valence electrons was described with projector-augmented wave (PAW) pseudopotentials 58 .Exchange-correlation effects of the electrons were taken into account within the Generalized Gradient Approximation (PBEsol functional 59 ).The vacuum size for the surface slab calculations was set to 20 Å.Following the same reasoning as in Ref. 53, we performed the calculations without spin polarization.Even though it is possible to describe the magnetism within the spin density functional theory (SDFT) approximation (and even though this framework is often used by ML models that incorporate magnetic information), when dealing with such a broad set of transition-metal compounds it is likely that some element combinations require different approaches, such as DFT with Hubbard U and J corrections (DFT+U+J), and dynamical mean-field theory (DMFT) 60,61 .The use of different methods would introduce inconsistencies in the training data, and may also require the adjustment of external parameters such as Hubbard U and J for each individual structure.Hence, non-spin-polarized calculations remain the most consistent choice for this work, even though it limits the accuracy of the model for metals and alloys with strong magnetic behavior.We further discuss the errors associated with the neglect of magnetism in Sections III D and V, and in the Supplementary Information.

B. Machine-learning model
For the sake of consistency between our previous work on bulk HEAs and our current work, we kept the model architecture unchanged.Here, we summarize the main ideas underlying the model, while for all the details, the reader is referred to the Ref. 53.
The model is based on a combination of ridge regressions built upon a chemical composition, two-and three-body correlation features, and a fully-connected neural network (NN) based on three-body features.We use the alchemical compression method introduced in Ref. 54 to prevent the steep increase of model complexity due to the large size of the chemical space.The model connects a representation of the atomic environment A i of the atom i with its contribution to the potential energy of the system V (A i ): (1) Here, the first term V (aeb) (A i ) = w (aeb) ai correspond to an atomic baseline that depends only on the chemical identity a i of each atom.The second term corresponds to a pair potential, that is trained in the full 25 × 25-dimensional space of two-element correlations.The three-body potential instead, is based on descriptors that are contracted from the original, 25-elements chemical space (a) to a much compressed pseudo-element space (b), The coefficients u ba are collected in the alchemical coupling matrix u, that determine the compression from the full to the reduced chemical space.For consistency, we compress the chemical space to 4 pseudoelements, see Ref. 53 for a discussion of convergence.The bra-ket notation represents in a concise manner the (al)chemical dimensions, as well as the indices of the radial basis (n, n ′ ) and the angular momentum channel (l) that discretize the expansion of the neighbor-density correlations.A final NN term is designed to capture the non-linear, many-body components of the target potential: For each environment, a multi-layer perceptron F is applied to the set of compressed 3-body descriptors.
Although, as discussed above, we chose to build a non-magnetic model, one can see quite easily how this framework could be modified to incorporate some recent ideas to describe spin in a ML framework.The simplest approach is to simply train the model on spinpolarized data with a fully relaxed spin subsystem.This approach provides an accurate description of a systems with a clear-cut magnetic ground state (such as the ferromagnetic bcc iron 62 ), but is problematic when there are multiple near-degenerate spin states, e.g.near or above the Curie temperature.Alternatively, one needs to modify the architecture to explicitly describe the magnetic state of the system, so that one can train on different magnetization states, and simulate atomic and spin dynamics simultaneously and thus provides a more accurate, and principled description of the system.This can be achieved by discretizing the spin states and treating different states as separate species 63 , or by associating scalar descriptors with the magnetic state of the atoms 64 .Both of these approaches are easily translated into the alchemicallycompressed architecture.A non-collinear, vectorial treatment of the atomic magnetic moments is also possible 65,66 , which would require more substantial modifications.As discussed above, in our opinion the challenge in including a consistent physical description of magnetism for such a wide chemical space is comparable -if not harder -than the implementation of a suitable ML architecture.

C. Training dataset construction
The starting point for this study is the dataset of Ref. 53, which contains approximately 25000 bulk configurations, each containing between 3 and 25 transition metals (the entire d block except for Tc, Re, Os, Cd, Hg) arranged as a regular, or randomly distorted, fcc or bcc lattice.We refer to this original dataset as "subset O".Although the resulting model was remarkably stable, and it was possible to perform molecular dynamics simulations of mixtures with 25 elements at temperatures well above the melting point, it would be unwise to use it for surface effects.To obtain a model that is capable of a reliable description of the surface segregation process in HEAs with up to 25 d -block elements, we need an extended data set, which we build iteratively by including three additional groups of structures.
The first part of our extended dataset (labeled as "A") follows the same philosophy as the bulk data, but aims to capture surface energetics.We generate a few subsets of on-lattice surface slab structures based on bcc and fcc lattices and Miller indices of (100), (110) and (111).The unit cell of each structure contains 45 atoms, obtained by replicating the appropriatelyoriented primitive cell of the surface according to a 3 × 3 × 5 geometry.The atomic composition is randomly assigned by sampling from a random subset In order to increase the structural diversity, we use replica-exchange molecular dynamics with atom swaps (REMD/MC) for structure generation and FPS for selection.These simulations use a preliminary version of the surface-enabled potential trained on a combination of subset A and the original dataset O.We perform high-temperature (2000 K -4000 K) REMD/MC of 100 FPS-selected HEA bulk systems and intermediate-temperature (300K -1500 K) REMD/MC of 100 FPS-selected HEA surface slabs.From the resulting trajectories, an additional round of FPS selection yields a collection of 1000 molten bulk structures (subset B) and 1000 thermally equilibrated surface slab structures (subset C).
Finally, using a preliminary potential trained on subsets O, A, B, and C, we generate a Cantorstyle alloy surface slab containing all 25 elements in roughly equimolar proportions, perform intermediatetemperature REMD/MC sampling, and select by FPS a set of 500 structures (subset D) that mimics closely the setup we use to investigate the segregation propensity of different elements.All datasets used in this paper are summarized in Table I

III. TRANSFERABILITY OF ALCHEMICAL LEARNING
The modeling scenario we investigate here -the extension of an existing ML model to a substantially different type of configurations -is common in practical applications to chemical and materials modeling.The simplicity and interpretability of the architecture of the HEA25-4-NN model allows us to gain some generally applicable insights into successful strategies.For this reason, as well as because of the excellent performance of HEA25-4-NN in out-of-sample, extrapolative predictions, we chose to use exactly the same model architecture, despite the fact that recent results have shown that deep-learning models have the potential of improving substantially the in-distribution accuracy for the bulk HEA dataset 68 .
The extensive set of experiments in Ref. 53 shows that the HEA25-4-NN model is able to perform both low-temperature and high-temperature MD without actually being trained on MD trajectories, and it even achieves respectable accuracy when making predictions for some elements that are not represented in the training set, thanks to the interpretable form of the alchemical contraction matrix. 53Despite the overall stability, the accuracy for molten configurations or surface structures is much worse than for in-sample predictions, and so it makes sense to expand the training dataset to include more structural diversity.An important question here, with a clear relevance for the broader goal of obtaining potentials that are generally applicable over a large swath of chemical space, is whether compositional and structural degrees of freedom must be sampled independently.In this particular situation, after having satisfactorily learned the behavior of 25 elements in the bulk phase, can we learn their behavior at a surface with a small number of additional training structures or do we need a similar amount of data for each type of chemical environment?

A. Learning curves
The predictions of the HEA25-4-NN model on the new structural subsets are up to 20 times less accurate than the in-sample predictions.While this might appear a very large degradation in performance, one has to consider that these structures are entirely different from the bulk structures included in dataset O, and indeed the errors on the high-temperature bulk structures are much smaller.In Figure 1 we plot the accuracy of a series of models trained by progressively including data from each of the subsets A, B, C. When interpreting these curves, one should keep in mind that (1) the errors are computed separately on holdout sets containing 10% of the structures from all the available datasets, namely O, A, B, C, D; (2) the alchemical weights matrix was taken from the HEA25-4-NN model and kept fixed during the training; (3) the models are trained incrementally, i.e. we start from the model weights of HEA25-4-NN and we further optimize the models after adding a new chunk of structures and after incorporating a new subset; (4) the train set is similarly extended in an incremental fashion when going left to right in the plot; (5) we always include 10000 randomly-selected structures from dataset O to avoid model drift.We also include the results of the HEA25-4-NN, the model trained on the full OABC dataset with optimized alchemical weights matrix (see the details in Sec.III B), together with results of the final HEA25S-4-NN model (see the details in the Sec.III C) in Figure 1 to explicitly demonstrate the overall training dynamics.
As the amount of training data increases, the accuracy of the model on this new data approaches the accuracy on the O subset.It is clear that the model can quickly adapt to new types of structures: adding the "ideal" surfaces from subset A improves the error by almost tenfold error not only on subset A, but also on the MD surface structures in subsets C and D; adding the bulk structures in subset B improves the accuracy on subset B, but also on the slab MD subsets C and D. The accuracy on dataset O degrades slightly initially, but the final model is only marginally worse than the one fitted exclusively on O-like structures.This indicates that, at least when including a substantial part of the original data in the new training, the model is flexible enough to incorporate new structures without substantially affecting the in-sample accuracy.We note however that learning curves exhibit a small slope which indicates that a much larger amounts of data would be needed to improve further the accuracy.Overall, this suggests a nuanced answer to our general question above.The trained model requires only a few structures to capture the gist of completely new types of configurations, and it can use the correlations from one set of structures to accelerate learning on new ones (for reference, the accuracy of the HEA25-4-NN model trained on 200 and 1000 O structures is approximately 150 and 30 meV/atom MAE 53 ).However, reaching uniform accuracy across completely different structural motifs may require almost-uniform train-set sampling.
We also see that keeping fixed alchemical weights has a minimal impact on model accuracy, an observation we elaborate on in Sec.III B. From this we can conclude that all the general information about the interatomic interaction is already contained in the original dataset O, and one only needs to slightly adjust the model weights by adding a small amount of new data during training to reach semi-quantitative accuracy.The total number of structures in datasets A,B,C is less than 20% of the 24'000 bulk structures that were used in Ref. 53 to train the parent modeland the learning curves indicate that an even smaller number may suffice with a limited impact on the accuracy.The relatively small change in model accuracy when adding the dataset D (which effectively corresponds to another iteration of a rudimentary activelearning protocol) is further indication that, within the level of accuracy we seek, the alchemical model can effectively combine information on different structure types, capturing the relevant information on surface effects from the dataset A and the defect properties of high-temperature molecular dynamics from the bulk dataset B to finally transfer this combined knowledge to the molecular dynamics of surface slabs.Test MAE on energies, meV/at.
Optimized weights Figure 1: Learning curves for the validation error computed separately on the various datasets considered in this work.The errors are calculated on a hold-out test set containing 10% of the structures.The labels for the datasets correspond to those in Table I, and visualizations for each structure type are provided for convenience.For each curve, 10000 structures from the dataset O were included in the training set and kept fixed.Each section on the x -axis corresponds to the addition of a fraction of the structures for each subset (x A for the subset A, and so on), while the preceding datasets are included in full.Individual data points marked with crosses correspond to the hold-out errors obtained by direct application of the HEA25-4-NN model from Ref. 53, for a the model trained on the OABC dataset with optimized alchemical weights matrix, and the final HEA25S-4-NN model trained on the full dataset.The inset shows a parity plot of the alchemical coupling matrix u values in the case of model training with fixed u values (x -axis) and optimizable u values (y-axis).

B. Transferability of the alchemical weights matrix
The alchemical coupling matrix u is used to contract the power spectrum features from the original chemical space of 25 elements to a reduced-size alchemical space.In Ref. 53 it was shown that the weights are relatively insensitive to the details of the fit, and that can be interpreted in terms of the layout of the elements in the periodic table .Here we can investigate further the generality of the contraction, by assessing how much it depends on the compositions of the training dataset.To study this, we start from the pre-trained HEA25-4-NN model and consider the dataset that combines the full O, A, B, C subsets, and perform two training exercises -one with fixed alchemical weights and one in which the coupling coefficients are optimizable parameters.The inset in Figure 1 shows that the entries of the coupling matrix remain almost unchanged upon optimization, which indicates that the values obtained from bulk structures are also compatible with the similarity in behavior of elements at surfaces, reinforcing the notion that u contains valuable, interpretable and transferable information about the chemical relations of the elements across the periodic table.As one may expect, the small difference in alchemical weights results in negligible changes of the model predictions compared to the case of fixed alchemical weights.This is evident from the minute changes in test-set errors in Figure 1, where the fixed-weights model is the last point in the "OAB+C" learning curve, and "OABC, opt.weights" indicates the model trained on the same dataset with optimizable alchemical coefficients.In fact, even the training curves of the two models are almost indistinguishable, as discussed in the Supplementary Materials.

C. Final model and dataset
Datasets O -C consistently sample the configuration space for simulations involving both bulk and surface-slab HEA configurations.Subset D serves mainly as a demonstration of convergence with respect to structural diversity.However, given that the application focus of this work is the study of surface propensity of different transition metals, we decided to train a final model that also includes these validation structures, allowing the weights of the alchemical coupling matrix to be optimized during the training to obtain the best possible accuracy with the current architecture.We refer to this combined dataset as the HEA25S dataset, and to this final model as the HEA25S-4-NN potential.Its accuracy on the holdout test sets, computed separately for each subset, is presented as the last points of Figure 1.Inclusion of this last batch of structures leads to a slight improvement of the performance on all test subsets.The error of the model in predicting the energies of the HEA bulk structures, about 10 meV/atom, is effectively unchanged compared to our previous study 53 .Errors on dataset D are comparable, around 9 meV/atom, and those for subsets A-C are only slightly worse: from 13 to 16 meV/atom.We make the final dataset and the model available as part of the data record that accompanies this publication 69   surface structures and that of a fully random, on-lattice configuration.Cleaving energies are computed as the difference between the energies of a bulk and a surface slab configuration, considering separately the cases of a fully-random, MC-relaxed or fully-relaxed configuration.

D. Model validation on structure relaxation and surface processes
In order to verify more quantitatively that the HEA25S-4-NN model can describe chemical and structural realaxation and surface effects, we generate a small set of more targeted test structures.We construct (a) five 2 × 5 × 10 fcc bulk structures of 100 atoms with 25-elements equimolar composition.Atoms are fixed to the ideal lattice positions, and the molar volume adjusted following the same logic used for the main dataset.Starting from these five structures, we use the HEA25S-4-NN model to generate: (b) bulk structures with the atomic ordering relaxed by Monte Carlo atom swaps at T = 300 K, keeping a rigid lattice; (c) fully-relaxed bulk structures using combined REMD/MC sampling, followed by geometry optimization starting from uncorrelated lowtemperature snapshots; (d) (111) slabs of the same size obtained by rigidly cleaving the random structures; (e) MC-relaxed (111) slabs ; (f) fully-relaxed (111) slabs.For each configuration, we recompute the DFT energy, and the overall energy MAE for this set is 25.7 meV/atom -with the largest errors being associated with on-lattice MC structures (see SI).
We then compute quantities that have a more direct bearing on relevant physical processes: Fig. 2(top) shows a parity plot for "relaxation energies" computed as quantify the enthalpic gain from short-range order, surface segregation and/or geometry relaxation.The subscript in these expressions corresponds to a specific subset from the list above.The ML model predicts these terms with an accuracy of 50.3, 12.5, 31.7, and 18.5 meV/atom respectively, which is less than 5% of the typical relaxation energy on average.We note that the typical error on MC relaxation energies prediction is significantly higher than in the case of REMD/MC relaxation.The main reason for such a difference is lack of the MC-relaxed structures in the training set.Therefore, the evaluation of the model on MC relaxation may be considered as another test for the model's generalization capabilities.Fig. 2(bottom) shows a parity plot for the "cleaving energies" While these should not be interpreted to correspond exactly to a surface energy (averaging over multiple configurations, as well as incorporating entropic effects, would be necessary) they gauge the accuracy of the model for enthalpic terms that are directly associated with the creation of a surface.The accuracy (2.3, 14.4, 4.4 meV/ Å2 ) is a fraction of less than 10% the magnitude of the cleaving energy, except for the case of the on-lattice MC relaxed structures, that represent an extrapolative regime.In the SI we also compare relaxation and cleaving energies computed with and without spin polarization.The errors are comparable to or smaller than the errors associated with the ML model, and small on the scale of the typical relaxation or cleaving energies, indicating that neglecting magnetic effects is an acceptable -if harsh -approximation when study-ing finite-temperature and surface effects on the structure and stability of HEAs.

IV. SURFACE SEGREGATION PROPENSITY IN A CANTOR-STYLE ALLOY
In heterogeneous catalysis, the composition and structure of the surface are among the most important factors determining the activity of a material.In this respect, HEAs are interesting because they often exhibit differential segregation of their components at the surface [70][71][72][73][74] , which can be used to control their properties.For example, this effect can be exploited to use small quantities of a rare-earth element, and manipulate the alloy chemistry so that it accumulates at the surface, where it can enhance the catalytical activity. 5,6,31,75,76.Even though the actual segregation propensity will depend on the specific composition, as well as on processing conditions that might affect thermodynamics and kinetics, it is useful to establish an "absolute" scale for the surface affinity of each of the 25 transition metals we consider in this study.To do so, we perform simulations similar to those performed in Ref. 53 for a bulk system.We consider a surface slab of a Cantor-style alloy, containing all 25 elements in equimolar proportions, randomly assigned to the atomic sites of an 864-atom slab with an initial fcc lattice, oriented to expose a (100) surface.We then perform four independent REMD/MC runs, combining replica-exchange molecular dynamics (we use 36 replicas with temperatures distributed from 300 K to 1500 K on a logarithmic scale) with Monte Carlo swaps between atom types.Moleculardynamics details are the same as in Ref. 53 with a time step of 2 fs and an efficient Langevin thermostat.The cell is left free to fluctuate along the inplane directions, with a small applied pressure of 1 bar.The system shows glassy behavior, with rapid disruption of the fcc lattice, and logarithmic relaxation of the potential energy (Fig. 3).This loss of crystallinity is not due to the presence of a surface, but a consequence of the presence of a wide variety of species with very different atomic radii, and was also observed in bulk simulations 53 .The simulation is not fully equilibrated even after 500 ps and several hundred thousands attempted atom swaps.We observe however that the four trajectories behave in a similar way, and that structure and composition profiles remain roughly constant after a few hundred ps.Thus, we average over the last 100 ps of each of the four independent runs to provide a qualitative estimate of the surface affinity.
We compute and analyze the concentration profiles of different elements along the normal direction to the surface plane.Fig. 4 showcases a few representative cases, together with a snapshot of a longitudinal cut of the simulation slab.Concentration profiles for all elements and different temperatures are given in the SI.There are clear-cut differences in the surface segregation propensities, with elements such as Ag accumu- lating at the surface, others such as Mn being strongly depleted, and some such as Ni displaying an intermediate behavior.The concentration profiles show pronounced differences in the peak shapes and positions, which suggests a tendency for some elements to accumulate in the sub-surface layer and to form ordered surface alloys.However, the disordered structure, the slow relaxation and the limited accuracy of the potential make it difficult to provide definitive statements on the precise relaxed geometry of the structure that is formed at the surface.Instead, we compute an integrated, macroscopic indicator of segregation propensity in terms of the Gibbs surface excess Γ a , that is defined for each element a as where N a and N B a correspond to the total number of atoms in the slab, and the number of atoms inside the bulk region of the surface slab.N and N B represent the total numbers of sites in the simulation and inside the bulk region, respectively, and S is the surface area.We define the bulk region of a surface slab as a 10 Åthick region around the center of the slab (shown as a shaded region in Fig. 4).Values of Γ a > 0 indicate the tendency of a given element to accumulate at the surface.If Γ a ≃ 0 , then the element does not have a pronounced affinity, and will have roughly the same concentration in the bulk and at the surface.Finally, if Γ a < 0, the element will tend to stay in the bulk region, and be depleted at the surface.It is important to stress that the definition ( 6) is insensitive to the thickness of the bulk region, which is only used to estimate the concentration of each element "far" from the surface.It avoids specifying explicitly which layers are considered as part of the surface, and it provides an overall measure of the surface affinity that can be readily connected to macroscopic quantities, such as the surface energy 77 .
Figure 5a shows the computed surface excesses for the 25 elements at three representative temperatures.Differences are very pronounced, with elements at the edges of the d block showing surface affinity, and most of the elements in the central part of the d block being depleted at the surface.The temperature dependency of the surface excess is usually weak, except for a few cases such as V or Nb, where it changes by more than 50% between 300 K and 1500 K.There is a remarkable correlation between the surface excess of the different elements and their position on the 2D map constructed in Ref. 53 based on their bulk behavior (Figure 5b).This is perhaps unsurprising, given that the mutual affinity in the bulk and the affinity for the surface are both loosely related to the binding energy between atoms.It is also worth noting that the periodic trends observed for the surface excess, as well as the ordering derived from bulk shortrange-order calculations, reflect the trends in atomic radii across the transition metals, with larger radii at the beginning and the end of each period.This is consistent with the fact that transition metals sur- faces are characterized by an accumulation of tensile stress 78,79 , that is alleviated by the segregation of elements with a large radius.Overall, this observation reinforces the notion that the data-driven optimization of alchemical weights can (re)discover physical trends across the periodic table, 54 and that similarity in behavior in REMD/MC simulations driven by the HEA25S-4-NN model can be used as a guide to performing element substitutions when designing novel alloy compositions.

V. SURFACE SEGREGATION IN SELECTED ALLOYS
The surface segregation study in the previous section is intended to provide insight into the overall surface propensity of transition metals commonly found in HEAs.However, it cannot substitute an explicit study of specific compositions, nor does it provide validation for our computational framework.Ideally, one could assess the reliability of our framework by directly comparing the equilibrium surface composition for specific HEA compositions with the surface distribution of elements observed in experiments.Nevertheless, there are some key concepts that need to be considered when evaluating our results.First, the model is trained on DFT data and thus the accuracy is limited by the accuracy of the DFT reference -which notably, for our model, neglects effects due to magnetism.Second, our sampling protocol allows to achieve highly ergodic sampling, whereas kinetic trapping plays a key role in (meta)stabilizing HEAs.Third, we consider the HEA surface slabs under vacuum conditions, at variance with experimental setups where a significant amount of oxygen is often present.In fact, the presence of a chemically-active environment can dramatically affect the surface concentration of different species, due to the formation of oxides on the surface (see Ref. 70) or the leaching of some of the elements in solution.In this section, we first validate the ML potential against DFT calculations for the prototypical Cantor alloy CoCrFeMnNi and comment on the suitability of "static" approaches to evaluate segregation patterns, which are commonly used in combination with first-principles calculations 70 .Then, we perform simulations for an IrFeCoNiCu alloy and comment on the comparison with experimental data on the surface composition from Ref. 80.

A. CoCrFeMnNi alloy
We begin by ensuring that the accuracy of the HEA25S-4-NN model is sufficient to reproduce the energetics of surface segregation relative to reference DFT calculations.To do so, we apply a simplified version of the protocol used in Ref. 70: we generate multiple slabs using the ideal lattice, an equimolar composition and random element distribution, and compute the difference in enthalpy when all the surface sites are filled with a specific atom type X=Cr, Mn, Fe, Co, Ni, keeping the overall composition fixed: H segr = H X,surf − H rand .This quantity is averaged over 16 different random realizations.
As shown in Fig. 6, there is excellent agreement between the values of segregation enthalpy obtained using the ML potential and those obtained with explicit DFT calculations.Results also agree qualitatively with Ref 70, with Ni being the element with the highest surface propensity (large negative H segr ) and Cr that with the least propensity (large positive H segr ).Mn and Fe have a small positive H segr , while Co has a negative segregation enthalpy (i.e.slightly favors surface sites), at variance with Ref. 70, in which it was found to have a (very small) positive G segr .The quantitative discrepancy can be attributed to the difference in DFT details and to the fact that Ref. 70 includes entropic terms and computes the segregation energy at constant bulk composition, which requires estimating the chemical potential of the elements.We chose to ignore these terms to obtain a more transparent validation of the ML model.We discuss further the details of the calculation of H segr in the Supplementary Information.
Having access to a fully-flexible and inexpensive ML model, we can then verify whether estimates based on static calculations, and on the overly simplified  picture of the formation of a pure surface monolayer on top of a random alloy, provide reliable indications of actual surface propensity.To do this, we prepare a surface slab with a fcc lattice, (111) orientation, and a larger 7 × 7 × 11 supercell containing 539 atoms.We perform a REMD/MC run of this surface slab for 200 ps in the NPT ensemble using a 2 fs timestep, following the same workflow we applied to the 25-elements Cantor alloy.As shown in the SI, the resulting REMD/MC trajectories reach equilibrium very quickly, which allows performing a single run, analyzing the surface composition by averaging over the final 100 ps of the trajectory.Even the replicas at the highest temperature retain an ordered fcc structure, consistent with the fact that the simulations are performed below the melting point.To complement this analysis, we show in the SI the values of H segr computed with spin-polarized DFT, and compare them with the mean enthalpy of structures that are relaxed with MC sampling on a rigid lattice at 300 K using the (non-polarized) HEA25S-4-NN model, and with combined REMD/MC sampling, also at 300 K. Neglecting finite-temperature sampling of short-range order and structural relaxation is an approximation that is at least as severe as neglecting effects due to magnetism.Compositional and structural relaxation stabilize greatly (up to several hundred meV/atom) the slab, in comparison to the static calculations in Ref. 70.This is also true when considering spin-polarized DFT calculations, even though the relaxation procedure neglects magnetic effects.In other terms, performing a full relaxation using a nonmagnetic ML model leads to a better prediction of surface segregation incorporating magnetism a posteriori than using spin-polarized DFT calculations limited to highly-idealized static structures.ML potentials that are capable of describing atomic spins 63,64,66 , in combination with thorough thermodynamic sampling, would obviously provide an even better approximation -even though extending spin-aware models to a similarly broad chemical space raises conceptual issues on the most appropriate physical description of magnetism, and even though finite-temperature effects are likely to reduce the importance of spin correlations for HEAs, that usually have low Curie temperatures 81 .
Analyzing more carefully the surface structure for ML-driven REMD/MC calculations, we observe that the concentration profiles of the elements at 300 K (see Figure 7a) show a very strong nickel enrichment in the first surface layer and depletion for all other elements.The second layer contains a significant excess of chromium, while iron, manganese and cobalt are more abundant from the third layer and deeper into the bulk.These observation are reflected in the Gibbs surface excess, computed according to Eq. ( 6) (see Figure 7b).There is a large surface excess for Ni and strong depletion of Co, which becomes less pronounced at the highest temperature.Mn and Fe have weak depletion, and Cr has almost zero overall surface activity -even though, as evidenced by Fig. 7b, this is the net result of first-layer depletion and secondlayer accumulation.When comparing these observations with the statically-determined H segr , one sees that the strong surface affinity of Ni is consistent between the two approaches and between spin-polarized and non-polarized calculations, as is the fact that Mn and Fe have better affinity for the bulk.The large positive H segr of Cr does not translate in strong depletion, which we attribute both to the fact that Cr appears to find a favorable environment in the subsurface layer, and to the fact that Cr exhibits strong local ordering in the bulk (as observed in Ref. 53), which makes a completely random alloy a poor reference state for the calculation of H segr .Similarly, our static DFT calculations show negative H segr for Co (indicating surface propensity) but finite-temperature sampling shows a negative surface excess Γ Co (indicating surface depletion).These results demonstrate the necessity of performing simulations that allow sampling local bulk relaxation and the formation of nontrivial surface orderings, as well as how drastically the behavior of the elements may differ from static, highly-idealized reference calculation.

B. IrFeCoNiCu alloy
Even though obtaining a realistic description of local ordering is necessary to extract reliable estimates of thermodynamical properties, experimental conditions often require a description of kinetic effects and of the presence of a reactive environment at the surface.To investigate these effects, we consider the case of an equimolar IrFeCoNiCu HEA, for which a detailed experimental characterization of the surface composition has been recently described. 80We first compute the Gibbs surface excess following the same protocol we used for the CoCrFeMnNi alloy.In this case, simulations show high segregation propensity of Cu and Ni (Fig. 8).Cu atoms are predominantly found in the first surface layer, while Ni atoms are mostly found in the second layer (Fig. 8).The remaining elements have negative surface excess and segregate to the bulk, with iridium having the smallest absolute value of Γ.We also observe that increasing temperature does not change the relative order of segregation propensity of the elements, but is associated with a more even distribution of the elements across the cell at high temperatures.Similar to the case of CoCrFeMnNi, spin-polarized DFT calculations confirm the qualitative findings, with both on-lattice and fully-relaxed structure being much more stable than the random reference even when re-computed including magnetic effects.Even though the quantitative value of the segregation enthalpy is considerably smaller, there is no indication of a major discrepancy that might alter fundamentally the trends we observe with the non-magnetic ML potential.
The comparison of these results with the experimental data in Ref. 80, shown in the inset, requires a brief description of the experimental setup.First, the IrFeCoNiCu nanoparticles synthesized in that work were treated with a 0.1 M solution of perchloric acid (HClO 4 ) at room temperature under an applied voltage, which essentially washed away the outer shell of the nanoparticle.Scanning transmission electron microscopy with energy dispersive spectroscopy imaging reveals an iridium-rich region at the surface.As evidenced by the high concentration of all elements but iridium found in the solution, this process is driven by the reactivity of the metals in the aggressive chemical environment, rather than by an intrinsic tendency of Ir to accumulate at the surface.
Our simulations suggest that in the absence of an electrochemical treatment, thermodynamic drive would favor the formation of a Cu/Ni thin layer at the surface.This suggests a broader investigation of the stability of the IrFeCoNiCu alloy, which we tackled by performing a REMD/MC simulation of a 5x5x5 bulk fcc crystal supercell (500 atoms in total), over the same temperature range (300 K -1500 K) used for the slab, and at a constant pressure of 1 bar.Our results (Fig. 9) suggest that the alloy is thermodynamically unstable, leading to precipitation of Cu and Ni even in the bulk phase.Therefore, we expect the homogeneous IrFeCoNiCu nanoparticles from Ref. 80 to be only metastable.The sluggish diffusion that kinetically stabilizes the quasi-random alloy may also affect the surface stability, explaining why the surface remains relatively homogeneous in the absence of a chemical treatment.However, the instability of the bulk alloy and thermodynamic drive of Cu and Ni to segregate at the surface may become a serious problem in catalysis applications, by reducing the surface concentration of Ir over time, especially if operating temperatures are reasonably high and if the environment does not ensure continued leaching of the reactive metals.
It would be desirable to have an alloy that is both thermodynamically stable in bulk form and that shows an intrinsic drive towards surface accumulation of Ir.While chemical treatment might still be useful to accelerate the formation of the Ir layer, such alloy would likely be easier to manufacture and more stable in practical applications.Based on the observation that the Ir-Fe-Co phase remains stable in our REMD/MC simulation and the relatively higher surface propensity of Ir over Fe and Co, we investigated the ternary IrFeCo system, which was recently   patented as a promising candidate for applications in catalysis 82 .We further verify the bulk stability of the IrFeCo by performing exactly the same REMD/MC simulation as for the 5-element composition.In this case, we observe that the relaxed structure remains uniformly mixed even after convergence of the trajectories (see the SI).Given that the highest temperature in the REMD/MC run is 1500 K, we expect the alloy to preserve its phase stability also at high temperature.Furthermore, we also study surface segregation with a setup analogous to that used for the other systems: we create a 7x7x11 fcc surface slab and perform a REMD/MC simulation in this system over the same range of temperatures from 300 K to 1500 K and at constant pressure of 1 bar.We then calculate the surface excess using Eq. ( 6) after averaging the composition of the bulk region over last 100 ps of the trajectory.The results are presented in Figure 10.Ir tends to accumulate at the first surface layer, while Co is abundant in both the first and second layer.As a result of the balance between the propensity of the two elements for the surface and sub-surface layer, the overall value of Γ Co shows a very pronounced temperature dependence.At 300 K Co is the most surfaceactive, followed closely by Ir, while at higher temperature Ir remains abundant in the first layer, while the concentration of Co becomes more uniform, reducing its overall excess to almost zero.The bulk phase stability and the positive Γ Ir at all temperatures make the IrFeCo system potentially interesting for applications, allowing one to create nanoparticle with a thermodynamically-stable Ir surface layer.

VI. CONCLUSIONS
We extend a recently-developed machine-learning potential trained on the DFT energetics of arbitrarily complicated alloys of 25 d -block metals to include surfaces and defective structures, allowing us to study equilibrium surface segregation of the various components.By investigating in detail the effect of incorporating new structures and re-optimizing a pretrained model, we infer some general guidelines for the broadly-relevant goal of training models for chemically and structurally diverse problems.First, it appears that the "alchemical embeddings" we use to reduce the complexity of learning across a large sector of the periodic table are transferable between different types of structural environments.The weights that describe how individual elements are projected on a smaller-dimensional set of pseudo-elements change only minimally if we re-optimize them on the extended data set, and the impact on the model accuracy is negligible.We also observe that very few structures need to be added to improve the accuracy for a new class of configurations: whereas the bulk-only model reached saturation with approximately 10000 structures, a few 100s of surface and molten configurations are sufficient to improve the validation error by almost an order of magnitude.On the other hand, the error saturates at a semi-quantitative level of accuracy, with errors between 10-20 meV/atom that are too large to resolve the fine details of the energetics of intermetallic phases.It appears that more flexible models would be necessary to reach quantitative levels of accuracy 68 , although this may come at the price of reduced extrapolative power, and an increase in the data requirements.
The new model, which we name HEA25S-4-NN, is then used to thoroughly sample configurations and element distributions for a large slab of a 25-elements Cantor-style alloy.Based on the quasi-stationary state of these simulations (that never fully achieve equilibrium due to the glassy nature of the system) we compute the Gibbs surface excess of all 25 elements.This quantity, that can be interpreted as a measure of their surface propensity, shows clear periodic trends, that are consistent with the known surface stress effects that would favor the presence at the surface of atoms with a large metallic radius, and correlates very strongly with a measure of chemical similarity that was derived in Ref. 53 based exclusively on bulk short range ordering information.Thus, one can use the bulk-derived similarities as a guide for alloy design also in terms of the surface activity of the various components.
We then present two examples of the application of the HEA25S-4-NN model to the study of surface segregation of equimolar quinary alloys.We perform a direct validation of the accuracy of the model against a DFT assessment of the surface segregation enthalpy of different species in CoCrFeMnNi, finding errors that are well below 10% of H segr .Furthermore, we demonstrate that the kind of static, idealized calculations that are usually employed to investigate surface propensity with explicit electronic-structure calculations can lead to qualitative errors, such as estimating a strong tendency towards surface depletion of Cr, whereas explicit REMD/MC sampling shows a tendency for Cr to accumulate in the sub-surface layer.The errors associated with treating the HEAs as random and/or rigid lattices are comparable (and usually larger) than those associated with the neglect of magnetism, which is one of the most prominent physical limitations of our ML model.
At the same time, one has to acknowledge that simulations that aim to compute the equilibrium thermodynamics of a solidvacuum interface cannot directly describe more complicated experimental or in operando conditions.Our simulations of a IrFeCoNiCu suggest that the bulk alloy is only metastable, and tends to phase separate into a NiCu and an IrFeCo phases.The instability is also reflected in the strong tendency of Cu and Ni to form a surface bilayer.The accumulation of Ir at the surface of seemingly stable quinary alloy nanoparticles that has been recently observed in experiments 80 can only be understood in terms of the slow diffusion kinetics in the bulk phase, and the leakage of chemically active elements in the aggressive environment used to treat the particles.Based on our simulations, we propose that the ternary IrFeCo alloy may be a more stable composition, that also exhibits a thermodynamic drive to form a stable Ir surface layer at all temperatures.
This study demonstrates the ease by which ML potentials for chemically-diverse datasets can be extended to include new types of structures, and used to capture the qualitative behavior of complex materials in finite-temperature conditions.Even though ad hoc models trained for more restricted sets of compositions and thermodynamic conditions, and the incorporation of magnetic effects at least at the level of spin-polarized DFT, are still needed to achieve errors of a few meV/atom, the semi-quantitative accuracy we achieve is sufficient to gather insights on subtle phenomena such as surface segregation, to study in a detailed way the structure and energetics of specific alloy compositions, and to propose new alloy compositions for applications to heterogeneous catalysis.

DATA AVAILABILITY
All data and code used to train the HEA25S-4-NN model, as well as the fitted parameters and code to run the simulations discussed in this work is available in the Supporting Materials or from publicly-accessible repositories.

Figure 2 :
Figure 2: Parity plots for relaxation (top) and cleaving (bottom) energies computed for 5 different realizations of a 25-element equimolar Cantor-style alloy structure.The symbols identify 5 different random realizations, and colors represent different simulation setups.MAE values in the legend providean overall estimate of the error for each case.Relaxation energies are computed as the difference between the energy of an on-lattice, Monte-Carlo (MC) relaxed configuration, or fully-relaxed (REMD/MC) configurations of bulk and (111) surface structures and that of a fully random, on-lattice configuration.Cleaving energies are computed as the difference between the energies of a bulk and a surface slab configuration, considering separately the cases of a fully-random, MC-relaxed or fully-relaxed configuration.

Figure 3 :Figure 4 :
Figure 3: Typical trajectories of the potential energy for a 36-replica REMD/MC run of the 864-atom surface slab.Each colored line corresponds to one of the replicas, and the target temperature changes as Monte Carlo swaps occur.The full and dashed black lines are obtained by concatenating the segments that correspond to the lowest and highest simulation temperature.Note the logarithmic x axis.

Figure 5 :
Figure 5: Values of the surface excess Γ a at different temperatures for each of 25 elements from the Cantor-style alloy REMD/MC simulation (top).The element similarity map from our previous work 53 , based on simulation of bulk Cantor-style alloy of the same composition at 1253 K and color-coded according to the surface excess values at the same temperature from this work (bottom).

Figure 6 :
Figure 6: Segregation enthalpies per surface atom of CoCrFeMnNi elements at 300 K, calculated with the HEA25S-4-NN potential (blue) and with the reference DFT setup (orange).Non-magnetic PAW DFT data on Gibbs segregation energy from Ref. 70 is provided for comparison (green).The error bars for HEA25S-4-NN and DFT results represent the standard deviation of the values obtained from 16 independent calculations.The reference DFT data uncertainty of 40 meV is taken from Ref. 70.

Figure 7 :
Figure 7: (a) Concentration profiles of the elements from the REMD/MC simulation of the CoCrFeMnNi alloy at 300 K (top) and a single snapshot of the system taken from the end of the relaxation trajectory (bottom).The z-coordinate represents the distance from the center of the surface slab, while the light blue area on the figure and a dashed line represent the bulk region and the Gibbs surface plane respectively.Atoms in the snapshot and concentration profiles follow the same color scheme.(b) Gibbs surface excess Γ a at different temperatures for the elements in the CoCrFeMnNi alloy from the REMD/MC simulations.

Figure 8 :
Figure 8: (a) Concentration profiles of the elements from the REMD/MC simulation of the IrFeCoNiCu alloy at 300 K (top) and a single snapshot of the system taken from the end of the relaxation trajectory (bottom).The z-coordinate represents the distance from the center of the surface slab, while the light blue area on the figure and a dashed line represent the bulk region and the Gibbs surface plane respectively.Atoms in the snapshot and concentration profiles follow the same color scheme.(b) Gibbs surface excess Γ a at different temperatures for the elements in the IrFeCoNiCu alloy from the REMD/MC simulations.Experimental data on the concentration of dissolved metals C after electrochemical treatment of the IrFeCoNiCu nanoparticles in a 0.1 M HClO 4 solution under the applied voltage from Ref. 80 is provided in the inset for comparison.

Figure 9 :
Figure 9: Front (a) and side (b) views of the IrFeCoNiCu bulk system, relaxed at 300 K in the REMD/MC simulation.Brown and green spheres represent Cu and Ni atoms, while the remaining elements transparent.Precipitation of Cu and Ni is observed after relaxation, which indicates instability of the alloy.

Figure 10 :
Figure 10: (a) Concentration profiles of the elements from the REMD/MC simulation of the IrFeCo alloy 300 K (top) and a single snapshot of the system taken from the end of the relaxation trajectory (bottom).The z-coordinate represents the distance from the center of the surface slab, while the light blue area in the figure and a dashed line represent the bulk region and the Gibbs surface place respectively.Atoms in the snapshot and concentration profiles follow the same color scheme.(b) Gibbs surface excess Γ a at different temperatures for the elements in IrFeCo alloy from the REMD/MC simulations. .