Explainable machine learning assisted molecular-level insights for enhanced specific stiffness exploiting the large compositional space of AlCoCrFeNi high entropy alloys

Design of high entropy alloys (HEA) presents a significant challenge due to the large compositional space and composition-specific variation in their functional behavior. The traditional alloy design would include trial-and-error prototyping and high-throughput experimentation, which again is challenging due to large-scale fabrication and experimentation. To address these challenges, this article presents a computational strategy for HEA design based on the seamless integration of quasi-random sampling, molecular dynamics (MD) simulations and machine learning (ML). A limited number of algorithmically chosen molecular-level simulations are performed to create a Gaussian process-based computational mapping between the varying concentrations of constituent elements of the HEA and effective properties like Young’s modulus and density. The computationally efficient ML models are subsequently exploited for large-scale predictions and multi-objective functionality attainment with non-aligned goals. The study reveals that there exists a strong negative correlation between Al concentration and the desired effective properties of AlCoCrFeNi HEA, whereas the Ni concentration exhibits a strong positive correlation. The deformation mechanism further shows that excessive increase of Al concentration leads to a higher percentage of face-centered cubic to body-centered cubic phase transformation which is found to be relatively lower in the HEA with reduced Al concentration. Such physical insights during the deformation process would be crucial in the alloy design process along with the data-driven predictions. As an integral part of this investigation, the developed ML models are interpreted based on Shapley Additive exPlanations, which are essential to explain and understand the model’s mechanism along with meaningful deployment. The data-driven strategy presented here will lead to devising an efficient explainable ML-based bottom-up approach to alloy design for multi-objective non-aligned functionality attainment.


Introduction
The notion of alloy formation by utilizing multiple principal elements, each typically present in atomic percentages ranging from 5% to 35%, results in a novel class of materials system known as high entropy alloys (HEAs) [1,2].This unique feature of HEAs in terms of wide compositional space (multiple principal elements) promotes mutual solubility and enhanced configuration entropy [3,4].The mutual solubility and high configurational entropy ensure the formation of solid solutions (SS) with the presence of uniform phases.In contrast to the anticipated complex phase formation in multi-principal element alloys, HEAs typically exhibit relatively simple phases, primarily consisting of face-centered cubic (fcc), body-centered cubic (bcc), or a combination of both [3].This structural simplicity is responsible for the exceptional properties of HEAs, including robust cryogenic toughness, high strength, thermal stability at elevated temperatures, and excellent resistance to corrosion and wear [5][6][7][8].Proportions of the constituting components of HEA are found to have a significant effect on the overall effective physical properties.The overarching theme of this paper is to computationally map the high-dimensional compositional space of HEAs, and subsequently achieve non-aligned multifunctionality by exploiting the emerging capabilities of machine learning (ML) through an explainable framework.
The microstructural properties of 3D-printed AlCoCuFeNi HEAs revealed the inherent simplicity of the atomic structure, which is reported to contribute significantly to the HEAs' enhanced microhardness and compressive strength [9].Furthermore, the inherent highly ordered atomic structure, as demonstrated by the AlCoCrFe 2. 5 Ni HEA [10], leads to increased strength and enhanced wear resistance.The variation in an individual atomic concentration (e.g.Al ranging from 0 to 1.8 in molar ratio) resulted in prominent microstructural evolution of Al x CoCrFeNi, especially under varying temperature conditions.Joseph et al. [11] reported the unprecedented presence of a single fcc phase in the Al 0.3 CoCrFeNi HEA and also demonstrated a substantial enhancement in the rate of work hardening during compression.These findings highlight the remarkable mechanical characteristics of HEAs, which have piqued the interest of the materials science community in harnessing their exceptional mechanical behavior in multi-faceted applications.The literature suggests that AlCoCrFeNi HEA offers the potential modulation of strength-to-weight ratio, by varying the Al concentration.The increase in strength/stiffness with respect to comparable density (i.e. increase in specific strength/stiffness) can open up a novel avenue for a broad range of structural applications such as aircraft, automobile and wind turbine structures.The AlCoCrFeNi HEA is widely used in manufacturing high-temperature structural components for aerospace engines, protective coatings for engineered surfaces, and soft magnetic materials for reducing eddy current losses [2].
The development of lightweight HEA (LWHEA) has great potential in addressing the constraints associated with low strength-to-weight ratio in structural applications [12][13][14][15].LWHEAs typically incorporate elements such as Mg, Al, Li, Cr, Fe, Ti, Mo, V, etc [14].For instance, Youssef et al [16] investigated a mechanical alloying based low-density nanocrystalline HEA that included Al, Li, Mg, Sc, and Ti as alloying elements, which demonstrated exceptional mechanical properties.It initially formed a single-phase fcc structure during ball milling and transitioned to a single-phase hexagonal close-packed (HCP) structure after annealing.This alloy exhibited an impressive strength-to-weight ratio, surpassing other nanocrystalline alloys and ceramics.However, the literature suggests that even though Mg as an alloying element offers low density, it also contributes some undesirable characteristics such as low tensile strength, limited room temperature formability, corrosion susceptibility, and high cost [17][18][19].Likewise, the incorporation of Ti provides specific strength but makes the alloy expensive [17].Liu et al [20] investigated the effect of decreasing Ti content and increasing Al concentration on the mechanical properties of Al x CoCrFeNiTi 0.25 high-entropy alloys to minimize costs and improve mechanical behavior.All of the HEA samples they tested displayed significant work hardening behavior, with the Al 0.5 CrFeNiTi 0.25 alloy exhibiting the highest fracture strength and plastic-strain limit, with values of 3.47 GPa and 40%, respectively, along with a yield strength of 1.88 GPa.Joseph et al [11] demonstrated that an increase in Al concentration improved the strength of the Al x CoCrFeNi HEA system but at the expense of ductility.Nevertheless, due to aluminum's inherent weakness in terms of strength, excessive addition of Al is detrimental to alloy strength [21,22].Additionally, Zhang et al [22] investigated the mechanical properties of Al x CoCrFeNiTi HEAs and reported a gradual decrease in compressive strength and elastic modulus with increasing Al content from x = 1.0-2.0.This is attributed to the saturation of the Al SS when the Al content exceeds x = 1.0.Therefore, to design advanced LWHEAs with desired physical and mechanical properties, it is crucial to adjust the alloy composition for maintaining the minimal density.
In practice, alloy design can take a variety of forms, such as tailoring element compositions or altering the microstructures of existing alloys to obtain specific characteristics.Alloy design involves basic deformation mechanisms, trial-and-error prototyping, high-throughput experimentation, computational techniques, and theoretical calculations.However, when investigating the wide compositional space of HEAs for identifying new materials, the typical trial-and-error approach, which has traditionally been utilized in alloy design, proves costly and time-consuming.The exploitation of evolving computational tools can significantly expedite the alloy design process by predicting desired properties and elucidating the underlying mechanisms.Consequently, computationally efficient strategies are essential to rapidly identify alloys with optimal characteristics, often with non-aligned functionalities.
Among the array of computational methods, molecular dynamics (MD) stands out as a widely adopted tool to comprehensively understand the physical behavior of materials system on the atomic scale [23][24][25][26][27][28][29][30][31][32][33][34][35].For example, Jiang et al [28] conducted MD simulations to investigate the microstructure evolution, deformation mechanisms, and mechanical properties of Al x CoCrFeNi HEAs under uniaxial tension.Their study considered the varying Al concentration in HEA, which revealed that the increased Al concentration adversely affected tensile properties.Additionally, the study highlighted how Al concentration modulates the influence of temperature and strain rate on Young's modulus and yield stress of HEA, with the complex influence of temperature and strain rate on its dislocation density.Doan et al [36] unveiled that higher Al content in AlCoCrFeNi high-entropy alloys contributes to increased nanoimprinting force.This finding emphasized the significant impact of Al concentration on the mechanical properties of the material during the nanoimprinting process.Wang et al [37] explored the tensile behavior of fcc heterogeneous CoNiFeAl x Cu 1−x HEAs, investigating the effects of strain rates, Al concentration, and degree of grain heterogeneity.The analysis demonstrated that as the Al concentration decreases, the stable stacking fault energy increases, leading to a higher tensile yield stress.Additionally, increasing the size of large grains within the heterogeneous grain structure improves plasticity by enhancing the fcc to bcc phase transformation, ultimately leading to increased uniform ductility in large grains.These insights provide valuable guidance for optimizing material structure parameters in heterogeneous grain structure HEAs to achieve superior mechanical properties.Though MD simulations can provide an acceptable level of accuracy for analyzing the effect of compositional space and inherent structures of HEAs, it is rather computationally expensive to identify the optimum HEA configurations based on iterative realizations and analyses.In this context, we propose to couple the emerging capabilities of ML with the computational strength of MD simulations for exploring the compositional space of HEAs more realistically by pushing the boundaries and hindrances due to the exorbitant computational demand of a solely physics-based simulation.
Over the past few years, there has been an extraordinary surge in the development of ML based predictive framework, especially in the domain of materials science [38][39][40][41].The exceptional capability of ML in revealing deep insights and performing data-driven materials discovery holds great promise for addressing challenges in the theoretical modeling of HEAs [42][43][44].Wen et al [45] presented a materials design strategy, which incorporated a ML surrogate model and experimental design algorithms to identify AlCoCrCuFeNi HEAs with enhanced hardness.Risal et al [46] presented the integration of ML, showcasing its ability to not only automate the classification and prediction of phases, including SS, intermetallic, and mixed phases (SS + MM) but also accelerate the development of novel HEA materials with unique microstructures.A good number of studies have adopted a combined MD and ML based approach that leverages ML technique to assist MD simulations in predicting phase formations [47,48] and mechanical properties [49][50][51][52] of HEAs.
The concise literature review presented in the preceding paragraphs highlights a few challenges associated with the strategy for alloy design such as precise characterization of variation in wide compositional space of HEA and subsequently its influence on the functional behavior and also the challenges associated with large-scale fabrication and experimentation.Owing to such challenges, it is essential to devise a high-fidelity computational framework for alloy design, which can eliminate the need for large-scale experimentations and leads to the optimal exploration of the design space of HEA efficiently.To address these challenges, this article aims to propose a computational strategy based on the seamless integration of quasi-random sampling, MD simulations and ML (refer to figure 1).The developed ML models are further interpreted based on Shapley Additive exPlanations (SHAP).The SHAP explanations are essential to understanding the model's mechanism of decision-making, which makes model deployment meaningful and explainable.The critical notions of explainability and interpretability in ML-based exploration of HEAs have not been investigated yet.
A limited number of algorithmically chosen molecular-level simulations would be performed to create a Gaussian process-based computational mapping between the varying concentrations of constituent elements of the HEA and effective properties like Young's modulus and density.The computationally efficient ML models would subsequently be exploited for large-scale predictions and multi-objective functionality attainment with non-aligned goals.As an integral part of this study, the deformation mechanisms and phase transformation behavior will be studied to explain the effect of elemental composition.The data-driven strategy presented here will lead to devising an efficient explainable ML-based bottom-up approach to alloy design for multi-objective non-aligned functionality attainments such as low density and high stiffness simultaneously.Hereafter, the article is organized as follows: section 2 provides a detailed understanding of the computational methodology adopted in the present study, section 3 discusses the numerically quantified responses gathered from the presented data-driven framework, and finally, section 4 presents the concluding remarks and perspectives.

Methodology
In this section, a detailed understanding of the methodology adopted for solving the data-driven multi-objective non-aligned goal-attainment problem is presented.A flowchart of the ML-based approach to developing the computationally efficient framework is illustrated in figure 2. The investigation is initiated by generating 128 quasi-random samples with the help of Sobol sequence algorithm [53], wherein the samples are generated within the parametric range (5%-35%) of variation of individual constituent elements of AlCoCrFeNi HEA (refer to figure 3).The summation constraint of 100% is maintained for each sample while generating the random samples.The quasi-random sequence sampling can be mathematically represented as follows where, Here α i denotes the randomly perturbed sample, SP i represents the random Sobol parameter which varies between 0 to 1, and i denotes the sample index.It is to be noted that while constructing the training sample space, a quasi-random (Sobol) sequence [54] is utilized, whereas the sample space generated for the predictions is generated using Monte Carlo sampling (MCS).Both the sampling methods, MCS and Sobol sequence sampling, generate random samples in between the defined parametric range.However, when compared with the Sobol sequence sampling, the samples generated by MCS lack uniformity and may lead to instances in clusters (with wide intervals), which forfeits the rationalization of implementing random sampling, especially when dealing with the lower number of samples.This can cause a slow convergence for MCS in terms of the sample size required for ML model formation.Hence, in the presented framework, the sample space (128 samples) generation for training the ML models is performed by utilizing the quasi-random Sobol sequence that is designed to cover the sample space more evenly compared to MCS.During the generation of prediction sample space (10 000 samples), the MCS is adopted since the uneven sampling would not have any issue related to convergence while prediction and testing (refer to figure 3).The summation of elements composition is maintained at 100 by ensuring the individual variation algorithm as Al

Modeling and simulation
The SOBOL sequence sampling based random compositional space of AlCoCrFeNi HEA is utilized to model 128 different HEA configurations.The HEA configurations are modeled as a bar of size 71.2 × 71.2 × 284.8 Å 3 in a LAMMPS [55] environment.The interatomic interaction among the constituent elements of AlCoCrFeNi HEA is modeled by utilizing the EAM/alloy forcefield [56].The simulation box is enforced with periodic boundary conditions in each direction with a timestep of 0.001.Before performing the simulation of uniaxial tensile deformation, the HEA model is subjected to energy minimization for 50 ps by utilizing the conjugate gradient algorithm.In the next step, a pressure and temperature equilibration of the HEA model is performed for another 50 ps under NPT (constant-temperature, constant-pressure) ensemble, during which the variation and convergence of the intrinsic density of HEA configuration are recorded.The equilibrated HEA model is then subjected to uniaxial tensile deformation with a strain rate of 0.5 ps −1 for 200 ps (0.1 strain, up to elastic region).The strain and corresponding stress in the Z direction are recorded throughout the simulation.The structural phase changes and subsequent deformation mechanisms illustrated in the final stage of investigation are explored by utilizing the common neighbor analysis (CNA) [57] platform through OVITO [58].

Development of ML model
The uniaxial tensile deformation is performed for 128 quasi-randomly generated samples in an MD environment, yielding a dataset containing varied atomic fractions of individual constituent elements of AlCoCrFeNi HEA as input parameters and MD-derived outcomes (Young's modulus (Y) and density (ρ)).The dataset is utilized to construct Gaussian process regression (GPR) models, to establish a generalized relationship between varying concentrations of constituent elements of HEA with Y and ρ.Section SM1 of the supplementary material provides a fundamental understanding of the GPR model.The rationale for employing GPR models is their ability to properly capture nonlinear interactions and uncertainties in data.Unlike conventional regression approaches, which mostly assume linear correlations between variables, GP models are adaptable and may account for complicated interactions between input data and output predictions.It has been noted in previously reported studies that GP models present a high-fidelity predictive framework based on a relatively smaller training dataset for further deployment in exploratory data analytics [59,60].
For developing the GPR models, the dataset is randomly divided into training (N = 108 samples) and validation (N = 20 samples) samples (refer to figure 3).It is to be noted that to ensure the sound generalization capability of the models, they are first trained with the training samples and then validated with a separate set of validation samples (which were not utilized while training the model).In addition to that, while training the models, hyperparameter tuning is performed by utilizing Bayesian optimization to enforce the optimal hyperparameters while constructing the models.The hyperparameter tuning is performed for 30 iteration steps by utilizing the 'Expected improvement (EI)' as an acquisition function which resulted in global minima of mean square error for both (GPR_Y and GPR_ρ) GPR models (refer to figure S1 in supplementary material).The range of hyperparameter search during Bayesian optimization and subsequent selection of optimized hyperparameters while training the GPR models for Young's modulus (GPR_Y) and density (GPR_ρ) are presented in table 1.The optimized hyperparameters are utilized to train the GPR models, wherein a k-fold (k = 9) cross-validation scheme is implemented for enforcing the simultaneous training-testing scheme and utilizing complete data in the training (and testing) process.The predictive accuracy of the developed models is assessed by observing the sample-to-sample error in prediction.The relative error (%) in prediction is evaluated as follows where i denotes the sample index, ε denotes the relative error in prediction, y denotes the MD simulation generated response, and ȳ refers to the GPR predicted response.Thus to summarize the training and testing process, the following two stages of validation are adopted: (1) k-fold (k = 9) cross-validation scheme during training and formation of the GPR model, (2) further testing using 20 completely unseen samples which are not involved in the training process.Two levels of validation will lead to adequate confidence in the prediction capability of the machine-learning models.

Deployment of the ML model for explainable optimized material design
Once adequate predictive accuracy is achieved from the developed ML models, the models are deployed for large-scale predictions corresponding to the unexplored sample spaces (refer to figure 3).The large-scale predictions are utilized to reveal deep insights into the critical relationship between the compositional space of HEA and its functional responses.In this regard, the large-scale predictions are at first utilized for understanding the influence of individual parametric variation on the stiffness and density of AlCoCrFeNi HEA.Further to understand the magnitude of the statistical significance of variation in individual atomic fraction of constituent elements of HEA on the quantities of interest, a data-driven sensitivity analysis is performed.The sensitivity analysis is performed by evaluating the relative coefficient of variation (RCV), which can be calculated as follows Here, COV x stands for coefficient of variation with respect to x (x ∈ an individual constituent element of AlCoCrFeNi HEA), σ x stands for the standard deviation of the response corresponding to variation in x alone, µ x stands for mean of the response corresponding to variation in x alone.For instance, with the variation in the elemental concentration of Al from 5% to 35%, while Co, Cr, Fe, and Ni remain equi-atomic, the σ Al and µ Al are calculated for the output responses Y and ρ, which are further used to evaluate COV Al .Likewise, individually COV Co , COV Cr , COV Fe , and COV Ni are evaluated and subsequently, the RCV x is calculated.
The constructed GPR models are further utilized in performing multi-objective genetic algorithm (MOGA) optimization for designing the LWHEA with sufficiently high stiffness (non-aligned functionalities).A detailed overview of the genetic algorithm-based multi-objective optimization is provided in section SM2 of the supplementary material.The optimization is performed by developing a MATLAB code with a population size of '100' , a cross-over rate of '0.8' and by enforcing the adaptive mutation function.The compositional space of an individual HEA configuration is featured into a numerical input vector (five-dimensional vector based on the elemental composition of AlCoCrFeNi HEA) in a similar fashion as carried out during the construction of the sample space for ML model construction.The search space (elemental concentration of Al, Co, Cr, Fe, and Ni) of the generations is restricted in between 5% to 35%, with the constraint of summation of the composition as 100.To ensure the constraint of the summation is restricted exactly at 100, a feedback loop is utilized in the code, which only passes the generations for which elemental compositions add up to 100.The MOGA framework utilizes the developed ML models as the fitness functions that predict the solutions for the selected generations obtained from the search space.The cross-over rate of 0.8 ensures that there is sufficient probability (80%) of the directional (maximize Y and minimize ρ) change in the solutions for achieving the optimality.
In general, the developed ML models are utilized as a black-box for exploring the parametric variation range for the desired properties.Such as, in this case, the ML models explore the variation in the atomic fraction of constituent elements of AlCoCrFeNi HEA to predict Young's modulus and density of unknown atomic compositions.Even though, the models' efficient prediction capabilities can be directly deployed, the clarity in the underlying mechanism of the models' functioning remains unclear (refer to figure 1(B)).The ML models developed in the present study are further subjected to SHAP [61] investigation.The SHAP analysis explores the individual and compound influence of the input features on the prediction-making process of the developed models, leading to an explainable component of the ML outcomes.

ML-based insights on the compositional space of HEAs
This section discusses the numerically quantifiable findings derived from the proposed data-driven framework.The analysis begins by validating the MD simulation outcomes (Young's modulus (Y) and density (ρ)) with the results reported in the previous literature.For example, Jiang et al [62] reported Young's modulus of Al 0.1 CoCrFeNi HEA (with the Al concentration as 2.43%) and Al 0.7 CoCrFeNi HEA (with the Al concentration as 14.9%) as 118.4 GPa and 78 GPa.They obtained this result by performing the MD simulation of uniaxial tensile deformation.The MD simulation of uniaxial tensile deformation performed for the same configurations in the current investigation resulted in Young's modulus of 118.6 GPa (for Al 0.1 CoCrFeNi) and 78.6 (for Al 0.7 CoCrFeNi).The observations are also found consistent with the values reported in a recent study by Barman and Dey [35].The density of Al 0.1 CoCrFeNi HEA configuration is evaluated as 8.009 g cm −3 , which is in close agreement with the published literature [63,64].Thus, the observations obtained from the MD simulation are in strong agreement with the experimental and theoretical observations reported in the past literature.It is to be noted that the current study only considers the fcc crystal structure of the HEA configurations; however, the crystal structure of the alloy is heavily influenced by the processing methods.With the change in crystal structure, the properties of HEA tend to vary, even for the same compositional space [36].Such aspects should be carefully investigated in future studies.With adequate confidence in the MD simulation-driven observations, the simulations of uniaxial tensile deformation are extended for SOBOL sequence sampling to generate 128 samples (training and validation data, refer to figure 3).
The samples generated by utilizing SOBOL sequence sampling consisted of 128 combinations of varying compositions of each constituent element (Fe, Ni, Co, Cr, and Al), wherein an individual elemental composition varies between 5%-35%.While generating each sample space, the summation constraint of 100% atomic fraction is imposed to ensure that the sum of the individual atomic fractions of each element equals 100.The Young's modulus and density of the 128 AlCoCrFeNi HEA configurations are recorded as an outcome of the uniaxial tensile deformation, resulting in a dataset of 128 samples, where the compositional variation (atomic concentration of Al, Fe, Ni, Co, and Cr) is considered as input parameters, and the Young's modulus and density are considered as responses.The statistical correlation between input parameters and the desired responses is illustrated in figure 4, wherein, the heatmap of Pearson's correlation matrix highlights the correlation mapping of individual input parameters with the responses.It is evident from figure 4 that Al concentration exhibits a strong negative correlation with both of the responses (Y and ρ), in contrast, the Ni concentration exhibits a strong positive correlation.The variation in atomic concentration of Cr, Fe, and Co exhibits a mild influence on Young's modulus and density of AlCoCrFeNi HEA.
The correlation mapping also reveals that there exists a strong positive correlation between Young's modulus (Y) and density (ρ).It is to be noted that the presented correlogram is based on the dataset constructed by performing 128 MD simulations.The strong positive correlation between Young's modulus and density (i.e. as stiffness increases, density increases) poses a challenge in developing LWHEAs with sufficient stiffness.Hence, a data-driven framework can help in understanding the relationship between HEA compositional space and functional responses such as Young's modulus and density, and can assume a significant role in optimizing compositional atomic fractions to produce a lightweight alloy system with maximum stiffness.Thus, the MD simulation based dataset is used in the following stage to develop Gaussian process ML models for performing large-scale predictions of the quantities of interest (Young's modulus and density) for unknown AlCoCrFeNi HEA configurations (refer to figure 3 for prediction data).The sample space is randomly divided into training (N = 108 samples) and validation (N = 20 samples) datasets for developing the ML models.It is worth noting that to maintain a sound generalization capability of the constructed models, the models are validated using an unknown dataset (which is not introduced during training).
Figure 5 illustrates the validation of ML models using scatter plots and error plots for individual responses.Figures 5(A) and (B) highlight the validation of ML models for predicting Young's modulus, with the scatter plot (figure 5(A)) revealing a close match between the predicted and true (MD) values of Y regardless of the training or validation samples.This is supported by the probability density function (PDF) of the prediction error percentage (see figure 5(B)), which illustrates that the maximum percentage error in the prediction is in the range of ±2%, regardless of the training or validation samples.Further, the probability of prediction error is less for higher number of samples.The constructed ML model corresponding to Young's modulus resulted in mean absolute error (MAE) values of 0.39 and 0.475, while training and testing, respectively.A similar ML validation is performed in the case of prediction of density (ρ) as well (refer to figures 5(C) and (D)), wherein a close match in the true and predicted values of the ρ is observed (refer to figure 5(C)).The major population of the prediction error percentage lies in between ±0.2% (refer to figure 5(D)), regardless of the training or validation samples.The MAE values corresponding to the ML model for density were observed as 0.0024 and 0.003843, while training and testing, respectively.The model assessment metrics are presented in table S1 of the supplementary material.In the next stage, large-scale predictions for the unknown compositional space of AlCoCrFeNi HEA are made using the constructed models once sufficient predictive accuracy is achieved.The observations gained from the large-scale predictions can be helpful in revealing deep insights into the influence of compositional space on the functional responses of the HEA.In order to achieve that, a MCS-based prediction dataset (N = 10 000 samples (refer to figure 2)) is constructed.By utilizing such a large-scale random sampling, five different prediction datasets are constructed, wherein at a time an individual compositional element is varied from 5% to 35%, while the remaining elements are kept equiatomic.The constructed ML models are deployed to predict Young's modulus and density (refer to figure 6) for these five unknown sample spaces.The observations gained from these predictions provide a clear perspective on the influence of an individual compositional variation on the functional responses (Y and ρ) of AlCoCrFeNi HEAs.The parametric influence of individual atomic fractions of constituent elements of AlCoCrFeNi HEA reveals that regardless of the responses (Y or ρ), the increase in Al concentration has a negative influence, whereas, the increase in the concentration of the remaining elements demonstrates a positive influence (refer to figure 6).The variation trends (%) in quantities of interest as a function of variation in an individual compositional element from 5% to 35% atomic concentration, which is drawn by the data-driven investigation presented in figure 6 are further highlighted in table 2.Here the compositional-space-dependent percentage variation in desired quantities (Y and ρ) is evaluated with respect to the observation at the minimum concentration of an individual atomic fraction.
In the next stage, the large-scale predictions derived from the GPR models are utilized to perform sensitivity analysis.Figure 7(A) illustrates the sensitivity analysis for Young's modulus of AlCoCrFeNi HEA, wherein, the statistical significance of the elemental composition of the HEA decreases in the order of Al, Ni, Cr, Fe, and Co.This suggests that Young's modulus varies significantly as the atomic concentrations of Al and Ni vary.This is supported by the boxplots illustrated in figure 7(B), which shows that the variation in Al and Ni concentration leads to relatively higher variation in Young's modulus when compared with the influence of variation in Cr, Fe and Co concentration.The variation in Al, Ni and Co concentration has a relatively higher influence over the density of HEA.It is to be noted that the observations gathered from figure 7 are fortifying the observations and conclusions presented in figure 6, and table 2.

Data-driven multi-functionality attainment
The numerical analysis presented so far establishes that the increase in Al concentration in AlCoCrFeNi HEA drastically reduces its stiffness and density, on the contrary, the increase in Ni concentration leads to a drastic increment in the stiffness and density of the alloy.In addition to that the increase in Cr concentration also leads to a mild increase in the stiffness of the HEA, whereas, the increment in Co concentration substantially contributes to the increase in the density of the HEA.Owing to such a complex relationship between elemental composition and the desired responses of the HEA, it is critical to find the optimal compositional space of the AlCoCrFeNi HEA which addresses the trade-off between stiffness and density effectively.Hence, in the following stage of the investigation, the developed ML model is exploited to perform MOGA optimization for non-aligned multi-functionality attainment.
The Pareto front obtained from MOGA optimization is illustrated in figure 8.The optimal solutions obtained from the GPR-based MOGA are validated by performing a few sets of separate MD simulations (refer to figure 8), which corroborates an adequate level of accuracy.The optimal composition of   the Pareto solution can be utilized to find the optimal compositional space of AlCoCrFeNi HEA, wherein the trade-off between stiffness and density can be accounted for in the alloy design process.

Al concentration dependent deformation physics of AlCoCrFeNi HEA
The data-driven investigation presented in the preceding paragraphs results in the optimal compositional space of AlCoCrFeNi HEA, wherein, it is observed that the variation in Al concentration greatly influences the responses (Young's modulus and density).With this understanding, in this subsection, the deformation mechanisms of the three random AlCoCrFeNi HEA configurations (composition-1, 2 and 3) shown in figure 8 9(B).The stress-strain behavior maintains linearity within the elastic limit (refer to figure 9(B)), indicating the elastic deformation in the HEAs.However, beyond the yield point, a sudden stress drop takes place due to atomic structural transformations during quasi-static uniaxial tensile deformation.Notably, the magnitude of this stress drop at the emergence of yielding is more pronounced in sample-1 (HEA configuration with lower Al atomic fraction).The critical stress and strain (corresponding to yield point) significantly decrease with the increase in the Al concentration from 5% to 30% (sample-1 ≈ 5% Al, to sample-3 ≈ 30% Al), suggesting an early initiation of yielding.In particular, an 81.67% (from 13.15 GPa to 2.41 GPa) decrease in yield stress is recorded with the increase in Al concentration decreases from 5% to 30%, while the yield strain decreases from 0.125 to 0.054, marking a 56.8% reduction.This is explained by the formation of active shear bands triggered by an increase in Al content, which leads to substantial deformation and a decrease in yield strength [62].The CNA based quantitative representation of the influence of Al concentration on microstructural evolution is depicted in figure 9(C).These observations reveal that the initial fcc structure is maintained at nearly 100% during elastic deformation, which is also evident in figure 9(A) (refer to points a 1 , a 2 , and a 3 ).However, the intrinsic fcc crystal structure of AlCoCrFeNi HEA undergoes a sudden transformation with an increase in applied strain, particularly at the emergence of yielding (especially for the cases of ≈15% and ≈30% Al concentration).As illustrated in figure 9, these rapid structural transformations are mainly contributed by the development of stacking faults and twin boundaries which lead to the post-yield stress reduction.The emergence of structural transformation takes place at a relatively lower strain (b2, b3) as Al concentration is increased, leading to a lower yield strain.It is worth noting that the initiation of atomic structural transformation decelerates at higher Al concentrations (i.e., early occurrence of structural transformation in the case of Al 30 ), leading to a reduced degree of stress drop in the stress-strain responses, as demonstrated in figures 9(B) and (C).Additionally, during tensile deformation, AlCoCrFeNi HEA exhibits a prevalence of bcc atomic structure at higher Al concentrations, which is not that prominent in the cases of sample-1 and sample-2 (refer to figures 9(C) and (A)).
The distinct hues of atoms in figure 9(A) indicate different types of atomic structures: amorphous atoms are indicated by grey, and atoms representing fcc, bcc, and HCP structures are represented by green, blue, and red, respectively.Figure 9(A) (a 1 , b 1 , and c 1 ) illustrates the atomic structural transformations in sample-1 at different strain levels (refer to figure 9(B)): a 1 (0.12, pre-yielding), b 1 (strain of 0.125, emergence of yielding), and c 1 (strain of 0.128, post-yielding).Initially, at a 1 (0.12 (within the elastic range)), the intrinsic fcc structure primarily exists.As the strain increases to b 1 (0.125), the transformation of fcc to HCP crystal structure takes place, which becomes more prominent in c 1 (0.128).These HCP layers are commonly identified as twin boundaries and stacking faults [35].The rapid stress drops observed in figure 9(B) at the start of yielding are attributed to the rapid generation of HCP planes (twin boundaries and stacking faults).It is worth noting that with the increase in Al concentration to 14.95%, the formation of these stacking fault layers takes place at a relatively lower strain of b 2 (0.098).Furthermore, as the Al concentration increases to ≈30%, the fcc atomic structure is majorly replaced by the bcc structure, especially evident at strain b 3 (0.0628), indicating the prevalence of the bcc structure at higher Al concentrations.Such physical insights during the deformation process would be crucial in the alloy design process along with the ML-based data-driven predictions, specifically for the identified optimum configurations with different degrees of functionality attainment.

SHAP driven explainability
Predictive frameworks based on computationally efficient ML models frequently serve as a black box, making interpretation of the model's performance challenging.Using SHAP [61,65] values to mitigate such challenges is a promising approach, as the model's prediction performance can be assessed in terms of the model's dependency on specific parameters during decision-making.SHAP indicators are derived from the Shapley values of game theory and are used to interpret how each player contributes to a collaborative game.With this intent, the input features (considered to construct the ML model) are viewed as individual players, and the contributions of individual features are compared to explain the process of the decision-making of the ML model.
In this section, the SHAP-driven explainability investigation is presented to comprehend the mechanism of prediction made by the developed ML models (GPR_Y and GPR_ρ).The SHAP summary plots for the large-scale (10 000 samples) prediction space are explored (refer to figures 10(A) and (B)) to obtain deep The features with positive SHAP values have a positive influence over the model's prediction capability and vice-versa.With this understanding, it is evident from figures 10(A) and (B) that the Al and Ni concentrations have a relatively higher impact on the model's prediction capability, regardless of the GPR_Y and GPR_ρ.In addition to that, lower Al concentrations have a positive impact (and vice-versa) on the predictions produced by the GPR_Y and GPR_ρ.Higher Ni concentrations, on the other hand, have a positive effect on the created model's prediction capabilities.It is to be noted that the SHAP-driven summary plots fortify the understanding gained from the earlier sensitivity analysis illustrated in figure 7.
To understand the model's prediction mechanism in more detail, the SHAP force plots (refer to figure 10(C)) are obtained specifically for the case of optimal elemental concentration denoted by point 2 in figure 8.The SHAP values corresponding to each input feature are determined for predicting Young's modulus and density of Al  10(D)).The force plots in figure 10(C) show that when predicting Young's modulus and density of the suggested optimal HEA configuration, the models give relatively more (positive) weightage to Al and Ni concentrations and less (negative) weightage to Co concentrations.Similar conclusions can be drawn from the bar plots in figure 10(D), where the relatively very small and negative Shapley values are obtained corresponding to Co regardless of the GPR models (GPR_Y and GPR_ρ).So far, the SHAP-driven explainability of the constructed models reveals that variation in Al concentration has a relatively higher influence on the prediction-making, on the contrary, the variation in Co concentration has the least impact on the prediction mechanism of the models.To understand the interactive influence of Al and Co concentrations on the prediction mechanism of the GPR models, the distribution of SHAP values for the prediction provided by GPR_Y and GPR_ρ is investigated (refer to figures 10(E) and (F)).Figure 10(E) shows that lower concentrations of Co and Al together result in negative SHAP values of Co, i.e. the developed model 'GPR_Y' is negatively affected by this combination.Similar interpretations can be made with respect to the combination of higher concentrations of Co and Al together.The higher concentration of Al in combination with the lower concentration of Co or vice-versa majorly leads to the positive SHAP values of Co, i.e. these combinations positively affect the developed GPR_Y model.It is evident from figure 10(F) that the Co concentration higher than 15% leads to positive SHAP values irrespective of Al concentration for the other model GPR_ρ, which indicates that the concentration of Co more than 15% has a positive influence on the model (GPR_ρ) prediction.Such qualifiable explainability in the ML prediction would bring more confidence and insights into the outcomes of ML-based analyses.

Summary and perspective
In this article, we propose an explainable ML for HEAs, leading to the attainment of multifunctionality with non-aligned objectives.The design of HEAs presents a significant challenge due to the large compositional space and composition-specific variation in their functional behavior.The traditional alloy design includes trial-and-error prototyping and high-throughput experimentation, which again is challenging due to large-scale fabrication and time-consuming experimentation.To address these challenges, a computational strategy for HEA design is presented here based on the seamless integration of quasi-random sampling, MD simulations and ML along with explainable insights through SHAP.
A total of 128 algorithmically chosen molecular-level simulations are performed to create a Gaussian process-based computational mapping between the varying concentrations of constituent elements of the HEA (Al, Fe, Ni, Cr, and Co atoms) and the effective properties like Young's modulus (Y) and density (ρ).The computationally efficient ML models are subsequently exploited for large-scale predictions and multi-objective functionality attainment with non-aligned goals based on iterative multi-objective minimization solvers.The salient outcomes derived from the explainable data-driven analysis presented in this article are summarized below.
1. Pearson's correlation matrix-based observations revealed that Al concentration exhibits a strong negative correlation with both of the effective properties (Y and ρ).In contrast, the Ni concentration exhibits a strong positive correlation.The correlation mapping also revealed that there exists a strong positive correlation between Young's modulus (Y) and density (ρ). 2. The developed ML models demonstrated exceptional generalization capability with a prediction error in the range of ±2% while predicting Young's modulus and ±0.2% while predicting the density of AlCoCrFeNi HEA. 3. The SHAP driven explainability of the developed models revealed that the Al and Ni concentrations have a relatively higher impact on the models' effective property prediction capability.4. The parametric influence of individual atomic fractions of constituent elements of AlCoCrFeNi HEA reveals that regardless of the responses (Y or ρ), the increase in Al concentration has a negative influence, whereas, the increase in the concentration of the remaining elements demonstrates a positive influence. 5.The increase in Al concentration (alone) from 5% to 35% resulted in a 70% decrease in Young's modulus and a 27% decrease in the density of AlCoCrFeNi HEA.In contrast, the increase in Ni concentration (alone) from 5% to 35% resulted in a 127% increase in Young's modulus and a 13% increase in the density of AlCoCrFeNi HEA. 6.The data-driven sensitivity analysis for Young's modulus of the AlCoCrFeNi HEA revealed that the statistical significance of the elemental composition of the HEA decreases in the order of Al, Ni, Cr, Fe, and Co, whereas, in the case of sensitivity analysis of density, the variation in Al, Ni and Co concentration revealed to have relatively higher influence when compared with the variation in Cr and Fe. 7. The deformation mechanism of AlCoCrFeNi HEA configuration with a higher concentration of Al exhibits the rapid structural transformation (fcc to bcc) which is primarily attributed to the formation of twin boundaries and stacking faults, ultimately leading to the post-yield stress drop.
The primary challenge in alloy design by using efficient ML models lies in efficient featurization of the input space.In this regard, utilizing the molecular or atomic descriptors for constructing the input feature space can be an alternate idea, which promotes scalability in the design of HEA [49].On the other hand, chemical information-driven feature vectors [66,67] have been successfully used in the construction of input features for developing ML models corresponding to HEA design.Though such featurization approaches have proven to be fairly effective, they also present a major challenge in terms of high dimensionality, which leads to an inevitable difficulty in solving the inverse problem.For instance, if the optimal feature vector is derived rather than the optimal atomic composition (as done in this study), it becomes extremely difficult to pinpoint the alloy's compositional space in terms of the participating elements using the optimal high-dimensional feature vector.Further, training an ML model involving high dimensional input parameter space requires a significantly large number of training samples, leading to higher computational expenses.To mitigate such challenges, in the present study, we utilized the individual atomic composition of AlCoCrFeNi HEA as the input features.Since, the current investigation deals with the HEA design considering fixed individual elements, the atomic composition-based featurization leads to a high-fidelity computational framework.The present approach is very effective when the participating atoms are pre-decided, and their individual fractions are utilized as design parameters.However, for developing a more generic ML model where the participating atoms are not pre-decided (including the numbers), utilizing molecular or atomic descriptors, and chemical information-driven feature vectors may be more advantageous.The proposed concepts of explainability and Gaussian process-based ML can be extended in such predictive frameworks.

Conclusions
This article presents an efficient and explainable ML-assisted computational framework for designing multifunctional AlCoCrFeNi HEAs with non-aligned objectives such as enhanced stiffness and low effective density.The proposed hybrid simulation framework is capable of eliminating or substantially reducing the requirement for high throughput experimentations, which can further be extended to designing other HEAs as well as for finding the optimal solutions of elemental concentration by considering multi-faceted functional demands.

Figure 1 .
Figure 1.Overview of the computational data-driven framework for designing lightweight AlCoCrFeNi HEA with enhanced stiffness.(A) To generate the dataset required for developing the ML model, 128 MD simulations of uniaxial tensile deformation are performed for the HEA configurations with quasi-random atomic composition.The dataset subsequently is utilized to develop computationally efficient ML models.Large-scale predictions of the desired responses (Young's modulus (Y) and density (ρ)) for unknown HEA configurations are produced by exploiting the generalized GPR models.The ML models are utilized further for performing sensitivity analysis and genetic algorithm based multi-functionality attainment to determine the optimum atomic composition of AlCoCrFeNi HEA by minimizing the density and maximizing the stiffness.(B) Explainability analysis of the constructed ML models, leading to enhanced insights.

Figure 2 .
Figure 2. Flow diagram of the stage-wise approach for machine learning-based data-driven HEA design strategy.
[100−(a+b+c+d)] Co a Cr b Fe c Ni d (a: atomic fraction of Co, b: atomic fraction of Cr, c: atomic fraction of Fe, d: atomic fraction of Ni).

Figure 3 .
Figure 3. Quasi-random Sobol sequence sampling space of input parameters (constituent elements of AlCoCrFeNi HEA) for generating the training, validation and prediction dataset.

Figure 5 .
Figure 5. Validation of Gaussian process regression (GPR) based machine learning (ML) model.(A) Scatter plot between true and predicted Young's modulus, (B) Probability density function (PDF) plot for the sample-wise percentage error in the prediction of Young's modulus, (C) Scatter plot between true and predicted density, (D) probability density function (PDF) plot for the sample wise percentage error in the prediction of density.

Figure 6 .
Figure 6.Large-scale prediction capability of the developed GPR model.The variation in Young's modulus (Y) and density (ρ) of AlCoCrFeNi HEA as a function of variation in the atomic concentration of (A) Al, (B) Ni, (C) Cr, (D) Fe, (E) Co.

Figure 7 .
Figure 7. Data-driven sensitivity analysis.(A) Sensitivity analysis for the Young's modulus of AlCoCrFeNi HEA based on relative coefficient of variation, (B) magnitude of variation in Young's modulus of AlCoCrFeNi HEA as a function of variation in its constituent elements, (C) sensitivity analysis for the density of AlCoCrFeNi HEA based on relative coefficient of variation, (D) magnitude of variation in density of AlCoCrFeNi HEA as a function of variation in its constituent elements.

Figure 8 .
Figure 8. Non-aligned multi-functionality attainment.GPR-driven multi-objective genetic algorithm (MOGA) based optimization is carried out.The Pareto solution is presented by the red circular points and the corresponding validation by MD simulation is presented by yellow hexagonal points.The regions 1, 2 and 3 correspond to the lower concentration, medium concentration, and higher concentration of Al, respectively.This also leads to regions as: 1 (high stiffness and density), 2 (moderate stiffness with relatively low density), and 3 (lowered stiffness and density).

Figure 10 .
Figure 10.SHAP driven explainability and interpretability of constructed ML models.(A) SHAP summary plot for large-scale predictions made by GPR_Y (B) SHAP summary plot for large-scale predictions made by GPR_ρ (C) SHAP force plots for an individual prediction made by GPR_Y and GPR_ρ corresponding to an optimal elemental composition suggested by GPR-driven MOGA (refer to point 2 denoted in figure 8) (D) The input features specific SHAP values while making the prediction of Young's modulus and density of Al14.95Ni24.78Cr20.05Fe21.82Co18.4HEA configuration (E) The interactive influence of Al and Co concentration on the prediction mechanism of GPR_Y (F) The interactive influence of Al and Co concentration on the prediction mechanism of GPR_ρ.From figures 10(A) and (B), it is noted that Al has the highest impact on ML prediction.Considering that it is not possible to present all possible feature interactions due to the limitation of space, we have shown representative results involving Al in (E), (F).However, other possible numerical results can be obtained following a similar approach.

Table 1 .
Range of hyperparameter search used during Bayesian optimization based hyperparameter tuning, and post-optimization best hyperparameters corresponding to GPR_Y and GPR_ρ.

Table 2 .
The percentage variation trends in the functional properties as a function of the increase in individual constituent elements of AlCoCrFeNi HEA.Here ↑ and ↓ denote the increase and decrease, respectively.