Comparative assessment of supervised machine learning algorithms for predicting geometric characteristics of laser cladded inconel 718

Laser cladding, an innovative surface modification and coating preparation process, has emerged as a research hotspot in material surface modification and green remanufacturing domains. In the laser cladding process, the interaction between laser light, powder particles, and the substrate results in a complicated mapping connection between process parameters and clad layer quality. This work aims to shed light on this mapping using fast evolving machine learning algorithms. A full factorial experimental design was employed to clad Inconel 718 powder on an A286 substrate comprising 64 groups. Analysis of variance, contour plots, and surface plots were used to explore the effects of laser power, powder feeding rate, and scanning speed on the width, height, and dilution rate of the cladding. The performance of the predictive models was evaluated using the index of merit (IM), which includes mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R2). By comparing the performance of the models, it was found that the Extra Trees, Random forest regression, Decision tree regression, and XGBoost algorithms exhibited the highest predictive accuracy. Specifically, the Extra Trees algorithm outperformed other machine learning models in predicting the cladding width, while the RFR algorithm excelled in predicting the associated height. The DTR algorithm demonstrated the best performance in predicting the cladding dilution rate. The R2 values for width, height, and dilution rate were found to be 0.949, 0.954, and 0.912, respectively, for these three models.


Introduction
Laser cladding is an advanced surface modification and coating preparation technology, integrating material preparation and surface configuration, and is a crucial support technology for green remanufacturing technology.Laser cladding exploits the high energy density of a laser beam to melt and spray metal powder (or wire) onto the surface of the target substrate, forming coatings with high hardness, high abrasion resistance, high corrosion resistance, etc.The cladding process enables surface modification of critical parts as well as the repair of surface damage, and it has been widely used in the fields of aviation, aerospace, automotive, machinery and so on [1,2].
The process parameters of the laser cladding process have a profound effect on the quality characteristics of the resulting cladded layer (i.e., the geometrical and mechanical properties).The coupling of light, powder, and substrate properties leads to a complex mapping relationship between process parameters and cladding layer [3,4].In order to optimize laser cladding, many scholars are exploring the influence of the different process parameters on the coating morphology and mechanical [5,6].In the study by Liu et al 27 sets of experiments were selected using a full factorial design (FFD) to investigate the interplay between laser power(P), powder feed rate(F), and scanning speed(S) on the cross-sectional dimensions (i.e., width and height) of a cladded deposit consisting of ferrous self-melting alloy powder.A nonlinear fitting model was used to fit the relationship between geometrical parameters and process parameters [7].Li et al used a single-track orthogonal experiment to explore the influence of process parameters (substrate tilt angle, P, F, and S) on the cladding layer width and height during the laser cladding deposition of Ni60A powder under the tilted substrate.The results showed that an increase in tilt angle would increase the cladding layer width and decrease the height, with the width being more significantly affected by laser power, and the height being more influenced by S and F [8]. Xu et al used Taguchi's method to design L 16 orthogonal experiments to melt cladding of In718 on a A286 substrate [9].They analyzed the effects of P, F, and S, on micro-hardness, load bearing capacity, yield strength, ultimate tensile strength, and elongation and failure.In addition, they have shown that using optimized process paramaters, mechanical properties could be further improved.Despite the differences in the selection of process parameters by researchers in many scientific papers, it is still evident that P, F, and S generally have an important influence in different experimental studies.
The optimization of laser cladding process parameters is often divided into three stages: experimental design, predictive modeling, and parameter optimization.The research methodology adopted in each stage affects the final results.Currently, robust experimental designs for laser cladding often employ factorial experiments, Taguchi analysis methods, and response surface methodology [5][6][7].
To realize the complex mapping between laser cladding process parameters and cladding layer quality, three methods are commonly used: (i) statistical analysis method, to establish the regression model between the process parameters and the response [10][11][12]; (ii) using finite element analysis methods, the established threedimensional model controls each parameter variable, simulates the experimental process of laser cladding, and predicts the desired experimental results [13][14][15][16]; (iii) application of machine learning (ML) algorithms such as Random forest regression (RFR), Support Vector Machine (SVM), Artificial Neural Network (ANN), and Deep Learning [16][17][18][19][20][21].
Alizadeh-Sh et al melted In718 alloy powder on the surface of A286 iron-based superalloy.They proposed an empirical statistical analysis based on the linear regression (LR) method to analyze the melting and cladding process.The critical geometrical characteristics (i.e., width, height, angle, dilution rate) required to avoid solidification cracking during the melting process were predicted [22].Lian et al carried out laser cladding on curved surfaces and used the response surface method to establish mathematical models for width, height, and dilution rate.The authors obtained the relationship between the response and the process parameters, and experimentally verified the model's reliability.However, it is difficult to solely use regression equations to describe this correlation when dealing with nonlinear data and data characteristics involving complex polynomials [23].Song et al established a three-dimensional finite element model of the laser cladding process of 7075 aluminum alloy powder on 2024 aluminum alloy substrate, and obtained the temperature field and residual stress field generated during the process, in order to analyze the effects of different laser power, scanning speed, cladding layer lengths, cladding layer patterns, and cladding angles on fatigue life of aircraft Fuselage [24].Wolff et al predicted the effects of process parameters on temperature distribution, liquid metal flow, cladding layer geometry, and dilution rate in the melt pool during the laser cladding process by establishing a threedimensional thermo fluid dynamics model and using a surface contour calculation method based on minimizing surface free energy [25].Many assumptions need to be made, which often do not align well with the actual cladding process.This misalignment results in simulation outcomes that struggle to offer meaningful guidance for the actual process.Additionally, the finite element method often consumes a significant amount of time during the solving process.
To circumvent the above issues, many scholars have recently turned their attention to rapidly developing research methods such as ML to achieve better process predictions.Omar et al studied the applicability of common ML algorithms such as Gaussian Process Regression, Decision Tree Regression (DTR), Random Forest Regression (RFR), Support Vector Regression (SVR), Gradient Boosting Regressor, and Multi-layer Perceptron (MLP) in predicting friction welding process parameters.The results showed that Gradient Boosting Regressor (GBR), Support Vector Regressor (SVR), and Gaussian Process Regressor had the highest accuracy with a percentage error of less than 3% [17].According to literature research, commonly used ML algorithms in establishing prediction models for laser cladding include Back-propagation Neural Network (BP), SVR, DTR, RFR, GBR, etc. [3,20,[26][27][28][29][30][31].
Although numerous scholars have conducted detailed studies on predicting laser cladding process parameters, fewer have undertaken in-depth comparative analyses of these ML algorithms to determine their accuracy and applicability.Therefore, the primary focus of this study is to compare the predictive accuracy of commonly used ML algorithms in determining the geometric characteristics of laser-cladded layers of Inconel 718.Specifically, our study centers on K-Nearest Neighbors (KNN), Back-propagation Neural Network (BP), Support Vector Regression (SVR), Decision Tree Regression (DTR), Random Forest Regression (RFR), Extra Trees, and XGBoost.Our objective is to establish which ML algorithms exhibit higher precision in predicting process parameters for laser cladding.We conducted 64 FFD experiments on Inconel 718 clad on an A286 substrate.First, we applied an analysis of variance to assess the contribution of each parameter to various responses and the statistical significance of the cladding process.Secondly, traditional analytical methods investigated the relationship between the three inputs-laser power (P), powder feed rate (F), and scanning speed (S)-and the responses-cladding width (W), cladding height (H), and dilution rate (D).Finally, we optimized the hyperparameters in the machine learning models using genetic algorithms (GA).Subsequently, we assessed the performance of these optimized ML algorithms in predicting the quality indices (IMs) of the laser cladding process parameters, and comparative evaluations were performed.

Laser cladding
Figure 1 shows the ZKZM-Z06 laser cladding system employed in this work.The laser cladding system, which is installed in a 4-axis CNC machine, consists of 6KW fiber laser, a coaxial laser cladding head, and powder feeding and cooling systems.Disc-shaped samples (ø150 mm × 10 mm) consisting of iron-based high-temperature alloy (A286) were used as the substrate material.Nickel-based high-temperature alloy In718 powder was selected as the cladding material.To ensure good flowability, In718 powder with spherical-shaped particles in the diameter of 45-150 μm was chosen, and its micrograph, obtained through a Scanning Electron Microscope (SEM, FEI Quanta 250), is shown in figure 2.
The chemical composition of the substrate was determined using carbon-sulfur analysis and atomic spectroscopy, as shown in table 1.Before the experiments, the powders were dried in a vacuum using an oven set at 100 °C for 2 h.Prior to laser cladding, the A286 substrates were ground using 180# SiC sandpaper to remove oxides and contaminants from the substrate surface and wiped clean using absolute ethanol (anhydrous ethanol).Single-track cladded layers were deposited on the pre-treated upper surface of the substrates, with representative dimensions of 5 mm × 8 mm × 10 mm.Subsequently, the samples were sequentially grounded using SiC sandpaper with grit size 400#, 800#, 1200#, 1500# and 2000# and then polished with W2.5 abrasive paste.The samples cross-section was subsequently etched using Kalling's reagent (i.e., a mixture of 100 ml water, 100 ml HCl, and 5 g CuCl 2 ).Finally, the etched cross-section was observed using an optical microscope (OM, LEICA DM4), and the geometric characteristics of the cladding layer were measured and evaluated using an image processing program (ImageJ, National Institure of Health, US).

Design of experiments
In this study, a full factorial experimental (FFD) approach was employed to design the experiments, considering the varying degrees of influence from different process parameters.The laser spot diameter was set constant at 5 mm, and argon gas was used as the shielding and carrier gas for powder delivery.The gas flow rate was maintained at 15 L/min.Based on literature research, three main process parameters, namely P, F, and S, were selected as independent variables.Each parameter was set at 4 levels, as shown in table 2. The 64 sets of parameter combinations obtained from the full factorial experimental design are shown in table 3. Figure 3 presents the measured geometric characteristics of the cross-section of the single-track claddings.The geometric characteristics, including W, H, melt pool depth (h), cladding area (Ac), and fusion area (Af), were measured using ImageJ software.
The dilution rate (D) is an important parameter which indicates the degree of bonding between the fusion cladding and the substrate.In the present work, it was calculated using equation (1) [32]: where A f is the area of the fusion zone, and A c is the area of the cladding layer.

Machine learning methods and selection of hyperparameters
Drawing upon relevant literature and the authors' own investigations into mainstream machine learning techniques, this paper employs a selection of algorithms, including K-Nearest Neighbors (KNN), Backpropagation Neural Network (BP), Support Vector Regression (SVR), Decision Tree Regression (DTR), Random Forest Regression (RFR), Extra Trees, and XGBoost [3,20,[26][27][28][29][30][31].The objective is to compare their predictive performance in the context of this study.
To avoid issues such as overfitting and underfitting, this study employs 5-fold cross-validation.In this validation technique, k denotes the number of parts into which the data is divided; K-1 folds are used for training, and the remaining folds are used for testing the model.The evaluation metrics of the cross-validation set can be used to continuously adjust the hyper-parameters to obtain a reliable and stable model.Additionally, different ML algorithms have their characteristics, and the hyperparameters that significantly impact the prediction results may vary depending on the specific problem.Therefore, this study selects important

K-nearest neighbor
The K-nearest neighbor (KNN) regression algorithm operates on the principle of distance similarity.It selects the K closest neighbors to the sample to be predicted by computing the distance between the target sample and each sample in the training set.Prediction is then made based on the labels of these neighbors.The algorithm's workflow involves storing samples and their labels during the training phase, creating a sample space.In the prediction phase, the distance between a test sample and each training set sample is calculated, and the K nearest neighbors are chosen based on these distances.For regression problems, the average value of the K neighbors is used as the prediction result.By selecting an appropriate K value and distance metric, the KNN algorithm can accurately predict regression problems.Using Euclidean distance as the vector distance algorithm, and after optimizing the K value of KNN through the genetic algorithm (GA), the value of 5 was selected [26].

BP neural network
BPNN (BP) is a neural network regression algorithm based on the back-propagation algorithm, which is based on the principle of training the network weights through two phases: forward propagation and backpropagation, in order to enable the network to learn the mapping relationship between inputs and outputs.The loss function is minimized by continuously adjusting the weights and biases and is suitable for dealing with nonlinear problems.The complexity and learning capability of the network are determined by parameters such as the number of hidden layers and the number of neurons in each hidden layer.The learning rate determines the step size of each weight update.A learning rate that is too large may cause oscillations, while a learning rate that is too small may result in slow convergence.The number of iterations determines the number of training cycles.Too many iterations may lead to overfitting, while too few may result in underfitting [4,5,20,27].

Support vector regression
Support Vector Regression (SVR) is a regression algorithm based on Support Vector Machines that fits the data by finding the optimal hyperplane and minimizing the prediction error as much as possible.It is suitable for dealing with high-dimensional data and linear problems.One of its most important hyperparameters is the kernel function.Through the comparison of different kernel functions, it was found that the model based on the radial basis is more suitable for predicting the geometric characteristics of the cladding [3].In DTR, each tree node represents a feature, and the input data is partitioned based on that feature until a predetermined stopping condition is met.At each node, DTR uses some criterion to select the best feature and segmentation point, commonly used criteria include mean square error (MSE) and mean absolute error (MAE).
During the prediction phase, the input data follows the branches of the tree and eventually reaches a leaf node, which stores the average or median value of the samples in that subset.This value serves as the prediction for that subset [17,19].There are important parameters that need to be set in DTR.These include the maximum depth of the tree, the minimum number of samples in a leaf node, and the minimum number of samples required for a split.These parameters determine the complexity and generalization ability of the tree.A larger maximum depth and minimum number of samples may lead to overfitting, while a smaller minimum number of samples may lead to underfitting.By adjusting the maximum depth through GA to optimize the Decision Tree Regression (DTR) and improve the model's performance and generalization ability, more accurate prediction results can be obtained.

Random forest regression
The Random Forest regression (RFR) algorithm is an ensemble learning method that constructs multiple decision trees by randomly selecting data and features.The final result is obtained by voting or averaging the predictions of these trees.It has high accuracy and generalization ability [17,19].In the Random Forest model, important hyperparameters include the number of decision trees and the maximum depth [27].

Extra-trees
Extra trees, also known as Extremely Randomized Trees, further increase the randomness on top of the Random Forest.Extra trees randomly select features and split points when constructing each decision tree instead of optimizing criteria.This additional randomness can enhance the diversity of the model and reduce the risk of overfitting, making it suitable for handling high-dimensional data and nonlinear problems.The important hyperparameters of Extra-trees are the same as those in the Random Forest algorithm.

XGBoost
XGBoost trains each weak classifier by optimizing the gradient of the loss function and obtains the final prediction by weighted summation.The gradient boosting algorithm adjusts the sample and classifier weights in each iteration to minimize the loss function.The accuracy and efficiency of the model is improved by using regularization and parallel processing, which is suitable for dealing with high-dimensional data and nonlinear problems [17,20].After optimization through GA, the learning rate is set to 0.1.Random Forest and Extra Trees belong to the Bagging model, while XGBoost belongs to the Boosting model.The Bagging model is a parallel ensemble learning method that constructs multiple base classifiers by randomly sampling the training set with replacement.The final prediction is made through voting or averaging.The Boosting model is a sequential ensemble learning method that iteratively trains multiple weak classifiers.In each iteration, the sample weights are adjusted based on the predictions from the previous round to improve the model's performance.

Index of merit for the evaluation of the precision of ML algorithms
Using the 59 samples obtained from the experiments, the dataset was randomly divided into a training set (70%) and a validation set (30%).To eliminate the dimensional impact among response values, a min-max scaling preprocessing was performed.In particular, the data was scaled linearly between [0,1] using the following equation ( 2 To mitigate the randomness of the ML algorithm, five consecutive calculations were performed using the hyperparameters shown in table 4. The best-performing result in predictive performance was selected as the accuracy evaluation metric for the chosen algorithm.Notice that, in this study, we adopted an index of merit (IM) introduced by Barrionuevo et al [17,19] to assess the predictive accuracy, and which is defined as: Where: This index combines the above metrics of to provide a comprehensive measure of algorithm accuracy.Notice that a value closer to 0 indicates better overall performance.

Results and discussion
Five samples were dislodged after the experiment and 59 samples were obtained.The single track cross-section maps obtained from optical microscopy (OM) are shown in figure 5, representing the claddings prepared at 800 W, 1300 W, 1800 W, and 2300 W laser power (P), respectively.

Statistical analysis
Analysis of variance (ANOVA) can be used to investigate whether the process parameters significantly affect the results of the laser cladding process.ANOVA is usually based on the assumption of normal distribution, so the results of the obtained responses need to be tested for normality.Figure 6 shows the residual normal probability plots for the measured W, H, and D. It can be observed that the data points are distributed along a straight line, indicating that W, H, and D all follow a normal distribution.The importance of p-values lies in their ability to help us perform hypothesis testing, i.e., to determine whether the sample data supports the null hypothesis.Suppose the p-value is smaller than the predetermined significance level (typically 0.05 or 0.01).In that case, we can reject the null hypothesis and conclude that there is a significant difference between the sample data and the hypothesized population mean.
The ANOVA results are shown in table 5, except for F and S, which have insignificant effects on W and D, respectively, the p-values of P, F and S on W, H, and D are all less than 0.01, indicating that all have extremely significant effects.Among them, the contribution rates of P, F, and S to W are 90.39%,1.16%, and 8.45%, respectively, indicating that P has the greatest influence on W. For H, the contribution rates of P, F, and S are 43.85%,9.30%, and 46.85%, respectively, indicating that P and F have the greatest impact on H.For D, the contribution rates of P and S are 63.80%, 0.48%, and 35.58%, respectively, indicating that P and S significantly influence the dilution rate.

Contour plot and surface plot analysis
To visually examine the relationship between the process parameters and the geometrical characteristics of the cladding, surface and contour plots of the response results versus the cladding parameters are made from the 59 sets of experimental data obtained.The concept of powder distribution density (I) is introduced to synthesize the powder volume measure [33]:  where d is the laser spot diameter.Surface plots are drawn with I and P as independent variables and each response as dependent variable, which enables the surface and contour plots of the same response to be displayed in a single plot.

Cladding width
As illustrated in figure 7(a), the contour plot shows the highest gradient variation along the direction of increasing power, while the gradient variation along the direction of I is relatively smaller.This suggests that P has a more significant impact on W compared to I. Additionally, as shown in figure 7(b), a positive correlation between P and W is observed.Initially, increasing F enhances the influence on W, which then decreases.Conversely, increasing S results in a decrease in W.This ultimately leads to a non-linear relationship between I and W. It follows that when I is constant, an increase in P within the melting power range can melt more powder per unit time, thereby improving powder utilization and increasing W. When P is constant and maintained within the melting power range, increasing I leads to more powder per unit time, resulting in a larger W.However, if I becomes too large, it can cause shielding effects, leading to a decrease in W.This explanation effectively accounts for the overall increasing trend of W.

Cladding height
Similarly, as shown in figure 8(a), the contour plot exhibits the highest gradient variation along the direction of I, while the gradient variation along the direction of P is relatively smaller.This indicates that H is more influenced by I than by P. Further observation of figure 8 reveals that increasing P leads to a gradual decrease in the slope of  H, increasing F initially increases H and then decreases it, and increasing S leads to a decrease in H. Ultimately, this results in a non-linear relationship between I and H. Within the melting range, when P remains constant, a larger I leads to a greater amount of powder deposition per unit time, resulting in a larger H.When I is constant, increasing P leads to a larger amount of melted powder, thereby increasing H.

Dilution rate
In the laser cladding process, where high-energy lasers are used to melt cladding powder and form a melt pool on the substrate, the deposition of cladding material on the substrate surface is key to enhancing its surface properties.One crucial indicator is forming a sufficiently strong metallurgical bond between the deposited material and the substrate, or previously deposited layer [20].The dilution rate (D) is typically used to gauge the bonding strength in laser cladding processes.The surface and contour plots shown in figures 9(a) and (b) show that D increases with increasing P.This is because it increases the melt pool area, which allows more powder material to be melted and bonded to the substrate material, inducing more material to fill the microscopic voids on the surface of the substrate, which increases the contact area between the materials, and in turn increases D.
Moreover, the overall effect of I on the dilution rate exhibits an increasing trend.

Evaluation of model performance
To address the non-linear mapping between process parameters and the characteristics of the cladded layer, eight ML algorithms, namely linear regression (LR), KNN, BP, SVR, DTR, RFR, Extra Trees, and XGBoost, were employed to predict the Geometric Characteristics of laser cladding.The accuracy performance metric, IM, was obtained for each algorithm's prediction of the laser cladding response values, and the results are summarized in table 6. Cross-validation is a method for assessing the performance of a model by dividing the dataset into a training set and a validation set, which allows for an assessment of the model's ability to generalize over unseen data.As shown in table 6, among the eight prediction methods used in this study, the performance of 5-fold crossvalidation in terms of the IM metric is inferior to that of the validation set data.This is because it provides a more stringent evaluation of the model.In 5-fold cross-validation, the data is divided into five mutually exclusive subsets for model training and prediction, with each subset serving as the test set in turn.The cross-validation results reflect the model's performance on new, unseen data.This implies that the model is tested on a broader range of data, leading to a higher prediction error.Therefore, the predictive performance of cross-validation may exhibit larger variance and error [34,35].Furthermore, looking at the IM results of the 5-fold crossvalidation among the eight prediction methods, the generalization capabilities of LR and tree-based models outperform KNN, BP, and SVR.This is due to LR being a parametric model that assumes a linear relationship between features and the target.This simple assumption makes LR less prone to overfitting during training, hence it has better generalization capabilities.The higher IM of tree-based models in CV could be attributed to their binary decision structure for prediction, which is insensitive to outliers.Moreover, models like Random Forest Regression (RFR), Extra Trees, and XGBoost are ensemble models that improve prediction performance by integrating multiple decision trees.The ensemble method can average out the noise of individual decision trees, enhancing the model's stability and robustness.
As discerned from figure 10(b)-(h) exhibit a high degree of fit for cladding width prediction.From figures 10, 11 and 12, it can be seen that tree-based algorithms DTR, RFR, Extra Trees, and XGBoost have the best predictive performance and accuracy.LR, KNN, BP, and SVR have poorer predictive performance due to the complex mapping relationship between laser cladding process parameters and cladding layer quality.Simple LR cannot accomplish such a complex prediction task.The KNN algorithm predicts based on the distance between data points, but when dealing with high-dimensional data, distance calculation becomes very difficult, leading to a decline in predictive performance.Due to the adoption of a single response, the cladding width, in determining the hyperparameters during the selection process, it may lead to poorer predictive performance when predicting the width and dilution rate.This is the reason why SVR and BP cannot fit the mapping relationship between height, dilution rate, and clad quality well [3][4][5][6].
The magnitude of the IM values calculated by the DTR, RFR, Extra Trees, and XGBoost prediction algorithms shows that Extra Trees is the optimal prediction model in the prediction process of cladding width, improving performance by 36.8%,21.8%, and 46% compared to DTR, RFR, and XGBoost respectively.RFR has the best predictive performance in the prediction process of cladding height, improving performance by 56.4%, 45.5%, and 57.9% compared to DTR, Extra Trees, and XGBoost respectively.DTR has superior predictive ability in the prediction process of dilution rate, improving performance by 58.4%, 31.6%, and 58.3% compared to RFR, Extra Trees, and XGBoost respectively.In general, the four tree-based ML algorithms DTR, RFR, Extra Trees, and XGBoost, by adopting tree structure, bagging, and boosting strategies, can effectively predict the results of the laser cladding process, providing powerful tools for the optimization and control of the laser cladding process.

Importance of different independent variables
Tree-based models possess a 'feature importance' attribute, which aids in analyzing the influence of different independent variables or features on the output of prediction results.Each independent variable is assigned a value from 0 to 1, with the total sum being 1.As seen in figure 13(a), among the factors influencing width in the laser cladding process, laser power has the highest score and the greatest contribution in the four tree-based model algorithms DTR, RFR, Extra Trees, and XGBoost, with respective values of 81%, 37.7%, 82.3%, and 46.9%.In the ANOVA of table 5, the contributions of each factor are 90.39%,1.16%, and 8.45% respectively, indicating that the Extra Trees algorithm can predict width effectively.As seen in figure 13(b), in the laser cladding process, laser power and scanning speed contribute about 40% to height, which corresponds well with the ANOVA results in table 5 of 43.85%, 9.30%, and 46.85%.This suggests that the RFR algorithm, which has the smallest IM, can predict height and that the contribution rate of each factor to height is reliable.As seen in figure 13(c), in the laser cladding process, P and F are significant factors influencing D, both around 40%.Compared with the ANOVA results in table 5 of 63.80%, 35.68%, and 0.52%, DTR can be used as the prediction algorithm for dilution rate.The 'feature importance' attribute results are consistent with our ANOVA results, further validating the effectiveness of our chosen ML algorithms.From figure 13(c), it can be observed that P and F have a significant impact on D in the laser cladding process, both around 40%.This is consistent with the ANOVA results in table 5, which show percentages of 63.80%, 35.68%, and 0.52% respectively.Therefore, DTR can be considered as a suitable predictive algorithm for dilution rate.The 'feature importance' attribute results align with our ANOVA results, further validating the effectiveness of the ML algorithms we have chosen.

Conclusion
This paper uses the full-factor experimental design method to clad ln718 coating on an A286 substrate, with laser power, powder feed rate, and scanning speed as inputs, and cladding width, cladding height, and dilution rate as responses.The effects of various factors on responses were studied using ANOVA and surface plots, and the laser cladding process was predicted using common ML algorithms, providing a reference for ML in laser cladding process prediction.
The conclusions of this study are as follows: (1) The ANOVA results show that laser power, scanning speed, and powder feed rate all have extremely significant effects on the laser cladding process, with only the powder feed rate having a significant effect on the dilution rate.Among them, laser power contributes the most to cladding width (90.39%) and dilution rate (63.80%), while powder feed rate contributes the most to cladding height (46.85%).
(2) Through the analysis of contour and surface plots, an increase in laser power will increase cladding width, height, and dilution rate; an increase in powder distribution density will increase width and height and decrease dilution rate.
(3) Through the application of eight machine learning algorithms, LR, KNN, BP, SVR, DTR, RFR, Extra Trees, and XGBoost, it was determined that DTR, RFR, Extra Trees, and XGBoost exhibited superior predictive performances.Specifically, in the prediction of cladding width, height, and dilution rate, Extra Trees, RFR, and DTR demonstrated the highest predictive accuracy, with Index of Merit (IM) values of 0.122, 0.114, and 0.186, respectively.
(4) The tree-based models, with their 'feature importance' attribute, play a similar role to the contribution rate in ANOVA.They provide valuable insights into the influence of different variables on the prediction results, aligning well with our ANOVA findings and further validating the effectiveness of our chosen ML algorithms.

Figure 1 .
Figure 1.Laser cladding equipment employed in the present work.

Figure 3 .Table 4 .
Figure 3. Schematic representation of the cross-sectional geometric parameters of a single-track laser cladding.

Figure 4 .
Figure 4. Flowchart of the procedure employed to predict geometric characteristics of laser cladding using ML algorithms.

Figure 5 .
Figure 5. OM image showing the cross-section of the cladded layer as deposited using various combinations of processing parameters.(a) Power 800 W, (b) Power 1300 W, (c) Power 1800W, (d) Power 2300 W.

Figure 7 .
Figure 7. (a) Surface plot and contour plot of W about P and I (b) Main effect diagram of W.

Figure 8 .
Figure 8.(a) Surface plot and contour plot of H about P and I, (b) Main effect diagram of H.

Figure 9 .
Figure 9. (a) Surface plot and contour plot of D about P and I, (b) Main effect diagram of D.

Figure 13 .
Figure 13.Histogram of characteristic importance for cladding parameters based on tree-based models.(a) Cladding width, (b) cladding height, (c) dilution rate.

Table 2 .
Key parameters of laser cladding process in different levels.

Table 3 .
Full factorial experimental data set.
H Yang et alhyperparameters based on previous research to ensure that the ML algorithms can achieve good predictive performance and generalization ability.For ML algorithm in table 4, heuristic algorithms are used to optimize the hyperparameters.The schematic is shown in figure 4.

Table 6
reveals that the maximum R 2 value is for Extra Trees at 0.949, with KNN having the minimum at 0.866.Considering the predictive performance evaluation indicators of MSE, MAE, and R 2 as per equation (2), the IM values for DTR, RFR, Extra Trees, and XGBoost in predicting cladding width are 0.192, 0.156, 0.122, and 0.226 respectively, indicating that Extra Trees has the best predictive performance of cladding width.Observing figure 11(e)-(g) demonstrate superior fitting results for cladding height prediction.Table6shows that the maximum R 2 value is for RFR at 0.954, with DTR having the minimum at 0.836.Considering the predictive performance evaluation indicators of MSE, MAE, and R 2 as per equation (3), the IM values for DTR, RFR, Extra Trees and XGBoost in predicting cladding height are 0.330, 0.114, 0.264 and 0.342 respectively, indicating that RFR has the best predictive performance of cladding height.The observation of figures 12(e)-(h) shows that the algorithms DTR, RFR, Extra Trees and XGBoost have better predictive performance for the prediction of dilution rate, and through table 6 it is known that the largest R 2 value is DTR with 0.912 and the smallest is RFR with 0.743.According to equation (3), the IM evaluation metrics that combine the prediction performance of MSE, MAE and R 2 , the IM values of DTR, RFR, Extra Trees and XGBoost for the prediction models of melting width are 0.186, 0.477, 0.272 and 0.446, respectively, which

Table 6 .
Statistical evaluation of global accuracy performance indicators for various prediction algorithms in predicting geometric characteristics.
indicates that RFR has the best prediction performance.All four algorithms have IM values greater than 0.3 for dilution rate, indicating that despite min-max Scaling data preprocessing, the small dilution rate and numerous zero values result in a decline in fitting results.