Machine learning models to predict the relationship between printing parameters and tensile strength of 3D Poly (lactic acid) scaffolds for tissue engineering applications

3D printing is an effective method to prepare 3D scaffolds for tissue engineering applications. However, optimization of printing conditions to obtain suitable mechanical properties for various tissue engineering applications is costly and time consuming. To address this problem, in this study, scikit-learn Python machine learning library was used to apply four machine learning-based approaches which are ordinary least squares (OLS) linear regression, random forest (RF), light gradient Boost (LGBM), extreme gradient boosting (XGB) and artificial neural network models to understand the relationship between 3D printing parameters and tensile strength of poly(lactic acid) (PLA). 68 combinations of process parameters for nozzle temperature, printing speed, layer height and tensile strength were used from investigated research papers. Then, datasets were divided as training (80%) and test (20%). After building the OLS linear regression, RF, LGBM, XGB and artificial neural network models, the correlation heatmap and feature importance of each printing parameter for tensile strength values were determined, respectively. Then, the tensile strength was predicted for real datasets to evaluate the performance of the models. The results demonstrate that XGB model was the most successful in predicting tensile strength among the studied models with an R 2 value of 0.98 and 0.94 for train and test values, respectively. A close R 2 value for the train and test also indicated that there was no overfitting of the data to the model. Finally, SHAP analysis shows significance of each feature on prediction of tensile strength. This study can be extended for independent variables including nozzle pressure, strut size and molecular weight of PLA and dependent variables such as elongation and elastic modulus of PLA which may be a powerful tool to predict the mechanical properties of scaffolds for tissue engineering applications.


Introduction
3D printing is a technique in tissue engineering for the production of 3D structures.This technique is divided into extrusion, material jetting, and vat polymerization.Extrusion-based printing is the most commonly studied technique in the tissue engineering field [1][2][3].
Machine learning is a rapidly rising field and has high potential in 3D printing in terms of developing materials and processes.In fact, several studies investigated models to predict various properties of the printed object from various printing parameters [4].Ruberu et al [5] optimized 3D printing parameters of methacrylated hyaluronic acid/gelatin methacrylate composite ink by machine learning.Conev et al [6] utilized RF regression to determine effect of different 3D printing parameters on 3D printed poly (propylene fumarate).Menon et al [7] used a hierarchical machine learning framework to produce biomaterials with high print fidelity at a high printing speed.
With machine learning technique, mechanical properties may be predicted effectively before many trial and errors [2].In this study, for the first time, ordinary least squares (OLS) linear regression, random forest (RF), extreme gradient boosting (XGB), light boosting (LGBM) and artificial neural network models were used to predict tensile strength from nozzle temperature, printing speed, infill density and layer height.The feature importance of the independent variables and correlation heatmap were determined to study the strength of the relationship between independent variables and tensile strength.Overall, XGB was found to be the most effective model to predict tensile strength among the studied algorithms.

Methodology
2.1.Data collection Datasets of nozzle temperature, printing speed, layer height and tensile strength were obtained from four research papers [8][9][10][11].Overall, this dataset covered 68 possible combinations of printing speed (40-80 mm s −1 ), nozzle temperature (205 °C-220 °C), layer height (0.15-0.8 mm), infill density (2%-100%) and tensile strength of PLLA.Nozzle temperature is the temperature that the material inside the nozzle is held while printing.Layer height is the distance between the needle and the deposition layer.Printing speed is the distance deposited per second.Infill density is the percentage of the material being deposited [3].Figures 1(a) and (b) show the variation of the dependent variables and the independent variable for the experiments and distribution of tensile strength values, respectively.Table 1 shows the average and standard deviation of the used parameters and the tensile strength and table 7 is presented in the supplementary file which shows all the processing conditions and the resultant tensile strength.

Computational modeling
All machine learning algorithms were implemented using Python with Pandas, Numpy, Scipy, Matplotlib, Seaborn and Scikit-learn.Five different models were  used in the study which were OLS linear regression, RF, XGB, LGBM and artificial neural network regression [12].The Python codes used are provided at https://github.com/duyguege/machine-learning.git.

Linear regression
OLS is a linear regression technique to find the bestfitting line between the independent and dependent variables.In our model, there is a single dependent and four independent variables in which the OLS method minimizes the sum of squared differences between the actual data and their prediction values as given in equation (1) [13].
where Y i and y i are actual data and its prediction, respectively.

RF regressor
RF was first developed by Breiman et al in 2001 which is a decision tree based ensemble learning algorithm [14][15][16][17].RF is a bagging method (bootstrap aggregation) which builds independent estimators (trees) and the mean is found to make a prediction [18].In each node of the tree, one feature is selected at each node and splitting is made to produce a sub-node which is called a child node.The splits are done by minimizing variance by searching through splits.When splitting cannot be performed any longer, the node is named a leaf node.In RF, instead, a set of decision trees are constructed.Then, the final prediction is made by the average of each prediction which improves accuracy compared to decision tree regression [13,14,19].

Training, hyper tuning and validation processes
In this study, the dataset was split into training (80%) and test partitions (20%).The training dataset was used for model building and the test dataset was used for its evaluation.For application of the models, Scikit-learn library from Python was used.For decision tree and gradient boosting regression models (RF, XGB and LGBM), hyperparameter tuning was performed for subsample ratio of columns, number of estimators, maximum depth and learning rate (shrinkage factor) after determination of the best parameters which is shown in table 3.For artificial neural networks alpha and hidden layer sizes were selected from values of 0.1, 0.01, 0.02, 0.005 and (20,20), (100,50,150) and (300,200,150).The best parameters were found as 0.005 and (100,50,150) respectively for alpha and hidden layer sizes.The best parameter for both training and test sets was selected to prevent underfitting and overfitting [25].10-fold cross-validation was used for performance assessment which randomly splits the dataset into ten equal folds [31].
Then, the model was again applied in the final stage to determine the predicted values of the tensile strength [32].

Correlation heatmap
A correlation heatmap was produced by Seaborn module of Phyton to determine the strength of relationship between the independent variables (nozzle temperature, printing speed, layer height and infill density) and the dependent variable (tensile strength) [25,33].A higher correlation coefficient (r) indicates multicollinearity between independent variables [18].

Feature importance
The features importance was determined using an embedded function in the Scikit-learn implementation of the RF model.Then the features were ranked according to their importance [6, 34].  3) and (4), respectively.

Determination of the model performance
where n is the total number of the samples, Y i is the actual data, y i is its prediction y i is the test value and whose mean values are denoted by Ȳ and ȳ, respectively.For R 2 , the range of values for degree of success are given in table 4. The Y i versus y i curves are given in figure 3 for each model.

Shapley additive explanation
Lundberg and Lee [38] recently developed Shaply Addive explanation (SHAP) technique in 2017 which enables analysis of complex relationships in machine learning models.In this study, SHAP which is a model interpretation package in Phyton is utilized to further understand

Results and discussion
In this study, the data was fitted to OLS linear regression, RF, LGBM, XGB and artificial neural  network models than the performance of the models were tested.Having homogeneous distribution of the independent variables is important to build a successful model.Figure 4 shows the variation of the independent variables and their frequencies.
According to figure 4, overall, the variables have quite a large variation.However, for infill densities, the variation is limited in the range lower than 60%.For printing speed, the data is larger for above 40 mm s −1 .We may assume that a more homogeneous distribution of the data may be better for more successful predictions.Figure 5 shows the summary of the study.Using Seaborn in Phyton, a correlation heatmap is produced which is used to evaluate the correlation between printing parameters and tensile strength.In figure 6, the correlation heatmap presents the correlation between the components in the matrix, in which the r value is the correlation coefficient.A larger correlation coefficient value indicates a stronger relation between these two components.
The correlation heatmap shows that printing speed had a negative correlation with tensile strength which indicated that as printing speed reduced, the tensile strength tended to increase.Among other parameters, infill density showed the highest correlation with the tensile strength following with the layer height and nozzle temperature.Additionally, correlation coefficient was lower than 0.8 between each independent variable which indicated that there was no collinearity between independent variables [18].
Feature importance was also determined from ensemble models.The ranking of the independent variables based on the feature importance function of Scikit-learn is shown in figure 7.As shown in figures 6 and 7, a slower printing speed increases resolution and leads to better accuracy of the constructed scaffolds.A faster printing rate may lead to formation of artifacts which ultimately reduces quality of the printed constructs [39].As layer height increased, a higher tensile strength was also observed [40].A direct relationship between infill density and increase of tensile strength is expected [41].Nozzle temperature also affects the tensile strength due to the relation of PLA temperature and its viscosity [42].After running the models, R 2 train, R 2 test, MAE, and RMSE values were found for the OLS linear regression, RF, XGB and LGBM models as presented in table 5.For training, R 2 value was highest for XGB model followed by RF, OLS and LGBM models.For testing, R 2 was highest for XGB followed by RF, LGBM and OLS linear regression models.According to table 5 and figure 8 (a), RF had R 2 of higher than 0.6 for both training and test data; however, R 2 for train was much higher than test which indicated overfitting.For linear regression and LGBM, R 2 values were lower than 0.6 for the training dataset which implied that the models were unsatisfactory [18].Additionally, for OLS and LGBM regression model, R 2 for test was higher than training, which showed that test data fitted better to the model than the training data.
For XGB, R 2 for training and test dataset were above 0.94 and they were almost the same.This  indicated that the XGB model worked very successfully for the dataset.XGB model also led to low RMSE and MAE values which further supported that the performance of XGB was highest for prediction of tensile strength.LGBM had higher RMSE and MAE values for test than train data which might further indicate that it led to overfitting.Literature also indicates that LGBM had a worse prediction than XGB as it is more prone to overfitting [23].Furthermore, artificial neural networks had poor prediction of tensile strength which is indicated from the low R 2 .Neural networks also led to overfitting as R 2 of train was much higher than R 2 of the test.This is because of relatively poor performance of artificial neural network models with small datasets.
[31]. Figure 3 compares the predicted values and the experimental results for OLS, random forest, LGBM and XGB models.Artificial neural networks is eliminated at this stage due to its low R 2 .Figure 3 shows that there are outliers in all models.For all models, most of the data points distribute around the diagonal, while some values deviate from the diagonal line.This deviation is most likely attributed to the limited datasets which can be improved with a larger variation of dataset.Also, data is gathered from four different papers.Although 3D printing technique is kept as exrusion-based and the same polymer is used for all studies, gathering data from a single study may lead to improved predictions.This way, molecular weight, nozzle diameter and the model of the extrusion based printer may be also controlled.However, currently, in the literature, it is not possible to find large enough dataset from a single source.Figure 9 shows the average test value and average respective predictions of tensile strength with the practiced models for low,  medium and high value of tensile strength.Table 6 shows the index number of the independent and dependent variables chosen from each (lower, medium and higher) range from table 7. Figure 9(a) shows that the average predicted values of the tensile strength had quite similar values with the tensile strength.Only significant difference was for predicted tensile strength from OLS regression from the lower range test values.
From figure 9(b), it is clearly observed that XGB led to lower % difference compared with the test values of tensile strength compared to other models.For models, the prediction is worse for lower and higher values of tensile strength.This could be due to a lower number of data points for lower and higher tensile strength values compared to middle-range tensile strength values.Moreover, the dataset might not be suitable for linear regression.The improved prediction for XGB could be accounted for the loss function it employs for minimization and the second-order Taylor approximation normalizing the weak learners in ensemble modeling [23].Table 6 shows the % difference between the test values and the model predictions.Figure 10 shows distribution of experimental and predicted values after executing OLS, RF, XGB and LGBM models depending on the index number in table 7.
To increase the accuracy of the models, further experimentation may be required focusing on collecting data with larger coverage of printing speed, layer height and nozzle temperature on examining a larger variety of material compositions.Such additional data would improve machine learning approaches.It is also important to note that our dataset was obtained from four different publications which might increase the variability than using a single source of dataset.If the dataset is generated from a single study, a higher accuracy may be achieved using the applied models.Figures 11 and 12 show the SHAP summary plot and dependence plot, respectively.In figure 11, SHAP summary plot shows impact of each feature value on prediction of tensile strength.The high values of the features are given as red and low values of the feature are given as blue.
According to shap summary plot, low values of infill density have negative contribution on the prediction.The high values of infill density have positive influence on the prediction.This is possibly because there is more data for high infill density values in the dataset.Layer height is observed to have negative influence for most of the values.For printing speed, high values have negative influence and high values     to figure 14, the highest tensile strength was achieved for 0.6 mm and 0.8 mm layer height while printing speed, nozzle temperature and infill density were constant.To achieve the highest tensile strength, it is predicted that 3D printing of PLA with 60% infill density at 220 °C with a printing speed of 20 mm sec −1 at a layer height of 0.6 mm may be suitable for bone regeneration applications.In the future, adaptive Boosting (Adaboost) may also be applied as a promising approach for a prediction with higher accuracy.Moreover, deep learning may be also applied for larger datasets to make predictions of mechanical properties of biomaterials for different 3D printing parameters.This study demonstrates the potential benefit of machine learning to form the core of a guidance system for determining suitable printing parameters, thus reducing experimentation [6].

Conclusion
In this study, datasets were collected from four different publications on effect of nozzle temperature, layer height, printing speed and infill density on tensile strength.Then, OLS linear regression, RF, LGBM, XGB and artificial neural network models were applied to test their accuracy to predict tensile strength.R 2 , RMSE and MAE values for all models indicated that XGB regression yielded a more  successful prediction of tensile strength values.Furthermore, SHAP analysis indicated that high infill densities and low printing speed had a positive influence on the prediction of tensile strength.A large dataset may improve the accuracy of the predictions.Also, a wider range of independent variables could be used to reveal the importance of each printing parameter such as infill pattern, gauge length, bed temperature, molecular weight of PLA.Finally, in addition to tensile strength, other important parameters qualifying the mechanical properties may be also included such as elongation and Youngs modulus.Overall, machine learning is a promising tool to predict tensile strength values and reduce the number of tests required for optimization of 3D printing parameters for tissue engineering applications.

Figure 1 .
Figure 1.Shows (a) the variation of the dependent variables and the independent variable for the experiments and (b) distribution of tensile strength values.

Figure 2 .
Figure 2. Difference of tree growth of (a) XGB and (b) LGBM boosting regression models.

Figure 4 .
Figure 4. Variation of the independent variables (a) temperature, (b) infill density, (c) layer height and (d) print speed.
Figure 8 shows (a) the comparison of R 2 train and test values and (a) MAE, RMSE for the test values.Since R 2 is more convenient for comparing suitability of the models, more focus was given to this parameter [23].

Figure 8 .
Figure 8.Comparison of (a) R 2 of train and test values and (b) MAE, RMSE for test values of OLS, RF, XGB, LGBM and neural networks models.

Figure 9 .
Figure 9. Test value and respective predictions of tensile strength with the practiced models and (b) % difference with tensile strength (p < 0.05 in comparison to the control group).
have positive influence on prediction.Nozzle temperature has mostly positive influence on the prediction at some of the high values however there is no obvious trend observed for this feature.In figure13, tensile strength was predicted by using XGB model for different values of nozzle temperature for constant printing speed, layer height and infill density.According to figure 13, XGB model predicted that the tensile strength would increase if the processing temperature of PLA decreased to 220°C from 225 °C.However, further reduction of processing temperature of PLA would lead to reduction of tensile strength.This information would be beneficial for the determination of printing parameters of PLA.For bone tissue engineering applications, 60% in fill density is suitable for osteogenic differentiation[3].Therefore while keeping infill density as 60%, nozzle temperature as 220 °C and printing speed as 20 mm s −1 , tensile strength was predicted for different layer heights by XGB model as in figure14.According

Table 1 .
Average values, standard deviation, standard error of temperature, infill density, layer height, printing speed and tensile strength.

Table 3 .
Hyperparameters and their best parameters for tuning by using RF, LGBM and XGB models (best parameters are given in bold).
of each feature value minus prediction made without a feature value.A negative and positive SHAP value indicates negative and positive contribution of the feature value for prediction of tensile strength, respectively.SHAP summary plot is drawn to show contribution of each value on tensile strength.Then SHAP dependence plot is drawn to show effect of each feature value (infill density, layer height, printing speed and nozzle temperature), separately.Primary y-axis shows the SHAP value and secondary y-axis shows a color bar illustrating the high feature values in red and low feature values in blue.p < 0.05 (in comparison to the test values).

Table 5 .
MAE test , R 2 and RMSE test data for different models.

Table 6 .
% Difference with test value and the models' predicted tensile strength values for the lower, medium and higher range.