Microstructure property classification of nickel-based superalloys using deep learning

Nickel-based superalloys have a wide range of applications in high temperature and stress domains due to their unique mechanical properties. Under mechanical loading at high temperatures, rafting occurs, which reduces the service life of these materials. Rafting is heavily affected by the loading conditions associated with plastic strain; therefore, understanding plastic strain evolution can help understand these material’s service life. This research classifies nickel-based superalloys with respect to creep strain with deep learning techniques, a technique that eliminates the need for manual feature extraction of complex microstructures. Phase-field simulation data that displayed similar results to experiments were used to build a model with pre-trained neural networks with several convolutional neural network architectures and hyper-parameters. The optimized hyper-parameters were transferred to scanning electron microscopy images of nickel-based superalloys to build a new model. This fine-tuning process helped mitigate the effect of a small experimental dataset. The built models achieved a classification accuracy of 97.74% on phase-field data and 100% accuracy on experimental data after fine-tuning.


Introduction
Nickel-based superalloys are widely used for high temperature, high pressure, and high-stress applications due to their excellent mechanical properties, which include good fatigue and creep resistance even at severe temperature and high mechanical strength [1,2]. The inception of these excellent mechanical properties emanates from their two-phase microstructure of γ disordered matrix phase and γ' precipitate phase having L1 2 crystal structure. The ordered γ' phase, with a volume fraction of about 60%-70%, acts as a strengthening phase, suppressing dislocation movement and atomic diffusion during creep deformation [3][4][5].
Due to its microstructure, its usage in the aerospace industries and power generation plants are becoming increasingly famous, ranging by weight to about 50% of turbine engines in aircrafts [6]. Further development and improvement have led to the higher efficiency of turbine engines, having a significant impact on environmental emissions. Their application niche subjects them to continuous exposure to mechanical loading at high temperatures. This induces directional coarsening called rafting, creep deformation, and microstructural evolution, leading to a reduction in the mechanical properties, resulting in the reduced service life of the material [7,8]. In addition, rafting increases the plastic strain in corresponding gamma channels (γ) and interphases by widening the gamma-prime channels (γ') range. As a result, it is crucial to analyze the microstructural evolution following creep failure using a creep test with increasing time and strain. Image analysis techniques in materials science, which advances in computers and electronic devices have enhanced, may aid in understanding the relationship between structure and functional relationships in materials, and thus microstructural evolution. Mansur et al successfully used machine learning methods in conjunction with image processing to investigate the microstructural evolution of nickel-based superalloys [9]. However, due to the extreme complexity of microstructural features, it is often difficult to understand the various parameters that play specific roles in the evolution following creep failure. Deep learning has been shown to learn the features for microstructural recognition from unprocessed input images alongside the classification task, thus eliminating the need for manual feature engineering.
This research aims to study and understand the alloys microstructure with respect to specific parameters such as strain using deep learning techniques, as several investigations have revealed that rafting occurs due to loading conditions associated with plastic strain and high-temperature fatigue loading [10]. This research has the following structure. First, a brief description of computer vision and deep learning applications in materials science, followed by examining the most commonly used deep learning techniques and architectures for microstructural image classification. Finally, deep learning techniques are applied to our current task, and the results are analyzed.

Computer vision and deep learning
The role of micro-structural images in the growth of materials science cannot be overemphasized, as they play a critical role in the understanding of materials and groups of microstructures through feature analysis. Proper interpretation of digital microstructural images can help to identify features to support segmentation, characterization, and high precision comparison of microstructures [11]. The field of computer vision, which is rapidly evolving and gaining a lot of attention in recent times due to its accomplishments in interpreting and extracting digital image information, can be extended to the interpretation of microstructural images. Computer vision can be defined as developing algorithms and models to comprehend and recognize information in images by modeling the human visual system's complexity [12].
Computer vision's growth has greatly been enhanced by deep learning, a branch of machine learning inspired by the human brain's structure and function, consisting of neurons that perform basic operations and coordinate with one another before a decision is made [12]. Deep learning's success in solving complex problems stems from its ability to learn complicated structures in complex data with little or no human intervention [13]. Traditional machine learning techniques required careful human engineering with domain expertise and knowledge to extract meaningful features from raw data, whereas deep learning approaches automatically learn these features from the raw data [14].
In recent years, the convolutional neural network has been the most suitable algorithm for solving computer vision tasks due to its accuracy in handling image tasks. CNN LeNet-5 was first illustrated by LeCun (figure 1) in 1998 for digit recognition [15]. Still, it was not until 2012 that CNN gained prominence when AlexNet emerged as the winner of the ImageNet ILSVRC competition boasting a 15.4% error rate, besting the next runner up with a 10.8% difference [16]. AlexNet was built using the same pattern of LeNet-5 architecture. This renewed interest led to numerous research efforts in this field, leading to an improved version, ZFNet, with the same structure as AlexNet to be the ILSVRC winner in 2013 [17]. The Inception and VGG-16 architectures further demonstrated the accuracy of CNN's improved with increasing layers, with Inception winning the ILSVRC 2014 competition and VGG-16 finishing first runner-up [18,19]. Further improvements in CNN accuracies were made with the introduction of skip connections, making the possibility of deeper networks with lower complexity possible. ResNet, built with skip connections, had 152 layers and emerged victorious in the 2015 ILSVRC challenge [20].
The growth of deep learning algorithms has also been extended to materials science, with Azimi et al showing the classification of steels into martensite, bainite, and perlite [21], Chowdury et al using deep learning to recognize dendritic morphologies [22], and Iglesias et al using deep learning for optical micrographs [23]. Feng et al also used deep learning to predict material defects with limited datasets [24]. Deep learning networks were utilized in several works of Lin et al to forecast the hot deformation behavior of alloys and analyze the hot deformation behavior of Al-Zn-Mg-Cu and Ni-based superalloys [25,26]. Chen et al used neural networks in conjunction with cellular automaton simulation to develop a material design framework that successfully optimizes processing parameters for target microstructures, resulting in fine and uniform Ni-based superalloys [27], while Lin et al investigated hot compression behaviors and microstructures using artificial neural networks [28].
This study aims to apply deep learning methods to the microstructural classification of creep strain, overcoming the challenge of limited data by training deep layers with phase-field simulation data, which has been shown to exhibit similar results with experiments and then transferring already gained knowledge from the simulated data to solve our classification task on experimental data.

Convolutional neural networks
The most well-known deep learning algorithm is the convolutional neural network, a type of artificial neural network that has demonstrated exceptional success in pattern recognition and image processing [29]. They capture the spatial information between an image's pixels by applying relevant filters. The components of CNN are outlined in the next section.

Features of convolutional network
Convolution layer: the convolution operation can be described as the combined integration of two functions, illustrating how one function modifies the other. The convolution layer is a type of linear operation that performs distinct feature extraction of the input image by applying filters to generate a feature map, as shown in figure 2.
The filters, also known as kernels, detect different patterns in the images and preserves the spatial relationship between pixels. The complexity of features to be extracted increases with increasing layers, with the highest level features extracted in the final layers. Equation (1) describes the convolution ( * ) of the input data with respect to the filters: where (h k ) i j specifies the neuron's output in the kth feature map for the position (i, j), k = 1,. . . , K is the index in the convolution layer corresponding to the kth feature map, x denotes the input data, W k indicates the weight, b k denotes the bias for the kth feature map [21]. Pooling layer: the pooling layers perform a non-linear downsampling operation, reducing the number of feature map's parameters, making it vigorous against distortions and noise. This leads to a reduction of overfitting and saves computational cost. The types of pooling are max pooling and average pooling shown in figure 3. Max pooling is commonly used as it outperforms in preserving the spatial invariance in the image by returning the maximum value from a patch of input features [30]. The hyper-parameters of the pooling layer are the filter size and the stride, which are both recommended to be of size 2 × 2.
Fully connected layers: the output of the pooling layer is flattened and passed to the fully connected layer, which is similar to the generic neural network. This layer combines features for classification with a linear equation indicated in equation (2). The weights are adjusted in this layer where y k is the kth output neuron, W kl indicates the kl-th weight within x l and y k [21]. Activation functions: CNN employs the activation function in the network to boost the desirable non-linearity, detect non-linear features, and learn more complex models. The ReLu activation function ReLu(x) = max(0, x) was used as it is easy to train and speeds up training.
Classification layer and loss function: the final layer contains the output class of the input image. A softmax function denoted by equation (3) is used to force the output to represent a probability distribution across discrete alternatives where y is the input vector, and a categorical probability distribution for the jth class and input vector of X is denoted by a vector of values within (0, 1) through this function [21]. The loss function is used to optimize the parameters to maximize the architecture's effectiveness. The cross-entropy function H shown in equation (4) is used as it takes the function's logarithm to notice even slight improvements during backpropagation where P(x) reduces the cross-entropy of correct distribution of data P (x) and predicts class probabilities, where P (x) is 1 for true class and 0 for others [21].

Training a CNN network
Training CNN is a global optimization problem involving determining the best fitting parameters by minimizing the loss function. The input data, an image, is propagated forward through the model to produce an output with an initial set of arbitrary parameters. The loss on the output is then determined using the loss function, which evaluates the discrepancies between the predicted label and the true label of the input for a given set of values. The loss function is reduced using stochastic gradient descent, an iterative optimization algorithm that calculates the gradient of the loss function by backpropagation and updates the model's weights. Training can be improved by repeatedly passing the data through the model with each pass through the network known as an epoch. The learning rate determines how the weights are modified.

CNN architectures
Architecture involves how the layers in the neural network are arranged. The architectures employed with the available computational resources are the AlexNet and ResNet architectures.

2.4.2.
ResNet. ResNet architecture was proposed by He et al, introduced residual learning, and helped solve the vanishing gradient problem that arises as our networks become deeper, resulting in accuracy loss. Skip connections are introduced to provide identity mapping, properly described in [20], which allows stacking of deeper networks without adding additional parameters or computational complexity hence avoiding the degradation in accuracy.

Methods
In this section, we illustrate the model architectures and approaches taken to classify nickelbased superalloys with reference to their creep strain levels. First, we acquire the datasets for the simulated and experimental images then perform dataset pre-processing and augmentation. Furthermore, we implement deep learning models to the simulated dataset with various parameter values, obtaining an optimized model whose weights and parameters would be transferred to the experimental data set by fine-tuning. Finally, a model is built using the fine-tuned values with the experimental dataset.

Dataset
The dataset investigated comprises phase-field simulated images and SEM nickel-based superalloys experimental images. The phase-field dataset was acquired from the STKS department [31][32][33][34], through a simulation with an applied stress of 350 MPa and temperature of 950 • C shown in figure 6. The simulation attained similar creep strains to that of experiments. The phase-field simulation microstructure consists of creep strain levels from 0.0%-1.0% at different interval steps. A total of 128 slices were cut from the microstructure in each step in the x, y, and z-direction, producing 10 752 images. The images were then cropped to 390 × 390 pixels to eliminate regions without relevant information (figure 7). The images were sorted according to the classes of creep strain levels ranging from 0.0%-0.4%, 0.4%-0.6%, 0.6%-0.8%, and 0.8%-1.0%. The experimental images were provided by researchers from the Lehrstuhl für Werkstoffwissenschaft, Institute for Materials, Ruhr-University Bochum [35,36]. The images were obtained from dendritic regions in ERBO/1 single-crystal nickel-based superalloy with varying strain levels, as shown in figure 8. A total of 15 images ranging from prior creep to 0.4%, 0.6%, 1.0%, and 2.0% strain, were obtained by conducting disrupted tensile creep tests in the [001] direction at 950 • C and 350 MPa. All images were generated using the SEM technique at a magnification of ×10 000 and transformed to grayscale images with eight-bit depth, having 1024 × 768 pixels. The experimental images were sorted into strain levels ranging from 0.0%-0.4%, 0.4%-0.6%, 0.6%-1.0% and greater than 1.0%.
The SEM images used to train the model further had similar creep strain results to the simulated data under the same temperature and stress conditions.

Data preprocessing and augmentation
The images of both classes were split into training and testing data randomly at an eight to two ratio. The training data was used to select and build the model parameters while the test data was used after training to evaluate the model's performance, simulating the deployment of our Figure 5. Phase-field simulated images: step12 slice 12 at 0.7% strain in x,y, z-direction before and after cropping [51]. model in real-world unknown cases. The random initialization is to help the generalization of the model so as not to limit the model to learn from only specific instances.
Data augmentation was used to compensate for the limited availability of the experimental datasets, as we were unable to obtain additional images online because the majority of nickelbased superalloy images found online were unlabeled, i.e. without the proper strain levels. Data augmentation involves applying different approaches to improve the size and quality of datasets helping to solve the problem of limited data [37]. Data augmentation was applied to the images to expand the variety of data available to train the models involving the transformation of the images by flipping horizontally and vertically, zooming, rotating based on angles, and intensity variation for brightening and darkening of the images shown in figure 9. Data augmentation boosts model regularization by expanding the training image dataset, which improves the model's generalization and reduces the overfitting problem. The images were resized to specific sizes tailored to the model's architectures.

Implementation details for simulated images
The classification of simulated images was done with several convolutional neural networks of varying architectures, limited to the computational capacity (ResNet18, ResNet34, and AlexNet). Transfer learning which uses already gained knowledge from previously trained models to solve a new task, was used to build the model. Our current task uses pre-trained ConvNets from PyTorch libraries that have been trained on millions of images using the opensource ILSVRC [45]. Due to the dissimilarities between the current dataset and the dataset used for training, the layers in the pre-trained model were unfrozen, making it possible for them to be trained and updated. A differential learning rate was employed for training as final layers are more probable to require additional training. The batch size was alternated between 16, 32, and 64 due to the GPU RAM constraint, and the model predictions were evaluated using the metric 'error-rate' which is introduced in the later sections. The saved weights were saved and exported as a pickle Python file after training to be employed on the experimental images.

Implementation details for experimental images
The classification model for experimental images was fine-tuned by using the previously saved weights from the simulation model. This helps to mitigate the lack of experimental images as some similar features from simulation images have already been learned and can be applied to experimental data. This provides better initialization of parameters, which aids in model generalization, reducing overfitting, decreasing training time, and improving accuracy. This model was created using the simulation model's optimized architectures and hyper-parameters.

Performance evaluation
After implementing and training our model, it is necessary to evaluate the model's performance to determine how accurate the model learned from the data. Metrics are employed in the evaluation of the model's predictions and can be used to compare different models. The error rate that estimates the number of misclassified instances with respect to the total instances available is the key metric used. The major metrics used for classification can be implemented with the confusion matrix, which is introduced below.

Confusion matrix
A confusion matrix summarizes the model's performance with respect to the test data for which label values were known. Table 1 shows a confusion matrix.
Here TPs are values that were correctly predicted to be positive, while TNs are values that were accurately predicted to be negative, FPs are values wrongly predicted as positive, and FNs are values wrongly predicted as negative. The confusion matrix diagonal represents the correctly predicted cases. The accuracy, precision, recall (sensitivity), specificity, F1, and error rate metrics can be calculated using the confusion matrix.
Accuracy indicates the total number of samples correctly classified in comparison to the total number of samples F1 is the harmonic means of precision and recall Error rate ER is indicated by The metrics error rate and precision are the essential measures employed as they reflect the ratio of misclassified classes and produce values that can easily be interpreted and used.

Results and discussions
Firstly, we discuss the findings acquired when training the CNN with the simulation dataset and then present the results achieved by applying this method to the experimental dataset.

Evaluation of data augmentation
The effect of data augmentation on the images was analyzed by comparing the model's accuracies with and without data augmentation. For the phase-field images, it was noticed that the model performed slightly better without augmentation with an accuracy of 97.74% against 97.63% without applying data augmentation for a batch size of 16. The model could only be trained with a batch size of 16 without augmentation as part of the included augmentation resizes the images in the batch before being passed to the model.
The effect of augmentation on the experimental images was more profound as it helped reduce the training and validation loss. Similar accuracies were achieved in both cases.

Evaluation of network architecture
The architectures used in this report were AlexNet, ResNet18, and ResNet34. Due to the limitation of RAM, additional networks could not be used. Tables 2-4 shows the different architectures used and their error rate for training five epochs each.
From our observations and results presented in tables 2-4, the ResNet34 network was found to be the most suitable model for the classification task as it outperformed the other networks in each of the comparative cases. Yes 64 0.034 820

Evaluation of hyper-parameters
The efficiency of the model is highly dependent on the model's parameters and hyperparameters. Model parameters are model-dependent parameters whose values are learned during training. On the other hand, hyper-parameters cannot be learned directly from training the model and must be supplied to the model before the training process begins. Hyper-parameters govern the entire structure of the model; thus, selecting the right hyper-parameters is critical for the model's accuracy. The hyper-parameter selection is often unique to different tasks. Optimum hyper-parameters are obtained by training different models with various hyper-parameter values and deciding the most accurate values by evaluating the different models. The hyper-parameters used in this model include batch size, learning rate, and loss function.

Batch size.
The batch size specifies the number of samples that must pass through the network before the model variables are updated. Large batch sizes can lead to the model's generalization, while small batch sizes can lead to several iterations before the update of the model. Implementation of batching helps us reduce computational memory required to train our model and helps train the model faster. For better regularization, smaller batch sizes are recommended [46], as it helps improve accuracy. Due to computational memory, batch sizes greater than 64 could not be implemented. Table 5 shows the results of the different batch sizes with the ResNet34 architecture, using cyclic learning [47]. The different batch sizes produced similar accuracies. Batch size of 16 without data augmentation and the batch size of 32 produced the highest accuracies. A plot of training and validation loss used to evaluate the model's fitting is shown in figure 10. With the exception of batch size 16 without data augmentation, the runtime for the execution of one epoch in seconds was found to decrease with increasing batch size.

Learning rate.
The learning rate hyper-parameter governs the model's response to the to the estimated error whenever the model weights are adjusted [48]. Low learning rates result in more efficient training but slower optimization, while large learning rates can result in diverging as the optimizer can overshoot the accurate minimum. The learning rate η can be gotten where w is the weights and L is the loss function. The choice of learning rate is a very important parameter for training our model, and the FastAI method 'learning rate finder' helps to give a good estimate for an optimal learning rate. Figure 11 shows the plot of loss versus learning rate, with appropriate learning rate values observed prior to the curve's minimum. The learning rate value at the minimum point is not chosen as it is too large, and the model would not be able to converge. A learning rate of 1 × 10 −3 was employed in training the model with frozen layers, while a cyclic learning rate was implemented after unfreezing the weights to enable the learning rate to change within a range of values, thus helping the model converge faster and improve accuracy [47]. The used cyclic rate gotten from the learning rate finder suggestion was between 1 × 10 −9 to 1 × 10 −2 . Table 6 shows the comparison of fixed learning rates on frozen layers and cyclic learning rates after unfreezing. The addition of cyclic learning rates was found to minimize the percentage errors alongside the training and validation loss for all batch sizes. Figure 11. Learning rate finder. The loss function is used to optimize the model by evaluating the error value, which describes how the model's output varies from the expected output [21]. The stochastic gradient descent method is used to minimize the loss function by back-propagating the error to the first layer and updating the model weights at each iteration. The gradients are calculated through backpropagation [49]. The cross-entropy loss function is defined by equation (4) and label smoothing cross-entropy is defined by where α is the label smoothing parameter and K is the number of label classes. Both loss functions were applied in the model, with cross-entropy achieving greater accuracies, and less training and validation loss.

Computational runtime
Due to the computational intensity of deep learning models, evaluation of computational time and memory is essential for limited computational resources. Figure 12 shows the computational time for one epoch training for different architectures with respect to the batch sizes and   learning rates. The training time increases with the depth of the architecture stemming from the fact that deep networks are harder to train. Increasing batch size helps improve parallelization in GPUs, thereby making training faster for ResNet networks. The effect of increasing batch size in the AlexNet network was found to be varying. Cyclic learning rates were observed to increase the training time for one epoch as it cycles through a range of values, but this helps the model converge faster. Tables 7 and 8 shows the confusion matrix of the most accurate models built using phasefield datasets with and without data augmentation. The majority of misclassified instances were found to be in the neighboring classes.

Experimental dataset results
The optimized parameters obtained after hyperparameter tuning were applied to the experimental data by fine-tuning. Data augmentation helped reduce the validation and training loss as similar accuracies were achieved with and without augmentation. Overfitting is a fundamental problem in deep learning that prevents the model's effective generalization from observed to unseen data. It was handled in this model by applying several techniques to reduce the effect of overfitting. Data augmentation was used to increase the amount of available data to help the generalization of the model. Pre-trained networks also helped model generalization by initially training a model on a more extensive training set with a similar domain. Cyclic learning rates helped improve the model's convergence with fewer training iterations, preventing some possibility of learning unwanted parameters and noise. Choosing an appropriate loss function helped the model's regularization. Tables 9 and 10 shows the confusion matrix of the experimental images with and without fine-tuning. A 100% accuracy was achieved with fine-tuning, classifying all instances correctly, while a 98.86% accuracy was achieved without fine-tuning.

Testing on literature images
The model was tested on images gotten from works of literature that were not used to train the model. The images only indicated the rafting condition and not the percentage strain and were all dendritic images of nickel-based superalloy.
The predicted outcomes are shown in table 11. The source column in table 11 specifies the papers and figures from which the images were obtained. The images were cropped into separate images when the figures included more than one image. The microstructure of CMSX-4 before and after rafting is depicted in table 11 images 1 and 2. Table 11 images 3 and 4 show SEM images cropped images of DD5 before and after rafting. Table 11 images 5 and 6 show unrafted CMSX-4 superalloy and rafted superalloy, respectively, with image 5 representing the as-received microstructure and image 6 representing N-type rafted images. Both photos were cropped from a larger image. TEM images of SRR99 superalloy are shown in table 11 images 7-9.
From table 11 applying our model to image 1 from source figure 1 [3] we obtained that this microstructure corresponds to the initial state of the CMSX-4 superalloy with a probability of 62.89% for the class of 0.0%-0.4%. Image 2 from source figure 2 [3] corresponds with the rafted microstructure of CMSX-4 showing a probability of 99.83% for the > 1.0% class. Image 3, which was cropped out from source figure 1(a) [4] corresponds with the initial microstructure prior deformation of DD5 superalloy with a probability of 100% for the 0.0%-0.4% class and image 4 cropped out from source figure 1(b) [4] having a probability 99.98% for the > 1.0% class corresponds to a rafted structure after creep deformation. From table 11 the results of the predicted nickel-based superalloys were found to be in cohesion of the state of the microstructure.
The results were predicted correctly for microstructural length-scales between tested ranges of 1 μm to 5 μm, as well as the presence of random noise in images such as text.

Conclusion
This paper investigates the feasibility of using deep learning approaches to classify nickelbased superalloys. It involved using pre-trained models on phase-field simulation data to train various models with different hyper-parameter values. The optimized hyper-parameter values were then applied to the second training phase with SEM images of nickel-based superalloy. The fine-tuning procedure was carried out to help alleviate the shortage of experimental data when training the model.
ResNet34 architecture was found to be the most appropriate for the strain classification task. The model achieved an accuracy of 97.74% without data augmentation and 97.63% with data augmentation with the phase-field simulation data. After fine-tuning, it achieved an accuracy of 100% on the experimental images, as opposed to an accuracy of 98.86% without fine-tuning.
The study shows that simulation datasets can be used in training models in a similar domain to produce accurate and efficient models by applying the fine-tuned parameters to the experimental dataset. The model also proved to be independent of image scale size ranging between 1 μm to 5 μm were found to give accurate results.