Surface Defect Detection: A feature-based transfer learning approach

Surface defect detection is critical for maintaining product quality in manufacturing. In this work, we apply a feature-based transfer learning approach for surface defect classification on the NEU surface defect database. The database contains defects across 6 categories captured under various conditions. We utilised two pretrained convolutional neural network (CNN) architectures - VGG16 and InceptionV3 - by removing the final classification layer and using the CNN as a fixed feature extractor. The output feature vectors were classified using a logistic regression (LR) model. The data was split into train, validation, and test sets with a 70:15:15 ratio. The VGG16-LR model achieved classification accuracy (CA) of 100%, 98%, and 99% for the train, validation, and test sets respectively. The InceptionV3-LR model attained CA of 100%, 91%, and 92% for train, validation, and test. The results demonstrate the effectiveness of transfer learning with CNN feature extraction for surface defect detection on challenging multi-category industrial datasets. Further work includes tuning hyperparameters and evaluating additional architectures.


Introduction
Surface defects in manufactured products can lead to functional or aesthetic issues, making detection critical for quality control.Manual inspection is time-consuming, subjective, and inefficient for modern high-throughput manufacturing [1]- [3].Automated computer vision techniques present a compelling solution.Recent work has shown success applying deep convolutional neural networks (CNNs) for defect classification.However, training CNNs requires large labelled datasets which are costly to obtain in practice.
Transfer learning provides a method to leverage pretrained CNNs on general image datasets and transfer the learned features to new tasks with limited data.As reported in the literature, the aforesaid technique has been successfully employed on different applications [4]- [9].With regards to defect detection, Tabl et al. [10] used a fine-tuned ResNet-50 CNN model to classify manufacturing defects as either normal or defective.The accuracy reported ranged widely from 48 to 96%, depending on batch size and epochs, indicating inconsistent performance.
Mat Jizat et al. [5] compared an InceptionV3 CNN paired with logistic regression (LR), k-nearest neighbors (kNN), SVM, and stochastic gradient descent (SGD) for classifying wafer defects.The InceptionV3-LR pipeline performed the best on both training and test data compared to the other methods investigated.Another study addressed common steel surface defects like scratches, pitting, inclusions, and patches to separate defective from non-defective surfaces [11].The authors used a CNN with Xception architecture to detect these defects.It achieved reasonably good accuracy between 85 to 94%.Guan et al. [12] compared the efficacy of a new VSD network against VGG19 and ResNet on the NEU dataset.It was shown that the VSD network could achieve an overall accuracy of 89.86% on the validation dataset.
In this work, we present a feature-based transfer learning approach for multi-category surface defect classification on the Northeastern University (NEU) database.The key insight of the present study is to evaluate the efficacy of the pre-trained CNN models, i.e., VGG16 and InceptionV3 ability in extracting meaningful features for the classification of the defects.

Methodology
In this study, we conduct an in-depth investigation and comparison of two state-of-the-art pre-trained convolutional neural network (CNN) architectures -VGG16 and Inception V3 -for feature extraction and defect classification.The VGG16 was developed by the Visual Geometry Group at Oxford University.It consists of 16 layers and utilizes consecutive convolutional and max pooling layers of increasing depth to extract hierarchical features, with fully connected layers at the end for classification.We utilize a version of VGG16 without the fully connected layers, replacing them instead with a Logistic Regression (LR) classifier.It is worth noting that the original VGG16 model was pre-trained on millions of ImageNet images for object recognition tasks.
Conversely, the Inception V3 was developed by Google for image classification and recognition applications.It introduced a novel architectural block called the inception module, which contains convolutions of different sizes in parallel to capture features at multiple scales.Multiple inception modules are stacked for deeper representations.Similar to the VGG16 model, the Inception V3 was also pre-trained on the extensive ImageNet dataset.The key difference between the two architectures is that Inception V3 relies on convolutions in parallel for multi-scale feature learning, while VGG16 uses consecutive convolutional layers in a hierarchical fashion.
In our study, we leverage the transfer learning capacity of these two powerful pre-trained models by extracting features from the convolutional layers, which contain generic representations useful for many vision tasks.The extracted feature representations are then fed into a LR classifier for defect classification.Our experiments allow an in-depth analysis and comparison of the transferability of features from consecutive versus parallel convolutional architectures.
We evaluate the models on a surface defect classification dataset, particularly the NEU database (https://www.kaggle.com/datasets/kaustubhdikshit/neu-surface-defect-database).It consists of six different types of surface defects, i.e., crazing, inclusion, patches, pitted surface, rolled in scale and scratches, respectively [12]. Figure 1 illustrate samples of the different defects investigated.The dataset consists of 300 samples for each type of defects.It is worth noting that the 70:15:15 stratified split ratio was carried out for training, validation and testing dataset.A Python IDE, viz.Spyder (running on Python 3.7) was used to carry out the analysis.In the study, the default LR from the sklearn library hyperparameters were used whilst the Keras library was evoked for the VGG16 and InceptionV3 architectures, respectively.The present study provides meaningful insights into the representational differences between these two CNN architectures and their ability to generalize for defect detection across different datasets.The classification accuracy (CA) as well as the confusion matrix (CM) are selected as the performance indicators to evaluate the performance of the formulated pipelines.

Results and Discussion
The results shown in Figure 2 demonstrate that using a pre-trained VGG16 model coupled with a logistic regression (LR) classifier was highly effective for classifying defects in this image dataset.This VGG16-LR pipeline achieved classification accuracies of 100%, 98% and 99% on the training, test and validation sets respectively, with no errors.This suggests the VGG16 model was able to extract meaningful features from the images that enabled accurate defect classification.However, combining the Inception V3 model with an LR classifier led to much poorer performance, with classification accuracies of only 91% and 92% on the test and validation sets, despite 100% training accuracy.

Conclusion
In conclusion, the pre-trained VGG16 model coupled with a simple logistic regression classifier provides an excellent pipeline for classifying defects in this image dataset, significantly outperforming the InceptionV3-LR approach.The VGG16 model is able to extract highly discriminative features from the images, enabling near perfect classification accuracy.Moving forward, the VGG16-LR pipeline could be deployed for real-time defect detection in manufacturing quality control.Additionally, further hyperparameter tuning and model optimization could potentially improve performance even further.In addition, the effect of different pre-trained transfer learning models as well as the combination of different classifiers shall be explored.Beyond this specific application, the results highlight the importance of selecting an appropriate pre-trained model for a given computer vision task and dataset.

Figure 1 .
Figure 1.Sample defect images from the NEU database.

Figure 3 (Figure 3 .Figure 4 .
Figure 3 (Testing CM) and Figure 4 (Validation CM) provides further evidence that the InceptionV3 model struggled to learn an effective feature representation compared to VGG16, leading to multiple misclassifications.The different defect classes, i.e., crazing, inclusion, patches, pitted surface, rolled in scale and scratches are denoted as 0, 1, 2, 3, 4 and 5, respectively.In summary, the VGG16 model demonstrated superior ability over InceptionV3 to extract discriminative features from this defect image dataset when used with an LR classifier.The VGG16-LR pipeline provided excellent defect classification performance, while the InceptionV3-LR pipeline had difficulty learning effective image features, resulting in poorer classification accuracy.