Explainable attention-based fused convolutional neural network (XAFCNN) for tire defect detection: an industrial case study

Ensuring tire quality is crucial in the manufacturing industry, particularly for race cars, where defective tires present a signi ﬁ cant safety risk. Visual inspection for defects in tires is crucial; however, identifying defects in complex, textured tires has been proven to be a challenging task. This paper tackles this challenge by introducing XAFCNN, an Explainable Attention-based Fused Convolutional Neural Network for tire defect detection. XAFCNN ’ s novel architecture, including a Special Attention Module ( SAM ) and custom CNN structure, coupled with Grad-CAM visualization, prevents over ﬁ tting, enhances local feature mapping, enables detection of small defects, and proffers valuable insights into the model ’ s reasoning, enabling con ﬁ dent interpretation of its predictions. The model was trained on a dataset from a leading global tire manufacturer, including 38,710 x-ray images of defective tires and 83,985 defect-free tire images, covering 15 defect types and 50 design patterns. The results demonstrate the model ’ s exceptional performance compared to literature, achieving a recall rate of 86.85%, a precision of 98.5%, an F1 score of 92.31%, and an overall accuracy of 95.40%. This research, with its substantial dataset and high-performing model, advances automated tire defect detection, satisfying the industry


Introduction
The exponential growth of the world population has resulted in a substantial surge in the need for transportation vehicles, including trucks, buses, and cars.In response to this increased demand, numerous firms have increased their production of tires, a crucial component of automobiles.Nevertheless, the annual return of defective tires at a significant rate of 7% leads to a substantial financial burden of $100 million in restitution.Despite the substantial allocation of resources to tire manufacturing, the industry continues to grapple with the task of guaranteeing the manufacture of superior products and minimizing the frequency of defective tire returns in order to reduce financial setbacks [1].In order to reduce the quantity of tires being returned, it is imperative to carry out comprehensive quality checks, which encompass the identification of defects through the utilization of x-ray imaging techniques.However, the tire manufacturing business now relies on manual labor to carry out this task, which leads to significant delays that consume valuable time and incur substantial expenses.Moreover, the aforementioned procedure exhibits subjectivity, inefficiency, time consumption, and susceptibility to bias, necessitating substantial degrees of worker concentration and attentiveness [2].
Furthermore, the advancement of safety is a key driving factor behind the development and implementation of tire detection systems.The presence of faulty tires has the potential to result in vehicular accidents, compromised vehicle handling, and ultimately fatal collisions.These technologies provide the timely detection and resolution of flaws, therefore mitigating the occurrence of accidents and preserving human lives [3].
The complex composite structures of tires pose a challenge in recognizing defects, as the presence of varied textures within and between tires' layers, as well as the low contrast of certain faults like bubbles in the tread and foreign objects in the bead, can complicate the process [4].In order to comprehend the intricate aspects of tires, specialized imaging devices known as x-ray cameras are employed to capture comprehensive 360-degree radiographic images of the tire structure.The provided images comprehensively depict the internal components and assembly process of the tire [5].The x-ray image showcases the detailed characteristics of the internal structure of a tire, wherein the apparent stripes in the image correspond to the arrangement of steel wires and rubber components.The unique configuration of these stripes may serve as an indicator of a specific defect present in the tire.The tire is equipped with steel wires that are uniformly distributed over its left, center, and right sections.A careful examination of these elongated x-ray images can reveal discrepancies that may be challenging to identify by alternative means [4].
Most of the tire fault detection systems discussed in the academic literature face two distinct categories of difficulties.One primary concern pertains to the existence of more than 200 unique designs and standards, leading to a considerable diversity of tire depictions.A further problem arises from the diverse range of defects observed in tire production, which exhibit distinct characteristics, encompassing over 20 different types of defects [1].
In recent years, there have been notable breakthroughs in the field of deep learning (DL), which have resulted in the development of creative solutions for several industrial problems, including tire defect detection [6].One of the most common branches of deep learning is the convolutional neural network (CNN), which is currently regarded as a highly popular deep learning architecture frequently utilized in many classification applications [7,8].Although CNN has exhibited impressive achievements in numerous practical applications, its intrinsic stochastic nature has the potential to erode confidence in the reliability of its results [9].In order to improve the dependability of CNN models, it is imperative to clarify the reasoning behind their decisions, thereby promoting transparency and cultivating confidence in the outcomes they produce.Consequently, the emergence of explainable artificial intelligence (XAI), such as Grad-CAM, has garnered increasing significance recently [10].
XAI aims to augment the transparency and dependability of outcomes generated by AI systems by providing visual elucidations in the shape of heat maps, which are developed using methodologies like Grad-CAM [11].Transparency plays a crucial role in the tire defect detection domain, as it is essential for fostering confidence among quality control operators, regulatory agencies, and other relevant stakeholders.The comprehension of the mechanisms and rationales behind the determination of a certain defect categorization by an AI system is crucial for the endorsement and implementation of such systems within the tire manufacturing sector.XAI enables operators to assess and validate the decisions rendered by AI algorithms.As a result, this tool assists operators in generating well-informed evaluations regarding the quality of the tire.The provision of this level of support not only enhances the overall effectiveness of defect identification but also imparts operators with a more profound comprehension of the variables that impact the output of the AI system [12].
Motivated by the significant financial burden and safety risks associated with defective tires, this work addresses the limitations of both manual inspection and existing AI methods in tackling these challenges.These limitations encompass the complex composite structures of tires, the diversity of tire designs and standards, and the existence of over 20 different defect types.Furthermore, existing literature lacks the application of attention mechanisms and XAI methods in the context of tire defect detection.In response, our research is positioned to revolutionize tire defect detection by enhancing its accuracy and reliability.This is achieved through the incorporation of attention mechanisms and XAI within the proposed CNN model and proposing the largest dataset to address the proposed problem.
The proposed methodology represents a cutting-edge fusion of two key components: the attention-based fused convolutional neural network (AFCNN) architecture and the Grad-CAM (gradient-weighted class activation mapping) technique.At the core of our approach lie the Special Attention Module (SAM) and the bespoke AFCNN structure.These elements were meticulously crafted to enrich the extracted features from tires, thereby significantly enhancing the model's capacity to accurately detect even the smallest defects.The utilization of Grad-CAM plays a pivotal role in our research as it enables the visualization and comprehension of the specific regions within an image that the model prioritizes when generating predictions.The inclusion of this process not only enhances our level of confidence in the outcomes of the model but also aids in the identification of accurate visual indicators that contribute to the detection of defects.Figure 1 depicts a graphical depiction of the proposed approach workflow.
Illustrated by the workflow diagram, this research paper's contributions can be succinctly summarized in the following manner: • This study represents a significant contribution to the field by effectively incorporating the Grad-CAM method into an attention-based fused CNN (XAFCNN) architecture to address intricate issues such as tire defect detection.The implemented methodology not only significantly improves overall efficiency but also cultivates a higher level of confidence in the process of identifying defects.
• A Special Attention Module (SAM) is proposed to mitigate the issue of overfitting and enhance the performance of the model by enhancing the limited feature maps.As a result, this model demonstrates an improved capability to detect small objects, such as foreign materials, within the data.
• The existing literature reveals a notable lack of publicly accessible datasets that involve x-ray imaging of tires.
The solution we propose addresses this gap by collecting and annotating a comprehensive dataset consisting of x-ray images of tires exhibiting fifteen distinct types of defects, originating from about fifty unique design patterns.This dataset is used for training and testing the proposed model, leading to effective identification of defective tires despite variations in specifications, designs, and types of defects.This particular challenge has not been adequately addressed in other research endeavors.
• The proposed approach is an automated end-to-end scheme for detecting tire defects.To validate the effectiveness and generalization of the method, comparative experiments were conducted.The results demonstrate the high performance and robustness of the approach, highlighting its potential for practical implementation in real-world tire inspection scenarios.
The structure of the paper consists of multiple sections.Section 2 offers a concise summary of the relevant literature.Section 3 is devoted to the presentation of the datasets that have been collected for the purpose of this investigation.The approach employed in this study is outlined in section 4. Section 5 provides an in-depth illustration of the experimental setup and a comprehensive explanation of the research findings.Finally, section 6 provides a conclusion and a comprehensive analysis of the study's findings, along with suggestions for possible directions for future research.

Related works
Deep learning (DL) techniques have become increasingly prominent in manufacturing settings due to their ability to autonomously acquire knowledge from data, identify underlying patterns, and make precise recommendations.These technologies have the capacity to significantly modify industrial operations, leading to the establishment of highly efficient smart facilities [13].In manufacturing, for example, DL models have the capability to extract meaningful information from imprecise sensory input, thereby contributing to the development of intelligent production systems.One of the key advantages of deep learning in comparison to classical machine learning is its ability to autonomously perform feature learning without the need for external involvement [14].
Within the field of deep learning, CNN is widely recognized as a fundamental technique that is highly regarded for its exceptional adaptability.This technology exhibits a broad spectrum of applications, encompassing tasks such as object identification, image classification, and recognition [15].The evolutionary trajectory of CNNs has culminated in seven distinctive categories, each characterized by a unique set of enhancements encompassing structural reformulations, regularization techniques, parameter optimizations, and more.The aforementioned categories encompass feature map exploitation, channel boosting, width, multipath, depth, spatial exploitation, and attention-based convolutional neural networks [16,17].
In recent years, a number of CNN-based approaches for the identification and classification of tire problems have been presented as potential solutions.For example, CNN faster regions are utilized for the purpose of identifying bubble faults in [18].In [19], researchers have introduced a multi-column CNN (MC-CNN) model, while in [20], an AlexNet-based classifier has been developed for this purpose.These methods leverage x-ray images that contain six different types of defects, including Normal-Cords (NC), Bulk-Sidewall (BS), Cords-Distance (CD), Belt-Joint-Open (BJO), Sidewall-Foreign-Matter (SFM), and Belt-Foreign-Matter (BFM).Another approach, as suggested in [21], involves utilizing a pre-trained VGG16 model along with a fully convolutional network (FCN) to detect tire defects.However, it is worth noting that this approach only focuses on four types of defects in both the tread and sidewall tire images.Despite these efforts, there remains a need for continued research to develop more accurate and efficient methods for tire defect detection and classification.
The TireNet model, proposed in [1], is an end-to-end technique for practical use in x-ray image-based tire defect identification.It utilized the Siamese network as part of a downstream classifier to collect faulty features.Inspired by periodic features of tire x-ray images, the model achieved a miss rate of 0.17%, outperforming YOLO, SSD, and Faster R-CNN in terms of recall metric.The labeled dataset used in this research consisted of 120,000 tire images (100,000 qualified tires and 20,000 defective tires) and was compared to YOLO, SSD, and Faster R-CNN, achieving better results in terms of the recall metric.
In [22], a two-stage CNN model is developed for tire defect detection by merging an improved pyramid scene parsing network with an optimized YOLOv3.The model is tested on six types of defects using the CIoU loss function and achieves an average precision of 91.39%.Later, the authors propose another model based on a deep convolutional sparse-coding network (DCScNet) in [23].This model utilizes sparse coding to extract tire features and achieves an accuracy of 96.8% when tested on the same dataset.Recently, an enhanced YOLO network with attention mechanism (TD-YOLOA) for tire defect identification was proposed [24].The authors address the limitations of existing methods by introducing an efficient layer aggregation network (ELAN), a spatial pyramid pooling with cross-stage partial convolution (SPPCSPC), and a convolutional block attention module (CBAM).The authors evaluate their method on a tire common defects dataset and achieve a 91.3% mean average precision (mAP) and 9.28 ms for a tire subimage, which is 0.5% and 0.65 ms better than the existing methods.
In recent times, generative adversarial networks (GANs) have been applied in the realm of tire defect detection.For example, in [25], a Wasserstein Generative Adversarial Network (WGAN) method is presented for unbalanced tire x-ray defect identification.The method uses WGAN to produce high-quality and diversified tire x-ray fault images, thereby addressing the problem of imbalanced datasets.As an improved WGAN for minority classes, the WGAN is developed using a pre-trained model to handle feature similarity of various defect grades in the same type.In addition, the performance of an improved deep CNN model is reorganized for defect images classification.The tire defect detection experiments reveal that the suggested approach outperforms widely used models in terms of classification performance of imbalanced tire x-ray defects.

Methodology and materials
This section provides an in-depth analysis of the dataset used as a fundamental element in this investigation.Furthermore, it offers a detailed explanation of the proposed theoretical framework, accompanied by a thorough analysis of its practical application.

Dataset description
Throughout the tire manufacturing process, a wide range of malfunctions may arise at different phases, spanning from the initial creation of the tire material through the subsequent cooking of the tire carcasses, which are manufactured using various methods.An illustration of this phenomenon is the existence of extraneous substances that have the potential to be integrated into the tire composition at any stage throughout the tire production procedure.Metal detectors are capable of detecting foreign materials that include metal, but non-metallic and undetectable foreign items can only be identified through the use of radiographic quality control systems.
An additional category of malfunction pertains to defects occurring in the joints of the textile and metallic belts employed inside the internal framework of the tire.These defects may arise from several factors, including layer overlapping, open overlap of the joint, incorrect angles or slippage in joint formation, horizontally offset joints, and infeasible end-to-end joining.The cords, yarns, and wires incorporated into tire belts are strategically arranged at diverse angles, typically employing many belts to achieve a diagonal orientation.Nevertheless, the act of coiling ropes or wires in a parallel manner, as opposed to a diagonal arrangement, can also result in inaccuracies.
The tires employed in this investigation encompass several forms of defects, and it is crucial to detect and address them in order to guarantee the manufacture of tires of superior quality.Figure 2 displays a selection of x-ray images depicting various instances of these faults.
The experimental dataset included in this study was obtained from the renowned Pirelli Automobile Tyres İzmit Factory, situated in Turkiye.In order to commence the process of data collection, we acquired 'long' x-ray images that were obtained by the utilization of x-ray cameras.These images comprised the primary dataset, encompassing both defective and acceptable tires.Given the relatively low frequency of damaged tires in realworld manufacture, we conducted data collection over a period of eighteen months.As a result, a total of 4,912 images representing defective tires and 83,985 images representing qualifying tires were successfully compiled.These images underwent thorough examination and classification by the quality inspectors at the Pirelli Factory.The presence of an imbalanced dataset has been acknowledged, prompting the implementation of approaches to mitigate this concern.To tackle this issue, a sophisticated augmentation methodology has been employed.This entailed augmenting the quantity of defective tire images with the objective of attaining a dataset that is more evenly distributed.

Data augmentation
Data augmentation is a widely employed strategy in the field of artificial intelligence that aims to enhance the size of the training dataset through the generation of synthetic data.This is accomplished by implementing minor modifications to preexisting data as opposed to gathering novel data.Data augmentation techniques, such as warping or purposeful oversampling, can be employed to enhance the available data by introducing certain circumstances like translation, rotation, or scale.Through the deliberate generation of more data, models can be trained with greater effectiveness and resilience.However, it is imperative to guarantee that the augmentation technique employed accurately represents real-life situations in order to get the best possible performance during the training process.Therefore, in this study, we have exclusively employed augmentation techniques such as vertical shifting and brightness alteration to generate supplementary images.The selection of these procedures is conducted with careful consideration in order to replicate authentic deviations that may arise during tire manufacturing operations, hence enhancing the resilience of the model.The utilization of these methodologies enables the generation of modified versions of the initial images that retain the same types of defects that could potentially appear in actual tires.This approach ensures that the model, which is trained on these augmented images, possesses robustness and the ability to effectively extrapolate to novel, unobserved data.The utilization of alternative augmentation approaches that deviate from real-world scenarios or include irrelevant changes may lead to overfitting, a phenomenon in which the model demonstrates high performance on the training data but performs badly on unseen data.Hence, it is imperative to meticulously select appropriate augmentation methodologies that strike a harmonious equilibrium between the imperative for diverse and realistic data and the necessity for model generalization and robustness.
In order to execute the proposed augmentation strategy, we followed a series of procedures that guaranteed the efficacy and precision of our dataset.
• Initially, the dataset consisting of 4,912 defective images was divided into two subsets: a training subset comprising 80% of the original dataset and a testing subset including the remaining 20%.
• In order to ensure the integrity of the testing process, the augmentation technique was implemented separately on each subset, thereby successfully preventing the introduction of any bias.As a consequence, there has been a notable rise in the number of defective tire images, resulting in a cumulative count of 38,710.
• Finally, the initial set of 83,985 non-defective photos has been divided into two subsets: 80% of the images have been allocated for training purposes, while the remaining 20% have been set aside for testing.The utilization of this methodology ensured the appropriate allocation of images among the subsets and the precision of our outcomes.
• The expert conducted a thorough examination of all the flawed photographs to verify their fidelity to the original images.

Proposed model
Our methodology as shown in figure 1 unfolds with a systematic sequence that commences by collecting, labeling, and splitting the x-ray tire images into training and testing.Subsequently, we augment and normalize the class of defective tires through a custom augmentation technique.Following this, each image is resized to 1000 × 500 × 3 dimensions and then inputted into our proposed XAFCNN model.The XAFCNN model incorporates a combination of Special Attention Module (SAM), Depthwise Separable Convolution (DSC) [26], Separable Convolution (SC) [27], and Convolutional Batch Normalization (CBN) layers [28], working together to effectively extract important local features that are inherent to tire images.The step-by-step construction of the proposed XAFCNN model is illustrated in figure 3. The innovative XAFCNN architecture we propose represents a fusion of Special Attention Module (SAM) and convolution layers, offering a hybrid solution to solve numerous limitations in conventional convolutional networks.Our XAFCNN model aims to accurately identify local features in tire images, especially small objects, which have presented recognition issues in the literature for the following reasons.Small objects lose context during image preprocessing and downsampling for model input, making them hard to recognize.If the model cannot handle this loss, it may miss detections.Existing tire defect detection models may not be scalable enough to recognize small items, and their unique properties may not be extracted.
The XAFCNN architecture we propose addresses these restrictions in multiple ways.In preprocessing, each image is carefully processed at 1000 × 500 × 3 to retain as much information as possible about small objects.Second, our model is carefully built to extract unique features from small objects by extending the width of filters while maintaining depth to avoid overfitting.We achieve this through the introduction of a Special Attention Module (SAM), as detailed in figure 4, wherein varying filter depths are employed for each filter, allowing for more effective feature extraction and thereby enhancing the model's capacity to recognize small objects with precision.Figure 1 shows a unique inception layer with changing filter depths for each filter, which improves feature extraction and the model's ability to distinguish small objects.
The architecture of the proposed XAFCNN model consists entirely of SAM, DSC, and CBN layers, categorized into three blocks.The data first enters the A block, followed by four iterations through the B block, and finally passes through the C block. Figure 3 presents a detailed overview of the three blocks.In the SAM layer, the input undergoes DSC using three different-sized filters (1 × 1, 3 × 3, and 5 × 5), followed by maxpooling and 1 × 1 convolution.The outputs are merged and then passed to the subsequent layer.This process is followed by a max-pooling layer to reduce the number of features.Batch normalization is applied after each convolution and SC layer.All SC layers use a depth multiplier of 1 (no depth expansion).The A block comprises two SAM layers, with filter depths gradually increasing twice.The SAM layer's filter depths are constant in the B and C blocks.All sizes were selected to match the size of residual layers added in the model structure flow graph.

Special attention module (SAM)
One of the key contributions of this study was the introduction of the lightweight Special Attention Module (SAM).The proposed module utilizes the depth-separable convolution layers with different filter sizes, as depicted in figure 4. The SAM module utilizes an attention mechanism to selectively attend to various regions within the input image.This is achieved by employing varying filter sizes, resulting in enhanced feature extraction capabilities and increased robustness.Furthermore, it offers models with a more profound comprehension of the contextual aspects of images.Therefore, it facilitates the precise identification and spatial determination of objects inside images through the selective focus on prominent object characteristics.The model's power to perceive and interpret complicated scenarios is enhanced by its ability to record contextual relationships.
Depthwise separable convolutions offer several different advantages in comparison to standard convolutions.Firstly, they aid in mitigating overfitting by reducing the number of parameters involved.Secondly, they exhibit decreased computing costs, making them particularly advantageous for real-time applications such as tire defect detection.Nevertheless, the act of decreasing the number of parameters can potentially result in a decline in the performance of the model.The proposed module integrates the advantages of DSCs with the capability to extract more comprehensive features through the utilization of the SAM layer.Additionally, this approach aids in mitigating overfitting and minimizing the number of parameters inside the model.

Experimental results
The subsequent section is organized into four distinct components.The evaluation metrics employed in this study are initially stated.Subsequently, the experimental configuration is clarified.Following this, a comprehensive analysis of the experimental findings is presented.Subsequently, an examination and evaluation of past studies employing classification-based methodologies is offered.

Evaluation metrics
The evaluation of the proposed model provided in this study involves the utilization of multiple metrics, such as accuracy, precision, recall, F-score, Receiver Operating Characteristic curve (ROC), AUC, and confusion matrix [29,30].The equations used for calculating these metrics are as follows.( ) The terms TP and TN are used to denote the frames that have been correctly predicted for the classes of defected and non-defective tires, respectively.In contrast, the terms FP and FN denote the frames that have been predicted incorrectly for the classes of defected and non-defective tires, respectively.

Experimental setup
The implementation of the proposed approach requires substantial resources for training such a large quantity of the dataset at hand.As a result, the Nvidia GeForce RTX 3060 GPU was utilized in our research experiments.Moreover, the 80/20 principle is commonly referenced in scholarly literature as a recommended approach for partitioning training and testing datasets [31].Hence, we proceeded to randomly shuffle all the images in the study and afterward separated them into two distinct sets: 80% for the purpose of training and 20% for validation.

Experimental results
The goal of this section is to thoroughly evaluate and compare the performance of the proposed XAFCNN model in the context of tire defect detection.To assess its effectiveness, we perform a thorough examination on the testing dataset using confusion matrices, as shown in figure 5.The models chosen for comparison, AlexNet and MC-CNN, were chosen due to their documented success in addressing similar challenges in the literature.
This evaluation goes beyond simply comparing models; it includes a careful examination of classification metrics derived from the confusion matrix, as shown in figure 6.These metrics provide a more complete understanding of the XAFCNN model's capabilities in tire defect detection by looking at important performance metrics such as accuracy, precision, recall, and F1-score.Results indicate a progressive improvement from AlexNet to MC-CNN, with XAFCNN outperforming both.XAFCNN achieves the highest accuracy, precision, recall, and F1-score, making it a promising choice for the current problem.This indicates the model demonstrated robust performance in effectively identifying positive cases with a high level of accuracy (recall), decreasing the occurrence of incorrect positive identifications (precision), and establishing a desirable balance between recall and precision (F1-score).Moreover, it achieved a significant degree of accuracy in all aspects.
To enhance the thoroughness of the evaluation and analysis of the model presented, additional metrics such as the receiver operating characteristic (ROC) curve and the area under the curve (AUC) score have been utilized.The incorporation of these measures into the assessment procedure provides a more comprehensive evaluation of the efficacy of the models.The comprehensive evaluation of the proposed XAFCNN model's performance in accurately categorizing tire images within the testing dataset is facilitated through the application of confusion matrices, classification metrics, receiver operating characteristic (ROC) curves, and area under the curve (AUC) scores.

XAI Grad-CAM analysis
This subsection presents XAI Grad-CAM heatmaps for selected samples from both the defective and nondefective classes, as generated by the proposed model.The heatmaps are utilized as visual representations to demonstrate the specific regions within the input images that played a crucial role in influencing the decisionmaking mechanism of the model.The heatmaps presented in this analysis depict regions that are represented by varying shades of red, indicating their substantial influence on the decision made by the model.Conversely, places that are emphasized in blue suggest their limited impact on the final determination made by the model.The visual representation depicted in figure 8 illustrates the Grad-CAM heatmaps, which serve to provide valuable insights into the decision-making process of the model for both tire samples that exhibit defects and those that do not.Upon examination of the defective class, it becomes evident that the suggested model is significantly impacted by particular regions within the input images.This is supported by the observation of reddish patches that correspond with the defected areas in the images.In contrast, while examining nondefected images, the presence of reddish patches is rather infrequent.
This finding highlights the model's capacity to accurately identify and rank the important regions within the input images, hence enabling it to make well-informed recommendations.The Grad-CAM heatmaps offer significant visual indicators that enrich our comprehension of how the suggested model analyzes data and discerns relevant features within the tire samples.

Comparison with Previous
Works table 1 provides a detailed evaluation of the performance of the XAFCNN model presented in this study for the purpose of detecting tire defects.The present analysis offers a direct comparison to other cutting-edge approaches, hence offering useful insights into the capabilities of the model in detecting tire defects across different levels of complexity.The table sheds light on a significant element, which is the increasing difficulty in identifying tire defects as the number of defect types increases.The difficulty of differentiating between several defect categories is compounded by their growing complexity.The table not only functions as a means of evaluating performance but also highlights the ability and effectiveness of the proposed XAFCNN model in managing the increasing complexity.
The table clearly illustrates the variations in accuracy among past models, which were shown to be dependent on the exact number of defect types they were designed to handle.The previous models demonstrated higher levels of accuracy when subjected to a limited range of defect types during testing.Nevertheless, the efficacy of these methods seems to decline with the increasing diversity of fault types.This finding highlights the inherent challenge in accurately classifying a diverse array of defect categories since the task of differentiating between complex and diverse flaws becomes increasingly difficult.
The value of the proposed model is augmented by the distinctiveness of the dataset.The dataset comprises a compilation of fifty distinct tire pattern designs, characterized by varying structures and proportions.The work at hand poses a considerable challenge due to its intrinsic complexity, as the model must possess the capability to detect faults encompassing various patterns and their associated intricacies.The resilience and adaptability of the model are demonstrated by its ability to achieve outstanding performance despite the presence of these difficulties.
By means of synthesizing and displaying these performance measures, table 1 serves the purpose of not only quantifying the success of the model but also establishing a benchmark for researchers and practitioners involved in the field of tire defect identification.The evaluation process allows for a comprehensive analysis of the model's capabilities and constraints in effectively addressing the complexities of multi-class defect detection scenarios, hence making valuable contributions to the progress of the discipline.

Discussion and conclusion
This study represents a significant advancement in the field of automated tire defect detection, addressing key challenges that have historically plagued this domain.Traditional approaches have struggled due to the complex, anisotropic multi-textured rubber layers present in both intra-tire and inter-tire constructions, compounded by the diverse range of tire designs (over 200) and the multitude of potential defects (over 20 types).Additionally, existing models in the literature lack interpretability, hindering user confidence in their results.Our proposed XAFCNN architecture addresses these shortcomings by leveraging a novel fusion of two key components: the attention-based fused convolutional neural network (AFCNN) and the Grad-CAM technique.The core of this approach lies in the Special Attention Module (SAM) and the bespoke AFCNN structure, which extract richer features from tires, enabling accurate detection of even subtle defects.Grad-CAM further enhances interpretability by visualizing the specific regions within an image that the model prioritizes during prediction.
This study utilizes a unique and extensive dataset of 38,710 x-ray images of defective tires and 83,985 images of defect-free tires, acquired from a leading global tire manufacturer.Our proposed XAFCNN model significantly outperforms existing models like AlexNet and MC-CNN, achieving a recall rate of 86.85%, precision of 98.5%, F1-score of 92.31%, and a remarkable total accuracy of 95.40%.These metrics demonstrate the model's exceptional ability to navigate the complex characteristics of diverse defect types and rubber materials.
The findings of this study demonstrate significant progress in the field of automated tire problem detection.The present study successfully addresses an important industry demand through the introduction of a unique dataset and a highly competent model, XAFCNN, which fulfills the demand for precise and reliable inspection methods.Notably, XAFCNN has been successfully deployed on a manufacturing line, demonstrating its efficacy in assisting decision-makers and enhancing inspection speed and accuracy.Despite surpassing an impressive accuracy threshold of 95%, there remains a discernible imperative for an even more accurate model to comprehensively address the identified issues.

Figure 2 .
Figure 2. Sample x-ray images for tire defects.

Figure 5 .
Figure 5.Comparison of confusion matrix on the testing dataset: Evaluating the performance of the proposed XAFCNN model against other models documented in the literature.

Figure 6 .
Figure 6.Comparison of classification metrics on the testing dataset: Evaluating the performance of the proposed XAFCNN model against other models documented in the literature.

Figure 7 .
Figure 7.Comparison of receiver operating characteristic curve (ROC) and area under the curve (AUC) on the testing dataset: Assessing the performance of the proposed XAFCNN model in comparison to other models documented in the literature.

Figure 8 .
Figure 8. Heatmaps of Grad-CAM for some defected and non-defected tire samples.

Table 1 .
Comparison with Previous Works.