Efficient fabric defect detection based on lightweight model

Juncheng Zou; Fuli Yang

doi:10.1088/2631-8695/adb0a3

1. Introduction

The textile industry is a vital sector globally, contributing significantly through manufacturing processes that transform raw materials into finished fabrics. Quality control, particularly in detecting fabric surface defects, is critical for maintaining product quality, ensuring customer satisfaction, and reducing production costs associated with rework or waste disposal. However, the task of fabric defect detection is challenging due to variability in fabric textures, sizes of defects, and complexity of background patterns.

Traditional methods of fabric inspection have relied on manual visual examination by human operators, which is labor-intensive, time-consuming, and prone to errors due to fatigue or subjective judgment. Automated systems based on traditional feature extraction techniques, such as filter-based methods, statistical approaches, and K-Nearest Neighbors (KNN), have been developed to address these limitations. While these methods show effectiveness for simple patterns, they struggle with complex textures, varying lighting conditions, and subtle defects due to their reliance on hand-crafted features and limited ability to adapt to diverse fabric types [1].

Recent approaches in fabric surface defect detection have predominantly focused on deep learning models, particularly Convolutional Neural Networks (CNNs). While basic CNN architectures showed promising results, they faced limitations in processing speed and detection accuracy for small defects. The introduction of YOLO (You Only Look Once) frameworks attempted to address these issues by enabling real-time object detection [2]. However, these implementations often struggle with three key challenges: detecting small-scale defects, handling low-contrast anomalies, and maintaining accuracy across diverse fabric textures common in textile manufacturing.

To address these limitations, researchers have explored the integration of attention mechanisms into deep learning models. Channel-Wise Attention Module (CBAM) [3] and Deformable Convolution Feature Extraction Module (DCFE) [4] have demonstrated significant improvements in feature extraction and defect detection accuracy. For example, Zhao et al [5] addressed the challenge of processing speed and accuracy by combining attention mechanisms with multi-source fusion in a modified ShuffleNetV2 architecture. Wang et al [6] further improved small defect detection capabilities by developing YOLOv4-SA, which integrates spatial attention with YOLOv4, achieving notable improvements in mean Average Precision (mAP) particularly for minor defects.

Despite these advancements, significant challenges persist in the field of automated fabric defect detection. Current models struggle to simultaneously achieve real-time processing, maintain robustness across varying textures, and accurately detect minor defects [7]. While specialized models like TDB-YOLO with its small target detection layer and Bidirectional Feature Pyramid Network (BiFPN) [8], and FEM-Net with its cascading channel space attention module [9] have made progress, they often optimize for one aspect at the expense of others. The field lacks a comprehensive solution that can effectively address these challenges while maintaining practical applicability in industrial settings.

However, existing models often fall short in providing a comprehensive solution that addresses all challenges simultaneously. There is still a gap for a model that can handle small, low-contrast defects effectively while maintaining robustness across different textures and ensuring real-time processing capabilities. This necessitates the development of a novel approach combining advanced feature extraction techniques with attention mechanisms to improve both accuracy and efficiency in defect detection.

Motivated by these persistent challenges in fabric surface defect detection, this research aims to develop a novel model based on the YOLO framework that effectively addresses small, low-contrast defects and complex textile backgrounds. The specific objectives are: (1) integrating advanced feature extraction techniques to enhance feature representation; (2) incorporating attention mechanisms to focus on critical areas; and (3) improving both accuracy and efficiency in defect detection.

The rest of this paper is organized as follows: section 2 discusses related work, providing a comprehensive overview of existing methods and highlighting the gaps that our proposed approach addresses. Section 3 outlines our proposed method in detail, including its architecture, the integrated attention mechanisms, and the advanced feature extraction techniques employed. Section 4 presents the experimental setup and dataset used for training and evaluation. Section 5 reports the results and discusses their implications, demonstrating the superior performance of our approach. Finally, section 6 concludes the paper and outlines potential directions for future research.

2. Related work

The textile industry is a vital sector globally, contributing significantly to economies through manufacturing processes that transform raw materials into finished fabrics. One critical aspect of this industry is quality control, particularly in detecting fabric surface defects. The identification and classification of such defects are paramount for maintaining product quality, ensuring customer satisfaction, and reducing production costs associated with rework or waste disposal. However, the task of fabric defect detection is fraught with challenges due to the variability in fabric textures, sizes of defects, and the complexity of background patterns.

2.1. Surface defect inspection

Recent advancements in fabric surface defect detection have seen a shift towards machine learning and deep learning techniques to improve accuracy and efficiency. State-of-the-art methods primarily leverage deep learning, with notable approaches including ASC-YOLO, feature enhancement methods, and hybrid models combining YOLO and R-CNN. Li et al [2] introduced an improved Yolov4 algorithm, incorporating the Efficient Channel Attention (ECA)-DenseNet-BC-121 feature extraction network and the Dual Context Feature Enhancement (DCFE) module. The ECA-DenseNet-BC-121 enhances feature representation by focusing on important channels, while the DCFE module captures both local and global contexts to improve detection accuracy. Zhao et al [10] proposed SE-SSDNet, which integrates the Squeeze-and-Excitation (SE) module with the Single Shot MultiBox Detector (SSD) network to address the asymmetry problem in detecting large and small defects. The SE module adaptively recalibrates channel-wise feature responses, leading to better performance in identifying defects of varying sizes. The ASC-YOLO model enhances defect feature extraction through a novel architecture that incorporates global contextual relationships and attention mechanisms [11]. This approach effectively addresses the challenges posed by complex backgrounds and varying defect scales. Similarly, a method utilizing feature enhancement and neighboring information complementation, effectively addressing the challenges posed by complex backgrounds and varying defect scales [12]. Hybrid models, such as those combining YOLOv4 and R-CNN, demonstrate improved detection accuracy and real-time processing capabilities [13]. A comprehensive survey [14] reviewed state-of-the-art methods, highlighting the transition from traditional feature-based approaches to learning-based algorithms. Traditional methods, such as filter-based and feature-based techniques, have been effective for specific textures but struggle with complex patterns. Despite these advancements, challenges remain, including the need for real-time processing, robustness to varying fabric textures, and the ability to detect minor defects accurately. Limitations persist, such as the reliance on large datasets for training, potential overfitting, and difficulties in detecting small or subtle defects due to background noise [15]. Traditional methods, such as filter-based and feature-based techniques, have been effective for specific textures but struggle with complex patterns. For instance, K-Nearest Neighbors (KNN) can be effective for simple defect detection tasks but may not match the accuracy of deep learning approaches in more complex scenarios [1]. Despite these advancements, several challenges remain in the field of surface defect detection. Real-time processing is essential for practical applications but can be difficult to achieve with complex models. Robustness to varying fabric textures and the ability to detect minor defects are also critical areas that require further research. In summary, deep learning approaches have significantly advanced the field of surface defect detection by addressing the limitations of traditional methods. However, challenges such as real-time processing, robustness to varying textures, and detection of minor defects persist. Future research should focus on developing more efficient architectures that can operate in real-time while maintaining high accuracy. Additionally, integrating attention mechanisms and exploring hybrid models could lead to further improvements in performance. Addressing the generalization issue across different fabric types is also a critical area for future investigation.

2.2. Feature extraction and fusion

In recent years, significant efforts have been devoted to developing effective feature extraction and fusion techniques for enhancing surface defect detection accuracy and efficiency in fabrics. We categorize these methods into three primary groups based on their underlying approaches: attention-based, multi-scale, and transfer learning-based methods. Zhang et al [16] proposed a dual-structure attention-based multi-level feature fusion network (DaMFFN) to address the challenges of detecting small defects and intra-class variations. Hu and Lin et al [17] introduced DFFNet, a lightweight fusion network that employs partial convolution and a Feature Enhancement Aggregation Module (FEAM) to optimize feature fusion and improve defect localization with low computational complexity. Another approach by Fu et al [18] involved a rich feature extraction and fusion model that effectively combines semantic and detailed information for surface defect inspection. In the domain of PCB surface defect detection, a few-shot learning method based on feature enhancement and multi-scale fusion was proposed to improve the recognition of small defects [19]. Additionally, a machine vision detection method using multi-feature extraction was developed for the automatic classification of weld surface defects [20]. The TDB-YOLO model integrates a small target detection layer and a Bidirectional Feature Pyramid Network (BiFPN) to enhance feature extraction and fusion [8]. FEM-Net introduces a cascading channel space attention module and an Adaptive Weighted Feature Fusion Module to effectively blend multiscale features [9]. Furthermore, a feature-based transfer learning approach using pretrained CNNs like VGG16 and InceptionV3 has shown promising results [21]. These techniques collectively enhance detection performance, particularly in complex industrial environments, by improving feature representation and robustness against disturbances. Despite these advancements, challenges such as real-time processing, robustness to varying textures, and accurate detection of minor defects remain. Future research should focus on addressing the limitations of these approaches, such as enhancing robustness to varying textures and improving minor defect detection accuracy, to develop more efficient and practical fabric defect detection systems.

2.3. Attention mechanisms

The application of attention mechanisms has significantly improved surface defect detection systems by addressing key challenges such as feature extraction, multi-scale integration, and handling complex defect characteristics. We categorize recent advancements in this area based on the type of attention mechanism employed and their performance on diverse datasets. Recent advancements in this area have demonstrated that incorporating attention modules, such as Channel-Wise Attention Module (CBAM) [3] and Deformable Convolution Feature Extraction Module (DCFE) [4], enables improved focus on critical defect features. This leads to higher detection accuracy, particularly in challenging environments like aluminum profiles, welding defects, and transparent materials with small, low-contrast defects. Notable works include Zhao et al [5], who proposed a fast PCB surface defect detection method combining an attention mechanism and multi-source fusion using a modified ShuffleNetV2 with CBAM to emphasize valid information. Wang et al [6] introduced YOLOv4-SA, integrating spatial attention with YOLOv4 for tiny defect detection, which resulted in substantial improvements in mean Average Precision (mAP). Moreover, optimizing attention mechanisms through modules like SE has been shown to significantly enhance performance in noisy environments, such as steel surface detection [22]. Haitao et al [7] developed DMSA-YOLOv3, employing a dual multi-scale attention mechanism to address low efficiency and poor accuracy in micro-scale defect detection. Despite these advancements, several challenges persist, including real-time processing, robustness to varying textures, and accurate detection of minor defects. Therefore, future research should focus on addressing these limitations to develop more robust and efficient defect detection systems that can effectively handle diverse industrial applications. Overall, the synergy between advanced feature extraction techniques and tailored attention mechanisms is crucial for enhancing the precision and efficiency of surface defect detection across various domains.

3. Method

3.1. YOLOv8-mini

YOLOv8 [23] was developed by the Ultralytics research and development team. This model has been further optimized based on YOLOv5. It introduced the CSPNet backbone network to enhance the feature extraction ability, adopted the FPN (Feature Pyramid Network) and PAN (Path Aggregation Network) neck structures to achieve multi-scale object detection, and achieved the transformation to the anchor-free detection method. The YOLOv8m model has approximately 25 million parameters and is positioned as a medium-sized model.

YOLOv8 employs the advanced Convolutional Neural Network (CNN) backbone, CSPDarknet, which can efficiently extract multi-scale features from input images. Through the optimization of efficient layers such as depthwise separable convolution, CSPDarknet can capture hierarchical feature maps ranging from low-level textures to high-level semantic information. In terms of feature fusion, the neck module of YOLOv8 uses the optimized Path Aggregation Network (PANet), improving the information flow among different feature levels. The head module of YOLOv8 adopts the anchor-free method for bounding box prediction, simplifying the prediction process and reducing the number of hyperparameters, thus enhancing the adaptability of the model to objects with different aspect ratios and scales.

In the YOLOv8-mini model, the CSPPC lightweight module was introduced to replace the C2f module in the Backbone, as shown in figure 1. This move not only reduced the number of parameters and the amount of computation of the model but also effectively decreased the redundancy of the feature maps. Subsequently, the MLCA attention mechanism was adopted in the Neck part to enhance the expression of effective information in the feature maps and further improve the accuracy of fabric defect localization. Finally, by using BiFPN to reconstruct the feature extraction network, the bidirectional flow of feature information among different scales was achieved, thereby enhancing the efficiency of feature extraction.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** The architecture of YOLOv8-mini.
Download figure:
Standard image High-resolution image

3.2. Cross stage partial and partial convolution

In this section, we address the challenges associated with traditional target detection models in fabric defect detection. These models often have a large number of parameters and complex computations, which can lead to inefficiencies in processing time and memory usage. To overcome these limitations, we propose a novel lightweight module called Cross Stage Partial & Partial Convolution (CSPPC).

The CSPPC module is designed to adapt to the subsequent network structure while reducing computational effort. Its core lies in the fusion of two 1 × 1 convolutional layers and several CSPPCBottleneck modules. The first 1 × 1 convolution layer scales the input features, enabling efficient data flow through Split method. Subsequently, multiple CSPPCBottleneck modules are applied for feature processing. Each bottleneck module combines features from different paths and then integrates them using a 1 × 1 convolution layer to restore scale and output. Figure 2 shows the internal network structure of the CSPPC module.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** CSPPC structure.
Download figure:
Standard image High-resolution image

The proposed CSPPC module leverages the basic principle of Partial Convolution (PConv [24]), and inherits the design concept of CSP Cross Stage Partial Network (CSPNet [25]). Following PConv's principle, our CSPPC module divides input channels into two groups: one group undergoes conventional convolution operations while the other remains unchanged. Specifically, we achieve this through the CSPPCBottleneck module, where each module applies convolution operations to only half of the input channels, significantly reducing computational complexity while maintaining feature diversity. The unchanged channels maintain the original feature information, creating a balanced trade-off between feature extraction and computational efficiency. This selective processing approach not only optimizes redundancy in feature maps but also enables more efficient memory utilization during the detection process. The CSPPC module addresses the limitations of traditional target detection models by providing a more efficient processing mechanism.

3.3. Coordinate attention

In fabric defect detection, traditional target detection models often face challenges such as information overload and reduced accuracy. To overcome these limitations, we employ a novel attention mechanism called Coordinate Attention (CA [26]). CA enhances the detection performance of a network by embedding location information into channel attention. The motivation behind using CA in fabric defect detection is to effectively manage the high volume of visual information present in textile images while maintaining detection accuracy.

By embedding positional information into channel attention, CA creates a hierarchical information processing mechanism. This mechanism first identifies spatially relevant regions in the fabric image based on coordinate relationships, allowing the model to prioritize areas that are more likely to contain defects. The spatial-channel correlation helps filter out regions that are statistically less likely to contain defects, thereby reducing the processing load on the network. This targeted approach is particularly effective in textile inspection, where defects often exhibit distinct spatial patterns and distributions.

As shown in figure 3, the CA attention mechanism utilizes two parallel one-dimensional feature coding processes that aggregate features along two spatial directions, respectively. This bidirectional feature aggregation creates a comprehensive spatial context map that helps the model understand both local and global spatial relationships. The horizontal and vertical feature encoding enables the model to capture defect patterns that may extend in different orientations, which is crucial for identifying elongated defects such as yarn breaks or weaving irregularities. The CA attention mechanism then normalizes and activates the feature mapping to enhance the differentiation of features, ensuring that the most relevant spatial-channel correlations are emphasized.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** CA structure.
Download figure:
Standard image High-resolution image

This coordinate-guided attention allocation significantly improves the model's ability to process complex textile images by creating an efficient information filtering system. By establishing clear priorities in feature processing based on spatial relevance, CA effectively reduces the cognitive load on the network while maintaining high detection accuracy.

3.4. Mixed local channel attention

In fabric defect detection, capturing local and global information is crucial for achieving high accuracy and efficiency. To address this challenge, we introduced the Mixed Local Channel Attention (MLCA [27]) module, which combines local and channel attention to improve feature representation and detection accuracy, as shown in figure 4. The MLCA module processes input features through local averaging pooling (LAP) and global averaging pooling (GAP). LAP captures local information by focusing on the feature details of a small region, while GAP captures global context by capturing the overall information of the entire feature map. The resulting features are then transformed by a 1D convolution, which compresses channels while maintaining spatial stability.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** MLCA structure.
Download figure:
Standard image High-resolution image

After rearrangement, the local and global pooling features are fused using multiplication and addition operations to incorporate global context into feature maps. This process helps to enhance the model's focus on useful features and improve its ability to capture features at different scales. Finally, UNpooling operations restore features to their original spatial dimension, allowing the model to effectively utilize both local and global information. By leveraging

MLCA's ability to capture local and global information, we improved the accuracy and efficiency of our fabric defect detection system.

3.5. Bi-directional feature pyramid networks

In fabric defect detection, efficient feature fusion is crucial for achieving high accuracy and efficiency. To address this challenge, we introduced BiFPN as an optimized feature fusion network to improve the performance of our model.

BiFPN combines features of different scales more effectively by establishing bidirectional connections between top-down and bottom-up paths, allowing information between features of different scales to flow and fuse more efficiently [28]. This enables the network to pay more attention to features with more information and improves the robustness of our fabric defect detection system.

In our implementation (figure 5(b)), BiFPN is optimized for feature fusion. First, it removes nodes with only one input edge that contribute less to feature fusion, resulting in a simplified bidirectional network. Second, if the nodes are at the same level, BiFPN adds an extra edge from the original input to the output node in order to incorporate more features without adding too much cost. By leveraging BiFPN's efficient multi-scale feature fusion capabilities, we were able to improve the accuracy and efficiency of our fabric defect detection system.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** (a) Original neck part. (b) Improved neck part.
Download figure:
Standard image High-resolution image

4. Experiment

4.1. Dataset

The dataset used in this experiment is from the Xuelang Manufacturing AI Challenge [29]. The training data consists of 706 images, each with a size of 2560 × 920 pixels, and a total of 46 types of defects, including common defects such as scratch holes, stains, threading, thin fabric, etc To avoid sample imbalance and improve the robustness of the model, the sensitivity of the model to the image was reduced by applying data enhancement techniques. The original dataset was enhanced by reducing the model's over-dependence on some features, avoiding over-fitting, and enriching the original dataset. These techniques included random image rotation, cropping, translation, adding noise; flipping in the direction; and changing the brightness, chroma, and saturation of the sample. The data enhancement technique was carried out on 706 images, extending the dataset to 4236 images. This enhanced the generalization ability of the model, and then the training set, verification set, and test set were divided according to the ratio of 8:1:1.

4.2. Experimental setup

The experimental equipment parameters are shown in table 1. The input image size is 640 × 640; Each training session is 150 rounds; Batch size is 16; The initial learning rate is set to 0.01.

Table 1. Experimental setup.

	Parameters
CPU	Intel(R) 8352V @ 2.10 GHz
GPU	RTX 4090(24 GB)
CUDA	11.3
Python	3.8
Framework	PyTorch 1.10.0

4.3. Evaluation criteria

To verify whether the trained target detection model has good performance, it is necessary to use professional evaluation indicators in the field of target detection to evaluate the performance of the improved model comprehensively and accurately through these indicators, such as accuracy rate, recall rate, average accuracy, average accuracy mean, parameter number, floating point calculation amount, etc The significance of these evaluation indicators is briefly described below.

Precision (P): Precision refers to the proportion of samples correctly judged by the model as positive examples. It focuses on the proportion of samples predicted by the model as positive examples that are really positive examples, so as to ensure the accuracy of prediction results.

Recall (R): Recall is different from accuracy in that it focuses on the proportion of positive cases correctly identified by the model to all actual positive cases. It is concerned with the ability of the model to cover the total number of real positive cases, that is, how many of the actual positive cases are successfully predicted by the model, thus ensuring that as few real positive cases are missed as possible.

Average Precision (AP): AP is the area under the accuracy-recall curve, and the Mean Average Precision (mAP) is the average of the average accuracy across all categories. The calculation methods for these evaluation indicators are:

$\begin{eqnarray}&&P=\displaystyle \frac{TP}{TP+FP}\end{eqnarray} \tag{ 1 }$

$\begin{eqnarray}&&R=\displaystyle \frac{TP}{TP+FN}\end{eqnarray} \tag{ 2 }$

$\begin{eqnarray}&&AP=\displaystyle {\int }_{0}^{1}P\left(R\right)dR\end{eqnarray} \tag{ 3 }$

$\begin{eqnarray}&&mAP=\displaystyle \frac{\displaystyle {\sum }_{i=1}^{N}A{P}_{i}}{N}\end{eqnarray} \tag{ 4 }$

5. Results and discussion

5.1. Implementation details

This experiment aims to comparatively evaluate and analyze the performance of six state-of-the-art target detection models, including SSD [30], EfficientDet [31], CenterNet [32], YOLOv7 [33], PRC-Light YOLO [34], YOLOv8m and YOLOv8-mini(our improved model), on fabric defect detection tasks. We aim to assess their trade-offs between accuracy, recall, computational efficiency, and model size, as shown in table 2.

Table 2. Comparison of different models.

Models	P/%	R/%	Para/M	GFLOP/G	mAP50/%
SSD [30]	83.2	52.9	105.2	87.4	73.1
EfficientDet [31]	78.7	49.2	52.1	34.9	69.7
CenterNet [32]	83.8	54.7	125	69.6	75.6
YOLOv7 [33]	82.6	69.7	37.2	105.2	75.4
PRC-Light YOLO [34]	85.9	78.4	30.5	83.6	83.0
YOLOv8m [23]	98.0	91.0	25.9	79.3	89.3
YOLOv8-mini	96.8	93.0	18.8	65.1	93.3

It can be seen that EfficientDet, as an efficient target detection model, has the best GFLOPs but relatively low mAP50 compared with other detection algorithms. This is because it employs EfficientNet as the backbone network, which reduces computing costs by sacrificing some precision.

Compared with existing target detection models, the proposed YOLOv8-mini demonstrates significant improvements across multiple performance metrics. The recall rate (R) showed substantial enhancements: 40.1% improvement over SSD, 43.8% over EfficientDet, 38.3% over CenterNet, 23.3% over YOLOv7, 14.6% over PRC-Light YOLO, and 2% over YOLOv8m. These improvements in recall rate indicate that YOLOv8-mini is more effective at identifying true positive cases, particularly important for fabric defect detection where missing defects can be costly.

In terms of model parameters (Params), YOLOv8-mini achieved remarkable reductions: 82.1% decrease compared to SSD, 63.9% to EfficientDet, 84.9% to CenterNet, 49.4% to YOLOv7, 38.3% to PRC-Light YOLO, and 27.4% to YOLOv8m. This significant reduction in parameters demonstrates the model's efficient architecture design and its potential for deployment in resource-constrained environments.

The mean Average Precision (mAP50) metric also showed notable improvements: 20.2% increase over SSD, 23.6% over EfficientDet, 17.7% over CenterNet, 17.9% over YOLOv7, 10.3% over PRC-Light YOLO, and 4% over YOLOv8m. These improvements in precision, coupled with the higher recall rates, indicate that YOLOv8-mini achieves better overall detection accuracy while maintaining model efficiency.

Regarding computational efficiency (GFLOPs), YOLOv8-mini demonstrated lower values compared to SSD, CenterNet, YOLOv7, PRC-Light YOLO, and YOLOv8m, though it remained higher than EfficientDet. In terms of precision (P), YOLOv8-mini outperformed SSD, CenterNet, YOLOv7, PRC-Light YOLO, and EfficientDet, with only a marginal decrease of 1.2% compared to YOLOv8m. This minimal trade-off in precision is well-compensated by the significant improvements in other performance metrics.

As shown in table 3, YOLOv8-mini has demonstrated excellent performance across most fabric defect categories, with detection accuracy consistently maintaining high levels above 90%. The model achieves particularly impressive results of 99.5% accuracy when detecting challenging defect targets such as subtle oil stains, stains, retouching, and yellow stains, which validates its exceptional capabilities in fabric defect detection. However, we observe a notably lower detection accuracy of 77.3% for the 'scarf' defect category. This lower performance can be attributed to several factors. First, scarf defects typically present complex geometric patterns that vary significantly in size and orientation, making them more challenging to detect consistently. Second, these defects often share visual similarities with normal fabric texture variations, particularly in certain lighting conditions, which can lead to false negatives. Additionally, our analysis suggests that the relative scarcity of scarf defect samples in the training dataset may have limited the model's ability to learn robust features for this specific category. Despite this limitation in scarf defect detection, the overall performance of YOLOv8-mini remains strong, supported by its key architectural components. The CSPPC module effectively reduces feature map redundancy, while the MLCA attention mechanism enhances the recognition ability of non-obvious defects. Through BiFPN feature fusion, the detection accuracy is further improved, demonstrating the model's successful balance of lightweight design and high precision performance.

Table 3. YOLOv8-mini partial fabric detection mAP50.

Specie	mAP50	Specie	mAP50	Specie	mAP50
Hole cleaning	96.4	Oil stain	99.5	Capillus	95.5
Darts	90.3	Hole	99.5	Back edge	99.5
Picking	90.3	Hair spot	94.2	Yellow stain	99.5
Thin tissue	95.6	Stain	99.5	Tight yarn	99.5
Suspending warp	93.2	Bow yarn	90.8	Shear hole	99.5
Scarf	77.3	Retouching	99.5	Mispick	99.5

Figure 6 shows the prediction results. Except for the detection effect of puncture, which is 0.8, the detection accuracy of other fabrics is as high as 0.9.

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Results from YOLOv8-mini.
Download figure:
Standard image High-resolution image

5.2. Ablation study

To understand the impact of the CSPPC module in our proposed method, we conducted an ablation study where the C2f module was replaced by the CSPPC module in the Neck part. We aimed to evaluate how this substitution affects model complexity and detection accuracy. The experiments were conducted under the same setup as before, with the only difference being the module replacement.

We assessed the model performance using metrics such as Params, GFLOPs, and mAP50. The results, presented in table 4, show that replacing the CSPPC module with the C2f module reduced model complexity significantly. Params decreased by 22% (from 25.9M to 20.2M), and GFLOPs dropped by 18% (from 79.3G to 65.0G). However, this reduction in complexity led to a slight decrease in detection accuracy, with mAP50 increasing by just 1%.

Table 4. CSPPC comparison experiment.

Exp	Para/M	GFLOP/G	mAP50/%
YOLOv8m	25.9	79.3	89.3
YOLOv8m+CSPPC	20.2	65.0	90.3

The ablation study indicates that the CSPPC module is effective in reducing model complexity without a significant loss in accuracy. While the reduction in Params and GFLOPs demonstrates the efficiency of the CSPPC module in feature fusion, the slight accuracy decrease suggests a potential trade-off between complexity and performance.

Figures 7 and 8 provide additional insights into the impact of the CSPPC module. The training prediction effects in figure 7 highlight the influence of the CSPPC module on detection accuracy and model performance and the performance evaluation in figure 8. To further evaluate the effectiveness of the CSPPC module, we compared its performance with the C2f module regarding detection accuracy. As shown in tables 5 and 6, the CSPPC module improved detection accuracy for most fabric defects, such as retouching and pelleting, with mAP50 increases of 41.5% and 74.9%, respectively. However, exceptions were noted in the cases of lifting and cutting hole defects, where mAP50 decreased by 11.2% and 22.1%, respectively. These results demonstrate that the CSPPC module captures more detailed features, improving accuracy for most defect types.

Figure 7. Refer to the following caption and surrounding text. — **Figure 7.** Detection result.
Download figure:
Standard image High-resolution image

Figure 8. Refer to the following caption and surrounding text. — **Figure 8.** Performance evaluation.
Download figure:
Standard image High-resolution image

Table 5. Improve the first part of the fabric detection accuracy.

Specie	mAP50	Specie	mAP50	Specie	mAP50
Hole cleaning	93.1	Oil stain	99.5	Capillus	24.9
Darts	87.6	Hole	99.5	Back edge	99.5
Picking	96.3	Hair spot	85.7	Yellow stain	94.1
Thin tissue	95.3	Stain	98.2	Tight yarn	99.5
Suspending warp	90.4	Bow yarn	91.8	Shear hole	84.0
Scarf	68.0	Retouching	58.0	Mispick	99.5

Table 6. The improved detection accuracy of some fabrics.

Specie	mAP50	Specie	mAP50	Specie	mAP50
Hole cleaning	93.4	Oil stain	99.5	Capillus	99.5
Darts	84.4	Hole	99.5	Back edge	99.5
Picking	85.1	Hair spot	83.3	Yellow stain	99.5
Thin tissue	92.2	Stain	99.4	Tight yarn	99.5
Suspending warp	89.0	Bow yarn	83.5	Shear hole	61.9
Scarf	64.1	Retouching	99.5	Mispick	99.5

The performance evaluation results are presented in figure 8. Although the convergence speed has slowed down after introducing the CSPPC module, the two key performance indicators mAP50 and precision have been significantly improved.

Figure 8 visually shows that after replacing the C2f module with the CSPPC module, the detection effect of the suspension shows a significant upward trend. These results verify the effectiveness of the CSPPC module in improving the detection accuracy.

All this is due to the perfect combination of CSP and PConv in the CSPPC module, which not only reduces the calculation amount, but also improves the detection accuracy. CSP enhances the learning capability of the network, improves the diversity of features, and reduces the amount of computation. PConv, on the other hand, reduces the redundant computation and memory access, improves the operation efficiency of the model, and reduces the computation cost to a certain extent. Although C2f module has excellent performance in feature fusion, it can effectively integrate different levels of feature information, making its detection accuracy superior to CSPPC module in large target defect detection. However, C2f module can not improve efficiency by reducing redundant computation and memory access as CSPPC module, and its performance in small target detection is not as good as CSPPC module. Although the overall performance is good when CSPPC module is replaced by C2f module in Neck part, the performance deteriorates when CSPPC module is replaced by C2f module in Backbone part.

As can be seen from tables 7–9, mAP50 of the model increased by 0.9% after CA attention mechanism was added. However, compared to the CA attention mechanism, the MLCA attention mechanism not only improved mAP50 by 1.4%, but also showed higher detection accuracy in other fabric types except thin and wool. Although the CA attention mechanism improves model performance, it also has obvious disadvantages. Compared with MLCA attention mechanism, firstly, its computational complexity is higher, especially when processing large-scale data or complex tasks, which may consume a lot of computational resources. Second, the CA attention mechanism becomes less efficient when dealing with long sequences, because the overhead of computing positional correlations in long sequences increases significantly. Compared with the CA attention mechanism, the MLCA attention mechanism shows a more comprehensive ability to process spatial information. Although the CA attention mechanism has successfully integrated location information into the channel attention mechanism, it may still be limited in the spatial dimension of feature capture. In contrast, the MLCA attention mechanism combines local average pooling and global average pooling to extract key information from the input feature map more comprehensively, focusing on both local details and capturing the global context. In addition, MLCA attention mechanism can effectively improve the performance of the model without significantly increasing the computational burden.

Table 7. The comparison of different attention mechanisms.

Exp	Para/M	GFLOP/G	Map50/%
YOLOv8m	25.9	79.3	89.3
YOLOv8m+CA	25.8	79.2	90.2
YOLOv8m+MLCA	25.8	79.1	90.7

Table 8. The CA attention mechanism parts of the fabric detection accuracy.

Specie	mAP50	Specie	mAP50	Specie	mAP50
Hole cleaning	93.7	Oil stain	99.5	Capillus	99.5
Darts	87.4	Hole	99.5	Back edge	99.5
Picking	87.6	Hair spot	87.2	Yellow stain	99.5
Thin tissue	96.6	Stain	99.4	Tight yarn	99.5
Suspending warp	90.7	Bow yarn	94.6	Shear hole	51.5
Scarf	52.7	Retouching	52.0	Mispick	99.5

Table 9. The MLCA attention mechanism parts of the fabric detection accuracy.

Specie	mAP50	Specie	mAP50	Specie	mAP50
Hole cleaning	95.6	Oil stain	99.5	Capillus	99.7
Darts	90.1	Hole	99.5	Back edge	99.5
Picking	90.8	Hair spot	95.0	Yellow stain	99.5
Thin tissue	96.1	Stain	99.5	Tight yarn	99.5
Suspending warp	92.9	Bow yarn	89.8	Shear hole	94.8
Scarf	63.7	Retouching	99.5	Mispick	99.5

In the Head part of YOLOv8, CA attention mechanism and MLCA attention mechanism are respectively referenced to conduct a series of experiments. Figure 9 is their training prediction effect diagram respectively, where (a) adding CA attention mechanism and (b) adding MLCA attention mechanism, and figure 10 is the performance evaluation diagram. It can be seen that the training process has been successfully converged, and the detection effect of CA attention mechanism and MLCA attention mechanism is not different.

Figure 9. Refer to the following caption and surrounding text. — **Figure 9.** Detection result.
Download figure:
Standard image High-resolution image

Figure 10. Refer to the following caption and surrounding text. — **Figure 10.** Performance evaluation.
Download figure:
Standard image High-resolution image

Replacing the PAN module with the BiFPN module in the Neck part improved model performance. As shown in table 10, mAP50 increased to 91%, Params decreased to 23.8M, and GFLOPs rose slightly to 80.2G. This improvement is primarily due to BiFPN's bidirectional feature flow, which optimizes feature fusion across levels.

Table 10. Comparison after BiFPN improvement.

Exp	Para/M	GFLOP/G	mAP50/%
YOLOv8m	25.9	79.3	89.3
YOLOv8m+BiFPN	23.8	80.2	91.0

It is obvious from the data in table 11 that after BiFPN is adopted, the detection performance of small targets such as hair particles and hair spots has been significantly improved. This improvement effectively enhances the ability of the model to capture small size targets, thus improving the accuracy and efficiency of the overall detection.

Table 11. The improved detection accuracy of some fabrics.

Specie	mAP50	Specie	mAP50	Specie	mAP50
Hole cleaning	94.9	Oil stain	99.5	Capillus	95.5
Darts	89.1	Hole	99.5	Back edge	99.5
Picking	89.2	Hair spot	92.7	Yellow stain	99.5
Thin tissue	96.0	Stain	99.5	Tight yarn	99.5
Suspending warp	93.3	Bow yarn	91.2	Shear hole	99.5
Scarf	67.7	Retouching	99.5	Mispick	99.5

The final training results not only converge successfully, but also show higher mAP50 and MAP50-95. Further, it can be seen from figure 11 that after BiFPN structure is adopted, the prediction effect of the model on the fabric remains above 0.8 during the second training. In view of the lightweight BiFPN structure adopted in the Neck part, the model did not achieve the ideal convergence effect quickly during the first training due to the lightweight characteristics of the structure, as shown in figure 12(a). In order to optimize the training process, it was decided to make a strategic adjustment in the second training: the model that performed best in the first training was used as the pre-training weight, and on this basis, the training continued for 150 rounds. After such adjustment, the training effect of the model has been significantly improved, as shown in figure 12(b). Among them, the prediction accuracy of the defects such as hanging warp, scraping hole and hair spot is as high as 0.9.

Figure 11. Refer to the following caption and surrounding text. — **Figure 11.** Result after BiFPN improvement.
Download figure:
Standard image High-resolution image

Figure 12. Refer to the following caption and surrounding text. — **Figure 12.** Performance evaluation.
Download figure:
Standard image High-resolution image

6. Conclusion

This study proposes a defect detection model named YOLOv8-mini to address three critical challenges: (1) large model parameter size, (2) heavy computational load, and (3) low detection accuracy. Based on PConv, the CSPPC module is proposed, replacing some convolutional layers to achieve network lightweighting. Additionally, the model employs the MLCA attention mechanism, which integrates local average pooling and global average pooling to reduce computational costs while improving detection accuracy. During the feature fusion stage, the BiFPN technique is adopted, which effectively reduces the number of model parameters and enhances detection accuracy without significantly increasing computational burden. Experimental results demonstrate that compared to YOLOv8m, our model achieves: (1) a 27.4% reduction in Params, (2) a 17.9% reduction in GFLOPs, and (3) a 4% improvement in mAP50.

Our work has significant implications for the textile industry, where fabric defect detection is crucial for quality control. The proposed model provides a new solution for efficient and accurate defect detection, with potential applications in various industrial settings.

While our model achieves excellent results, future research directions should focus on several key areas. First, enhancing the detection ability of YOLOv8-mini for large target defects remains a priority. Second, further computational optimization could be explored through specialized training frameworks. Recent work by [35] demonstrates how hierarchical and progressive learning strategies can improve model focus on key information without increasing computational burden, suggesting potential alternative approaches to our attention-based method. Third, additional optimization approaches could be investigated, including optimizing model structure, adopting more efficient algorithms, or introducing other technical means to improve detection efficiency.

Acknowledgments

This work is supported by the Excellent Youth Science and Technology Talents Project of Huizhou City (2023EQ050038), Huizhou University Professor and Doctor Launch Project (2022JB006), the Open Projects Program of State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS2024116), and the Key Construction Discipline Research Project of Guangdong Province (2022ZDJS056) and Key research platforms and projects of Universities in Guangdong Province (2023KCXTD036, 2024GCZX009, 2021ZDZX1012).

Data availability statement

No new data were created or analysed in this study.