Automatic stent struts detection in optical coherence tomography based on a multiple attention convolutional model

Objective. Intravascular optical coherence tomography is a useful tool to assess stent adherence and dilation, thus guiding percutaneous coronary intervention and minimizing the risk of surgery. However, each pull-back OCT images may contain thousands of stent struts, which are tiny and dense, making manual stent labeling slow and costly for medical resources. Approach. This paper proposed a multiple attention convolutional model for automatic stent struts detection of OCT images. Multiple attention mechanisms were utilized to strengthen the feature extraction and feature fusion capabilities. In addition, to precisely detect tiny stent struts, the model integrated multiple anchor frames to predict targets in the output. Main results. The model was trained in 4625 frames OCT images of 37 patients and tested in 1156 frames OCT images of 9 patients, and achieved a precision of 0.9790 and a recall of 0.9541, which were significantly better than mainstream convolutional models. In terms of detection speed, the model achieved 25.2 ms per image. OCT images from different collection systems, collection times, and challenging scenarios were experimentally tested, and the model demonstrated stable robustness, achieving precision and recall higher than 0.9630. Meanwhile, clear 3D construction of the stent was achieved. Significance. In conclusion, the proposed model solves the problems of slow manual analysis and occupying a large amount of medical manpower resources. It enhances the detection efficiency of tiny and dense stent struts, thus facilitating the application of OCT quantitative analysis in real clinical scenarios.


Introduction
Coronary atherosclerotic heart disease is characterized by stenosis or obstruction of the vascular lumen due to coronary atherosclerosis, resulting in myocardial ischemia, hypoxia, or necrosis (Spînu et al 2022).It is one of the leading causes of death worldwide (Lloyd-Jones et al 2023).The use of IVOCT system has become widespread in the diagnosis of coronary artery disease (Tian et al 2023).The IVOCT system separates the axial resolution from the lateral resolution, greatly improving the image resolution (Huang et al 2022).The axial resolution, typically less than 20 μm, depends on factors such as the wavelength, bandwidth and dispersion of the light source.The lateral resolution is determined by the imaging catheter and typically ranges from 20 to 60 μm.In the current stage of diagnosing coronary artery disease, researchers have conducted extensive studies on vascular boundaries, detection of coronary artery lesions, and assessment of coronary artery stenosis.For instance, Zhi et al (2023) proposed a bilateral collaboration learning approach for vessel contour detection in intracoronary images to address the differences between OCT and IVUS domains.Lee et al (2020), Sun et al (2022), Li et al (2022a), Celi et al (2014) and Wang et al (2023) used deep learning methods to classify and segment lesions in coronary arteries.They also conducted quantitative evaluations of the segmented lesion areas, including measurements of area, depth, and angles.The coronary artery stenosis was graded based on this quantitative assessment of the lesions.
In addition, percutaneous coronary intervention (PCI) is currently an effective method in the treatment of coronary artery disease, which is typically performed in conjunction with stent implantation to open narrowed blood vessels and reduce the recurrence of vessel blockage after treatment (Buccheri et al 2016).OCT images play a crucial role in accurately assessing stent attachment and neointimal coverage, thereby enhancing the accuracy of stent detection.Clinically, multiple hundreds of images which contain thousands of tiny and dense sent struts are generated for each OCT pullback.Hence, manual OCT images quantification is a timeconsuming and cumbersome task, susceptible to the differences between and within observers.Automatic stent struts detection is necessary to provide quantitative data of stent struts within the intraoperative time.
At present, coronary stent struts detection has been extensively studied using both traditional algorithms (Xu et al 2011, Wang et al 2012, 2014, Cao et al 2018a) and deep learning techniques (Zhou et al 2019, Wu et al 2020, Huang et al 2021, Yang et al 2021, Yu et al 2021).The stent types include bare metal stents (BMS), drugeluting stents (DES), or bioabsorbable vessels stent (BVS).In the OCT images, these stents appear as highly reflective points with shadows behind them at cross-sections due to near-infrared light's inability to penetrate the metal.Traditional algorithms are mostly based on these features to detect stent struts.Xu et al (2011) developed an automatic algorithm to locate deeply buried stent struts and quantify the restenosis burden.The technique was based on an improved steerable filter for computing the local ridge strength and orientation of OCT images and an ellipsoid fitting algorithm and continuity criteria for obtaining globally optimal stent localization.Wang et al (2012) proposed a global intensity distribution-based method using Prewitt compass filters to detect stent struts, which showed convincing performance.Wang et al (2014) detected and measured bioresorbable vascular scaffolds stents based on the black core region in baseline and subsequent OCT image sequences.Their methods achieved a detection rate of 93.7% and a false positive rate of 1.8% for 4691 BVS stent struts.Cao et al (2018a) used an Adaboost-trained cascade classifier to identify a region of interest (ROI) for each stent strut that fully encompasses it.To segment the pillar boundary within the ROI, they used the dynamic programming method, and automatically performed dislocation analysis based on the boundary segmentation results.The method was tested on 7 pull-back tests involving 5821 BVS stent struts, achieving a precision of 91.5% and a false positive rate of 12.1%.Traditional algorithms can detect the position information of stent pillars, while automatic detection methods with fixed thresholds have greater uncertainty and are difficult to meet the accuracy and recall rate indicators of stent detection in the medical field.Therefore, in recent years, academic researchers have been adopting deep learning methods to avoid the problem.
Object detection and pattern recognition have made significant progress with the rapid development of deep learning.Deep learning has been widely used in intelligent detection and diagnosis based on medical images.Deep learning-based stent struts detection and analysis have demonstrated higher accuracy and robustness compared with traditional methods.Zhou et al (2019) proposed an automatic detection method for BVS based on U-shaped convolutional neural networks (U-net).Wu et al (2020) used deep convolutional neural networks (CNN) to detect and segment intracoronary stent struts in OCT images.They improved the probability of stent struts detection and mitigated the impact of spatial information loss on tiny stent regions during feature extraction.Huang et al (2021) proposed a weakly supervised fusion U-Net method based on convolutional attention mechanisms and dilated convolution to separate the contour of OCT bioabsorbable stents.Their approach obtained better results than U-net, FCN, and SegNet.Yu et al (2021) proposed hybrid algorithms (U-Dense and U-Mobile) by adapting the standard U-Net, MobileNetV2 and DenseNet121 for the segmentation of both metal stents and BVS from intravascular OCT pullback images.Yang et al (2021) developed deep learning methods to automatically analyze stent struts with both thin (0.3 mm) and very thick tissue coverage (>0.3 mm), and to accurately analyze stent struts area for vessels with multiple stents.Except for the above mentioned deep learning algorithms which are two-stage algorithms, one-stage deep learning-based object detection algorithms including YOLO (You Only Look Once) (Redmon et al 2016, Redmon and Farhadi 2017, 2018, Bochkovskiy et al 2020, Wang et al 2022, Li et al 2022b, 2023a, Li et al 2023b), DETR (Zhu et al 2021a), RetinaNet (Lin et al 2017), and EfficientDet (Tan et al 2020) have exhibited low complexity and high speed.Especially, the YOLO use anchor boxes to combine classification with target location, achieving higher efficiency, flexibility, and good generalization performance, have played an important role in object detection.YOLOv5 is one of the most widely applied algorithms in the YOLO serials for target detection, and it also exhibits excellent performance in terms of detection speed and small target detection.The detection of coronary artery stents requires high precision and recall, but YOLOV5's performance in terms of recall is not ideal.Therefore, we made modifications to improve its precision and recall.
In this paper, we proposed a multiple attention convolutional model with a similar hierarchy to YOLOv5 consisting of Backbone, Neck and Head parts for automatic stent struts detection.To enhance the detection of tiny and dense stent struts, we introduced the squeeze and excitation (SE) attention mechanism module in the  (Zhu et al 2021a).In terms of detection speed, our model achieved 25.2 ms per image, which is faster than Mask-RCNN, Faster-RCNN, YOLOv3, DETR, YOLOv7 and YOLOv8 and slower than YOLOv5 and YOLOv6.At the same time, the ablation experimental results showed that increasing the detection head and introducing attention mechanisms could improve the detection performance.In addition, our model exhibited good and stable performance in OCT images collected in different systems, challenging scenarios, and collection times.To thoroughly validate our approach, we compared its algorithmic performance with three experienced analysts and several other mainstream methods.We also evaluated the accuracy of quantitative measurements of stent struts abscissa and ordinate.important regions of feature maps while ignoring relatively unimportant areas.This enables the network to better handle complex visual scenes and improve its detection performance.Finally, the model integrated multiple anchor frames to predict stent struts in the Head.

Backbone modifications
The Backbone of the multiple attention convolutional model first applies a Focus layer (shown in figure 1(b)) which uses tensor reshaping slice to reduce space resolution and increase the network depth.It then employs CBS convolution layers and C3_x modules to stack features and extract deep-level features.The CBS convolution layer (shown in figure 1(c)) includes convolution operation (Conv2d), batch normalization (BN), and Silu activation function to fully map target features and improve the non-linear fitting ability of the detection model.The C3_x module uses CBS and Bottleneck (shown in figure 1(d)) to gradually extract intralayer features and stack deeper networks, as shown in figure 1(e), in which x represents the number of the Bottleneck.The Bottleneck module adopts a residual structure to extract features, allowing to increase in the network width and eliminate redundant gradient information.The SPP module (shown in figure 1(f)) uses serial max pooling (Maxpool) operations to further separate out the most significant context features and expand the model's receptive field.Finally, we incorporated a SE attention module to extract important channel information.
The SE attention mechanism allows the model to pay attention to important local features and ignore other irrelevant feature information in the image by giving different weights to local features.Therefore, the introduction of attention mechanisms in deep learning networks simplifies the model and speed up calculations.The SE mechanism is a channel attention module which consists of two main parts: SE, as shown in figure 2. The squeeze operation compresses the feature map into a vector through global average pooling (GAP) to obtain z c which is defined as follows where u c is input feature map, z c is output vector of squeeze operation, and c represents the channel.The excitation operation includes two fully connected layers (fully connected) and two activation functions, rectified linear unit function (ReLU) and sigmoid.The SE module calculates the weight value of each channel, which is then assigned to the corresponding channel to extract channel attention.The formula is as follows where s denotes the Sigmoid function,  ´are learnable weight matrix, r is a scaling parameter with a size of 16.X c ~is the output of the SE Module.

Neck modifications
The Neck component is made up of an advanced feature fusion structure that combines Path Aggregation Network (PANet) (Liu et al 2018).PANet is a top-down and bottom-up two-way fusion network.Top-down network merges high-level context feature and low-level positional features, and predicts multi-scale features independently, resulting in a significant improvement in detecting small objects.Bottom-up feature pyramid structure preserves shallower positional features, further boosting the model's overall feature extraction ability.
To more effectively fuse the multi-scale features extracted by the Backbone, CBAM and SPP were incorporated into the Neck module to identify areas of interest in the image, as illustrated in figure 1.In the modified top-down network, CBAM was added after C3_6, while in the modified Bottom-up, SPP and CBAM were added before and after C3_6, respectively.The SPP layer helped convert feature maps of varying sizes into fixed-sized feature vectors.The feature vectors were obtained through maximum pooling of the feature maps and concatenated to minimize information loss.
The CBAM is a simple but effective hybrid attention module that combines the channel attention module and the spatial attention module to create a sequential attention structure from channel to space, as shown in figure 3. The channel attention module computes the average and maximum values of channels in the input image by GAP and global max pooling (GMP), respectively.The output results are cascaded into multi-layer perceptron (MLP) layer and then transferred to Sigmoid activation function layers to obtain the channel characteristics information M F c ( )defined as follows where s denotes the sigmoid function, the MLP weights, W 0 and W 1 are shared for both inputs and the ReLU activation function is followed by W . 0 Where Ä denotes element-wise multiplication, F¢ represents output of the channel attention module.The spatial attention module aggregates feature information from the channel attention module by using the channel pooling operation (Channel Pool) layer which is composed of the GAP and GMP of channels.It then uses 7 × 7 convolutional layer (7 × 7 Conv) to extract spatial feature information, followed by a Sigmoid activation function.This process allows the model to focus on relevant regions in the image, further improving the model's ability to identify and localize objects of interest.The formula is as follows where s denotes the sigmoid function, f 7 7 ´represents a convolution operation with the filter size of 7 ×7 and F is the final refined output.

Head modifications
The detection Head was used to be responsible for detecting the location and category of the object by the features maps extracted from the Backbone and Neck Since the stent struts in OCT images were small, we added an additional detection head with 160 × 160 dimension to improve detection efficiency.The purple dotted box in the Head of figure 1 showed the location of the additional detection head, which was generated from low-layer and high-resolution feature maps and was more sensitive to tiny stent struts.Increasing the detection head improved the detection performance of small and dense objects.

Experimental data
The OCT images used in this study consist of 12 466 frames of images from 46 patients.These 12 466 frame images include images without stent struts and images with stent struts.We excluded 6685 frames of images without stent struts, and used 5781 frames of images with stent struts to construct the dataset.The images in the dataset were labeled by three experienced clinical experts.The dataset was divided into the training and test sets using an 8:2 ratio, with 4625 images of 37 patients in the training set and 1156 images of 9 patients in the test set.The OCT images were collected from the Chinese PLA General Hospital (Beijing, China) and Jinling Hospital (Nanjing, China) using the OCT systems from F1 (Nanjing Forssmann Medical Technology Co., Ltd, China) and C7-XR TM (Light Lab Imaging/St.Jude Medical, Westford, MA), respectively.The F1 system has an axial resolution of less than 20 μm and a lateral resolution of less than 100 μm, while the C7-XR TM system has an axial

Implementation details
We used an NVIDIA-RTX3060 GPU for both training and testing the model.To ensure stable detection performance, we trained each model for 300 epochs using the Adam optimizer under the initial learning rate of 0.001.The input image was set to 640 × 640 pixels, allowing us to set the maximum batch size to 8. In the design of our model, most parameters, including kernel size and filter number, were referred to the parameter design of THP-YOLOv5 proposed by Zhu et al (2021b).We conducted five sets of experiments in this study.Experiment 1 compared the detection performance of our model with existing mainstream models in our dataset.Experiment 2 was an ablation experiment that verified the performance improvement resulting from the model modifications.Experiment 3 compared the detection performance of images collected from different OCT systems using our proposed model.Experiment 4 evaluated the model's performance in the detection images during the patient's post-PCI and follow-up.Lastly, Experiment 5 tested the model in different challenging scenarios.

Objective evaluation indicators of experimental results
In the field of object detection, the commonly used objective evaluation indicators include Precision, Recall, and F1-score, as shown in equations (8)-( 10).Precision represents the percentage of correctly detected stent struts among all predicted stent struts, while Recall reflects the percentage of correctly detected stent struts among all actual stent struts.F1-score is a composite metric that combines precision and recall in a mutually constraining way.

Precision TP TP FP
where, TP, FP, and FN refer to the true positive, false positive, and false negative values, respectively.TP is the number of correctly predicted stent struts, while FP is the number of non-stent struts that are incorrectly predicted as stent struts.FN is the number of stent struts that are not detected by the model.IOU is the Intersection over Union between the predicted stent struts region and the manually labeled stent struts region.According to Redmon et al (2018), Bochkovskiy et al (2020), and Li et al (2023a), we set the IOU threshold as 0.5.Therefore, when the IOU is higher than 0.5, the predicted stent struts will be judged as the correct stent struts.

Comparative experiments
We trained various advanced network models and ablation models in our dataset and evaluated their performance in the test set.The comparative models included two-stage networks like Mask-RCNN and Faster-RCNN, and one-stage networks like YOLOv3 and YOLOv5, DETR, YOLOv6, YOLOv7 and YOLOv8.Table 1 demonstrated the superior performance of our model with a detection precision of 0.9790, a recall of 0.9541, and an F1-score of 0.9664 compared with other models.Our model achieved a detection speed of 25.2 ms per image, which is only slower than YOLOv5 and YOLOv6 models.
The ablation experiments compared the performance of five models: YOLOv5, YOLOv5 with an additional detection head, YOLOv5 with an SE in Backbone, and YOLOv5 with CBAMs in Neck.The results were shown in table 2. Considering the Precision, Recall and F1-score, the experiments results showed that increasing the detection head and introducing attention mechanisms improved detection performance.
YOLOv5 was enhanced by adding a detection head.In comparison to the original YOLOv5, the recall of YOLOv5+1D did not improve, but the precision increased by 1.09%.Experimental results demonstrated that adding a detection head at the large-scale detection layer of 160 × 160 made the model more sensitive to detecting small stent struts, improving the detection precision of small stent struts detection.Furthermore, SE and CBAM attention mechanisms were separately incorporated into YOLOv5.Compared with the original YOLOv5, YOLOv5 added by the SE attention mechanism was improved by 0.24% on precision and 0.92% on recall, and YOLOv5 added by the CBAM attention mechanism was improved by 0.14% on precision and 0.9% on recall.These experiments confirmed that the introduction of both attention mechanisms enhanced the performance of the model, especially the performance of recall.Our model has increased by 1.34% on precision and 1.79% on recall compared with the original YOLOv5.The precision was increased to 0.9790, the recall was increased to 0.9541, and the F1-score was increased to 0.9664.In terms of detection speed, the increase in complexity of our model architecture resulted in a slower detection speed.Our model achieved detection time of 25.2 ms per image which is 9.4 ms slower than YOLOv5.

Evaluation of test results
The OCT images test sets were divided based on various factors such as OCT systems, collection times, and challenging scenarios.The systems included F1 and C7-XR TM .The collection time was divided into two categories: post-PCI and follow-up.Challenging scenarios were classified into five categories: stent over sidebranch, stent thrombosis, severe stent malapposition, overlapping stents, and residual blood.This categorization enabled the evaluation of the model's performance in different scenarios, providing a comprehensive understanding of its robustness and scalability.In table 3, we presented the model test results of OCT images obtained from different systems of two hospitals.The testing performance of Chinese PLA General Hospital was found to be superior to that of Jinling Hospital.Results from both clinical centers showed homogeneously high precision (>0.961), recall (>0.978) and F1-score (>0.970).
Table 4 showed the performance of different models in images collected at different stent implantation times in independent detection.The proposed model achieved optimal performance at both post-PCI and follow-up scenarios.In all models except for YOLOv5, the detection performance of post-PCI of other models was better than that of follow-up.
Table 5 presented the results of the subgroup analysis for challenging scenarios, including stent over sidebranch, stent thrombosis, severe stent malapposition, overlapping stents, and residual blood.The analysis indicated that the model's performance in these scenarios was comparable to that in general datasets.The maximum precision for detecting severe stent malapposition is 0.9752.Among the challenging scenarios, the most favorable scenario is residual blood, with precision, recall, and F1-scores of 0.9720, 0.9817, and 0.9768, respectively.
Then, we also tested the model's performance on different kinds of images, and the detection and recognition results were shown in figure 5. Figures 5(a  systems, detection times, and challenging scenarios, but there were still some cases of missed detection, as shown in figures 5(h)-(g).The coloured arrows point the missing stent struts.The missing stent struts has characteristics of lower brightness or similar brightness to the tissue, resulting in unclear outline of stent struts, which increased the difficulty of stent struts detection.In experimental validation, only a few of the tens of thousands of stent struts showed instances of omission.
We calculated the correlation between the center positions of auto-detected stents and manually labeled stents.These experiments normalized the image size.A strong correlation was achieved between automatic analysis and manual analysis of the stent struts abscissa (r = 0.9603, p < 0.0001, figure 6(a)).The Bland-Altman plot showed overall good consistency, but there was a slight deviation between the algorithm and the analyst (95% limit agreement was 0.00097 ± 0.05021, figure 6(b)).Our algorithm also showed good correlation (r = 0.9870, p < 0.0001, figure 6(c)) and overall good consistency (95% limit agreement was −0.00209 ± 0.03467, figure 6(d)) for the stent struts ordinate.

3D reconstruction of stent
In OCT system, the catheter contains an optical fiber that transmits near-infrared light and focuses on the blood vessel wall.The catheter's rotary scanning enables 360-degree imaging, and pulling back the optical fiber captures continuous blood vessel images.We used the frame sequence of the image as the Z coordinate and the center position of the stent struts as the abscissa and ordinate coordinates to perform 3D reconstruction of the stent.The comparision of 3D reconstruction of stent localization in OCT images by our model and experts manual were shown in figure 7. Figure 7(a) showed longitudinal section view of a coronary artery with stent implanted, and figure 7(b) showed the split view of stent center.By using 3D reconstruction of the stent, it was possible to clearly determine the expansion status of the stent and whether any abnormalities occurred.This model exhibits a high degree of structural similarity between the 3D reconstruction of the stent and the manually annotated stent.Some slight differences were indicated with the blue arrows.

Discussion
In this study, we proposed a multiple attention convolutional model for coronary artery stent recognition and detection in OCT images.The model was based on YOLOv5 and modified to address the challenges of detecting tiny coronary artery stent struts.Specifically, we designed multiple prediction heads to automatically extract the best features for annotating stent struts.Additionally, we applied SE and CBAM attention mechanisms to extract and fuse stent struts features in the images for more accurate stent struts detection.We trained our proposed model using a large-scale training dataset.The test results on the testing set demonstrated that the proposed model had satisfactory detection performance and outperformed other mainstream CNN-based methods.The ablation experiments demonstrated that all proposed steps were effective in improving the Precision, Recall and F1-score of stent struts detection.As we increased the complexity of our model, the detection speed of 25.2 ms per image was slower than YOLOv5 and YOLOv6, but faster than the speeds of other models listed in table 1.In   the correlation analysis of center position between automatic detection stent struts and manual labelling stent sturts, the results were in good agreement for both abscissa and ordinate.
Although our model achieved high detection performance, but we found a few instances of missed stent struts during extensive data validation, as shown in figures 5(h), (i), and (j).The missing stent struts has characteristics of lower brightness or similar brightness to the tissue, resulting in unclear outline of stent struts, which prevents achieving the desired IOU value for stent annotation.From the analysis in table 4, the detection performance of follow-up is inferior to that of post-PCI.The main reason is that the follow-up detection images contain more stent struts with similar brightness to the tissue compared with post-PCI.In the future, we plan to solve the above limitation by designing a contextual feature extraction module and adding images containing the stent struts with lower brightness or similar brightness to the tissue to the dataset.
Traditional detection methods based on stent features The outcomes in table 1 revealed that YOLOv3 exhibited a detection precision of 0.9709, approximately 1% higher than Jiang et al (2020), and a recall of 0.8807, approximately 7% lower than Jiang et al.When considering the overall performance, YOLOv3 in our dataset exhibited lower performance than that in the dataset of Jiang et al (2020).
In addition, we found that different OCT systems had similar OCT image precision, but the images acquired by Chinese PLA General Hospital' using F1 system had better recall and F1-score than those acquired by Jinling Hospital using C7-XR TM system.We then performed stent struts detection in challenging scenarios, including stent over side-branch, stent thrombosis, severe stent malapposition, overlapping stents, and residual blood.We observed that these challenging scenarios could all affect the imaging characteristics of the stent struts, resulting in poorer stent struts detection performance.In addition, the detection results showed that although the model may miss some stent struts in the images, the overall detection performance was stable.In OCT images, a fanshaped shadow caused by the catheter can hide some stent struts.During the OCT image acquisition process, there was continuous feature information between frames.Our model only focused on learning features from individual frames, thus failing to fully utilized the continuity between frames for feature learning.In the future, we plan to incorporate 3D convolutions to capture temporal features for predicting occluded stents and to enhance feature learning between frames.
Efficient and timely stent detection can evaluate the effectiveness of stent placement, and provide medical professionals with reference information for pathological diagnosis and minimize the risk of stent failure.Although our proposed method has some limitations, we believe that our work has great potential to facilitate personalized treatment of precise PCI and stent failure cases.

Conclusion
In this paper, we proposed a multiple attention convolutional model based on YOLOv5 for the automatic detection of coronary stent.We improved feature extraction and image fusion by adding attention mechanisms into the Backbone and Neck, respectively.An additional detection head was also incorporated to improve the performance of stent struts detection and model stability.We tested different networks in the same dataset and validated the effectiveness of improving model performance with prediction head and attention mechanisms.
Our model achieved significantly better detection performance than mainstream CNN-based models, such as Mask-RCNN, Faster-RCNN, YOLOv3, YOLOv5, YOLOv6, YOLOv7, and YOLOv8.Furthermore, our model performed well in accurately detecting stent struts and localizing detection box markers in OCT images under different systems, collection times, and challenging scenarios.Validation studies demonstrated that our proposed method achieved good accuracy with human analysts in stent detection.This model exhibits a high degree of structural similarity between the 3D reconstruction of the stent and the manually annotated stent.Our proposed model can also be applied to other medical image recognition tasks.In the future, we plan to construct 3D CNNs based on image frame continuity to improve the detection performance of stent.
feature extraction part of our model.This module dynamically adjusts the weight of each channel to enhance the feature extraction ability.In order to improve the image feature fusion of different dimensions and accurately identify the attention regions in the image, we added the convolutional block attention module (CBAM) in the feature fusion part of the model.Additionally, during the prediction part of the model, we added a smaller detection head to detect the tiny stent struts in larger feature maps.The model was trained in 4625 frames OCT images of 37 patients and tested in 1156 frames OCT images of 9 patients, these OCT images were labeled by three experienced clinical experts.Our model achieved a precision of 0.9790 and a recall of 0.9541, outperforming Mask-RCNN (He et al 2017), Faster-RCNN (Ren et al 2015), YOLOv3 (Redmon and Farhadi 2018), YOLOv5 (Li et al 2023a), YOLOv6 (Li et al 2022b), YOLOv7 (Wang et al 2022), YOLOv8 (Li et al 2023b) and DETR In this paper, the multiple attention convolutional model was designed based on YOLOv5, as shown in figure1, which was divided into Backbone for feature extraction, Neck for feature fusion, and Head for classification and regression of detection targets.The purple dot boxes are the modified parts compared to YOLOv5.The attention mechanism (Mnih et al 2014) allows the model to pay attention to important local features and ignore other irrelevant feature information in the image by giving different weights to different local features.We first improved the Backbone network of the model, and added the SE (Hu et al 2020) module at the bottom of the Backbone.The SE attention mechanism introduces a channel attention module, which can dynamically adjust the weight of each channel and select more important channel information, thus enhancing the feature extraction ability.Then the CBAM (Woo et al 2018) was introduced in the Neck.The CBAM attention mechanism weights each channel and spatial position of the feature map, helping the network focus on more

Figure 1 .
Figure 1.Structure of the multiple attention convolutional model.

Figure 2 .
Figure 2. Structure of SE attention module.GAP refer to the global average pooling.

Figure 3 .
Figure 3. Structure of CBAM attention module.GAP and GMP refer to the global average pooling and the global max pooling, respectively.

Figure 4 .
Figure 4. OCT Image preprocessing.(a) OCT image under polar coordinate.(b) OCT image under Cartesian coordinate.(c) Color OCT image rendered using ImageJ.
) and (b) showed the stent testing at post-PCI and followup periods, respectively.Figures5(c)-(g) depicted the results of stent detection in challenging scenarios, such as (c) stent over side branch, (d) stent thrombosis, (e) severe malapposition, (f) overlaying stent, and (g) residual blood due to incomplete flushing.Our model has achieved high detection performance in different detection

Figure 5 .
Figure 5. Stent struts detection results of typical images.(a) Images collected at post-PCI and (b) at follow-up.(c)-(g) Typical images of stent detection in five challenging scenarios: (c)stent over side branch, (d) stent thrombosis, (e) severe malapposition, (f) overlapping stent, and (g) residual blood due to incomplete flushing (h)-(j) missing stent struts detection: coloured arrows point to the missing stent struts.

Figure 7 .
Figure 7. 3D reconstruction of stent.(a) The longitudinal section view of a coronary artery with stent implanted.(b) The split view of stent center.

(
Xu et al 2011, Wang et al 2012, 2014, Cao et al 2018a) achieved automatic detection and segmentation of stent struts, but they were prone to errors when detecting in challenging environments.We compared our proposed algorithm with existing stent strut detection methods based on deep learning in (Jiang et al 2020, Wu et al 2020, Huang et al 2021, Yang et al 2021, Yu et al 2021), as shown in table 6.Our model's detection performance demonstrated a slight decrease in comparison with the R-FCN network model employed by Jiang et al (2020).However, these methods are based on internal datasets, due to there are no publicly available OCT dataset.Meanwhile, the models in Jiang et al (2020), Wu et al (2020), Huang et al (2021), Yang et al (2021) and Yu et al (2021) have no publicly accessible code for us to train in our dataset.We conducted training on our dataset using the public YOLOv3 model (Redmon and Farhadi 2018).

Table 1 .
Performance of the different convolutional detection methods used in the dataset.

Table 2 .
Ablation experiments in the dataset.

Table 3 .
Performance in images from two OCT systems.

Table 4 .
Performance of different models in images collected at different times.

Table 5 .
Performance in images from different challenging scenarios.

Table 6 .
Comparison of the performance between our proposed algorithm and previous studies.