Ultrasound Image Classification of Thyroid Nodules Based on Attention Mechanism

Ultrasound imaging of the thyroid gland is currently the most common diagnostic technique, used for pre diagnosis of benign and malignant thyroid nodules. During the clinical diagnosis, the condition judgment requires the high clinical experience of physicians. The low level of medical care in remote areas has led to an increased rate of misdiagnosis of benign and malignant nodules. To improve the accuracy of diagnosis and treatment effectiveness of doctors, this work proposes a benign and malignant thyroid classification model based on mechanisms of attention. First, this paper performs data cleaning on the collected data and performs pre-processing operations such as data enhancement to construct a new dataset. Then, a network model based on Lightweight Global Attention Module (LGAM) is constructed for the benign and malignant classification of nodules. Finally, it is shown by experimental analysis and comparison that the improved LGAM-based thyroid node classification model has achieved better experimental results on the dataset and can achieve the optimal classification accuracy of 95.16% for the network model proposed in this work.


Introduction
Accurate diagnosis of cancer has always been one of the significant issues in clinical work.Rapid and accurate determination of suspicious lesions is essential to improve the survival rate and treatment efficiency.Thyroid nodule is a disease whose clinical incidence is currently increasing year by year, with about 41-67% of the population suffering from this disease [1].About 7-15% of these thyroid nodes are malignant tumors of the thyroid gland [2].Only a small percentage of the major number of thyroid nodules may be malignant.Therefore, a reliable method is needed to accurately distinguish between benign and malignant tumors.Currently, the conventional methods used by most physicians to discriminate between benign and malignant tumors are to perform biopsy puncture procedures and ultrasonography [3].Among them, ultrasound imaging has become the preferred protocol for thyroid disease examination and nodule screening because of its real-time, radiation-free, and inexpensive features.However, because ultrasound images are affected by echoes and scattered noise, diagnosis of benign and malignant thyroid tumors by doctors in ultrasound images is relatively subjective and strongly depends on the doctor's clinical experience [4].Therefore, computer-aided diagnosis of ultrasound images to distinguish benign and malignant thyroid tumors is very necessary.
In recent years, deep learning techniques have gradually started to be applied in the field of medical image processing.Previously, image processing of ultrasonography was mainly applied to the breast.
Compared with breast ultrasound images, thyroid ultrasound images have more complex echogenic structures and require more feature extraction [5].Therefore, most current methods are less effective when applied to thyroid images.Hence, deep learning-based thyroid nodule classification has once again become a hot research direction [6].Song et al. proposed a cascaded neural network that confirms the feasibility of CNN for the detection and recognition of thyroid nodes [7].Sun et al. proposed an FCN-AlexNet-based thyroid nodule classification method [8].High accuracy rates can be achieved by the above methods [9].However, the above networks have a large structure and high equipment requirements, which cannot meet the real-time requirements in practice [10].
Based on the above analysis, this paper proposes an LGAM-based ultrasound image classification model for thyroid nodules.Compared with existing studies, the model has a smaller structure and stronger real-time performance while ensuring average accuracy.Meanwhile, an improved selfattentive mechanism to capture rich context dependence is incorporated to solve the thyroid nodule classification task.

Construction of the data set
Ultrasound images of 79 patients, with thyroid nodules were collected in a tertiary care hospital from July to September 2022.As shown in Figure 1, the original dataset was the benign and malignant ultrasound images of thyroid nodules collected by specialized physicians.By recording the examination results of 79 patients, a total of 1245 thyroid ultrasound images were obtained.In order to construct a new dataset that can be used for network training, the following manipulation methods will be performed.
Step 1: Data cleaning refers to the process of reexamining and verifying the data.The original hospital ultrasound image dataset was cleaned to remove non-compliant images such as duplicates, no nodules or multiple nodules to obtain 586 usable data in JPG format.
Step 2: Uniform size: Extra markers were removed on the border of thyroid ultrasound images, such as machine model, diagnosis time, patient information, hospital name and other extra markers not related to nodule diagnosis, to avoid the influence of information around ultrasound images on the benign and malignant discrimination of thyroid tumors.The real ultrasound image is intercepted to obtain a square image with a resolution of 640×640 pixels, which is recorded as data set A.
Step 3: Data enhancement: Data enhancement is done on dataset A, including methods such as increasing brightness and contrast, flipping and rotating, to obtain dataset B.
Finally, overall 3516 ultrasound images were obtained in JPG format.The dataset is divided into training and test sets in a ratio of about 9:1.

Construction of LGAM network
As shown in Figure 2, in order to improve the diagnostic rate of the thyroid nodule ultrasonography model while ensuring the accuracy of the model, this paper proposes a lightweight global attention module (LGAM)-based residual network model for nodule classification.Specifically, the first layer C1 contains a convolutional layer with the size of 7 and a max pooling layer with the size of 3. The following layer C2 contains two convolutional layers with the size of 3 and a lightweight global attention module, and a shortcut connection is applied in the module.C3, C4, and C5 have the same structure as C2.Finally, a mean pooling layer and a fully connected layer to finally obtain the nodal benign and malignant classification results.Most existing deep learning-based thyroid nodule classification network models contain more than 50 convolutional layers, and these networks perform deep feature extraction by stacking a large number of convolutional layers, leading to low efficiency [11].Therefore, this paper proposes a lightweight global attention module that models the global context as a way to obtain the dependencies between long-range features.As shown in Figure 3, the original feature map A is divided into three branches to obtain feature maps B, C, and D respectively after a convolutional layer with a convolutional kernel size of 1 × 1. Feature map C is a matrix multiplied with feature map B after an average pooling operation, and then obtained feature map E after a Softmax operation.Feature map D is a matrix multiplied by feature map E after an average pooling operation, and then goes through an upsampling operation.The final attentional feature map F is obtained by superimposing it with the original feature map A in the channel dimension.

Experiment Preparation
The experiments were built on the PyTorch framework, using NVIDIA RTX 3050, Windows 11 system to train the model.The number of first iterations is set to 200, the learning rate is 10-3, and the batch size is set to 16.
Based on commonly used medical image classification indicators, this article uses four classification indicators: accuracy, recall, and F1 score to evaluate the classification results of the model.The higher the scores of the above four indicators, the better the model performance.

Experimental Results
To verify the effectiveness of this paper's network model based on the Lightweight Global Attention Module (LGAM) in classifying thyroid nodules on ultrasonograms, the data set of this paper, which was input to MobilenetV2, ResNet18/34/50 and network model based on Lightweight Global Attention Module, was compared for performance.With consistent parameter settings for the five models, the experimental comparison results of the different models on the data set of this paper are derived as shown in Table 1 1, according to the evaluation index results, with the same parameter settings, the classification of this paper's dataset, this paper uses the LGAM-based network model to classify the best results, with a 5.7% improvement in accuracy and 11.3% improvement in precision compared with MobilenetV2; a 7.8% improvement compared with the ResNet18 model with the lowest recall; the F1-score of ResNet18 model is improved by 5.6%.
Among them, the three different depth models of ResNet and the LGAM-based network model have higher accuracy rates, and their confusion matrices are shown in

Conclusion
This article construes a benign and malignant classification network model for thyroid nodes based on the Lightweight Global Attention Module (LGAM).The experimental results show that the network model can be applied well to small sample sets of ultrasound images, and the diagnostic efficiency was improved while ensuring the accuracy of nodule benign and malignant classification.However, although the network model in this work has improved the diagnostic efficiency compared with other models, it has only been experimented with in the data set constructed in this paper and has not been further validated in other ultrasound data sets of thyroid nodules.Future research is needed to continue to expand the data set in this paper and try to input other data sets into the model in this paper to further verify the feasibility of the model, and at the same time, we hope to apply the model to the actual ultrasound diagnosis to assist physicians in decision making.

2 .
Experimental Principles and Procedure In this paper, an attention mechanism-based network model is proposed for implementing benign and malignant classification of thyroid tumors.The collected data are first cleaned and preprocessed to obtain a new dataset for network training, then a network model based on Lightweight Global Attention Module (LGAM) is constructed to model the global context, which can extract nodule features more effectively and quickly.Finally, the dataset is entered into the network to obtain the final classification results of benign and malignant.

Figure 1 .
Figure 1.Images of the raw data collected at the hospital.

Figure 4 .Figure 4 .
Figure 4. Confusion matrix plot of four models.Further, in order to get the best model parameters of the LGAM-based network, comparison experiments were conducted by changing the number of iterations, and the experimental results are shown in Figure 5 below.It can be seen that the evaluation indexes show a flat trend when the number of iterations reaches 300, therefore, this paper selects the number of iterations of 300 as the model parameter, at which time the LGAM-based network model has the best effect on the classification of the dataset proposed in this paper, and the values of the four evaluation indexes are at a high level, among which the accuracy rate can reach 95.16%.

Figure 5 .
Figure 5.Comparison of experimental results for different number of iterations.

Table 1 .
. Comparison of experimental results of different models on the data set of this paper