Deep learning enhanced ultra-fast SPECT/CT bone scan in patients with suspected malignancy: quantitative assessment and clinical performance

Objectives. To evaluate the clinical performance of deep learning-enhanced ultrafast single photon emission computed tomography/computed tomography (SPECT/CT) bone scans in patients with suspected malignancy. Approach. In this prospective study, 102 patients with potential malignancy were enrolled and underwent a 20 min SPECT/CT and a 3 min SPECT scan. A deep learning model was applied to generate algorithm-enhanced images (3 min DL SPECT). The reference modality was the 20 min SPECT/CT scan. Two reviewers independently evaluated general image quality, Tc-99m MDP distribution, artifacts, and diagnostic confidence of 20 min SPECT/CT, 3 min SPECT/CT, and 3 min DL SPECT/CT images. The sensitivity, specificity, accuracy, and interobserver agreement were calculated. The lesion maximum standard uptake value (SUVmax) of the 3 min DL and 20 min SPECT/CT images was analyzed. The peak signal-to-noise ratio (PSNR) and structure similarity index measure (SSIM) were evaluated. Main results. The 3 min DL SPECT/CT images showed significantly superior general image quality, Tc-99m MDP distribution, artifacts, and diagnostic confidence than the 20 min SPECT/CT images (P < 0.0001). The diagnostic performance of the 20 min and 3 min DL SPECT/CT images was similar for reviewer 1 (paired X 2 = 0.333, P = 0.564) and reviewer 2 (paired X 2 = 0.05, P = 0.823). The diagnosis results for the 20 min (kappa = 0.822) and 3 min DL (kappa = 0.732) SPECT/CT images showed high interobserver agreement. The 3 min DL SPECT/CT images had significantly higher PSNR and SSIM than the 3 min SPECT/CT images (51.44 versus 38.44, P < 0.0001; 0.863 versus 0.752, P < 0.0001). The SUVmax of the 3 min DL and 20 min SPECT/CT images showed a strong linear relationship (r = 0.991; P < 0.0001). Significance. Ultrafast SPECT/CT with a 1/7 acquisition time can be enhanced by a deep learning method to achieve comparable image quality and diagnostic value to those of standard acquisition.


Introduction
SPECT bone scintigraphy is a widely used imaging technology in nuclear medicine with diverse applications (Brenner et al 2012). Technetium 99m-methylenediphosphonate (Tc-99m MDP) is a widely used radiopharmaceutical for bone scanning. Despite the high sensitivity of bone scintigraphy, its specificity is relatively poor . This deficiency has been improved by the introduction of hybrid SPECT/CT (Ghanem et al 2020), which utilizes the precise anatomical localization from registered CT images Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. (Utsunomiya et al 2006, Mariani et al 2010, Palmedo et al 2014. However, the multimodal information obtained by SPECT/CT comes at the cost of a long scanning time. In general, one patient may require a SPECT/CT scan time of up to 30 min in clinical practice. The patients examined are usually those with malignant tumors and often cannot tolerate prolonged immobility (Picone et al 2021). Body movement during the prolonged examination can result in inaccurate SPECT/CT fusion and motion artifacts (Shao et al 2021). Therefore, there is a pressing need to reduce the scanning time to improve patient comfort and suppress patient motion without sacrificing image quality. A short scanning time can also boost scanner throughput and improve clinical productivity, but it could lead to decreased count statistics and result in degraded image quality due to extensive poisson noise and poor contrast-to-noise ratio.
The researches aimed to slashing acquisition time can be separated into two classes. One is to improve the scan system with high efficient detector and the other is to reconstruct high-quality SPECT image on sparse data. The introduction of cadmium-zinc-telluride (CZT) detector-based SPECT system enabled higher sensitivity imaging compared with conventional NaI detector, which also allows optimization of acquisition protocols, include imaging time (Ljungberg and Pretorius 2018). Velden et al introduced a efficient radioembolization procedues with acquisition protocols using nonuniform duration of the projection (van der Velden et al 2019). Ali et al introduced ordered-subset expectation maximization with resolution recovery (OSEM-RR) algorithm to enable half-time SPECT myocardial perfusion imaging (Ali et al 2009). However, the physics limitations of these techniques prevent them from being used with older SPECT systems or could not achieve much faster imaging without sacrifice image quality.
To solve the trade-off between examination time and image quality, deep learning-based methods have been used in SPECT/CT reconstruction in recent years , Lin et al 2021, Yang et al 2021. Yang et al applied a deep learning method to synthesize attenuation-corrected cardiac SPECT images using noncorrected SPECT images without partaking in an additional image reconstruction process (Yang et al 2021). Ramon et al simulated low-dose SPECT myocardial perfusion imaging (MPI) scans by statistical subsampling of the counts obtained from standard clinical doses to form a paired dataset necessary for training a deep learning network (Ramon et al 2018). They also acquired corresponding low-dose list-mode data by accepting or rejecting the photon projection with a given probability and thereby obtained images with reduced dose levels of 1/2, 1/4, 1/ 8, and 1/16 of the standard acquisition (Ramon et al 2020). Deep learning-generated synthesized projections constructed by a deep convolution U-Net model were added to 177 Lu-SPECT scans with sparsely acquired projections, which considerably circumvented image degradation and reduced the scanning time (Ryden et al 2021). Another three-dimensional residual U-Net model was used to reconstruct full acquisition-time images from short acquisition-time images, reducing the scan time for pediatric Tc-99m dimercaptosuccinic acid SPECT (Lin et al 2021). Shiri et al compared the performance of a deep learning method to reduce the scan time in SPECT MPI through two approaches, namely, by cutting off angular projections and reducing the acquisition time per projection (Shiri et al 2021). A previous work showed that the image quality of SPECT bone scans could be significantly improved in terms of peak signal-to-noise ratio (PSNR) and structure similarity index measure (SSIM) (Pan et al 2022). However, the diagnostic performance of the DL-enhanced SPECT bone images was not assessed. In this study, we evaluated the performance of deep learning-enhanced SPECT/CT images acquired from subjects with suspected bone metastases and evaluated whether the deep learning approach could meet clinical diagnostic needs.

Subjects
This study was approved by the institutional review board. All patients signed informed consent forms before examination. One hundred and two subjects with suspected bone metastases from March 2021 to March 2022 were enrolled in this prospective study. The inclusion criteria were as follows: (1) adult age (18 years or older), (2) clinically suspected tumor bone metastasis, (3) ability to tolerate SPECT, and (4) ability to provide informed consent or assent according to the guidelines of the Clinical Research Ethics Committee. The exclusion criteria were as follows: (1) pregnancy, (2) inability to tolerate SPECT, and (3) inability or unwillingness to provide written informed consent on the part of either the research participants or their parents or legal guardians. The subject information was obtained from the medical record.

SPECT/CT acquisition
All patients were injected with 9-11 MBq kg −1 (0.24-0.29mCi) Tc-99m MDP. Whole-body scans and quantitative bone SPECT/CT imaging were performed using SPECT/CT (Siemens Symbia Intevo, Erlangen, Germany). Whole-body planar imaging was performed approximately 3-4 h post-injection. Areas of suspected malignancy were examined with a 20 min SPECT/CT (standard time) scan followed by a 3 min SPECT (1/7 reduction in scan time) scan. The patient was instructed to remain still during the examination. Scans containing motion artifacts which would affect the evaluation of image quality were discarded after visual inspection. The scanning matrix was 256 × 256, and the zoom factor was 1.0.
Step-and-shoot mode with a total of 120 projections (60 steps) over 360°was used, while 20 s per step was adopted for the standardtime SPECT scans and 3 s per step was adopted for the 1/7 standard-time SPECT scans. Subsequently, a CT scan was performed at 130 kV and 130 mA. The CT data were reconstructed using a sharp bone kernel with a 5 mm slice thickness (B50s) and a smooth attenuation-correction kernel with a 3 mm slice thickness (B31s). SPECT reconstruction with attenuation correction was performed using the B31s CT attenuation map. The ordered subsets conjugate gradient enhanced xSPECT reconstruction algorithm (xSPECT/CT, Siemens Symbia Intevo) with 2 subsets and 28 iterations without postsmoothing was used to generate quantification measurements such as the SUV max .

Imaging process
A pretrained deep learning model with integrated multiscale and multimodality features based on the 3 min SPECT images and corresponding CT images was applied to generate enhanced SPECT images (3 min DL SPECT) (Pan et al 2022). The architecture of the network is shown in figure 1. The main architecture is a U-Net-like encoder-decoder, where each stage is replaced by an N-layer residual U-block (RSU). Each RSU consists of a mixture of different-sized encoder-decoder structure helps capture contextual information on different scales more efficiently. In the last two layer of the architecture, we further discarded the downsampling and up-sampling structures as the resolution of feature maps is too low. This kind of block is referred as RS in the figure. The 3 min SPECT SUV images were directly used as the network input without extra scaling. The CT images were interpolated to the size of the 3 min SPECT images and were normalized by the image mean before being input to the network. Three consecutive SPECT and CT images were concatenated into a 6-channel matrix and fed into the network each time. Three slices of enhanced SPECT were predicted, and the average value was calculated if one slice was inferred multiple times. In the training process, we used lesion attention mask and a combination of L1 loss and the SSIM loss to ensure the accuracy of the synthesized image value and distinguishability of the structure and important ROIs. Deep supervision is used to accelerate training process.

Image evaluation
The 20 min, 3 min and 3 min DL SPECT/CT images were evaluated independently by 2 nuclear medicine physicians with 5 and 10 years of experience. A 5-point Likert scale (1), unacceptable image quality; (2), suboptimal image quality; (3), acceptable image quality; (4), good image quality; (5), excellent image quality) was used to score the three groups of images to evaluate their general image quality (the overall visual impression of the images), Tc-99m MDP distribution, presence of artifacts, and general diagnostic confidence. A score of 3 or higher indicated that the image met the requirement for image quality for clinical diagnosis. The largest lesion with the highest SUV for each subject was defined as the volume of interest, which was drawn using Siemens 3D Isocontour, and the SUV max was automatically calculated. All analyzes were performed while the evaluators were blinded to the image acquisition information.
To quantitatively evaluate the performance of the synthesized images, the PSNR and SSIM were used as evaluation metrics. The 20 min SPECT/CT images were treated as the ground-truth images. The PSNR for the synthesized image is defined as where m x and s x 2 are the average and variance of the pixel values of the input synthesized image, respectively. m y and s y 2 are the average value and variance of the pixel values of the input 20 min SPECT image. s xy is the covariance of the pixel values of the two images. c 1 and c 2 are small constants to prevent the denominator from being zero. SSIM was calculated using the scikit-image (0.17.2; metrics.structural_similarity) package.

Diagnostic performance
The 20 min SPECT/CT images of all patients were read by the two reviewers. Benign and malignant lesions were determined based on pathological diagnosis, imaging examinations (20 min SPECT/CT, CT, MRI, PET/CT), and clinical follow-up data (For example, the lesion became larger during image follow-up and the tracer of the lesion was more concentrated than before was considered as malignant). If the results were inconsistent, another senior physician was included to make a judgment, and the final status of the lesion (benign or malignant) was determined as the gold standard. To dilute the memory effect, the two readers independently read randomly and blindly presented 3 min SPECT and 3 min DL SPECT images after one month, and determined whether the lesions were benign (negative) or malignant (positive) , while a image quality score of 1 was considered negative. Sensitivity, specificity, accuracy, and interobserver agreement were calculated. The diagnostic performance analyzes were performed at the patient level.

Statistical analysis
Statistical analyzes were performed using GraphPad Prism (8.0.0). The kappa consistency test was used to evaluate the consistency of the two reviewers with the 20 min, 3 min, and 3 min DL SPECT images. The quantitative PSNR and SSIM metrics were compared with the paired Student's t test. Bland-Altman analysis and Spearman correlation analysis were used to demonstrate the consistency of the SUV max between the 3 min, 3 min DL, and 20 min SPECT/CT images. The McNemar test was used to assess the differences in diagnostic performance. P < 0.05 was considered statistically significant.

Patient characteristics
A total of 102 patients (55 males and 47 females, mean age 60 ± 13 years, age range 26-87 years, mean body mass index (BMI) 23.5 ± 3.4 kg m −2 , BMI range 15.8-32.0 kg m −2 ) with suspected malignancy were included in this study. Thirty-one patients were eventually diagnosed with bone metastases. The patient characteristics are summarized in table 1.

Image quantitative analysis
The 3 min images were too noisy to have clinical value and demonstrated severely disrupted structures and considerable differences from the 20 min SPECT images. These images were subsequently rated 1 point, indicating that they were of insufficient quality for clinical diagnosis. The 3 min DL and 20 min images displayed excellent image quality (figures 2-4). As shown in figure 5, the mean 5-point Likert scale scores assigned by the 2 reviewers to the 3 min DL images were higher than those assigned to the 20 min images in terms of general image quality, Tc-99m MDP distribution, artifacts, and diagnostic confidence (mean ± SD: 3.44 ± 0.64 versus 3.10 ± 0.64, P < 0.0001; 3.45 ± 0.64 versus 3.07 ± 0.62, P < 0.0001; 3.46 ± 0.63 versus 3.10 ± 0.61, P < 0.0001; 3.78 ± 0.66 versus 3.46 ± 0.68, P < 0.0001). The SUV max values obtained from the 3 min DL and 20 min images were not significantly different (P = 0.973). Bland-Altman analysis showed a mean SUV max difference of −0.0055, and the 95% limits of agreement (−2.031, 2.020) contained 38/40 of the SUV max difference (figure 6). Spearman correlation analysis indicated a strong association between SUV max and both the 3 min DL and 20 min SPECT images (r = 0.991, P <0.0001). The PSNR and SSIM of the 3 min DL images were significantly higher than those of the 3 min images (51.44 versus 38.44, P < 0.0001; 0.863 versus 0.752, P < 0.0001) (table 2).

Interrater agreement analysis
The diagnosis results for the 20 min (kappa = 0.822) and 3 min DL (kappa = 0.732) showed high consistency between the two reviewers. The benign and malignant diagnostic performance of the 20 min and 3 min DL images were similar according to both reviewer 1 (paired X 2 = 0.333, P = 0.564) and reviewer 2 (paired X 2 = 0.05, P = 0.823). The sensitivity, specificity and accuracy of the 20 min and 3 min DL images were similar according to both reviewer 1 (0.903 versus 0.806, 0.873 versus 0.873, and 0.882 versus 0.853, respectively) and by reviewer 2 (0.867 versus 0.806, 0.944 versus 0.936, and 0.912 versus 0.920) (all P > 0.05) (table 3).

Discussion
SPECT bone scintigraphy continues to be a high-volume nuclear imaging procedure, offering the advantages of total body examination with high sensitivity but poor specificity (Brenner et al 2012). The use of additional SPECT/CT for evaluating suspicious or equivocal lesions has been shown to improve diagnostic confidence and specificity (Utsunomiya et al 2006, Mariani et al 2010, Palmedo et al 2014. However, the extra multimodal information obtained with SPECT/CT leads to an increased scanning time. Reducing the acquisition time and radioactive dose in SPECT/CT have become a major focus of attention from researchers. A shorter acquisition time may be considered more comfortable for patients, especially children, people with obesity and patients experiencing chronic and acute pain, who are more likely to be unable to tolerate the prolonged immobility during the acquisition period. Reducing the injected radiotracer dose can further reduce the amount of radiation to which children and patients who require multiple follow-ups are exposed during the examination. A previous study generated high-quality bone scan SPECT images from SPECT images acquired at 1/7 the standard scan time using a deep learning method with a small sample and confirmed that this method yielded significant improvements in image quality in terms of the noise level, details of the anatomical structures and SUV accuracy (Pan et al 2022). Previous deep learning-based SPECT enhancement studies were conducted based on simulated images from randomly undersampled list-mode data (Ramon et al 2018, 2020, Yang et al 2021. To imitate the real clinical fast-scan environment, this work adopted continuous examinations and used the same reconstruction method. The fast scans of 102 patients with different types of cancer were enhanced by a pretrained deep learning model. The SPECT/CT image quality in this study was assessed by two independent nuclear radiologists with 5 and 10 years of experience in SPECT/CT diagnosis. The reading results of both radiologists showed that the image qualities of the 3 min SPECT were insufficient to have diagnostic value, with scores of 1 point on a 5-point Likert scale, and were not included in the statistical evaluation. Although standard acquisition always received the highest score in deep learning-based PET image enhancement (Ly et al 2021), the general image quality, distribution of Tc-99m MDP, presence of artifacts, and general diagnostic confidence of the 3 min DL images were significantly superior to those of the 20 min images (P < 0.0001) in this SPECT bone scan study. With respect to the 20 min SPECT images, a reduction in the noise level of soft tissue and a more uniform and coherent distribution of the radiotracer in normal bone were observed in the 3 min DL SPECT images were observed, as shown in figures 2-4. This could be a result of the high noise level of the SPECT system and the An 81 year old man with lung cancer after chemotherapy. A, B 20 min SPECT and SPECT/CT lumbar sagittal images, respectively. The concentration of radiotracer in the osteophyte of the 5th lumbar vertebra is shown (SUV max = 11.24). C The MIP image shows the concentration of radiotracer distributed at the edge of the lumbar spine. The two reviewers assigned scores of 4 and 4 to the general image quality, 3 and 3 to the distribution of Tc-99m MDP, 4 and 3 to the presence of artifacts, and 4 and 4 to the general diagnostic confidence. D, E 3 min DL SPECT and SPECT/CT lumbar sagittal images showing radiotracer distribution and concentration in the osteophytes of the 5th lumbar vertebra (SUV max = 11.03). F MIP image shows marginal lumbar radiotracer distribution and concentration in multiple locations. The general image quality, detail of Tc-99m MDP, presence of artifacts, and general diagnostic confidence were all given scores of 4 points by both reviewers. G, H 3 min SPECT and SPECT/CT lumbar sagittal images. I MIP images. The two reviewers both assigned a score of 1 point to the general image quality, distribution of Tc-99m MDP, presence of artifacts, and general diagnostic confidence. J-L transverse, sagittal and coronal CT images show hyperosteogeny at the edge of the 5th lumbar vertebra (arrow). The overall radiotracer concentration is slightly lower than that of the 20 min image, the Tc-99m MDP distribution is more uniform, and the left iliac crest edge of f (arrow) is smoother than that of c (arrow). tendency of the deep learning method to generate expected pixel values (Lehtinen et al 2018). The diagnosis results for both the 20 min and 3 min DL SPECT/CT images showed substantial agreement in the 2 reviewers' image evaluations (kappa = 0.822 and 0.732). A SPECT study showed that shortening the acquisition SPECT scan time to 1/4 of the standard time resulted in excellent interobserver agreement without affecting the diagnosis (Zacho et al 2017). Inconsistent with their findings, when the scanning time was reduced to 3 min, the image quality was greatly degraded and had a grainy appearance in our study. After the application of deep learning technology, the algorithm-enhanced 3 min images showed excellent image quality and met the needs of clinical diagnosis in our study.
Tc-99m MDP chemisorbs and binds to hydroxyapatite crystals and thus serves as a marker of bone turnover and bone perfusion. It rapidly localizes to bone and clears quickly from the background, making it favorable for imaging (Van den Wyngaert et al 2016); even a 5% change in bone turnover can be detected on bone imaging, which can often detect active bone formation in the skeleton related to malignant and benign disease (Vijayanathan et al 2009). For some conditions, such as infection, trauma, or sclerosis, focusing on increased radioactive tracer uptake alone could lead to false-positive results (Van den Wyngaert et al 2016). Therefore, a diagnosis based on SPECT/CT imaging relies to some extent on CT imaging and the readers' experience. The diagnostic performance of the 20 min and 3 min DL SPECT/CT images was not significantly different according to both reviewers in our study. These results are similar to those reported in traditional SPECT/CT bone scans studies (Zhang et al 2020, Mostafa et al 2021. Although the sensitivity of the 3 min DL SPECT/CT images was 10% lower than that of the 20 min images, this different was not statistically significant and would be considered acceptable in clinical practice. One reason for this result could be that the overall concentration of radiotracer in the 3 min DL SPECT/CT images was slightly lower than that of the 20 min images. This study showed that 3 min DL SPECT/CT images not only allowed a shorter scanning time but also reached the diagnostic level of traditional SPECT/CT images. Fast scanning is particularly beneficial for patients who have difficulty maintaining a prolonged horizontal position for SPECT/CT imaging due to pain or other reasons.  A previous study showed that the SUV max obtained from SPECT/CT could play an important role in differentiating benign from malignant bone diseases Qi et al (2021). In this study, the SUV max obtained from the 3 min DL and 20 min SPECT images was not significantly different (P = 0.973), and there was a strong linear relationship between the SUV max of the lesions in the two imaging sets (r = 0.991; P < 0.0001). The results indicated that the quantitative SUV max value of a 3 min ultrahigh-speed SPECT/CT bone scan images enhanced by our deep learning algorithm can also be used in clinical practice. The deep learning model is capable of processing DICOM images of bone SPECT that have undergone attenuation correction. This model can be installed as a plugin for the doctor's report computer in the PACS system, allowing for direct processing and export of the 3 min attenuation corrected SPECT DICOM images into DL-3 min SPECT images. The software's convenience and user-friendly interface will significantly enhance the efficiency of bone scanning.
To analyze the performance of proposed method in complicated clinic environments, we applied the model to CT images with different scan parameters and reconstruction settings together with the same fast scanned SPECT. Figure 7 showed one slice of registered CT and correspondence enhanced SPECT. Quantitative result showed SPECT enhanced using CT acquired under 100KVP, 1.5 mm slicethickness and smoothed with 51 kernel achieve the best PSNR and SSIM in the tested case. But all enhanced SPECT reach much higher PSNR and SSIM compared with the 3 min SPECT.
Although this research explored the clinical performance of a deep learning method in SPECT bone scans, it has several limitations. We addressed most tumor types in this study, but few positive cases for bone metastasis were included. Pathological verification of some bone lesions is challenging, and the gold standard, based on the clinical history and other imaging studies, might cause bias in the results. 1/7 reduction in scan time images were

Conclusion
This study evaluated the image quality and diagnostic efficacy of ultrafast SPECT/CT bone imaging enabled by a deep learning approach in a relatively large population. The clinical qualitative and quantitative measurements of the deep learning enhanced 7 times faster scans were comparable to those of standard of care SPECT/CT scans . The research findings indicateed that employing the deep learning enhancement technique could enable the realization of ultra-fast and highly probable ultra-low-dose SPECT scans in future clinical routine practice.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary information files).

Declaration of competing interest
Author NG is stockholder of RadioDynamic Healthcare. Author BP works for RadioDynamic Healthcare.

Ethical statement
All methods were performed in accordance with the ethical standards as laid down in the Declaration of Helsinki and its later amendments or comparable ethical standards. Institutional Review Board approval from Ethical Review of Medical Ethics Committee of Shanghai East Hospital EC. D(BG). 009. 02.1 was obtained. Written informed consent was obtained from all participants.

Funding
This study has received funding by Key Discipline Construction Project of Shanghai Pudong New Area Health Commission, PWZxk2022-12.

ORCID iDs
Nan-Jie Gong https:/ /orcid.org/0000-0001-9249-413X Jun Zhao https:/ /orcid.org/0000-0002-9887-5512 (Jun Zhao) Figure 7. (A) comparison of SPECT enhanced with different CT images. A standard 20 min SPECT. B original 3 min SPECT. C, D CT scan was performed at 100 kV, 130 mA and reconstructed using a smooth attenuation-correction kernel with a 3 mm slice thickness and corresponding enhanced SPECT. E, F CT scan was performed at 100 kV, 130 mA and reconstructed using a sharp bone kernel with a 1.5 mm slice thickness and corresponding enhanced SPECT. G, H CT scan was performed at 130 kV, 130 mA and reconstructed using a smooth attenuation-correction kernel with a 3 mm slice thickness and corresponding enhanced SPECT. I, J CT scan was performed at 100 kV, 130 mA and reconstructed using a sharp bone kernel with a 3 mm slice thickness and corresponding enhanced SPECT.