Convolutional neural network based attenuation correction for 123I-FP-CIT SPECT with focused striatum imaging

Yuan Chen; Marlies C Goorden; Freek J Beekman

doi:10.1088/1361-6560/ac2470

1. Introduction

SPECT with ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT can be used for visualization of the dopamine transporter (DaT) distribution in the brain. This enables assessment of parkinsonian syndromes, particularly for differentiation of Parkinson's disease from essential tremor and for differentiation of dementia with Lewy Body from Alzheimer disease (Catafau and Tolosa 2004, Hauser and Grosset 2012, Oliveira et al 2021). Current clinical assessment of ¹²³I-FP-CIT scans relies mainly on a visual inspection of the extent of DaT reduction in the striatum, the striatal shape and its symmetry (Djang et al 2012, Park 2012). Relative quantification by calculating the regional striatal uptake ratios could reduce inter- and intra-observer variability and may enable longitudinal studies to monitor disease progression (Winogrodzka et al 2001) and therapeutic effects (Parkinson Study Group 2002). For accurate relative quantification, correction for photon attenuation in the patient head is recommended by guidelines (Djang et al 2012, Morbelli et al 2020).

Ideally, attenuation correction (AC) would be performed based on an attenuation map ( $\mu$ -map) derived from a perfectly registered CT scan. This $\mu$ -map provides the tissue attenuation coefficient at each voxel in the patient. However, such a CT scan is often not available, may lead to increased radiation dose. Moreover, possible errors in image registration can induce quantitative inaccuracies in SPECT images (Rajeevan et al 1998, Goetze et al 2007, Crespo et al 2008). Besides the CT based approach, manually drawing an ellipse around the head contour and assuming uniform attenuation within the ellipse is widely used for attenuation map approximation in brain SPECT studies (Tavares et al 2013, Rahmim et al 2017). This ellipse method, however, suffers from observer subjectivity and insufficient estimation of the head contour and internal head anatomy.

Apart from the use of an additional CT scan or a simple ellipse, automatic approaches based only on SPECT data have been investigated, which can be mainly classified into two categories. The first category contains sophisticated methods using SPECT photopeak projections to estimate the attenuation map and activity map either simultaneously by means of joint reconstruction (Censor et al 1979, Nuyts et al 1999, Krol et al 2001) or independently by applying data consistency conditions (Welch et al 1997, Bronnikov 2000, Gourion et al 2002, Yan and Zeng 2009). This approach however has limited utility in clinical routine due to cross-talk artefacts, instability and its computational complexity. The second category consists of contour-detection methods that assume uniform attenuation within the contour. Such a contour can be obtained by automatic edge-detection in projection space or on non-corrected SPECT images (Macey et al 1988, Younes et al 1988, Hebert et al 1995, Pan et al 1996, Tossici-Bolt et al 2011). Automatic contour detection techniques are not commonly applied clinically, possibly due to the increased complexity given the minimal improvement of accuracy compared to the manual drawn ellipse approach. Interestingly, the value of SPECT images reconstructed from a scatter window has been emphasized in these contour detection studies for edge determination (Macey et al 1988, Wallis et al 1995, Pan et al 1996. This is justifiable as Compton scatter is the dominant photon-tissue interaction for clinical SPECT, and the probability of Compton scatter is proportional to the tissue density (with a maximal probability at skull and almost zero outside the body). Thus, the tissue density information embedded in scattered data could be helpful to highlight the tissue boundaries.

Lately, deep learning with neural networks has been applied to estimate $\mu$ -maps using SPECT-data-only for clinical (Shi et al 2020) and simulated (Yu et al 2021) ^99mTc-tetrofosmin myocardial scans and PET-data-only for clinical ¹⁸F-FDG brain PET scans (Liu et al 2018, Reimold et al 2019). Our group has recently also demonstrated a convolutional neural network (CNN) approach to estimate $\mu$ -maps for ^99mTc-HMPAO full brain perfusion scans (Chen et al 2021) based on a Monte Carlo (MC) study assuming a multi-pinhole clinical SPECT geometry (G-SPECT-I (Beekman et al 2015)). In this study, both the primary and scattered photons from SPECT emission data were used via multiple image reconstructions from different energy windows to obtain as much attenuation information as possible. Using these multi-energy SPECT images, a patch-voxel CNN with an encoder architecture was implemented to transform a 4D SPECT patch (3D SPECT plus one energy dimension) to a single attenuation coefficient for the central voxel of the patch. Such a patch-voxel approach was used due to its advantage of requiring a reduced number of parameters and providing an increased amount of training data compared to the full-image to full-image approaches e.g. U-Net (Ronneberger et al 2015). Accurate attenuation maps were obtained with the proposed CNN approach for the ^99mTc-HMPAO full brain perfusion scans.

For ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT SPECT, the activity distribution is more localized while the amount of activity in the brain is lower (standard injection dose of 185 MBq) compared to a brain perfusion scan (standard injection dose of 925 MBq), which leads to a limited number of primary and scatter events being captured and potentially utilized. Additionally, clinical assessment of a ¹²³I-FP-CIT scan often uses only a few transaxial slices around the striatum (e.g. 20 mm thick slices (Winogrodzka et al 2001)) rather than the full axial length of the brain. Previously, we demonstrated with a simulation study that for ¹²³I-FP-CIT scans, focused striatum imaging with a confined axial length can maximize the count yield without sacrificing image quality (Chen et al 2018). In case only a few SPECT slices are scanned, $\mu$ -maps that could be beneficial for ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT SPECT AC may not be fully estimated. Therefore, the validity of the CNN based method for the axially focused ¹²³I-FP-CIT scans needs to be investigated.

The aim of this paper is to verify the CNN based approaches for automatic $\mu$ -map estimation for G-SPECT-I ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT imaging. To this end, SPECT data were acquired with a protocol aimed at imaging a few slices centered at the striatum based on the G-SPECT-I geometry. Besides the patch-voxel CNN that was implemented in our previous work, we also tested two other networks that have been used in relevant recent studies which estimate $\mu$ -maps with a patch-patch or image-image based method (Liu et al 2018, Shiri et al 2020, Shi et al 2020). The proposed strategy was evaluated using MC simulations based on the G-SPECT-I geometry. Additionally, as the added value of AC on clinical ¹²³I-FP-CIT scans is debatable, the impact of AC was also reported to check in which cases CNN based AC could be beneficial. Quantitative accuracy of the CNN approach was assessed on the network estimated $\mu$ -maps and on attenuation corrected SPECT images.

2. Methods

2.1. G-SPECT-I system

The G-SPECT-I (Beekman et al 2015) consists of nine large-area NaI stationary detectors, a multi-pinhole collimator and a precisely controlled xyz-stage used for bed translation (see figure 1). All pinholes are simultaneously 'viewing' a central volume from which complete data is obtained without any bed movement. This central volume is thus referred to as the complete data volume (CDV, see figure 1). For a scan of an object larger than the CDV, the bed is translated to extend the scanning region with sufficient sampling. In this work, a total number of 8 bed translations (2 axial translations combined with 4 transaxial translations) was used based on findings in Chen et al (2018) for an optimal focused striatum scan to maximize the count yield. This bed translation trajectory ensures an axial scanning length of about 57 mm which is long enough to cover the entire striatum (35 mm). All pinhole projections from all bed positions together were used simultaneously for image reconstruction using the so-called scanning focus method (Vastenhouw and Beekman 2007). Other details concerning G-SPECT-I are described in Chen et al (2018).

**Figure 1.** Illustration of the G-SPECT-I scanner (the left image) when a small-bore collimator dedicated for brain imaging was mounted, and the focused striatum imaging strategy (the right image). The CDV is the volume 'seen' by all pinholes; it has a transaxial diameter of 100 mm and an axial length of 60 mm. The patient bed can be shifted in *xyz* directions to position different parts of the patient head into the CDV and thus extend the scanning region with sufficient sampling. With the focused striatum imaging strategy, the scanning region is confined in axial direction to maximize the count yield from the striatum. The data truncation region is 'seen' by only part of the pinholes and thus sampling is not complete.
Download figure:
Standard image High-resolution image

2.2. Simulated SPECT scans

To mimic realistic SPECT scans, full system MC simulations were performed. These MC simulated scans were used as input to the CNN for $\mu$ -maps estimation. Accuracy of AC using these $\mu$ -maps was evaluated subsequently. This evaluation was done on MC simulated realistic images as well as on noise-free simulated SPECT images. The latter noise-free images were involved to better visualize and quantify AC effects when using different $\mu$ -maps. Both simulation methods are summarized in figure 2 and are described in more detail below.

**Figure 2.** Illustration of SPECT data simulation. (a) MC simulations of realistic ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans; for each phantom, an activity and attenuation map were generated and were used for MC simulation. Background counts due to cosmic radiation were emulated and added to the MC simulated projection data to make the simulation more realistic. Five SPECT reconstructions were performed from different energy windows. The image set was used as input to CNN for μ-map estimation. (b) Noise-free simulated images on which the effects of attenuation correction with different μ-maps were studied. The MC simulation uses the attenuation map with each region assigned with a material (skull or brain), while for noise-free simulations, the maps with each region assigned with an attenuation coefficient were used (subject to the different requirements of both simulators).
Download figure:
Standard image High-resolution image

**Figure 2.** Illustration of SPECT data simulation. (a) MC simulations of realistic ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans; for each phantom, an activity and attenuation map were generated and were used for MC simulation. Background counts due to cosmic radiation were emulated and added to the MC simulated projection data to make the simulation more realistic. Five SPECT reconstructions were performed from different energy windows. The image set was used as input to CNN for μ-map estimation. (b) Noise-free simulated images on which the effects of attenuation correction with different μ-maps were studied. The MC simulation uses the attenuation map with each region assigned with a material (skull or brain), while for noise-free simulations, the maps with each region assigned with an attenuation coefficient were used (subject to the different requirements of both simulators).
Download figure:
Standard image High-resolution image

2.2.1. Digital phantoms

The publicly available Brainweb dataset containing 20 phantoms generated from normal subject scans was used. For each phantom, an activity distribution map and an attenuation map were generated (see figure 2). The activity map was obtained by assigning ${}_{\,}{}^{123}{\rm{I}}$ -ioflupane (159 keV) to the striatum and the background (which is the rest of the brain and the skin) with a concentration ratio in the range of 1.5:1–11:1 as typically seen in clinical settings (Dickson et al 2010, Niñerola-Baizán et al 2018). Besides, we made sure that the concentration in the putamen was not higher than in the caudate. We also allowed the concentrations in striatal substructures (caudate and putamen) to differ for the different hemispheres. In this way, normal and abnormal tracer uptakes (i.e. uniform global striatal uptake reduction as well as unilateral and bilateral uptake asymmetries between caudate and putamen) could be covered, as in Niñerola-Baizán et al (2018). A total activity of 7.4 MBq in average was put in the phantom (resembling an injected dose of 185 MBq and a brain uptake of 4% at the time of imaging (Volterrani et al 2019)). This activity was set slightly differently for each phantom with a standard deviation $\sigma$ of 10% (normally distributed).

The attenuation map was obtained by tissue segmentation. Regions of skull, skin, blood, muscle, brain, water, fat and air structures were segmented. These regions were assigned with a respective attenuation coefficient of 0.232, 0.148, 0.143, 0.141, 0.140, 0.135, 0.123 and 0 cm⁻¹. These values were calculated based on the chemical component of each tissue and the mass attenuation coefficient given in NIST (National Institute of Standards and Technology 2021) for photons at 159 keV (Hubbell and Seltzer 1995). This map with designated coefficients was considered to be the GT $\mu$ -map that was later used for training of the neural networks. Phantoms were randomly rotated (−20° to 20°) and translated (−10 to 10 mm) to make the dataset more variable. All phantoms were down-sampled using trilinear interpolation to a voxel size of 1.0 × 1.0 × 1.0 mm³ from their original voxel size of 0.5 × 0.5 × 0.5 mm³.

2.2.2. MC simulated realistic projections

MC simulations of the 20 phantoms were performed with Geant4 Application for Tomographic Emission (GATE) (Jan et al 2004) with geometry based on G-SPECT-I. The MC simulation assumes a total scan time of 30 min Besides, cosmic background counts were added to the MC projections to make the simulation more realistic. More details of the MC simulation in GATE and the cosmic background counts emulation with the G-SPECT-I system were described in Chen et al (2021) .

2.2.3. Noise-free simulated projections

For all 20 phantoms, noise-free forward projections were generated with the VRT simulator. This simulator takes the system geometry (i.e. the precise pinhole and detector positions and detector orientations) as input and models the collimator and patient attenuation but ignores scatter (Goorden et al 2016, Wang et al 2017), as shown in figure 2. No noise or cosmic radiation counts were modeled in the VRT simulated projections.

2.2.4. Image reconstruction

A system matrix calculated based on the VRT simulator was used for image reconstruction of the MC simulated projections and the noise-free simulated projections (see figure 2). No patient AC was performed during reconstruction. All image reconstructions were performed on a 1.5 mm grid, larger than the voxel size of the digital phantoms, to mimic a continuous activity distribution reconstructed on a discrete grid. Similarity regulated OSEM (Vaissier et al 2016) with 8 subsets and 10 iterations was implemented for image reconstruction.

As demonstrated in our previous study (Chen et al 2021), accurate estimation of the $\mu$ -maps can be obtained when photopeak as well as scatter window reconstructed images were used as input to CNN. Therefore, five SPECT reconstructions were conducted from different energy windows for the MC simulated data (see figure 2). One reconstruction used the photons detected in the photopeak window combined with a triple energy window scatter correction (32 keV width centered at 159 keV for the photopeak window and 6.4 keV width at each side for scatter correction) and four additional reconstructions were done from different windows (32 keV width centered at 139, 119, 99 and 79 keV respectively). For the VRT simulated projection data, only primary photons were simulated and could thus be reconstructed, resulting in noise- and scatter-free SPECT scans (see figure 2).

2.3. CNN $\mu$ -map estimation

The MC simulated SPECT scans were preprocessed and subsequently used as input to the CNNs for $\mu$ -map estimation. These steps are explained in detail in the subsections below.

2.3.1. Image preprocessing

The pre-processing of MC simulated images includes a step of cylindrical masking (diameter 240 mm) to remove artefacts outside the head and a step of intensity-normalization to ensure a similar dynamic range for scans from different phantoms, as in Chen et al (2021). Besides, the input SPECT images were down-sampled (tri-linearly) to a voxel size of 3 × 3 × 3 mm³ from an original voxel size of 1.5 × 1.5 × 1.5 mm³ before being fed into the neural network to speed up the training process with a relative larger image voxel size.

2.3.2. CNN architectures

The patch-voxel CNN that estimates $\mu$ -maps voxel-wise with an encoder architecture as implemented in our previous work was used (Chen et al 2021). Additionally, two networks with an encoder–decoder architecture that estimate $\mu$ -maps in a patch-wise or image-wise manner were tested. Such an encoder–decoder architecture has the advantage of preserving neighborhood information in the output space. For the three networks, 2D patches in the spatial domain (xy plane) were used as input. This is to avoid inter-slice interference particularly for slices close to the edge of the scanning region where neighboring slices might suffer from data truncation artefacts (note that there are only a limited number of slices in the scanning region due to the focused striatum imaging strategy). Each 2D patch underwent multiple stages of 2D convolutions and pooling (see figure 3). These three networks are explained below.

(1)
Patch-voxel CNN: this network takes SPECT image patches centered at each voxel as input to predict the corresponding attenuation coefficient of the central voxel from each patch as output (see figure 3). The input image patch has a dimension of 21 × 21 voxels taken from the 2D transaxial slices × 5 energies, while the output has a dimension of 1 voxel.
(2)
Patch-patch CNN: this network takes SPECT image patches as input to predict the corresponding attenuation map patches at the same location in image space (see figure 3). The input SPECT patch dimensions were set to have an even dimension due to the down-sampling up-sampling operations with U-Net architecture. Thus, a dimension of 20 × 20 voxels × 5 energies was used as input with an output size of 20 × 20 voxels. In the testing phase, the entire 2D image slice was used in the network for prediction. Thus, the attenuation coefficient of each voxel was the mean value among predictions from all patches covering that voxel (20 × 20 patches).
(3)
Image-image CNN: this network has a U-Net architecture as in the patch-patch CNN, while here each slice (72 × 80 voxels × 5 energies) was used as input to predict the attenuation map for the corresponding slice (72 × 80 voxels).

**Figure 3.** Network architecture of the patch-voxel, patch-patch and image-image CNN (Conv: 3 × 3 convolution; Pool: 2 × 2 max pooling; FC: fully connected; Upconv: 2 × 2 up-sampling). Every convolutional layer was followed with a layer of batch normalization and a layer of ReLU activation. Each fully connected layer was followed by a sigmoid activation. The number of filters is indicated in the figure below the layers.
Download figure:
Standard image High-resolution image

2.3.3. Model training

Training was done on five randomly selected phantoms while testing was performed on the remaining 15 phantoms. All CNNs were trained with 15k samples randomly selected with replacement in each epoch. For the patch-voxel and patch-patch CNN, a balanced selection of the patches was ensured for the three main tissue classes (air, soft tissue and bone). Data augmentation was performed with random rotation (−20° to 20°) and translation (−10 to 10 mm). The networks were trained to minimize the mean square error between the predicted attenuation coefficient $\mu$ and the GT $\mu$ -map. The Adam optimizer (Kingma and Jimmy 2014) with default settings and a batch size of 15 was used in the training. No validation set was used to determine the optimal epoch. The network was trained for 200 epochs for convergence. This work was implemented using TensorFlow.

2.4. AC using the $\mu$ -maps

Based on the $\mu$ -maps estimated from MC simulated data, an adapted multi-pinhole first-order Chang's method (Chen et al 2021) was done to check AC effects on SPECT images when using different $\mu$ -maps. This adapted multi-pinhole Chang's method first checks—at every bed position—if a voxel is seen by a pinhole. If yes, the transmission along the corresponding projection line (from that voxel center to the pinhole center) is counted and weighted by the pinhole's sensitivity for that voxel. The transmission fraction for the corresponding voxel is then the average transmission value among all projection lines that are counted.

2.5. Evaluation

2.5.1. Attenuation maps

The accuracy of the CNN estimated $\mu$ -map was evaluated by calculating the peak signal to noise ratio (PSNR) defined in equation (1) and the structural similarity index metrics (SSIM) given by equation (3). In the equations, $n\,\,$ is the number of voxels involved in the calculation for each scan. Only voxels in the head (i.e. soft tissue or bone) are taken into account. In equation (1), MSE is the mean square error defined as in equation (2); Max is the maximal image intensity of the GT $\mu$ -map (0.232 cm⁻¹ for the bone here). In equation (3), $\overline{{\mu }_{CNN}}$ and $\overline{{\mu }_{GT}}$ are the mean values of the CNN estimated and GT $\mu$ -maps; ${\left({\sigma }_{CNN}\right)}^{2}$ and ${\left({\sigma }_{GT}\right)}^{2}$ represent the variances of the CNN estimated and GT $\mu$ -map; ${\sigma }_{CNN,GT}$ is the covariance of the CNN estimated and GT $\mu$ -map. Higher values of PSNR and SSIM indicate better quality of the estimated $\mu$ -map. Definitions of the MSE, PSNR and SSIM are given below

$\begin{eqnarray}&&PSNR=10\cdot {\mathrm{log}}_{10}\left(\displaystyle \frac{Ma{x}^{2}}{MS{E}^{\,}}\right),\end{eqnarray} \tag{ 1 }$

$\begin{eqnarray}&&MSE=\displaystyle \frac{1}{\,n}\displaystyle \sum _{j=1}^{n}{\left({\mu }_{CNN}^{j}-{\mu }_{GT}^{j}\right)}^{2},\end{eqnarray} \tag{ 2 }$

$\begin{eqnarray}&&SSIM=\,\displaystyle \frac{2\cdot \overline{{\mu }_{CNN}}\cdot \overline{{\mu }_{GT}}}{{\left(\overline{{\mu }_{CNN}}\right)}^{2}+\,{\left(\overline{{\mu }_{GT}}\right)}^{2}}\,\displaystyle \frac{2\cdot {\sigma }_{CNN,GT}}{{\left({\sigma }_{CNN}\right)}^{2}+\,{\left({\sigma }_{GT}\right)}^{2}}.\end{eqnarray} \tag{ 3 }$

2.5.2. Attenuation corrected SPECT images

The CNN estimated $\mu$ -maps were evaluated on SPECT images via a step of AC. As the presence of noise in the MC simulated images may hamper visualization of AC effects with different $\mu$ -maps, assessment of $\mu$ -maps was done on the noise-free simulated SPECT images and on the noisy MC simulated SPECT images.

2.5.2.1. Visual inspection

SPECT images that are corrected using the CNN estimated $\mu$ -maps (CNN-AC) were compared to the ground-truth-AC (GT-AC) image that uses the GT $\mu$ -maps for correction. All SPECT images shown in this paper for visual comparisons were smoothed using a 3D Gaussian post filter with 6 mm FWHM.

2.5.2.2. Relative quantitative analysis

Regional striatal uptake ratios were calculated in localized regions of interest (ROIs). The specific binding ratio (SBR) and the asymmetry index (AI) were calculated as defined by

$\begin{eqnarray}&&SBR=\displaystyle \frac{{C}_{target}-{C}_{bkg}}{{C}_{bkg}},\end{eqnarray} \tag{ 4 }$

$\begin{eqnarray}&&AI=2\,\times \displaystyle \frac{SB{R}_{R}-SB{R}_{L}}{SB{R}_{R}+SB{R}_{L}}\,\times 100 \% \end{eqnarray} \tag{ 5 }$

Here ${C}_{target}$ and ${C}_{bkg}$ are the mean DaT image intensity in the target ROI and the reference ROI respectively, while $SB{R}_{R}$ and $SB{R}_{L}$ refers to the $SBR$ of a target ROI in the right and left hemisphere respectively. Eight localized sub-regions of the striatum were defined as the target ROI (see figure 4). These localized ROIs in the striatum were drawn with a diameter of 10.5 mm in the transaxial plane and were placed over 9 mm slices in the axial direction (thus each has a volume of 0.78 ml). The reference region (see figure 4) was obtained using the Southampton method (Tossici-Bolt et al 2006).

**Figure 4.** Illustration of the eight localizied regions and the reference ROI for quantitative analysis. Regions of caudate, anterior putamen, middle putamen and posterior putamen in the left and right hemisphere are depicted. Each region has a diameter of 10.5 mm and an axial length of 9 mm (thus a volume of about 0.78 ml). Caud: caudate; Ante Put: anterior putamen; Mid Put: middle putamen; Post Put: posterior putamen. The reference ROI was generated using the Southamoton method.
Download figure:
Standard image High-resolution image

The deviation for the SBRs and AIs calculated from the CNN-AC images were compared to those of GT-AC as in equations (6) and (7). In the equations, $DE{V}_{SBR}$ and $DE{V}_{AI}$ denote the SBR and AI deviations from the GT-AC image. The deviation of AI is calculated directly by subtracting the $A{I}_{GT-AC}$ since AI is already a normalized index expressed in percentage

$\begin{eqnarray}&&DE{V}_{SBR}\left( \% \right)=\displaystyle \frac{| SB{R}_{CNN-AC}-SB{R}_{GT-AC}| }{SB{R}_{GT-AC}}\times 100 \% ,\end{eqnarray} \tag{ 6 }$

$\begin{eqnarray}&&DE{V}_{AI}\,\left( \% \right)=| A{I}_{CNN-AC}-A{I}_{GT-AC}| .\end{eqnarray} \tag{ 7 }$

Besides evaluating the deviations, correlations between SBRs derived from images with different AC methods and those of the GT-AC were assessed. The intra-class correlation coefficient (ICC) (Shrout and Fleiss 1979, McGraw and Seok 1996) was calculated using the two-way mixed effects model for absolute agreement assessment as in Rosario et al (2011). High values of $ICC$ values indicate a strong agreement between two sets of measurements in their absolute values. Additionally, Pearson's correlation coefficient ( $r$ ) was also included to measure the linear association between two sets of measurements.

2.5.2.3. Absolute quantification in kBq ml⁻¹

Compared to relative quantification, absolute quantification assesses regional tracer concentrations rather than uptake ratio between regions. Absolute quantification is currently rarely implemented for ¹²³I-FP-CIT scans. Applications on PET and SPECT studies (Beauregard et al 2011, El Naqa 2014) show potential value of such an assessment, thus we include it in the present work. Here, the striatal binding value (SBV) is calculated using the mean concentration from a ROI (kBq ml⁻¹). The deviation for the SBVs calculated from the CNN-AC images were compared to those of GT-AC according to equation (8)

$\begin{eqnarray}&&DE{V}_{SBV}\,\left( \% \right)=\displaystyle \frac{| SB{V}_{CNN-AC}-SB{V}_{GT-AC}| }{SB{V}_{GT-AC}}\,\times 100 \% .\end{eqnarray} \tag{ 8 }$

For all the quantitative analysis, measurements were performed on the unfiltered SPECT images to avoid any bias from filtering.

2.5.3. Comparison to ellipse based method

A traditional ellipse based $\mu$ -map approximation method was included for comparison. To reduce the subjectivity of manual placement, an ellipse was automatically defined based on a threshold of 12% on the reconstructed SPECT images as in Papanastasiou et al (2020). Specifically, SPECT images were re-oriented to the original symmetric position (remind that phantoms were rotated prior to the GATE simulation to make the dataset more diverse). Subsequently, the length of the short and major axis of the ellipse was set to be the maximum $x$ and $y$ dimension of the contour (which is determined by the threshold). A visual check of the ellipse (and adjustment if needed) was done to minimize a possible shift between the ellipse and SPECT image. Finally, the ellipse was uniformly filled with an attenuation coefficient of $\mu$ = 0.14 cm⁻¹ (corresponding to brain tissue) as commonly used in relevant work (Tossici-Bolt et al 2017, Morbelli et al 2020, Papanastasiou et al 2020). AC results using the fitted ellipse based $\mu$ -maps is denoted as Fit-ellipse AC.

3. Results

3.1. $\mu$ -maps

Figure 5 gives a comparison of the CNN estimated and the GT $\mu$ -maps. Alongside the $\mu$ -map slices, the corresponding SPECT image slices showing the tracer distribution are also included in the first two rows. Note that all CNN estimated $\mu$ -maps were obtained from the MC simulated SPECT image sets (reconstructed from different energy windows). The center of the striatum in axial direction is defined to be at 0 mm.

$\mu $ — **Figure 5.** Comparison of the $\mu$ -maps obtained using different CNNs. Slices within an axial length of 48 mm that are essential for a ¹²³I-FP-CIT scan are shown. Note that all CNN estimated $\mu$ -maps were obtained from the MC simulated SPECT image sets (reconstructed from different energy windows).
Download figure:
Standard image High-resolution image

Figure 5 shows that with the axially focused ¹²³I-FP-CIT imaging strategy, $\mu$ -maps centered at the striatum that are essential for ¹²³I-FP-CIT scan inspection and quantification could be accurately estimated with the three CNN architectures. Among the three CNNs, the patch-voxel CNN gives slightly more noisy $\mu$ -maps. This (i.e. the noise on the $\mu$ -map) is circumvented in the patch-patch and image-image CNNs where neighborhood information is preserved in the output space with an encoder–decoder framework.

Table 1 provides the PSNR and the SSIM results of the CNN estimated $\mu$ -maps. Voxels for slices within the 48 mm axial range that are relevant for ¹²³I-FP-CIT assessment were used for the calculation. This table shows that the patch-voxel CNN gives slightly inferior PSNR and the SSIM values. However, differences for results obtained from the three networks are small.

Table 1. Mean and standard deviations of PSNR and SSIM for the $\mu$ -maps obtained with various CNN methods. The errors were calculated from the slices within an axial length of 48 mm that are essential for ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT SPECT. P values were calculated using paired t tests.

	PSNR	SSIM	p
Patch-voxel CNN $\mu$ -map	32.1 ± 2.0	0.89 ± 0.03	< 0.001
Patch-patch CNN $\mu$ -map	33.7 ± 2.0	0.92 ± 0.02	< 0.001
Image-image CNN $\mu$ -map	33.2 ± 1.7	0.91 ± 0.03	< 0.001

3.2. SPECT images with different AC

3.2.1. Visual inspection

A visual comparison of the attenuation corrected SPECT images as well as a line profile comparison are shown in figures 6 and 7 respectively. Figure 6 shows that the striatum structure looks similar for all AC methods including No-AC. Compared to the GT-AC, the No-AC gives an increased activity distribution at the periphery and outside of the brain. The differences between all three CNN-AC and the GT-AC are small as confirmed in figure 7.

**Figure 6.** Comparison of the attenuation corrected SPECT images. Attenuation correction results performed on both the noise-free smulated scans (the left part) and on noisy MC simulated scans (the right part) are displayed.
Download figure:
Standard image High-resolution image

**Figure 7.** Image profiles taken from the noise free simulated SPECT images in figure 6. The profiles are taken from the yellow lines indicated in figure 6 (images on the first row of the left panel). The lines are with a width and thickness of 4.5 mm. A zoomed view of some parts of the profiles are displayed at the top part of the figure.
Download figure:
Standard image High-resolution image

3.2.2. Regional quantitative analysis

Figure 8 gives Bland–Altman plots of the SBR differences from the GT-AC image across 120 regions (eight sub-regions for all 15 test subjects). The deviations in percentage for the SBRs and AIs are summarized in the figure. Figure 9 gives the scatter plot of the SBRs for correlation analysis.

**Figure 9.** Correlation between GT-AC and CNN-ACs, as well as between GT-AC and No-AC for the SBRs. Values are calculated from noise-free simulated SPECT images.
Download figure:
Standard image High-resolution image

Figure 8 shows the small differences between the three CNN-ACs and GT-AC for the SBRs (differences are close to the zero line). Among the three CNN methods, the patch-voxel CNN-AC shows a slightly more diverging distribution and thus a larger difference of the SBRs from GT-AC. The deviation from the GT-AC is $\leqslant$ 2.5% for all three CNN-ACs. This deviation reads as 3.5% for Fit-ellipse AC. No-AC underestimates SBRs by 13.1% systematically, as shown by the strong correlation ( $r$ $\geqslant$ 0.99) between the GT-AC based SBRs and the values obtained with No-AC in figure 9. The impact of different AC methods on asymmetry index is small (within 3.6%).

3.2.3. Absolute quantification in kBq ml⁻¹

The capability to obtain accurate regional uptake values with different AC methods is shown in figure 10. The SBVs calculated from the phantoms are included as a reference. Figure 10 shows that compared to the digital phantom, the ground-truth-AC suffers from slight inaccurate estimation of the SBVs. This might be due to the imperfect AC method or the partial volume effects. Besides, the three CNN-AC methods achieve comparable SBV accuracies as the GT-AC. The deviation from the GT-AC was summarized in table 2, which shows a mean deviation of within 2.2% for all three CNN-ACs and a mean SBV deviation of 16.0% and 71.7% for the Fit-ellipse AC and No-AC respectively.

Table 2. Deviation (mean $\pm$ standard deviation) of the SBVs from the GT-AC across 120 regions (8 sub-regions for all 15 subjects). Measurements are calculated on noise-free simulated SPECT images. The term 'PV', 'PP' and 'II' denotes patch-voxel, patch-patch and image-image respectively.

${\boldsymbol{DE}}{{\boldsymbol{V}}}_{{\boldsymbol{SBV}}}$ (%)	Caudate		Ante. Put.		Mid Put.		Post. Put.		Mean
	L	R	L	R	L	R	L	R
PV CNN-AC	1.5 $\pm \,$ 1.3	2.1 $\pm$ 1.9	2.3 $\pm$ 2.3	2.3 $\pm$ 2.4	2.4 $\pm$ 2.4	2.2 $\pm$ 2.6	2.4 $\pm$ 2.3	2.1 $\pm$ 2.7	2.2 $\pm$ 2.2
PP CNN-AC	1.4 $\pm \,$ 1.2	2. $2\,\pm$ 1.9	1.5 $\pm$ 1.2	2.6 $\pm$ 2.4	1.5 $\pm$ 1.1	2.6 $\pm$ 2.6	1.4 $\pm$ 1.0	2.6 $\pm$ 2.8	2.0 $\pm$ 1.8
II CNN-AC	1.6 $\pm \,$ 1.0	1.3 $\pm$ 1.5	2.1 $\pm$ 1.8	1.6 $\pm$ 2.5	1.9 $\pm$ 1.9	1.7 $\pm$ 2.7	1.7 $\pm$ 1.7	1.8 $\pm$ 2.8	1.7 $\pm$ 2.0
Fit-ellipse AC	16.5 $\pm \,$ 7.3	16.9 $\pm$ 7.2	15.6 $\pm$ 7.3	16.0 $\pm$ 7.2	15.5 $\pm$ 7.3	15.8 $\pm$ 7.1	15.6 $\pm$ 7.1	15.8 $\pm$ 6.9	16. $0\,\pm$ 7.2
No-AC	72.1 $\pm \,$ 0.7	72.0 $\pm$ 0.6	71.7 $\pm$ 0.9	71.3 $\pm$ 0.7	71.9 $\pm$ 0.9	71.3 $\pm$ 0.7	71.9 $\pm$ 0.8	71.3 $\pm$ 0.7	71.7 $\pm$ 0.7

3.3. Computation

The training took about 1.8 h, 1.9 h and 1.2 h for the patch-voxel, patch-patch and image-image CNN respectively, when running on a single NVIDIA 2080 Ti graphics processing unit with 11 GB of memory. Testing was done in 28.4 s and 60.0 s and 1.3 s for the respective network to generate the attenuation map for each patient scan.

4. Discussion

In the present work, we demonstrated the feasibility of CNN based approaches for $\mu$ -map estimation using only SPECT data from axially focused ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans. The approaches were tested on a focusing multi-pinhole system in a MC simulation study.

Our visual results show that AC does not affect the shape and symmetry of the striatum much. The main visual effect of AC is that the activity distribution at the periphery and outside of the brain can be well estimated, which may otherwise be incorrectly enhanced (as the No-AC images show). For relative quantification of the SBRs, deviations from the GT-AC were within 2.5% for CNN-AC. No-AC systematically underestimates SBRs by 13.1%. A strong correlation was observed between the GT-AC obtained SBRs and the values obtained with CNN-AC ( $r\,\geqslant$ 0.99). Absolute quantification in terms of the SBV has a deviation from GT-AC within 2.2% for CNN-AC and of 71.7% for No-AC.

Currently, the clinical value of AC for ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans is debated (Lange et al 2014, Lapa et al 2015). Based on our results, the impact of AC is likely insignificant for diagnostic purposes when assessment is based on visual inspection of the striatum. This is aligned with previous findings (Lange et al 2014, Akahoshi et al 2017). Likewise, omitting AC may not be an issue for relative quantification when ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans at a single clinical site are processed using the same protocols (e.g. all without AC), given the strong correlation between the GT-AC obtained SBRs and those of No-AC. However, in case that AC is a step in a standardized protocol or a precise measurement of the SBR is helpful (e.g. for multicenter studies where AC is already performed), AC can be performed. This is certainly true when absolute quantification is preferred. Absolute quantification (for which AC is required) is presently rarely applied on ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans. A recent study suggests that it can be helpful for differentiation of normal and pathological ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans (Jreige et al 2020). In these cases, using the CNN estimated $\mu$ -map allows to obtain accurate results, without suffering from possible image registration errors and eliminating the need of manually drawing an ellipse. Apart from ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans, in other applications where a low activity level is present in the majority of the head, e.g. for tumor therapy imaging with ¹³¹I-labeled 81C6 (Cokgor et al 2000), a CNN may also be applied to estimate $\mu$ -maps for precise quantification of the tracer uptake.

A traditional ellipse based method was included in this work for comparison. Results show that the Fit-ellipse AC obtains a $DE{V}_{SBR}$ and a $DE{V}_{SBV}$ of 3.5% and 16.0% respectively, which are smaller than the No-AC results (13.1% and 71.7% respectively) and are larger than all three CNN-AC (within 2.5% and 2.2% respectively). Note that the difference of the SBVs between Fit-ellipse AC and GT AC may be diminished by simple adjustments of the ellipse $u$ -map, e.g. by assigning a different $\mu$ value than the one used here or by changing the threshold for the ellipse contour detection. The SBV deviations from the GT-AC reported in the present work was valid solely for the approach implemented here. Besides, here an ellipse was generated to ensure that the symmetry of the AC SPECT images was preserved as much as possible. In clinical practice, placement of the ellipse could be affected by imperfect re-orientation of SPECT images, noise, etc, apart from the subjectivity due to manual placement. For example, one study reported that while uniform attenuation with ellipse drawn around a transmission image caused 5% error, placement of ellipse on the emission image caused 15% error (Rajeevan et al 1998).

In the present work, three CNN frameworks that estimate $\mu$ -map in a voxel-wise, patch-wise and image-wise way were tested for the task of interest. We found that the patch-voxel CNN which was used in full brain perfusion imaging (Chen et al 2021) gave slightly noisier $\mu$ -maps here when applied on ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans, probably due to the lower activity levels of the latter. Compared to the initial patch-voxel CNN, the patch-patch and image-image CNN gave slightly better $\mu$ -maps in terms of smoothness and cleanness due to the incorporation of neighboring information in the output space. Among the three frameworks, the image-image architecture attains a slightly better performance in terms of $DE{V}_{SBR}\,$ and $DE{V}_{SBV},$ and additionally has the advantage of fast computation in training and testing. The image-image CNN treats the image transformation problem (from multi-energy SPECT images to attenuation map) from a global view. In contrast, the patch-voxel and patch-patch CNN take local regions and thus focus more on details. For ¹²³I-FP-CIT scans, the $\mu$ -maps essential for interpretation and quantification are 'oval-shaped' with rather simple structures. This might favor the use of a method as image-image CNN that ensures a global consistency. Nevertheless, with the focused striatum imaging strategy, $\mu$ -maps essential for ¹²³I-FP-CIT SPECT inspection and quantification could be accurately estimated with the CNN approach. All the three CNN frameworks could achieve precise measurement of the regional uptake values.

End-to-end methods might be valid alternatives for SPECT AC alternative to the CNN μ-map estimation approach proposed in this study. Such methods as proposed in Yang et al (2019), (2020), (2021), Dong et al (2020), Torkaman et al (2021) directly generate attenuation corrected SPECT/PET images as output, and thus eliminate the procedure of performing an AC using the μ-map. We opted for a two-step strategy with an intermediate μ-map estimation step as it is less of a black box compared to the end-to-end approach. For example, in outlier cases that patient SPECT data were not represented in the training set, e.g. exceptionally abnormal brain anatomy or presence of motion artefacts, one could cease AC using the CNN estimated μ-maps when the estimated μ-map appears incorrect.

Limitations of the present work include a lack of validation using clinical data. Results presented in this work were based on a limited dataset from MC simulations of subjects with normal brain anatomies. The clinical value of the proposed method would be better validated on a large dataset with diverse brain anatomies from real patient scans. Besides, the proposed method was tested on the multi-pinhole G-SPECT-I geometry as this is an ultra-high resolution system currently under development at our institute. The CNN based approach may be translated to other SPECT scanners, yet the accuracy would need to be evaluated.

5. Conclusion

We have demonstrated the feasibility of a CNN based approach to generate $\mu$ -maps using only SPECT data from ${}_{\,}{}^{123}{\rm{I}}$ -FP-CIT scans with a focused striatum scan strategy. Our results based on a MC simulation study show that the impact of GT-AC versus CNN-AC or No-AC on striatal shape and symmetry is minimal. A strong correlation is observed between the GT-AC based SBRs and the values obtained with CNN-AC and No-AC. While SBRs and SBVs are underestimated by No-AC, they can be precisely quantified with CNN-AC. Thus, CNN estimated $\mu$ -map could be a promising substitute for CT $\mu$ -map, while further validation with patient scans in clinical cohorts is needed.

Acknowledgments

Financial disclosures of authors: FB is employee and shareholder of MILabs BV. This work is conducted with financial support of the Netherlands Organization for Scientific Research (NWO), Physics Valorization Prize 'Ultra-fast, ultra-sensitive and ultra-high resolution SPECT' co-financed by MILabs B.V.

Data availability statement

N.A

Convolutional neural network based attenuation correction for ¹²³I-FP-CIT SPECT with focused striatum imaging

Article metrics

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. Methods

2.1. G-SPECT-I system

2.2. Simulated SPECT scans

2.2.1. Digital phantoms

2.2.2. MC simulated realistic projections

2.2.3. Noise-free simulated projections

2.2.4. Image reconstruction

2.3. CNN $\mu$ -map estimation

2.3.1. Image preprocessing

2.3.2. CNN architectures

2.3.3. Model training

2.4. AC using the $\mu$ -maps

2.5. Evaluation

2.5.1. Attenuation maps

2.5.2. Attenuation corrected SPECT images

2.5.2.1. Visual inspection

2.5.2.2. Relative quantitative analysis

2.5.2.3. Absolute quantification in kBq ml⁻¹

2.5.3. Comparison to ellipse based method

3. Results

3.1. $\mu$ -maps

3.2. SPECT images with different AC

3.2.1. Visual inspection

3.2.2. Regional quantitative analysis

3.2.3. Absolute quantification in kBq ml⁻¹

3.3. Computation

4. Discussion

5. Conclusion

Acknowledgments

Data availability statement

Convolutional neural network based attenuation correction for 123I-FP-CIT SPECT with focused striatum imaging

Article metrics

Share this article

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. Methods

2.1. G-SPECT-I system

2.2. Simulated SPECT scans

2.2.1. Digital phantoms

2.2.2. MC simulated realistic projections

2.2.3. Noise-free simulated projections

2.2.4. Image reconstruction

2.3. CNN \mu-map estimation

2.3.1. Image preprocessing

2.3.2. CNN architectures

2.3.3. Model training

2.4. AC using the \mu-maps

2.5. Evaluation

2.5.1. Attenuation maps

2.5.2. Attenuation corrected SPECT images

2.5.2.1. Visual inspection

2.5.2.2. Relative quantitative analysis

2.5.2.3. Absolute quantification in kBq ml−1

2.5.3. Comparison to ellipse based method

3. Results

3.1. \mu-maps

3.2. SPECT images with different AC

3.2.1. Visual inspection

3.2.2. Regional quantitative analysis

3.2.3. Absolute quantification in kBq ml−1

3.3. Computation

4. Discussion

5. Conclusion

Acknowledgments

Data availability statement

Convolutional neural network based attenuation correction for ¹²³I-FP-CIT SPECT with focused striatum imaging

2.3. CNN $\mu$ -map estimation

2.4. AC using the $\mu$ -maps

2.5.2.3. Absolute quantification in kBq ml⁻¹

3.1. $\mu$ -maps

3.2.3. Absolute quantification in kBq ml⁻¹