Simulating effects of brain atrophy in longitudinal PET imaging with an anthropomorphic brain phantom

In longitudinal positron emission tomography (PET), the presence of volumetric changes over time can lead to an overestimation or underestimation of the true changes in the quantified PET signal due to the partial volume effect (PVE) introduced by the limited spatial resolution of existing PET cameras and reconstruction algorithms. Here, a 3D-printed anthropomorphic brain phantom with attachable striata in three sizes was designed to enable controlled volumetric changes. Using a method to eliminate the non-radioactive plastic wall, and manipulating BP levels by adding different number of events from list-mode acquisitions, we investigated the artificial volume dependence of BP due to PVE, and potential bias arising from varying BP. Comparing multiple reconstruction algorithms we found that a high-resolution ordered-subsets maximization algorithm with spatially variant point-spread function resolution modeling provided the most accurate data. For striatum, the BP changed by 0.08% for every 1% volume change, but for smaller volumes such as the posterior caudate the artificial change in BP was as high as 0.7% per 1% volume change. A simple gross correction for striatal volume is unsatisfactory, as the amplitude of the PVE on the BP differs depending on where in the striatum the change occurred. Therefore, to correctly interpret age-related longitudinal changes in the BP, we must account for volumetric changes also within a structure, rather than across the whole volume. The present 3D-printing technology, combined with the wall removal method, can be implemented to gain knowledge about the predictable bias introduced by the PVE differences in uptake regions of varying shape.


Introduction
Imaging techniques, such as positron emission tomography (PET), can provide unique neurochemical information about the brain, including the binding of neurotransmitters to dopamine D1 (Wang et al 1998) or D2 (Nevalainen et al 2015) receptors in aging or amyloid burden in dementia (Brendel et al 2015). In PET studies of aging and dementia, the intricate interplay between brain structure and function must be considered, as changes in brain volume may occur in a longitudinal context (Raz et al 2003, Gogtay et al 2004, Nyberg et al 2010 possibly influencing changes in the quantified PET signal. One methodological issue that has attracted much attention is the partial volume effect (PVE) (Hoffman et al 1979). The PVE disturbs the measurements in PET imaging due to the limited spatial resolution of reconstructed images (Frouin et al 2002, Soret et al 2007, Erlandsson 2012, Hutton et al 2013, Rahmim et al 2013. Because of the PVE, the signal may be underestimated in tissues with high binding potential (BP) due to spill-out into surrounding tissues with fewer binding sites. Similarly, the signal may be overestimated in low-binding tissue. The magnitude of the PVE is related to the spatial resolution in the reconstructed images and the volume of a given structure. The PVE is quantified as a recovery coefficient (RC), which is the measured signal relative to the true signal. Certain regions are more vulnerable to the PVE than others, e.g. when the surface to volume ratio is high (Dewaraja et al 2001). One such region is the striatum, which includes subregions that are heterogeneous in shape (i.e. surface to volume ratios), making it a candidate region for demonstrating potential issues that may arise from the PVE.
In a longitudinal design, the volume of the striatum is expected to shrink by approximately 1% annually from young adulthood into old age (Raz et al 2003). Therefore, related PVEs may lead to false conclusions regarding changes in D2 receptors during aging. For example, consider the following hypothetical scenario: striatal volume and BP are both reduced by 5% between two time points. The question then is whether there was truly a reduction in BP or if volumetric reductions induced a false reduction in BP without a true change.
The severity of the PVE problem has led to the development of high-resolution reconstruction algorithms (Hudson and Larkin 1994, Alessio et al 2010, Rahmim et al 2013. Two reconstruction methods for PET image data are filtered back projection (FBP) and orderedsubsets maximization (OSEM) algorithms. OSEM algorithms using resolution modeling may improve image resolution compared to FBP (Rahmim et al 2013, Meechai et al 2015. However, these algorithms are not without problems. For example, they may overestimate the PET signal in low-uptake regions (Reilhac et al 2008, Walker et al 2011, cause edge artifacts in areas where there are sharp edges in uptake between regions (Alessio et al 2010, Rogasch et al 2014, and cause increased variability in smaller regions (Rahmim et al 2013), possibly reducing test-retest reliability (Chow et al 2009, Alakurtti et al 2015.
The importance of measuring the PVE in a controlled and realistic manner has led to the use of brain phantoms, such as the [18F]fluorodeoxyglucose (FDG) optimized 3D Hoffman brain phantom (Hoffman et al 1990, Meechai et al 2015 or other anthropomorphic head phantoms (Frouin et al 2002). Measurement with phantoms that allow neighboring cavities to be filled with different concentrations of radioactivity reflects real human imaging in which tracer is also distributed outside target regions. This is important because without radioactive backgrounds (i.e. cold backgrounds), there is not only spill-out from target regions, but also spill-in from surrounding regions (Srinivas et al 2009). However, a problem in previous studies using anthropomorphic phantoms is that the volumes of structures were fixed and the PET signal at varying volumes was not tested directly (Frouin et al 2002, Maebatake et al 2015. Another problem with many previous phantom experiments is the existence of cold walls separating the hot background from target structures, introducing an artifact that does not exist in the living brain (Hofheinz et al 2010). Attempts to overcome the cold wall problem have included creating gelatin spheres (Sydoff et al 2014). However, this process is time consuming and the spheres cannot be reused after scanning. Recently, artificially merging a hot background phantom and a cold background-hot structure phantom was presented as another approach (Lajtos et al 2014). However, a manual shift in the positioning of the background and target phantoms is required within the same acquisition. In the present study, we provide an alternative approach for removing the cold wall in the raw data by reusing the same hot background scan for subsequent target scans, thereby minimizing between-scan variability.
The aim of this study was to investigate factors that can influence the accuracy of interpretations regarding the measurement of PET signals in the striatum in longitudinal projects by using a 3D-printed brain phantom (figure 1) with three striatal sizes. The evaluated factors include scanner reliability (using four different reconstruction algorithms), controlled small differences in the volume of the striatum (i.e. PVE), and varying background to striatum concentration ratios to check for any systematic bias occurring at a range of BP values.

Brain phantom
A phantom brain (figure 1) based on the MNI (www.bic.mni.mcgill.ca/) brain surface and the left and right striatal structures was printed by an Objet Eden 330 (Stratasys, MN, US) 3D printer using RGD720 and Veroclear for the brain surface and striatal structures, respectively. The brain surface phantom served as a container for the background solution, and the varying permutations of three differently sized left and right striata could be reproducibly attached in a location emulating the true anatomy of the brain. The striatal inserts were produced in three different sizes (small, medium, and large volume, figures 1(e)-(g)) with a uniform stepwise decrease of 10% from the large to medium and small sizes (see table 1 for exact volumes in cm 3 ). To prevent absorption of water into the plastic, the inside and outside of the striatal inserts were polished.

Data acquisition
PET and computerized tomography (CT) images were acquired on a Discovery 690 PET/ CT (General Electric, WI, US), at the Department of Nuclear Medicine, Umeå University Hospital. Prior to the PET scan, the phantom was attached to a plastic holder aligned to a laser crosshair to keep the same geometry for all scans. A CT scan (400 mA, 120 kV, 0.8 s/revolution) was acquired and used for attenuation correction and volume of interest (VOI) delineation. PET data were acquired in list mode.
Images were reconstructed using four methods, all with 0.97 × 0.97 × 3.27 mm voxels in a 256 × 256 matrix, 47 slices with a 25 cm field-of-view. FBP was reconstructed with either a 4-or 6 mm Hanning filter (FBP 4 mm or FBP 6 mm), VuePoint HD, an OSEM with 24 subsets, 2 iterations (Bettinardi et al 2011) (from here on referred to as OSEM), and SharpIR, an OSEM using spatially variant resolution modeling of the point-spread function (PSF) with 24 subsets, 6 iterations (Ross and Stearns 2009, Alessio et al 2010, Bettinardi et al 2011 (from here on referred to as OSEM + PSF). All images were then resliced into 1 × 1 × 1 mm voxels. The OSEM reconstruction has an image resolution of 7.3 mm full width at half maximum (FWHM), and the OSEM + PSF reconstruction has a resolution of 3.2 mm FWHM, as measured from images acquired with a radioactive point source in air (Wallstén et al 2013).
We devised a robust method to vary the activity ratio between two filled striata and background, while minimizing errors from mixing concentrations of radioactive solutions. This method relies on a single background scan, and by using radioactive decay and list-mode  Note: For each size, the volumes of the left and right striata were averaged and are reported in cm 3 (mean ± standard deviation). RCs for the reconstructions are reported as mean ± standard deviation. FBP = filtered-back projection, OSEM = ordered-subsets maximization algorithm, PSF = point-spread function.
acquisitions from striatal scans, different numbers of events were added to reconstruct scans with different BP values (described in detail in subsequent sections).

Filling procedure
The single background brain scan was performed by filling the background container (figure 1(a), without striatal inserts) with 4318 Bq/ml [18F]FDG and acquiring a 30 min scan (3 × 10 min frames). On seven different days, radioactive striata, in non-radioactive background, were scanned in multiple configurations (left versus right): small-medium, smalllarge, medium-small, medium-medium, medium-large, large-small, and large-medium. A radioactive standard solution was prepared each day, from which each striatum was filled using separate syringes. The radioactive striata were scanned for over 2 h. For each striatal insert, four 0.5 ml samples were extracted and placed in Eppendorf tubes, two samples from each syringe used for filling each striatum, and two samples extracted from each striatum after scanning. The radioactivity concentrations in these samples were measured with a well-counter (developed in-house) to independently measure the true concentration in each striatum. In addition, to verify that leakage had not occurred during the scan, for each striatum, the two samples extracted from the syringe used for filling, and the two samples extracted after scanning were averaged and then compared. Leakage of the striatal inserts was not observed (the average decay-corrected concentrations in Bq/ml was 24 679 ± 633 before scan versus 24 575 ± 690 after scan; paired samples t-test indicating no differences (p-value = 0.12)).

Varying striatum to background ratios
To vary the striatum to background ratio, i.e. BP, each striatal scan was binned into four independent sets with different numbers of events. Thus, four complete 30 min scans (3 × 10 min) with varying activity in the striatum were obtained using the same standard solution. The three frames were all decay-corrected to the start of the first frame. The goal was to reach a BP in the reconstructed images of approximately 2.5, 3.0, 3.5, and 4.0. This range is comparable to scanning with [11C]raclopride in human populations (Jonasson et al 2014, Nevalainen et al 2015.

Removal of the cold plastic wall separating striatum and background
In order to remove the effect from the non-radioactive plastic container walls separating the striata from the background, summations of raw data sinograms were performed. For each striatal scan (3 × 10 min) the raw data sinogram was summed with the sinogram from the radioactive background scan (figure 2). Realistic scatter was achieved because the phantom was filled with water in all cases, and the density difference between the plastic in the striatum to water in the background scan should only change the scatter by a negligible amount due to the small mass and minimal density variation of the striatum walls. The randoms from singles data from the background were used to represent the randoms from the summed sinograms, which makes sense since in this experimental setup the majority of the counts are coming from background. The dead-time was <5% during the scans.
The summed raw data produced a new sinogram, which was reconstructed according to the algorithms described above. The above-described method is a more robust approach compared to repeatedly preparing and filling eight solutions to achieve four BP levels (four background and four striatum). In this way, between-scan differences from imperfect radioactivity measurements and mixing were minimized. This was also a time-efficient approach compared to separately preparing and filling eight solutions to achieve four BP levels (four background and four striata).

Volume-of-interest delineation
Two separate striatum VOIs were created from the CT image by applying a 3D regiongrowing algorithm known as the 'flood fill' algorithm to delineate the liquid volumes (https:// mathworks.com/matlabcentral/fileexchange/12184-floodfill3d). In short, a seed voxel within each striatum is selected manually, whereby the algorithm fills the volume based on a manually set threshold. We set the threshold at 25, where the Hounsfield units indicated a shift from water to plastic. A cerebellum VOI was drawn on five axial slices that approximated the cerebellar volume in the living brain. To determine how different parts of the striatal volume were influenced, the striatum was split into putamen and caudate parts, dividing the two structures with a straight posterior-anterior line. The VOI setting was pragmatic and made it easier to achieve similar placement of VOIs independent of size while resembling the appearance of anatomical VOIs in humans. To further investigate dependence along the anterior-posterior direction, the caudate and putamen parts were further split into three equally long segments, named posterior, middle, and anterior caudate or putamen.

Statistical analysis
All analyses were performed using R (www.r-project.org) and imlook4d (https://dicom-port. com). To test scanner reliability, the three frames within each 3 × 10 min scan were compared to the mean signal of the same three frames and for each structure separately.
For the other analyses, RCs were calculated to estimate the deviation from the true signal in the reconstructed images. The measured (BP M ) and true (BP T ) BP acquired with this phantom is analogous to a bolus plus constant infusion approach in the living brain after equilibrium is reached (Innis et al 2007) and calculated using the following formula: where M striatum is the measured concentration in the striatum, M bkg the measured background concentration, and T the true concentrations measured with the well-counter.
The RC was defined according to the following formula: For estimating variations in the RC arising from controlled variations in volume and BP we looked at the average spread in RC across the volumes and structures. In addition, to minimize between-session differences in radioactivity concentrations and well counter accuracy, we performed linear regressions to estimate variations in the RC in striatum by analyzing RC and volume ratios of the left and right striata compared within, rather than between, sessions. In other words, for the two striata scanned together on a given day, the volume of the left striatum VOI was divided by the volume of the right VOI. Similarly, the RC of the left striatum was divided by the RC of the right striatum (and adjusted such that 20% larger volume or higher RC in left striatum is represented as 1.2 along the regression line, and 20% in right striatum as 0.8). Linear regression with the volume ratio predicting the RC ratio was performed, and the regression slope can be interpreted as the effect of PVE on BP. Another set of linear regressions were performed to test whether the level of BP measured with the well-counter predicted RC. Finally, to test whether reconstruction algorithms statistically differed from one another in terms of PVE, we used t-tests to compare the regression slopes (i.e. effects from volume or BP): where k 1 and k 2 are the slope coefficients for any given pair of reconstruction methods, and SEk 1 and SEk 2 are their respective standard errors.

Reliability
Scanner reliability was high for all four reconstructions tested, i.e. OSEM + PSF, OSEM, and FBP with either a 4 or a 6 mm filter. The percentage difference from the mean signal of the three frames within a scan gave an average spread of 0.23% ± 0.16% for OSEM + PSF. Similar values were obtained for OSEM (0.22% ± 0.15%), and both FBP reconstructions (0.14% ± 0.10%).

Volume dependencies
Our analyses of RCs from controlled volumetric changes, simulating atrophy in aging, revealed that all reconstructions were affected negatively (table 1), resulting in significantly lower RC for smaller volumes. Linear regression analyses of within-scan volume and RC ratios are reported in table 2. Tests of the difference between slopes indicated that OSEM + PSF (with a PVE of 0.08% per 1% difference in volume) had significantly lower volume dependencies than the other reconstructions (all p < 0.002), indicating the advantage of using the OSEM + PSF algorithm when changes in volume between time points are expected.
To better understand to which degree the PVE affects intrastructural variations within striatum, the putamen and caudate were divided into anterior, mid, and posterior parts. The existence of RC gradients for the OSEM + PSF reconstruction is visualized in figure 3 for the caudate and putamen, and the RCs for the subdivisions are shown in table 3. The middle parts, where recovery was higher, were least influenced by a change in the volume for both the caudate and putamen parts of the striatum model. The anterior part was affected by volume in both the caudate and putamen parts, whereas the posterior part of the caudate was most influenced by volume.

Uptake ratio dependencies
Varying striatum to background ratios did not appear to influence the RC, as indicated by nonsignificant linear regressions with measured BPs predicting RCs for each of the four reconstructions (table 4). This suggests that the BP will not be overestimated at the lower striatum to background ratios which are expected in later examinations in a longitudinal scenario (figure 4). The reconstructions also did not differ from each other, as indicated by t-tests comparing slopes (i.e. the slopes showing the degree and direction of any systematic bias in RCs introduced from varying levels of BP, all p > 0.6) (table 4).

Discussion
In the present study, we used a phantom model of the human brain to evaluate scanner reliability, the effect of controlled small volumetric differences in the striatum on the PET signal, and systematic bias occurring at different BP levels. With a 3D-printed phantom we have demonstrated a general method to investigate how complex PVE effects from a specific radioactive distribution can be modelled and quantified. We presented a method to use phantom measurements where non-physiological container walls between compartments are removed by adding different number of events from list-mode acquisitions of striata and a single hot  background scan, while at the same time producing varying BP levels from a single standard solution of radioactivity. Scanner reliability was high for all reconstructions tested. We showed that volume influences the measured BP (table 2, figure 3), whereas a change in BP did not introduce further bias into the measurements (table 4, figure 4), which would be a realistic scenario in a longitudinal context of aging. The design could be implemented equally well for other regions of interest. Figure 5 summarizes the findings and show how the PET signal is influenced in various scenarios.

Volume dependencies
To improve comparisons between subjects or within subjects when volume is expected to change over time (Raz et al 2003, Gogtay et al 2004, it is beneficial to reconstruct data using  a method that minimizes the influence of variations in size. Of the four reconstructions tested, the high-resolution OSEM + PSF algorithm was superior, showing an induced PVE of <1% per 10% size difference as indicated by the slope of the regression predicting RC with volume (table 2). Reliability was good for all reconstruction methods, with a mean variability between the three frames in a scan of 0.14-0.23%. Notably, variability was less than 0.1 percentage points higher for the OSEM + PSF reconstruction compared to FBP, indicating that use of the high-resolution algorithm does not incur reliability costs.
Overestimation of the true reduction in signal will be greater in striatal regions more susceptible to a PVE. This overestimation could increase if striata are smaller than in the present study as the PVE is not necessarily linear with volume (Dewaraja et al 2001). Signs of this nonlinearity can be seen in figure 3 and from the larger difference in RC between the small and medium structures than between the medium and large structures. When dividing the putamen Table 4. Uptake ratio dependencies.

Reconstruction
Slope  and caudate into posterior, mid, and anterior parts, it is evident that the change in RCs was as high as 13.7% in posterior caudate, compared to a 4.6% change in putamen (table 3). Thus, a change in volume may lead to different estimated changes in neuroreceptor density/availability in separate parts of the striatum (around 0% in the mid putamen to approximately 0.7% in the posterior caudate per every 1% difference in volume). The mid part was the least affected part of the striatum phantom, likely due to the voxels being surrounded by high-binding voxels both posteriorly and anteriorly, as well as the relatively large surface to volume ratio of neighboring regions (Dewaraja et al 2001). Due to the varying influence of the PVE, especially along the caudate, corrections for the total change in volume are likely suboptimal at best, as can be inferred from figure 3.

Uptake ratio dependencies
Despite the superior spatial characteristics of OSEM + PSF, concerns have been raised that the BP may be affected at lower uptakes when using OSEM reconstructions (Reilhac et al 2008, Walker et al 2011, Jian et al 2015. Importantly, neither reconstruction introduced any The mean frame to frame variability for each reconstruction is shown. In the upper right corner, the influence on the estimated BP of a 1% volume reduction for each reconstruction is shown. In the lower left, when a reduction in BP occurs without a change in volume there is no effect on the measured signal. In the least stable scenario (bottom right), the measured signal is influenced by volume through PVE (Erlandsson 2012) and by bias introduced by changes in the target to background ratio (i.e. BP) (Walker et al 2011), which was not the case in our study. The four reconstructions included one OSEM + PSF reconstruction with resolution recovery, one OSEM reconstruction, and FBP with either a 4-or 6 mm Hanning filter. FBP = filtered-back projection, OSEM = ordered-subsets maximization algorithm, PSF = point-spread function. significant RC bias due to different BP levels (table 4, figure 4). We argue from the results that, at physiologically correct BP levels, mimicking results from (Nevalainen et al 2015), in combination with the commercially available OSEM algorithms employed in this phantom study, lower uptakes do not cause any significant error in BP. However, we do not discard the possibility that a bias may arise at levels lower than those examined in this article (Reilhac et al 2008, Walker et al 2011, Jian et al 2015. Nevertheless, the measured BPs compare well with the range of BP reported for dopamine D2 receptors using the same camera in a large sample of older adults (Nevalainen et al 2015).

Future directions
Further improvement of the resolution problem may be necessary to accurately estimate the PET signal in all striatal voxels, e.g. by applying partial volume correction (PVC) (Erlandsson et al 2006, Erlandsson 2012. Voxel-wise correction of the PVE may improve both VOI (Brendel et al 2015) and voxel-wise analyses (Drzezga et al 2008), providing a possible method for correcting the artefactual uptake gradients created in the striatum by PVEs that remain even with OSEM and resolution modelling. The anatomically based multiple-target correction (MTC) method (Erlandsson et al 2006, Erlandsson 2012 gave good results for the phantom (data not shown). Although successful in reducing PET uptake volume dependencies, anatomically based PVC methods are sensitive to errors in registration and segmentation (Frouin et al 2002, Hutton et al 2013. MTC and related PVC methods assume homogenous uptake within regions (Müller-Gärtner et al 1992, Rousset et al 1998, Erlandsson 2012. Therefore, uptake gradients (i.e. heterogeneous uptake) should not be present within structures. The assumption of homogenous uptake in the striatum was however questioned by Alakurtti et al (2013) showing a rostrocaudal D2 receptor BP gradient in the striatum. Although a direct comparison to their study cannot be made due to differences in VOI definitions, the artefactual gradients in figure 3 resemble the gradients reported previously, albeit at a lower resolution. Phantoms similar to the one used in the present study could be useful for validating claims relating to gradients for a given PET camera and be used to verify the accuracy of a particular PVC method for a target of interest whose volume can be varied. The advantages of our approach are manifold. First, the same structure can be reused for any number of scans. Second, positioning is not time-dependent and repositioning is made easy by using a holder aligned to a laser crosshair. Third, the same radioactive solution with the exact same hot background activity can be used to generate multiple target to background ratios by binning list-mode time windows for the striatal scans. This reduces noise in analyses that depend on variations in the mixed solutions and measured solution concentrations. Finally, a considerable amount of time can be spared using our approach rather than determining each ratio in separate scans.

Limitations
A possible improvement in our methodology would be to choose plastic material that exactly matches the water. However, the difference in density between water and plastic should not be a major issue, as the difference is negligible. A limitation in our design is that we cannot completely normalize a plastic wall in phantoms with separate cavities for neighboring target structures; for example, the putamen and caudate would allow the use of different BPs in anatomically separate structures (Frouin et al 2002). We also did not create a method by which the PVE problem could be corrected. However, our design could be useful for validating results from developments in PVC.

Conclusion
To conclude, reliability was high for all reconstructions tested. The OSEM + PSF reconstruction had a substantially reduced PVE compared to the other reconstruction methods. Importantly, no activity-dependent bias was introduced by this iterative reconstruction at the physiological radioactivity levels modelled, hence, of the reconstructions tested, OSEM + PSF should be the preferred algorithm when volumetric changes over time is evident. We observed that small apparent changes in BP occur with decreasing striatal volume, with sizable effects observed in striatal subregions (as much as a 0.7% reduction in BP per 1% decrease in posterior caudate volume). It is therefore crucial to consider these volumetric changes when interpreting longitudinal changes in the PET-signal.