Stylized versus voxel phantoms: quantification of internal organ chord length distances

Dosimetric calculations, whether for radiation protection or nuclear medicine applications, are greatly influenced by the use of computational models of humans, called anthropomorphic phantoms. As anatomical models of phantoms have evolved and expanded, thus has the need for quantifying differences among each of these representations that yield variations in organ dose coefficients, whether from external radiation sources or internal emitters. This work represents an extension of previous efforts to quantify the differences in organ positioning within the body between a stylized and voxel phantom series. Where prior work focused on the organ depth distribution vis-à-vis the surface of the phantom models, the work described here quantifies the intra-organ and inter-organ distributions through calculation of the mean chord lengths. The revised Oak Ridge National Laboratory stylized phantom series and the University of Florida/National Cancer Institute voxel phantom series including a newborn, 1-, 5-, 10- and 15 year old, and adult phantoms were compared. Organ distances in the stylized phantoms were computed using a ray-tracing technique available through Monte Carlo radiation transport simulations in MCNP6. Organ distances in the voxel phantom were found using phantom matrix manipulation. Quantification of differences in organ chord lengths between the phantom series displayed that the organs of the stylized phantom series are typically situated farther away from one another than within the voxel phantom series. The impact of this work was to characterize the intra-organ and inter-organ distributions to explain the variations in updated internal dose coefficient quantities (i.e. specific absorbed fractions) while providing relevant data defining the spatial and volumetric organ distributions in the phantoms for use in subsequent internal dosimetric computations, with prospective relevance to patient-specific individualized dosimetry, as well as informing machine learning definition of organs using these reference models.


Introduction
As computational processing power has advanced over time, increasingly higher-fidelity anatomically representative models have been able to be produced and used within radiation transport simulations; these anatomical models are called anthropomorphic phantoms Xu 2007, Xu 2014). First generation models, called mathematical (stylized) phantoms, defined the organs of the body as a collection of simplified surface equations that make up the organ and tissue contours of the body. One such example of a stylized phantom series, the Oak Ridge National Laboratory (ORNL) stylized family of phantoms (Cristy and Eckerman 1987), have historically been adopted by international bodies to represent the human anatomy in

Background
The stylized phantom series developed at ORNL, which was revised in Han et al (2006), included anatomical updates to the head, brain, extrathoracic airways, kidneys, urinary bladder, and rectosigmoid colon definitions. Han et al further added organ definitions for the salivary glands, alimentary tract mucosa, and respiratory tract airways. This revised ORNL (RORNL) series is composed of six ages: the newborn, 1-, 5-, 10-, 15 year old, and the adult (used interchangeably with 30 year old or age 30) phantoms. The RORNL series is hermaphroditic, in that one phantom contains the sex-specific organs of both the male and female anatomy. The UF/NCI series is a hybrid phantom series which has previously been converted into voxel format for use in Monte Carlo simulation tools (Lamart et al 2011, Dewji et al 2018, Khamwan et al 2018, Schwarz et al 2021a, 2021b. Figure 1 highlights the contrasting geometries of the two series (Schwarz et al 2011).

Specific absorbed fractions
According to ICRP Publication 133 (ICRP 2016) and Medical Internal Radiation Dose (MIRD) Pamphlet 21  schema, the absorbed dose ( ) D r T from radiation emissions from source region ( ) r S to target region ( ) r T can be calculated as follows: A r S r r , 1 T r s T s s where˜( ) A r s is the time-integrated activity of the radionuclide in the source region and ( ) ¬ S r r T s is the mean absorbed dose to the target region per nuclear decay in the source region. The S value can be calculated as follows: where E i is the emission energy and Y i is the yield of the i th transformation of the radionuclide, and T s i is the specific absorbed fraction (SAF) for a radiation a given particle of energy, E . i The SAF can then be defined as the ratio of the absorbed fraction (AF) f (fraction of particle energy E i emitted within r S deposited in the target region ( ) r T ) to the mass ( ) m r :  Kaddouch and El Khayati 2020, and Schwarz et al 2021a, 2021b, Griffin et al 2022a, 2022b.

Review of prior comparative work
In prior work by Lamart et al (2011), the S values and SAFs for 11 target organs for 131 I (with the thyroid being the source) were calculated for the ICRP series of phantoms (ICRP 2009), the UF/NCI phantoms (Lee et al 2009), as well as the ORNL stylized phantoms (Cristy and Eckerman 1987, Han 2006. These values were separately acquired for validation purposes. The comparison between the UF/NCI and ORNL stylized phantoms were of particular interest. In the comparison of the S values, the UF/NCI phantoms generally reported higher values than the ORNL stylized phantoms. Exceptions included the brain, salivary glands, and pancreas. In the comparison of the SAFs, this relationship was consistent with the aforementioned trends, though the differences between the two phantoms decreased as the photon energies increased. This is physically consistent, as the absorption of photons for organs in both phantoms would decrease as the energy increases, until at the extreme they would be nearly equal. This study builds upon motivating efforts conducted by Lamart to compare chord length distributions of the thyroid to the 11 target organs in order to justify the differences in S values and SAFs. The Lamart study demonstrated that the ORNL phantoms generally had a higher probability density of large chord lengths compared to the UF/NCI hybrid phantoms. In other words, the thyroid-target pairs had a greater distance between them. This increased distance is consistent with the decreased S values obtained prior. A study involving the use of reference phantoms in nuclear medicine (Zvereva et al 2017) utilized Monte Carlo to compute dosimetric quantities from positron-emitting 18 F. Although the two phantoms pertinent to this study were the ICRP stylized and the Visible human voxel phantoms-obtained through analyzing CT data from the National Library of Medicine's Visible Human Project (Ackerman 1995)-this work provided a clear comparison of both the SAF and the dose calculated using voxel and stylized phantoms. The study further reported uncertainties in the calculated doses between these two different phantom types could be decreased below 25% through manipulating the position and size of the organs of the stylized phantom to match the voxel phantom more accurately. Assuming comparable material compositions, this work justifies the need to compare and quantify the geometric differences between phantom types for more accurate estimation of how their dosimetric quantities change relative to one another.
Recent studies by Schwarz et al (2021aSchwarz et al ( , 2021b, which employed the UF/NCI hybrid phantom series, demonstrated that organ distances play a significant role in the calculation of electron SAFs. Longer chord length distances that occur at greater pediatric ages have the effect of decreasing the estimated SAF (Schwarz et al 2021a). An example presented was the comparison of the liver-to-gallbladder SAF between the newborn and 1 yearold phantom. A comparatively lower SAF value is reported in the 1 year old phantom due to a significant decrease of collisional energy losses at low and intermediate electron energies. This was attributed to a larger separation distance in the 1 year old phantom than in the newborn. If material composition is assumed to be consistent across these two ages, the study by Schwarz et al demonstrates how sizes and separation of the organs are inversely proportional to the SAFs. With increased age these SAFs would then decrease as the average distance of the chord length distribution increased between defined organ source-targets, with the exception of a few organ-target pairs that do not have this relationship. These studies by Schwarz et al demonstrated a decrease in SAF quantity for liver and thyroid self-irradiation (same source/target organ), as well as a general decrease in cross-irradiation (different source/target organs)-with a few organ-target pair exceptions-with increased age.
This size dependence of the cross-irradiation SAF has been observed beyond pediatric hybrid phantoms. A study by Marine et al (2010) observed the SAFs for a series of constructed adult stylized phantoms with different sizes. These sizes kept a constant body mass index while changing height and weight based on percentage of the population (10th, 25th, 75th and 90th). In performing the SAF calculations, one purpose of the Marine study was to partially justify large uncertainties in evaluating radiopharmaceutical dose using a general biokinetic model and standard dose tables rather than patient-specific dosimetry. The results from the Schwarz et al studies (2021aSchwarz et al studies ( , 2021b are consistent with what was found in this study. Even with different phantom types, the SAFs generally decreased as the organ sizes increased.

Methods
The methodology employed in this study was uniquely developed to determine the range of distances between sampled points in the organs of each phantom which are relevant to internal dosimetry; these point-to-point distances from one source to one target organ in the body will be referred to herein as 'chord lengths'. For each source-target organ pair, 200 000 or more chord lengths were gathered and binned at 1 mm intervals to create a chord length distribution. These chord lengths were then compared between the two phantom series to explain differences in internal dose estimates between the phantom models. The newer UF/NCI voxel phantom series contains most of the source and target regions defined by ICRP. On the other hand, the older RORNL stylized series had fewer defined organs, making this series the limiting factor to which comparisons can be made. The source-target pairs, which were comparable through this work, are given in table 1. Additionally, chord length distributions were calculated for the individual bones of the phantoms, both as source and target regions. Fractions of the active, inactive, and shallow marrow contained within each bone were then used to create a weighted sum estimate of the distributions for active and inactive marrow as source regions and active and shallow marrow as target regions (ICRP 2016(ICRP , 2020c.

Stylized phantom chord length computation
Chord lengths were determined within the stylized RORNL phantom series using the Monte Carlo N-Particle code version 6.2 (MCNP6) (Pelowitz et al 2014). In the first step, MCNP6 code was used to identify points in a virtual coordinate system that correspond to each source and target organ location in the stylized phantom. The organs of the RORNL series are comprised of one or more regions in the phantom (or 'cells' of the MCNP6 model). By creating a bounding box surrounding the phantom and utilizing the source rejection technique, particle histories were generated in the model starting in one desired cell at a time in the phantom (an inherent disability of MCNP6 is that you cannot use source rejection for more than one cell at a time). The PTRAC (Particle Track Output) feature was then used to record a list of each source particle generation event and its coordinate location to a text file. A python script was then generated to read this text file and attribute an index to each location corresponding to a source particle generation event (X-5 Monte Carlo Team, 2003). Two random variables were then generated within this index range, corresponding to one location in the source, and another location in the target. The distance between these two random locations was recorded as one history. This was repeated for 2,000 locations in each source cell, and at each source location 2,000 target locations were selected in a given target cell for a total of 40 00 000 distances for every source-target pair. The chord length distributions for a source-target pair were then derived by making a weighted average of the distributions for each constituent cell in the source and target organ. The distribution for one cell was weighted by its mass out of the total mass of the source or target organ. The chord length distributions for a source-target pair were then derived by making a weighted sum average of the distributions for each constituent cell in the source and target organ.

Voxel phantom chord length computation
Chord lengths within the UF/NCI voxel phantom series were conducted using MATLAB (Unpingco 2008, MATLAB 2022. The voxel phantoms were read into MATLAB in 8 bit unsigned binary format, where each voxel of the phantom is encoded in sequence as an 8 bit tag identifier (ID). Each tag ID represents an organ or tissue within the body. Given the known phantom dimensions and resolution, the indices of the voxel phantom matrix can be translated into points in physical space. For each source-target pair, cumulative distribution functions were created to randomly sample one voxel out of all the voxels from both the source and target organs; the probability of a voxel being chosen was directly proportional to the mass within the voxel out of total source/target organ mass. This distribution function was used 200 000 times to randomly sample the voxels of the source and target organs. Within the voxel, a randomly sampled location was chosen. The resulting distances between each sampled position in the source and target organs created the chord length distribution for the pair.

Multiple regression analysis
As a demonstration of the predictive power of this dataset, multiple regression analysis was performed on a subset of the chord length data to estimate SAFs. Data were pulled from Villoing et al (2020) on SAFs for three  2022) was used to train several multiple regression models and select the best-performing algorithm. Four separate Gaussian Process Regression (GPR) models were tested: the Rational Quadratic GPR, Exponential GPR, Squared Exponential GPR, and Matern 5/2 GPR; GPR models were chosen due to their improved performance over other model types in this specific application. Within the multiple regression models, the photon SAF was used as the response variable; for the predictor variables, five data descriptors related to the chord length distribution were used: the minimum, first quartile, median, third quartile, and maximum chord lengths. The models were trained with 70% of the chord length data, and the remaining portion was used for testing purposes.

Results and discussion
This work compared distributions from 44 source organs and 32 target organs at the six different reference ages for a total of 8448 source-target distributions between stylized and voxel series. The distributions that are presented highlight relationships discussed in the aforementioned studies and illustrate the utility of the methods used. The remaining distributions are provided in PNG format (given in Electronic Supplemental Data) with each individual graph including distributions from both phantom sets.

Inter-organ comparisons
From the difference in the chord length distributions, as well as the assumption of consistent material composition between phantoms, the relationship between the S value for the two phantom sets can be inferred. The dependence of the SAF (and in turn, the S value) on the source-to-target distance gives rise to a purely geometric dependence when material composition can reasonably be estimated to be consistent (the density changes are negligible across phantoms for any given material). The geometric dependence can be shown in the difference between chord lengths from a source to a target. Much like the work by Lamart et al (2011), the source-target distributions in this study included the thyroid as one of the 79 source organs.
An example of the utility of this dataset is its ability to predict the scale of changes in S values. In the thyroidpancreas source-target pair, the chord length distributions are shown in figure 2. The mean distances in the chord length distributions of the RORNL, UF/NCI Female (UFNCI(F)), and UF/NCI Male (UFNCI(M)) phantoms are 36.99 ± 2.18 cm, 30.07 ± 1.61 cm, and 34.93 ± 1.50 cm, respectively. The relative mean distance increases by a factor of 3.35 when comparing the differences between the RORNL to the UFNCI(M) and UFNCI (F) phantoms. Likewise, relative differences in the S values presented in Lamart et al (2011) increase by a factor of 3 comparing the RORNL to the UFNCI(F) and UFNCI(M) phantoms. The order is reversed in the S value comparison as the S value depends on the inverse of the separation distance. This has been shown for the thyroid in work by Yeom et al (2020), where the positioning of the thyroid within the neck (i.e. the height location of the thyroid) was shown to significantly impact cross-irradiation S values in patient-specific internal dosimetry.
In addition to predicting the scale of the changes, general trends in organ positioning can also be observed by using the data more qualitatively. Figure 3 compares the mean chord lengths from the rectosigmoid colon wall to the bladder in six phantom ages. The shifting of organs that takes place to make room for reproductive organs (e.g. the uterus, ovaries) can be seen through the comparison of the RORNL and UFNCI phantoms. The pediatric UFNCI phantoms are nearly identical for all organs, however the bladder is an organ that is shifted in all ages (figure 3). The difference in how far the organ shifts with age introduces more uncertainty between phantoms when calculating dosimetric quantities and should be accounted for. An important distinction is that while the pediatric UFNCI phantoms are identical, there are small differences between the male and female UFNCI distributions, which are attributed to errors in using a finite sampling space. This uncertainty in the position must also be included when calculating dosimetric quantities. Quantifying the position of the organs between phantom sets was determined by the difference in the chord length distributions across phantom sets. By using the differences in the mean chord length distances across  phantoms, the variability in the position of the different organs could be determined. In order to determine the organ with the most variation in position, the magnitudes of the difference between the mean distance for each source-target pair was obtained for all the phantom types and tabulated from the largest difference to the smallest. This was performed for the adult ages only because the greatest variability was expected in the largest phantoms. Bone sites and organs with multiple structures were excluded, as well as repeated organ combinations (reversing the organ and target locations). Within the top twenty largest differences in the mean distances, the number of times an organ was repeated was tallied. The most tallied organ was the thyroid, as given in table 3.
In figure 4, the chord length distributions between the thyroid and stomach (figure 4(a)), lungs (figure 4(b)), and bladder (figure 4(c)) are presented. The stomach was selected as a key source organ for ingestion pathways; the lungs were selected as a key source organ for inhalation pathways; and the bladder was selected because it is a crucial organ for the excretion of radionuclides, particularly in nuclear medicine applications such as PET imaging and radiopharmaceutical therapy. These chord length distributions can be useful indicators to the changes in dosimetric values between phantom generations; for instance, the lungs to thyroid AFs for a 0.5 MeV photon are 9.89 × 10 −5 in the RORNL adult phantom (Han 2005) and 1.95 × 10 −4 and 2.70 × 10 −4 in the UF/ NCI adult male and female phantoms, respectively . This result is expected as the larger chord lengths correspond to the smaller AFs.

Intra-organ comparisons
While inter-organ distances can describe the differences between the phantoms in terms of organ location, intra-organ chord length distributions can demonstrate the differences in the size and shape of the organs. To determine the organs with the most variation in the size/shape, the differences between phantom sets of the mean chord length, as well as the standard deviation, were first computed. The largest differences in the mean and standard deviation were then analyzed separately, from which a variation coefficient was calculated.
In the evaluation of single structure organs, emphasis was placed on two cases: organs with varying means and consistent standard deviations, and organs with varying standard deviations and consistent means. The first case describes an organ that has a consistent size, but a varying shape (e.g. rounding edges, stretching in one axis while shrinking in another). The second describes an organ that has a consistent shape but a varying size (e.g. symmetric elongation, compression). An important distinction is that while the scenarios depicted are true for symmetric shapes, asymmetry causes this reasoning to be imperfect. Another limitation of this analysis is that it does not account for the actual shape of the distribution, so for any combination of shapes only the mean and standard deviation were considered. These surrogate measures of the size and shape were, however, adequate for a less rigorous two variable analysis. In order to place more emphasis on these two cases, a variation coefficient was obtained by dividing the differences in the mean by the differences in the standard deviation. The larger values contained the organs with large mean differences and small standard deviation differences, and the opposite for the smaller values. This coefficient was calculated for both the RORNL-UFNCI(F) and RORNL-UFNCI(M) pairs, and homogeneity between these pairs was preferred. This was to ensure that the differences between hermaphrodite RORNL phantoms and the male/female UF/NCI phantoms were solely dependent on defining the size and shape of the organs, rather than the anatomical differences between the male and female phantoms. The effect that these anatomical differences have is creating more variability in the volumes, and thus the masses of the organs. While studies have shown that within certain ranges of energies the SAF can be reasonably be approximated to be constant, particularly for electron sources, small changes in the Many organs showed consistent means as well as standard deviations across phantom types. One of the single structure organs with the highest variation coefficient was the gallbladder. This chord length distribution is presented in figure 5.
The single structure organ with the smallest variation coefficient was the thyroid. This is presented in figure 6. The larger standard deviation for both UF/NCI phantoms were indicative of the organ being larger than the RORNL phantom thyroid. The thyroid self-irradiation AFs for a 0.03 MeV photon are 0.163 in the RORNL adult phantom (Han 2005) and 0.1445 and 0.1275 in the UF/NCI adult male and female phantoms, respectively . This result is expected, because larger distances within an organ correspond to larger organ size, which in turn correspond to larger self-irradiation AFs.
The differences in mean chord lengths across phantom sets were investigated at different ages for five different organ self-distances and tabulated in table 4. This was to observe if there were tendencies of one phantom to grow larger than the other at each age. What was found was that the relationship between the mean chord lengths in each phantom type was not consistent with age. In general, the adult phantom demonstrated  the least consistency across phantom types. For the pediatric phantoms, the similarity in the mean chord lengths depended on the source-target organs. Because there is no consistent trend among the phantom types in the mean intra-organ distances, the observation by Lamart et al (2011) of the tendency of the S values to be lower in ORNL phantoms is better explained by an increasing distance in the distance between the thyroid to these targets with greater ages rather than a change in target size.
The differences in the standard deviations across phantom sets were generally found to be higher in organs that contained multiple structures. One reason for this is due to the fact that the UF/NCI phantoms are separated into male and female phantoms, while the RORNL phantoms do not consider these differences. Evidence for this explanation can be seen by the comparison of the distributions of the adrenal glands, shown in figure 7.
Another reason the organs with multiple structures presented significant differences is because, as demonstrated in the cases of inter-organ distances, the positions relative to the other structures play a role rather than simply the size and the shape. This explanation helps to understand the differences in the distribution of the salivary glands shown in figure 8. The chord length distributions at six different age groups are presented to show how the size affects the distribution. In this distribution the changes in standard deviation with age can be used to describe the size growth, and the distance between the lower limit of the error is influenced by the distance between structures as well as the organ size.

Multiple regression analysis and future applications
Test results showed the Exponential GPR model to provide the best performance in estimating photon SAFs, based on its high r-squared values and the symmetry in its predictions around ground truth. Table 5 shows the strength in the Exponential GPR model to consistently make SAF predictions with relatively low errors at all three tested photon energies; figure 9 shows a plot from the testing portion of the dataset of the predicted SAFs using this model versus the ground truth SAFs. Further testing results may be found in the supplementary data. These preliminary findings show the potential strength of the chord length dataset to predict photon SAFs without the need for Monte Carlo simulation. Table 4. Differences in the mean intra-organ chord lengths within the thyroid, brain, heart wall, liver, and bladder wall between RORNL and UFNCI phantoms. Chord length data can be applicable to prospective dosimetry efforts using other approaches as well. Chord lengths may be used to make quick estimates of SAFs when combined with point kernel methodology that can convert source-to-target distance into estimated dose; this type of effort has already been shown to perform well in internal neutron dose estimates (i.e. spontaneous fission neutron emitters) when compared to full Monte Carlo simulations (Griffin et al 2022b). Generating chord length data may also be of utility in future phantom   development, such as maintaining consistency with ICRP reference anatomy or in the systematic increase or decrease in inter-organ distances depending on height or weight of the phantom for population-specific radiation protection or for individualized nuclear medicine. If applied on patient-specific data, the mean distance/vector can inform organ identification and contouring vis-à-vis the mean, with potential to expand in machine learning contouring of organ segmentation or treatment planning.

Limitations
As is the case with most phantom studies, this inquiry into the organ differences did not account for physiological processes that may change the size (deformability) or anatomical location of any organs. More data is needed, for example, on how the phantoms will be adjusted in the case of respiration when the lungs expand, or the growth of the stomach, intestines, and bladder during ingestion and excretion. This theme of the static nature of current phantoms is discussed in a study by Yang et al (2021), which sought to obtain proper dimensions for lung phantom construction. Utilization of respiratory-specific lung phantoms (Chang et al 2021) may be explored to determine parametric variability of lung volume respiration with computed SAF to account for significant changes that occur during the breathing process. Another limitation to this study that was not addressed was material composition. In evaluating the distances between organs from the stylized and voxel phantoms, the material composition of each path will vary. While it has been observed that the theory of reciprocity holds for various organs within the RORNL stylized phantom series (Hiller et al 2017), there is not a current quantification of potential differences in distance material composition. Such differences may negate or further enhance the disparity of particles from one organ reaching another organ as suggested by the results of the organ distance calculations. Results from  are interpreted to suggest that source-to-target material compositions will have a negligible impact on pairs of organs that do not transcend a large bone structure, such as the pelvis, clavicle, or cranium. Thus, it is anticipated that source-to-target organ pairs that reside above the pelvis, below the clavicle, and within the rib cage should be largely unaffected by this phenomenon while source-to-target organ pairs that stray from this criterion should be increasingly affected. Figure 9. Regression-predicted estimates versus ground truth data on adult photon SAFs in the UF/NCI phantom series (photon energy of 0.5 MeV) when using the Exponential GPR model with the test portion of the chord length distribution dataset. The solid line represents where the data points would lie in a perfect prediction model.

Conclusions
In this study, the geometric differences between the UF/NCI hybrid and RORNL stylized phantoms were investigated. By using large numbers of random variables, chord length distributions were obtained for both phantom types and compared with one another. This work demonstrates utility in this geometric comparison in its ability to predict the scale of S value and SAF changes along with the general trends in their behavior. These S value changes are crucial to the calculation of dose variability under the same circumstances. Thus, the chord length distributions summarized in this study may be useful in future work that combine distance distributions and point kernels (precalculated distance-to-dose) to create dose estimates without MC simulation, only phantom geometry in potential applications computing internal dose quantities and in utilizing predictive machine learning. These applications would further ease the computational burden of model-based dose calculation algorithms. Future work accounting for variability in organ deformation will enhance the ability to label and characterize organs for individualized internal dosimetric estimates, further enhancing the ability to conduct both individualized patient dosimetry more rapidly rather than relegating to reference-only and potentially expansion to multi-scale dosimetry.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary information files). Data will be available from 1 February 2023.

Funding
This work was funded in part by the intramural program of the National Institutes of Health, National Cancer Institute, Division of Cancer Epidemiology and Genetics. The calculations in this work were performed in part on the NIH High-Performance Computing Biowulf Cluster (http://hpc.nih.gov).