An automated methodology for whole-body, multimodality tracking of individual cancer lesions

Objective. Manual analysis of individual cancer lesions to assess disease response is clinically impractical and requires automated lesion tracking methodologies. However, no methodology has been developed for whole-body individual lesion tracking, across an arbitrary number of scans, and acquired with various imaging modalities. Approach. This study introduces a lesion tracking methodology and benchmarked it using 23 68Ga-DOTATATE PET/CT and PET/MR images of eight neuroendocrine tumor patients. The methodology consists of six steps: (1) alignment of multiple scans via image registration, (2) body-part labeling, (3) automatic lesion-wise dilation, (4) clustering of lesions based on local lesion shape metrics, (5) assignment of lesion tracks, and (6) output of a lesion graph. Registration performance was evaluated via landmark distance, lesion matching accuracy was evaluated between each image pair, and lesion tracking accuracy was evaluated via identical track ratio. Sensitivity studies were performed to evaluate the impact of lesion dilation (fixed versus automatic dilation), anatomic location, image modalities (inter- versus intra-modality), registration mode (direct versus indirect registration), and track size (number of time-points and lesions) on lesion matching and tracking performance. Main results. Manual contouring yielded 956 lesions, 1570 lesion-matching decisions, and 493 lesion tracks. The median residual registration error was 2.5 mm. The automatic lesion dilation led to 0.90 overall lesion matching accuracy, and an 88% identical track ratio. The methodology is robust regarding anatomic locations, image modalities, and registration modes. The number of scans had a moderate negative impact on the identical track ratio (94% for 2 scans, 91% for 3 scans, and 81% for 4 scans). The number of lesions substantially impacted the identical track ratio (93% for 2 nodes versus 54% for ≥5 nodes). Significance. The developed methodology resulted in high lesion-matching accuracy and enables automated lesion tracking in PET/CT and PET/MR.


Introduction
Medical imaging plays a vital role in the assessment of cancer response to treatment by providing anatomical and functional information.Anatomical information is invaluable since treatment agents that cause the shrinkage of solid tumors are correlated with improvement in overall survival (Eisenhauer et al 2009).Furthermore, functional information captured by molecular imaging is becoming increasingly important for lesion identification and treatment monitoring (Weissleder 2006, Baum et al 2010, Vanderhoek et al 2013, Scarpelli et al 2018).Importantly, the spatial resolution capabilities confer medical imaging with the capacity to provide lesion-specific information.This allows for the identification of inter-lesion treatment response heterogeneity,

Methodology description
The developed lesion tracking methodology was benchmarked on a cohort of metastatic neuroendocrine tumor (NET) patients, which was retrospectively collected from the University of Wisconsin Hospitals and Clinics Picture Archiving and Communication System (UWHC-PACS).All the ethical guidelines and requirements of the institutional review board were followed.The inclusion criteria were metastatic NET patients that had received both PET/CT or PET/MR whole-body scans as part of their care in UWHC, PET/CT and PET/MR were used because of having whole-body fields-of-view (FOVs), and the PET image provides great lesion conspicuity.This population of metastatic NET patients poses two relevant challenges to the lesion tracking task.First, NET diagnosis and staging relies on multiple anatomical and functional imaging modalities (Pirasteh et al 2021).Second, NET patients undergo numerous imaging procedures due to the tumors' slow growing nature and patients' relatively long overall survival (5 year survival rate of 77%) (Figiel et al 2020).
The lesion tracking methodology acts on an image series with associated lesion masks, i.e. it assumes that all lesions have been identified and segmented on all scans in the series and the individual lesion components are separated and numbered based on connected components analysis.In this study, an imaging scientist (author versus) identified and contoured the cancer lesions manually, assisted by a nuclear medicine physician (author SBP).Segmentation was performed on the fused PET/CT images of all time-points by observing and manually segmenting regions of conspicuous elevated tracer uptake in relation to the local tracer background.Volumes of naturally occurring elevated uptake were not included in the segmentations.
Lesion matching is performed between each possible image-pair within a series, then the lesion matching decisions are combined to establish lesion tracking across the entire image series.Differently from lesion tracking, lesion matching is defined as establishing the correspondence between lesions in one single image-pair.For each image-pair, matching decisions are made to identify each lesion as a corresponding lesion (CL), a new lesion (NL), or a disappearing lesion (DL).The concepts of lesion tracking and matching decisions are illustrated in figure 1.The developed methodology has six steps: multimodality image registration, lesion body-part labeling, region-based lesion-wise dilation, projection-constrained lesion clustering, combinatorial linear assignment, and lesion graph display (figure 2).
In step 1, all images from different time-points in a series are registered to the baseline image, resulting in N 1 -registrations of image-pairs, where N is the number of images in a series.The intra-patient nature of the registration averts the need for registration atlases, which are commonly used for inter-patient registrations (Hellier et al 2001).For every image-pair, the registration mode can be either direct or indirect, direct registration is defined between the baseline image and any other subsequent image, whereas indirect registration is defined as the registration of two subsequent images whose transformation fields were calculated to baseline image.The registrations are performed using a whole-body rigid registration (WRR) followed by whole-body deformable registration (WDR).The WRR performs an initial alignment of the images to prepare for the more detailed deformable step.In the WDR, a hierarchical control-grid B-spline free form deformation (FFD) is used with a thin metal sheet bending energy regularizer (Rueckert et al 1999).This approach has been previously validated for whole-body intra-modality image registration (Santoro-Fernandes et al 2021).The optimization metrics used are the normalized mutual information for inter-modality registration (e.g.CT to MR) (Studholme et al 1998) and the normalized cross-correlation for intra-modality (e.g.CT to CT and MR to MR) (Avants et al 2008).Both metrics are optimized using a gradient descent approach (Klein et al 2007) with the cost function where C denotes the optimization metric (normalized mutual information or normalized cross-correlation) as a function of the moving image (I M ), the fixed image (I F ), and the set of parameters of the image transformation (  m), which are translation and rotation parameters for WRR, and the parameters of a deformation field for the WDR (Sederberg and Parry 1986).The function ( )  R m is a penalty term (bending energy) that enforces the smoothness of the transformation (S and Wahba 2006).The optimal transformation field T n (for image n) is then applied to the coordinates of the lesion masks L n associated with each of the N images in the series, generating a registered lesion mask L .
nT The transformation field is the identity for n 1 = (baseline image).

=
In this study, the registration was performed using the anatomical information contained in the CT and MR scans to compute the transformation fields, which were then applied to the binary lesion masks.The registration step was implemented using elastix (Klein et al 2010, Shamonin et al 2014) with the SimpleElastix python wrapper (Marstal et al 2016) and using a variation of the gain factor variables published by Akbarzadeh et al (2013) (supplemental material).
In step 2 the individual lesions are labeled according to the anatomical region (body-part) where they are located.The body-parts used were head and neck, chest, pelvis, abdomen, spine, arms, and legs.Lungs were defined as a subset of the chest and liver as a subset of the abdomen.A whole-body segmentation atlas is registered to the baseline image of every series following the same registration procedure described in step 1.Every lesion is labeled according to the overlap of the lesion's volume to the body-part segmentation atlas.If more than one body-part overlaps with a lesion volume, the lesion is labeled according to the greatest volume of overlap.
In step 3, the lesion contours are dilated to account for possible inaccuracies in the registration and increase the probability of lesion superposition.The dilation magnitude is decided independently for each lesion and is based on the local density of lesions in the anatomical region where each lesion is situated.A morphologic conformal lesion dilation operation is applied to each lesion in an image.The lesion-specific dilation magnitude (D i ) is defined for each lesion as

=
where d i j , is the distance between a lesion i and every other lesion j in the same image, and D max is a user defined parameter to set a maximum allowed dilation magnitude.Figure 3 illustrates the definition of D .
i A new dilated lesion mask L nT D , is then defined as

=
where i indexes each lesion in the mask.After the dilation, the overlaps between the individual lesions (i and j) of different time-points (n and q) are calculated for every possible image-pair in the series as The matrices M n q , are square matrices containing the volume of overlap between the lesions of time-points n and q.They have 2 w elements where w is the maximal integer between the number of lesions identified in time-point n ( n w ) and time-point q ( q w ).The number of superposition matrices M n q , created after the whole image series evaluation is ( )/ N N 1 2. - In step 4, disconnected lesion regions are clustered if they result from lesion merging or splitting.This can occur as part of the treatment response when a lesion volume shrinks and splits into many smaller lesion components or when many lesion components grow and merge as part of disease progression (Shafiei et al 2021).A subroutine was developed to perform clustering decisions (figure 4).The clustering decisions are made for every image-pair by selecting the lesion pairs that have a common intersecting lesion and using geometrical considerations to decide if these pairs consist of separate lesions or if they are components of the same lesion that have merged or split.In short, lesions are clustered if a characteristic distance between them (u) is smaller than the longest chord of the common intersecting lesion (d).The lesion clustering subroutine consists of the substeps in table 1.
In step 4.1, the orientation t is defined as the weighted mean of all the vectors connecting each voxel of both lesions in the pair  (A) Shows the lesions in the initial condition, where a lesion C has shrunk to a much smaller lesion L i in the time-point 2 and a new lesion L j appeared.In (B) the direction t is determined.In (C), the characteristic distance u is determined and L i and L j are projected to create the constrained contour C .
c In (D) the longest chord distance d is determined and compared to u.The lesions L i and L j are not clustered since d u.
< Importantly, lesions L i and L j would be wrongfully clustered if the lesion projection was not considered since * d u > .
Table 1.Description of the lesion clustering subroutine.

Sub-step
Description Rationale

Determines a connecting orientation t between the lesion pair
The distances u and d will be measured along this orientation 4.2 Defines a characteristic separation distance u between both lesions in the pair, where u is aligned with the orientation t The distance u characterizes the separation of the lesion pair 4.3 Projects the lesion pair contours along t to constrain the common intersecting lesion contour C The measurement of d needs to be locally constrained Determines the length d of the longest chord within the con- strained common intersecting lesion contour (C C ), where d is aligned with the orientation t The distance d defines a limit separation assuming the lesion pair originated from lesion merging (or splitting)

4.5
Compares d and u to make the clustering decision (positive for cluster if d u > ) Makes the final clustering decision where  t i is a connecting vector, V is the total number of connecting vectors, and the weights w i are defined as the inverse L2 norm of each vector In step 4.2, the distance u is determined as the 95th percentile of the distribution of the norms of all vectors  t i that satisfy the angular constraint this imposes a soft constraint that the vectors being considered to determine u are aligned with t .The 5 degrees value is a parameter of the methodology that can be adapted according to each application.The 95th percentile was chosen instead of a maximal operation because it is more robust against outliers in the distribution (Huff et al 2021).
In step 4.3, a constrained version (denoted by C c ) of the common intersecting lesion contour (C) is determined by projecting the contours (L i and L j ) of both lesions in the pair in the orientation t , generating the projected contours (L i p , and L j p , ).C c is then defined as C needs to be constrained to account for lesions of significantly different dimensions (as demonstrated in figure 4(D)).
In step 4.4,  d is determined as the longest of the chords this enforces again that the chord  d is oriented in a similar orientation as t , hence the distances d and u are both measured along similar orientations.
Finally, in step 4.5, u and d are compared, and if u d < the lesion pair is considered as a clustered, rather than as two disconnected lesions.Figure 4 illustrates the clustering sub-routine.After step 4, the superposition matrices M n q , for the whole series are reevaluated based on the altered lesion masks, generating modified matrices M .n q ,

¢
In step 5, lesion matching decisions are made by evaluating every possible image-pair combination in the series and establishing lesion correspondence.The combinatorial linear assignment process defines a cost matrix K for each matrix M n q , ¢ as /M 1 .
n q i j n q i j , , , , Lesion matching is reduced then to a linear assignment problem of finding the optimal permutation matrices A n q , that globally maximize the overlap between the lesions in the image-pair (Jaqaman et al 2008) within the constraints to the permutation matrix A ( ) A A 0 or 1 and 0 or 1 12 where the sum of the matrix elements can be zero if there are no matches for the lesions represented by those rows and columns.The Munkres assignment algorithm (Munkres 1957) is used to find each optimal matrix A n q , via a non-greedy minimization of the cost matrices .
n q , K Finally, in step 6, the lesion tracking information of the image series is summarized as a lesion graph (Yan et al 2018, Kuckertz et al 2021, Szeskin et al 2023).This structure has one layer per image in the series and each node represents one lesion in each time-point, if many lesion components are clustered, each component is still represented by an individual node in the lesion graph.The edges represent the temporal correspondence between the lesions, i.e. the lesion matches.The lesion graph is illustrated in figure 1.A lesion track is defined by a connected subset of the graph that is disconnected from every other subset.In this study, the methodology's steps 2 through 6 were implemented in MATLAB version R2022b (The Mathworks Inc.).

Performance of image registration
The distances between the centroids of landmark lesions were used to evaluate the accuracy of image registration.A lesion was used as landmark when there was no more than 50% volume variation of that lesion within the image-pair being registered.This volume constraint was introduced to limit the biasing in the expected value of the centroid difference by lesions significantly growing or shrinking.Three registration accuracy sensitivity analyses were performed.The first compared the WRR and the WDR accuracies.The second compared direct versus indirect registration modes.The third sensitivity analysis was done to compare the accuracies of intra-modality (CT-CT and MR-MR) against inter-modality registration (CT-MR).

Performance of lesion matching and tracking
Reference lesion matches were manually prepared by an imaging scientist (versus) and served to evaluate the methodology's lesion matching and lesion tracking performances.The matching accuracy was evaluated as the number of correct matching decisions divided by the total number of reference matching decisions.Accuracy was separately determined for CL and for new and disappearing lesions (NL/DL).To evaluate the lesion tracking accuracy, the number of automatic lesion tracks identified by the methodology with an identical reference track was determined.An identical reference track means that the automatic and the reference tracks contain the same lesions and lesion matches.The identical track ratio was determined by dividing the number of identical tracks by the total number of reference tracks.

Sensitivity analyses
Five sensitivity analyses were performed as part of this study.First, a lesion dilation magnitude sensitivity analysis was performed by evaluating the impact in lesion matching accuracy of employing various fixed magnitudes of lesion dilation (0-18 mm in 3 mm steps) and using the automatic dilation (as described in section 2, step 3).The optimal dilation strategy was determined as the one that maximized overall lesion matching accuracy.For the anatomic distribution sensitivity analysis, a reference set of body-part labels was manually established by an imaging scientist (author versus) and the lesion body-part labeling accuracy was determined as the number of correct automatic labels divided by the total number of lesions.The lesion body-part labeling, the lesion matching, and the identical track ratio accuracies were evaluated in relation to the body-part in which the lesions were located.A body-part labeling confusion matrix was prepared (Stehman 1997).Regarding the image modality sensitivity analysis, the impact on lesion matching accuracy was evaluated with respect to the image modalities involved in the registration step of the methodology.This was done by inter-comparing the lesion matching accuracy of CT-MR, CT-CT, and MR-MR registrations.The registration mode sensitivity analysis was performed by evaluating the lesion matching accuracy for the direct and the indirect registration modes.Finally, the track size sensitivity analysis consisted of evaluating the identical track ratio as a function of the number of scans in the track and of the number of lesion contours included in the track (number of nodes).The identical track ratio could not be used in the image modality and in the registration mode sensitivity analyses.To measure the identical track ratio, every scan in a patient's series is necessary.Therefore, the modalities and the registration modes cannot be separated.

Failure modes
The reason for every false lesion matching decision made by the automated lesion tracking methodology was investigated and a qualitative analysis was performed to group the errors into similar failure modes.

Statistical analysis
For the evaluation of registration accuracy, the landmark distributions were compared using the Wilcoxon signed-rank test.The direct and indirect registration landmark distributions were compared using the Wilcoxon rank test.For the testing of different imaging modalities, a Kruskal-Wallis' test was used to evaluate if all distributions could be drawn from the same underlying distribution, followed by a Wilcoxon rank test for difference in the distributions, if necessary.The significance level considered was p 0.05.

<
The Bonferroni method was used to correct for multiple comparisons.The patient-wise standard deviation was calculated for matching accuracies and identical track ratios.The patient-wise Pearson correlation coefficient between the overall accuracy and the identical track ratio was calculated to evaluate the degree of dependence between lesion matching and lesion tracking accuracies.The standard deviations of the accuracies and identical track ratios were calculated across patients and across lesion body-parts as a metric of the methodology's robustness.In the intra-versus inter-modality registration accuracy analysis and the image modality sensitivity analysis, only images from the patients who had both PET/CT and PET/MR were used.And in the registration mode sensitivity analysis, only images from the patients who had more than two images in their series were used.These measures were taken to avoid category imbalance.All accuracies were reported in the interval from 0 to 1.

Results
The search returned 8 patients that were scanned a total of 23 times, resulting in 7 PET/MRs and 16 PET/CTs scans.The PET/CT images were acquired on the scanners GE Discovery 710, GE Discovery MI, and Siemens Biograph mCT.The PET/MR images were acquired on a GE Signa Scanner.The PET tracer for all images was 68 Ga-DOTATATE.The median injected dose was 156 MBq (range: 115-467), and the median time interval from radiopharmaceutical injection to image acquisition was 83 min (range: 53-101).More details about the image acquisitions are presented in table 2. The images had varying voxel-sizes and all images were resampled to 2 × 2 × 2 mm 3 .
There were 956 unique lesion contours in the patient population.After visual analysis, lesions were grouped into 493 reference lesion tracks consisting of 1570 reference matching decisions.The results per patient are shown in table 3.

Performance of image registration
Out of the 956 lesions in the dataset, 352 (37%) fulfilled the volume variation constraint to be landmark lesions.The median landmark distance (figure 5(A)) using WRR was 4.3 mm and using WDR it was 2.5 mm.The registration accuracy for WDR was significantly higher than for WRR (p 0.001 < ).The two most prominent outliers seen on the WDR distribution (25.7 and 26.7 mm) come from lesions whose diameters are considerably larger than 25 mm (figure 6), therefore these registrations can still be considered acceptable.The results for the comparison between direct and indirect WDR are shown in figure 5(B).Direct and indirect registrations had non-distinguishable performances (p 0.4 = ).Finally, the intra-and inter-modality distributions of centroid distances (figure 5(C)) were significantly different.The Kruskal-Wallis p-value for the Chi-Square test was  p 0.001, the pairwise tests were done using a Bonferroni-corrected significance level of 0.02.Only patients who received all three possible modality combinations (patients 1 and 2) were included in the inter-modality analysis, consequently 265 of 352 lesion pairs were used.The median centroid distance for CT-MR, MR-MR, and CT-CT registrations were 2.9, 2.4, and 1.7 mm, respectively.Inter-modality (CT-MR) registration performance was significantly worse than intramodality performance (CT-CT and MR-MR).Both intra-modality registration performances (CT-CT and MR-MR) were indistinguishable.

Performance of lesion matching and tracking
WDR was used for all lesion matching and tracking performed in this work since it showed superior registration accuracy (than WRR) and was previously demonstrated to yield optimal lesion matching results (Santoro-Fernandes et al 2021).Without the use of lesion dilation (step 3), the identical track ratio was 83%, the overall matching accuracy was 0.86, CL matching accuracy was 0.77 and NL/DL matching accuracy was 0.96.

Sensitivity analyses
The automated lesion matching accuracy and identical track ratio for each fixed dilation magnitude and for the automatic dilation approach are shown in figure 7.In general, the automatic dilation magnitude surpassed any of the fixed dilation magnitude results, leading to 432 out of 493 identical tracks (88% identical track ratio) and an overall accuracy of 0.90.The CL accuracy for 9 and 12 mm dilations was better than for the automatic dilation   case, however the slightly better CL accuracy was offset in the overall accuracy by the inferior NL/DL accuracy.Automatic dilation led to CL accuracy of 0.84 and NL/DL accuracy of 0.96.Table 4 shows the performance metrics per patient in the automatic dilation case.The Pearson's correlation coefficient between the overall accuracy (indicative of lesion matching performance) and the identical track ratio (indicative of lesion tracking performance) was 0.28.r = Regarding anatomic distribution, the 956 lesions were distributed from head to thighs, with greater prevalence in the abdominal area.The automatic lesion body-part determination step had a sensitivity of 0.84, correctly labeling 803 lesions out of the 956, figure 8 shows the confusion matrix for lesion body-part labeling.Table 5 shows the number of lesions per body-part and the results of the anatomic distribution sensitivity analysis.
When evaluating the dependence of lesion tracking performance with imaging modality involved in the registration step (table 6), using different modalities (CT-MR) resulted in a slightly lower performance than using the same modality for registration (CT-CT and MR-MR).
Table 7 shows the lesion matching accuracy results for the direct and indirect registration modes.The results for both registration modes were similar, with CL accuracy results that differed by 0.01, the same NL/DL accuracy, and overall accuracy that differed by only 0.01.  Figure 9 shows the identical track ratio as a function of numbers of scans and number of nodes in the tracks.Tracks with 4 scans were exclusively from patient 1, 7, and 8 and tracks with 3 scans were exclusively from patient 2. Most lesion tracks had fewer than 5 nodes (lesions).There were 22 tracks with five or more nodes.This includes 12 tracks with 5 nodes, 3 tracks with 6 nodes, 3 tracks with 7 nodes, 2 tracks with 9 nodes, and 2 individual tracks each with 11 and 19 nodes.

Failure modes
There were 164 errors out of 1570 (10%) automated matching decisions made using automatic dilation.The most common failure mode was in the registration (98/164 errors), where lesions were incorrectly superimposed due to inaccuracies in registration.This failure mode was further divided into four sub modes (table 8).The borderline sub mode was defined as lesions that did not intersect but shared a border.The liver and spine registration failure sub modes denote that an inaccuracy in registration of these organs was the driver of the error.The second most common failure mode was inaccurate lesion clustering (44 errors), where incorrect lesion clustering resulted in a matching error.Less frequent failure modes were shoulder lesion (9 errors), which failed to intersect when the patient's arm positioning changed significantly between scans (e.g.patient imaged with arms up, then imaged with arms down).The resection failure mode (8 errors) happened exclusively for one patient (patient 2) that had the inferior portion of the liver resected between scans.In this case, the WDR wrongfully distorts the resected liver to match the unresected liver, resulting in matching errors.Finally, the least common failure mode was field of view (FOV) (5 errors), which happened in the edges of the imaging FOV where the registration had higher inaccuracies, impairing lesion intersection.

Discussion
In this work we introduced a lesion tracking methodology that was validated on a population of neuroendocrine tumor patients with high-burden metastatic disease.The methodology is based on lesion contour superposition from registration of longitudinal multimodality images to the baseline image of an image series.Image registration was accurate, with a median residual landmark error of 2.5 mm for WDR (figure 5) which is comparable to previously published landmark based residual errors for whole-body PET/CT (Santoro-Fernandes et al 2021) and for lung CT (Yin et al 2011).We studied the lesion matching and tracking  performances under various fixed and automatic dilations, of which the latter resulted in the highest proportion of identical lesion tracks (88%) and overall accuracy (0.90) (figure 7).The weak correlation ( 0.28 r = ) between the overall accuracy (indicative of lesion matching performance) and the identical track ratio (indicative of lesion tracking performance) suggests a considerable degree of independence between the two metrics.Lesion tracking accuracy variation by anatomical region was low.The lowest lesion matching accuracy was observed in the spine (0.88), and the lowest identical track ratio was observed in the arms (78%), with standard deviations of 0.03 for lesion matching accuracy and 8% for identical track ratio across anatomical regions (table 5) This suggests that the methodology performance is robust regarding body-part.Use of inter-modality led to inferior registration accuracy (figure 5) and lesion matching (table 6), indicating a degree of correlation between registration and lesion matching performances, and the importance of accurate registration as a prerequisite for accurate lesion matching.
Our novel lesion tracking methodology presents a few advantages when compared to the 4D connectivity approach, which is the most commonly used and relies entirely on the lesion superposition caused by registration (Metcalf et al 1992, Kikinis et al 1999, Gerig et al 2000, Köhler et al 2019, Kuckertz et al 2021).The first advantage is the automatic lesion dilation (step 3), which increases the probability of establishing lesion correspondence without relying on registration accuracy alone.A second advantage is the ability to establish lesion tracks even when the lesion falls below the detectability limit in an intermediate time-point (intermittent lesion-figure 1), which is possible in our methodology due to the combinatorial linear assignment (step 5).A third advantage is avoiding the wrongful joining of tracks caused by spurious overlap.Instead, the linear assignment (step 5) disregards spurious overlaps by globally optimizing lesion overlap volume, while lesion clustering (step 4) enables the matching of disjoint lesions.A fourth advantage is performing whole-body lesion tracking.Previous works have been limited to brain (Metcalf et al 1992, Kikinis et al 1999, Gerig et al 2000, Shahar and Greenspan 2005, Köhler et al 2019, Kuckertz et al 2021), liver (Kuckertz et al 2022, Szeskin et al 2023), or bone tissue (Yip and Jeraj 2014).The fifth advantage is performing lesion tracking with multiple image modalities (PET/CT and PET/MR), whereas previous works that developed whole-body lesion matching were limited to PET/CT (Hering et al 2021, Santoro-Fernandes et al 2021) or CT images (Yan et al 2018).In summary, the lesion tracking methodology developed in this work achieved highly accurate results and is the first registration-based whole-body methodology that does not present the 4D connectivity limitations and accepts multiple imaging modalities as input.Other novelties of our methodology include accounting for lesion merging and splitting by considering lesion morphology and angular constraints, and an automatic selection of the magnitude for the conformal lesion dilation to account for local variations in lesion density.Finally, the introduced methodology is tracer independent since it assumes a pre-existing lesion mask, and it is also agnostic about the method used to obtain the lesion masks.
The method introduced by Yan et al is a non-registration approach to lesion tracking that consists of evaluating the similarity between lesion features derived using deep learning methods (Yan et al 2018).They performed whole-body lesion tracking limited to a single modality (CT).Their work is the only whole-body lesion matching work that reported on lesion matching accuracy.They reported precision and recall, which were respectively 0.86 and 0.92, leading to an F-score of 0.89.Our methodology had similar results, with slightly higher precision (0.89), lower recall (0.84), and F-score (0.87).The gain in precision and loss in recall are minor and suggest the non-inferiority of the methodology introduced in this work.Importantly, the methodology introduced herein has the additional advantage of handling multiple imaging modalities (supplemental material, figure 1).In a recent work, Szeskin et al introduced a simultaneous lesion tracking and segmentation approach that utilizes image registration followed by a convolutional neural network simultaneous segmentation approach.Their work is specific for liver lesions on CECT and was benchmarked on 50 scan-pairs containing 492 lesions.They report precision of 0.86 and recall of 0.90, leading to a F-score of 0.88.Again, the lesion matching metrics were very similar when compared to this work (0.89 precision, 0.84 recall, 0.87 F-score).In summary, our methodology achieved comparable lesion matching results on a more strenous test, that of tracking lesions across multimodalities, multiple time-points, and spread through the whole-body.
The main challenge of our work was to determine the reference lesion matches and tracks, which is subject to unavoidable observer bias.Nevertheless, most of the lesion matching decision are conspicuous, and the decisions that elicit confusion would also be subject to inter-observer variability in a clinical setting, therefore this challenge does not take away from the potential clinical utility of the developed methodology.Another challenge is the limited number of patients available for benchmarking the methodology.Despite only testing on 8 patients, many matching decisions (1570) were available to validate the lesion matching performance.Furthermore, the standard deviation across patients for lesion matching accuracy was 0.06 and for identical track ratio was 12% (table 6), which indicates a small outcome variation between patients.However, further investigations with more patients and different applications (imaging modalities, FOVs, cancer types, etc) are necessary to evaluate the robustness of the methodology.A third challenge was the different arm positioning between PET/CT and PET/MR (above the head, to the side of the torso), which introduces anatomical discrepancies that cannot be reconciled by the deformable registration approach without unrealistic image distortion.To circumvent this limitation, the whole-body images were constrained to head, torso, and legs when needed (arm lesions were absent in these instances).
The use of the lesion tracking methodology in clinical applications should be cautious since an important drop in the identical track ratio was observed as the lesion tracks increased in number of scans and in number of nodes (figure 9).This result is expected since identical track ratio is a very strict metric (a single error suffices to break the identicalness).This issue can be diminished in clinical applications by considering the most recent image series subset for patient evaluation, especially when the patient is surveilled over many years.More importantly, automatic lesion tracking should serve as an aid to the physician, allowing for human intervention to correct errors in the lesion tracks.A work by Moltz et al (2012) has shown that automated lesion matching can save time and reduce inter-reader variability of clinical decisions, suggesting that human correction can be faster than the infeasible task of manual lesion tracking.The feasibility of the human corrections is reliant on sufficient automatic tracking accuracy, which this work has shown to be encouraging (88% overall identical track ratio).Another factor that facilitates human correction of lesion tracks is that errors in lesion matching often lead to joined lesion tracks rather than completely incorrect decisions.This is mostly observed in anatomical regions of dense lesion population.Figure 10 illustrates a situation where three reference lesion tracks joined because of wrong automated lesion matches.In a clinical setting, providing that the implementation of the automated lesion tracking methodology allows for rapid and guided manual intervention, an operator could separate the three tracks in less time than individually matching all 15 lesion involved contours.
Individual lesion tracking allows for the construction of lesion graphs (figure 1), which are mathematical representations of the pathology identified in the image series of a metastatic cancer patient.The nodes of the graph represent the lesions and can encode information such as lesion volume, tracer uptake, grey levels, radiomics metrics, or any other quantitative information desired.The edges of the graph (lesion matches) establish the temporal relationship between these quantities, making for a comprehensive tool in the quantitative imaging context.Graphs have been previously used as input for predictive models (Duvenaud et al 2015, Cangea et al 2018), using lesion graphs and lesion-level information as predictive metrics for patient response to cancer treatment holds immense potential and will be pursued in future developments of this work.Other future investigations include lesion tracking of images with intersecting but unequal FOVs, establishing the degree of certainty of the matchings to guide manual lesion track corrections, and automatically addressing different arms positionings.

Conclusion
In this work we have developed a whole-body and multimodal lesion tracking methodology that enables lesionlevel treatment response assessment of metastatic cancer patients.The developed methodology achieved high accuracy in lesion matching and tracking in a NET patient population imaged with 68 Ga-DOTATATE PET/CT and PET/Mr The novel methodology introduced for tracking lesions through all time-points of an image series was made possible by using image-pair-wise lesion matching decisions with a combinatorial approach that leads to lesion tracks.Our results are evidence that individual lesion tracking is feasible in a multimodality setting.However, the methodology benchmarking presented is limited and further validation is needed to identify the methodology's strengths and weaknesses and to consider clinical translation.

Figure 2 .
Figure 2. Schematic representation of the lesion tracking methodology's main steps.

Figure 3 .
Figure 3. Illustration of process to determine the automatic lesion dilation.

Figure 4 .
Figure 4. Illustration of the clustering subroutine.(A) Shows the lesions in the initial condition, where a lesion C has shrunk to a much smaller lesion L i in the time-point 2 and a new lesion L j appeared.In (B) the direction t is determined.In (C), the characteristic distance u is determined and L i and L j are projected to create the constrained contour C .cIn (D) the longest chord distance d is determined and compared to u.The lesions L i and L j are not clustered since d u.< Importantly, lesions L i and L j would be wrongfully clustered if the lesion projection was not considered since * d u > .

Figure 5 .
Figure 5. Registration performance sensitivity analysis with respect to (A) registration technique, comparing whole-body rigid registration (WRR) and whole-body deformable registration (WDR) techniques.(B) By direct versus indirect registration.(C) By modalities involved in the registration.

Figure 6 .
Figure 6.Coronal slice of abdominal region displaying the lesion involved in causing centroid distance outliers observed in figure 5(A) (25.7 mm).The PET signal was omitted to improve visibility of the anatomic structures.

Figure 8 .
Figure 8. Confusion matrix for lesion body-part labeling.'Abdomen-Liver' is a sub-category of the 'Abdomen' category, and 'Chest-Lungs' is a sub-category of the 'Chest' category.

Figure 9 .
Figure 9. Track size sensitivity analysis of identical track ratio results.The numbers next to the markers indicate how many tracks were present in each category.

Figure 10 .
Figure 10.LHS: sagittal slices of the thoracic spine for each time-point.The PET signal was omitted to improve visibility of the anatomic structures.The red dashed outline indicates the clusters determined by the automatic tracking methodology.RHS: section of the reference lesion graph containing the lesion tracks, reference clusters are indicated by a red contour around nodes.Due to matching errors, the three tracks shown in the figure were joined during automated tracking.

Table 2 .
Patient population and imaging information.
a Image acquired outside UWHC and imported to UWHC-PACS.

Table 3 .
Results from the reference lesion matching analysis, used to benchmark the methodology.

Table 4 .
Lesion matching and lesion tracking accuracies shown by patient.Results using automatic dilation.

Table 5 .
Results of lesion body-part sensitivity analysis, using automatic dilation.Body-part sub-categories are indicated in italic.

Table 6 .
Image modality sensitivity analysis of lesion matching accuracy results, using automatic dilation.Only patients 1 and 2 were considered.

Table 7 .
Registration mode sensitivity analysis of lesion matching accuracy results, using automatic dilation.Only patients 1, 2, 7 and 8 were considered.

Table 8 .
Breakdown of the failure modes involved in the 108 errors made using automatic dilation.Failure sub-modes are indicated in italic.