Magnetic resonance biomarker assessment software (MR-BIAS): an automated open-source tool for the ISMRM/NIST system phantom

Objective. To provide an open-source software for repeatable and efficient quantification of T 1 and T 2 relaxation times with the ISMRM/NIST system phantom. Quantitative magnetic resonance imaging (qMRI) biomarkers have the potential to improve disease detection, staging and monitoring of treatment response. Reference objects, such as the system phantom, play a major role in translating qMRI methods into the clinic. The currently available open-source software for ISMRM/NIST system phantom analysis, Phantom Viewer (PV), includes manual steps that are subject to variability. Approach. We developed the Magnetic Resonance BIomarker Assessment Software (MR-BIAS) to automatically extract system phantom relaxation times. The inter-observer variability (IOV) and time efficiency of MR-BIAS and PV was observed in six volunteers analysing three phantom datasets. The IOV was measured with the coefficient of variation (CV) of percent bias (%bias) in T 1 and T 2 with respect to NMR reference values. The accuracy of MR-BIAS was compared to a custom script from a published study of twelve phantom datasets. This included comparison of overall bias and %bias for variable inversion recovery (T 1 VIR), variable flip angle (T 1 VFA) and multiple spin-echo (T 2 MSE) relaxation models. Main results. MR-BIAS had a lower mean CV with T 1 VIR (0.03%) and T 2 MSE (0.05%) in comparison to PV with T 1 VIR (1.28%) and T 2 MSE (4.55%). The mean analysis duration was 9.7 times faster for MR-BIAS (0.8 min) than PV (7.6 min). There was no statistically significant difference in the overall bias, or the %bias for the majority of ROIs, as calculated by MR-BIAS or the custom script for all models. Significance. MR-BIAS has demonstrated repeatable and efficient analysis of the ISMRM/NIST system phantom, with comparable accuracy to previous studies. The software is freely available to the MRI community, providing a framework to automate required analysis tasks, with the flexibility to explore open questions and accelerate biomarker research.


Introduction
Imaging biomarkers give insight into biological processes and have potential to improve clinical tasks such as screening, diagnosis, prognosis and monitoring of response during and after therapy (Omenn et al 2012). In a drug discovery setting, biomarkers can reduce the approval time and cost of novel therapies by the early identification of potentially successful or ineffectual therapies (Workman et al 2006). Two common quantitative magnetic resonance imaging (qMRI) methods are dynamic contrast enhanced imaging (DCE-MRI), which derives microvasculature properties from contrast agent kinetics, and diffusion weighted imaging (DW-MRI) Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. that infers tissue cellularity from water mobility (Shukla-Dave et al 2019). Imaging biomarkers derived from these qMRI methods can provide a non-invasive measure of early changes in the vascularity or cellularity of cancerous tumours, which often precede later morphological changes (Li and Padhani 2012).
The emergence of MRI simulation and MRI guided radiotherapy provides a unique opportunity to develop and translate qMRI biomarkers for oncology (van Houdt et al 2021). Translation of an imaging biomarker into clinical trials and patient care requires technical, biological and clinical validation (O'Connor et al 2017). Technical validation includes the assessment of repeatability, reproducibility and bias between the image biomarker and a true underlying value (Kessler et al 2015). Reference objects with known physical properties, often referred to as phantoms, are recommended for technical validation (O'Connor et al 2017, Shukla-Dave et al 2019, van Houdt et al 2021, Weingärtner et al 2022 and provide an approximation of in-vivo technical performance. In multisite clinical trials, phantoms can assist with the development of standardised image acquisition protocols (Shukla-Dave et al 2019) or allow the use of different optimised protocols at each site (van Houdt et al 2020). Phantom measurements can provide quality assurance throughout an imaging study by detecting changes in scanner configuration or performance (Gunter et al 2009, Keenan et al 2019. Multiple in-house and commercial phantoms have been designed for the assessment of qMRI methods (Keenan et al 2018). This report focuses on a phantom developed by the National Institute of Standards and Technology (NIST) and the International Society of Magnetic Resonance in Medicine (ISMRM), the NIST/ ISMRM system phantom (Stupic et al 2021). The phantom is suitable for evaluating a number of MRI image properties and includes vial arrays of doped water to assess proton density, longitudinal relaxation (T 1 ) and transverse relaxation (T 2 ). Existing system phantom relaxation studies include investigations of repeatability (Jiang et al 2017, Bane et al 2018, Carr et al 2021, reproducibility (Bane et al 2018) and detection of imaging changes due to system upgrades (Keenan et al 2019).
Analysis of system phantom images is often performed with custom software (Bane et al 2018, Carr et al 2021. An open-source software package (Stupic et al 2021) and a commercial solution (qCal-MR™, CaliberMRI, USA) are also available. Phantom Viewer (PV), the open-source software designed by NIST, requires manual region of interest (ROI) placement that may introduce inter-observer variability (IOV). A review outlining the need for an MRI system phantom suggests that software for relaxation analysis should be open-source and enable researchers to include their own models (Keenan et al 2018).
We have developed an open-source software called Magnetic Resonance BIomarker Assessment Software (MR-BIAS) which fully automates the analysis of images of the ISMRM/NIST system phantom for T 1 and T 2 quantification, and is available from http://github.com/JamesCKorte/mrbias. MR-BIAS includes automated ROI detection to reduce IOV and was developed with software extensibility principles (Hillard 2019) to allow users to include new relaxation models with minimal modification to the software. To assess IOV and time efficiency we observed volunteers using our automated software (MR-BIAS) compared to PV. To validate technical performance we compared relaxation times in the human tissue range, as calculated by MR-BIAS with commonly used signal models, against scan data evaluated independently in a previously published study (Carr et al 2021).

Image analysis software
To analyse phantom images, MR-BIAS automates four main tasks (figure 1): image sorting, ROI detection, model fitting and result reporting. MR-BIAS has a command line interface and requires two inputs; a directory of images to analyse and a configuration file to control the image analysis. MRI datasets are sorted into geometric images, T 1 and T 2 image sets. ROIs are detected by registering the geometric image to a predefined ROI atlas. MRI parameters are estimated in the detected ROIs by fitting relaxation models to the T 1 and T 2 image sets. The software has three outputs; a text based log of the success or failure of the analysis, a PDF formatted visual summary of the analysis and a comma separated value (.csv) file of the model fitting results. The tool is written in Python (v3.7) and uses well established python packages for image analysis and reporting.
2.1.1. Image sorting Quality assurance scanning of a system phantom results in multiple MRI image series which are stored in the digital imaging and communications in medicine (DICOM) format on a picture archiving and communication system (PACS). MR-BIAS requires the user to export DICOM images to a local computer hard drive for analysis. To automatically sort the image files, the software scans a user defined image directory and extracts DICOM metadata to identify which image files are associated with geometric images, T 1 image sets or T 2 image sets. Geometric images are used for ROI detection and require a field of view (FOV) covering the T 1 and T 2 arrays in the phantom. The T 1 and T 2 image sets are used for model fitting. All images in a specific T 1 or T 2 image set must have identical spatial parameters with only the acquisition parameters such as inversion time, flip angle or echo time differing between images. The software supports the geometric image, T 1 and T 2 image sets having different spatial parameters.

Region of interest detection
The system phantom has a T 1 array and a T 2 array for relaxation measurements. The T 1 array consists of fourteen vials of NiCl 2 doped water and the T 2 array consists of fourteen vials of MnCl 2 doped water. We use an atlasbased approach (supplementary figure 1) to detect these twenty-eight ROIs. The atlas consists of a geometric image and manually defined spherical regions, with a 10 mm diameter and a centroid as measured in an image viewer. The target geometric image is rigidly registered to the geometric image of the atlas. The inverse of the resulting image transform is then applied to the atlas ROIs to transform them onto the target geometric image. The transformed ROIs are then resampled to spatially match each of the relaxation image sets. For valid ROI detection, the phantom cannot be physically moved between acquisition of the geometric image and relaxation image sets.
Image registration is a two stage process using the Python bindings for SimpleITK (Lowekamp et al 2013) (1.2.4) involving a rough angular alignment followed by a fine rigid adjustment. The angular alignment stage is Figure 1. The MR-BIAS automated workflow requires two inputs; a directory of images (DICOM format) to analyse and a configuration file (.yaml format) to control the image analysis. Images are sorted into geometric images for ROI detection and T 1 and T 2 image sets for fitting of relaxation models. The software has three outputs; a text based log of the analysis, a PDF formatted visual summary and a comma separated value file of the model fitting results.
an exhaustive search of a rigid 3D rotation around all three axes, with 15°angular spacing. A mean squared error metric is used to select the closest matching 3D rotation from the exhaustive search. The second stage registration uses a gradient descent optimiser with a 3D rigid transform (rotation and translation) and a normalised cross-correlation metric.

Parametric model fitting
To estimate T 1 and T 2 relaxation times, we fit commonly used parametric signal models (Milford et al 2015, Bane et al 2018, Keenan et al 2019, Carr et al 2021 to the T 1 and T 2 image sets. The software currently supports T 1 estimation with variable inversion recovery (T 1 VIR ) and variable flip angle models (T 1 VFA ), and T 2 estimation with a multiple spin-echo model (T 2 MSE ). Data from each ROI is pre-processed with a customisable list of operations that are applied in the following order: exclusion from a user defined list, exclusion of saturated signals, averaging over the ROI, and intensity normalisation. The majority of pre-processing operations are performed automatically as a default setting or if explicitly enabled by the user. An exception to this is the user defined exclusion list that, if enabled, requires the user to specify which measurements to exclude. The T 1 and T 2 signal models are fit to the pre-processed data using non-linear least squares optimisation via the Levenberg-Marquardt method as implemented by lmfit (Newville et al 2016) (1.0.2). The quality of each model fit is noted in the PDF report (Supplementary Report 1) and a results data file. For lists of pre-processing options and signal model details see Supplementary Tables 1 and 2. 2.1.3.1. T 1 inversion recovery signal model Estimation of T 1 from a series of inversion recovery images with variable inversion time, T IR , were performed with the following signal equation which includes the equilibrium magnetisation, M 0 , inversion factor, F inv , repetition time, T R , and noise factor, n.
2.1.3.2. T 1 variable flip angle signal model Estimation of T 1 from a series of images with variable flip angle, α, were performed with the following signal equation,

Evaluation of the software
We conducted a volunteer study to evaluate the IOV and time efficiency of MR-BIAS. The quantitative performance of the tool was validated against a previously published study (Carr et al 2021). Evaluation was conducted on two ISMRM/NIST system phantom datasets: a small test dataset (n = 3) collected at Peter MacCallum Cancer Centre (Melbourne, Australia) and a longitudinal dataset (Carr et al 2021) (n = 12) collected over a year duration at Liverpool Hospital (Sydney, Australia). Both centres acquired image data on 3 T Siemens Skyra MRI scanners, each centre had its own system phantom (serial#:130-0093, serial#:130-0111) and followed the acquisition protocol in the phantom product manual (QalibreMD 2016). Further information is provided in the supplementary material, including the MRI sequence parameters (Supplementary table 3) and dataset details (Supplementary table 4).

Inter-observer variability and efficiency
To compare IOV and time efficiency, we observed six volunteers analyse three sets of phantom images with an earlier version of MR-BIAS and a semi-automated analysis program called PV (Stupic et al 2021). The volunteers had varying levels of experience with PV and used the graphical interface for the selection of image data, manual ROI placement, curve fitting and saving of a text based report to disk. The analysis included T 1 estimation with a T 1 VIR model and T 2 estimation with the T 2 MSE model. Volunteers were unable to physically access the same computer due to external factors, so remotely accessed a computer with both MR-BIAS and PV installed to perform the analysis.
We observed each volunteer performing the analysis and recorded the time taken to perform the following tasks: file sorting/selection, ROI detection/definition, curve fitting and reporting. Task time was measured from timestamps in the MR-BIAS log files and manually recorded with a stopwatch for PV. The IOV was measured using the coefficient of variation (CV = σ/μ) where μ is the mean and σ is the standard deviation of the estimated relaxation times for each ROI in a dataset, across all volunteers.

Quantitative performance
The quantitative performance of MR-BIAS was compared to that of a custom script used in a previous study from another research centre (Carr et al 2021). The study dataset consisted of twelve scans of a system phantom, acquired monthly on a Siemens 3 T Skyra MRI simulator. T 1 relaxation was estimated with both T 1 VIR and T 1 VFA models. T 2 relaxation was estimated with the T 2 MSE model. No temperature correction was performed, as the temperature range of this dataset was reported as 18°C-22°C, with no observed effect on the measured T 1 and T 2 times.
For all models, the signal was normalised to the maximum of the average signal. The signals in some ROI can become saturated, clipped to a maximum value, if they exceed the dynamic range of the MRI receiver hardware. Pre-processing for T 1 VFA and T 2 MSE models was configured to automatically exclude saturated voxels from ROIs, or to exclude the entire ROI if >50% of voxels in the ROI were saturated (see Supplementary table 1).
The MR-BIAS analysis settings outlined in the following paragraph were selected to match as closely as possible with the previous study (Carr et al 2021). For all models, the signal was averaged over all voxels in an ROI. The T 1 VFA model was fit on a central axial slice of the 3D image, the study excluded images with a flip angle of 15°for all ROIs, and excluded flip angles of 15°, 20°, 25°and 30°for a specific T 1 ROI (reference T 1 = 121 ms) due to signal saturation. The T 2 MSE model estimation excluded images with a 10 ms echo time due to expected and observed inconsistent phase coherences (Milford et al 2015).
Performance was evaluated with percentage bias (Weingärtner et al 2022) calculated as, whereȲ i is the mean estimated value over potentially repeated measurements and X i is the ground truth measurement for the ith region of interest. The overall bias (Weingärtner et al 2022) across phantom vials in a given relaxation array was calculated as, where N is the number of phantom vials. Overall bias and %bias were calculated for each monthly MRI dataset, with manufacturer provided NMR measurements at 20°C as the ground truth (see Supplementary Report 1 for reference NMR values). To test for significant performance differences (p < 0.05) between MR-BIAS and the custom script, we used a two-tailed Wilcoxon signed-rank test. The Wilcoxon test was performed on each relaxation model with SciPy (v1.7) (Virtanen et al 2020) to test for %bias differences for each ROI and for differences in overall bias. Performance evaluation was restricted to phantom compartments within a reliable physiological range (Carr et al 2021): nine T 1 vials (reference T 1 = 121-1884 ms) and nine T 2 vials (reference T 2 = 19-378 ms).

Inter-observer variability and efficiency
The coefficient of variation in T 1 and T 2 relaxation rates as analysed by six volunteers is shown in figure 3. The coefficient of variation is lower in all ROIs when estimating T 1 and T 2 with the automated software (MR-BIAS) as compared to the semi-automated software (PV) on the three datasets. The average coefficient of variation across all ROIs and datasets for T 1 is 0.03% (MR-BIAS) and 1.28% (PV), and for T 2 is 0.05% (MR-BIAS) and 4.55% (PV).
A comparison of task duration ( figure 4) shows that the automated software takes less time to complete all tasks than the semi-automated software. The largest task duration difference was observed between manual ROI placement (PV) and automated ROI detection (MR-BIAS). The average total analysis time per dataset was 9.7 times faster with the automated software, at 0.78 min (MR-BIAS) as compared to 7.64 min. (PV). MR-BIAS was also more consistent than PV, with less deviation in the coefficient of variation and total analysis time.

Quantitative performance
A total of 11 of the 12 datasets were analysed. One dataset was manually excluded, as it did not meet the ROI detection requirements (the phantom position changed between geometric imaging and the T 1 and T 2 imaging). There was no significant difference in the overall bias of results from MR-BIAS as compared to the previously published study for any of the relaxation models ( figure 5(a)). For the majority of the ROIs, there was no statistically significant difference in the %bias for the T 1 VIR , T 1 VFA or T 2 MSE measurements (figures 5(b)-(d)). The %bias of T 1 VIR was significantly smaller (p < 0.01) in one ROI (reference T 1 = 987 ms) as calculated by MR-BIAS ( figure 5(b)). The %bias of T 1 VFA was significantly larger (p < 0.05) in one ROI (reference T 1 = 121 ms) as calculated by MR-BIAS ( figure 5(c)).

Discussion
We have presented an automated software for analysis of the relaxation arrays of the ISMRM/NIST system phantom. The automated analysis shows a reduced IOV and a reduced total task duration in comparison to the semi-automated analysis tool. The reduction in IOV and total task duration can be largely attributed to the use of automated ROI detection over the manual placement of ROIs. The lack of statistically significant differences in overall bias, or a significant difference in %bias for the majority of ROIs, demonstrates the accuracy of the automated software for T 1 VIR , T 1 VFA and T 2 MSE models. Significant differences in a minority of ROIs for T 1 VIR (1/9) and T 1 VFA (1/9) are attributed to minor differences in ROI position as compared to the prior study (Carr et al 2021).
Setting up MR-BIAS for routine clinical or research use requires basic programming skills, such as the installation of Python and required packages. However, once established, it will enable users with minimal training to analyse system phantom data by specifying a path to an image directory. The command line interface is well suited to research tasks such as the batch processing of multiple datasets, analysing a dataset with multiple models or a sensitivity analysis of model parameters. MR-BIAS has been developed with software extensibility principles (Hillard 2019) to allow future support for different MRI scanners, ROI detection methods and signal models to be implemented with minimal change to the software (Supplementary figure 2). To date, we have modified MR-BIAS to analyse images from an Elekta Unity MR-Linac (Phillips Marlin 1.5 T) and have implemented multiple T 1 VIR models with minor programming effort. There were a number of limitations in our volunteer study of IOV and time efficiency. The remote connection to the computer used for analysis led to a noticeable delay for some volunteers, which may have increased the task time for ROI placement. The majority of volunteers (4/6) were unfamiliar with the system phantom and the analysis task, we would expect a more experienced PV operator to have a reduced task time. The image data was pre-sorted (i.e. all the T 1 VIR images in a folder) for analysis with PV, the file selection task may have had a longer duration if the volunteers were required to manually sort the images. The automated analysis may be affected by the quality of image data and the configuration settings. To assist with image registration used for ROI detection, we recommend that geometric images have a 3D field of view covering the entire phantom and an approximately isotropic resolution of 2 mm or less. The default configuration settings may not be optimal for analysing data from all MRI scanners, a potential solution is to provide scanner specific configuration files and related acquisition protocols on the online repository. MR-BIAS supports custom configuration to allow experienced operators to control the analysis, whilst standard  acquisition and analysis settings may assist with repeatability of routine measurements by less experienced operators.
MR-BIAS was designed to quantify T 1 and T 2 from system phantom data, but could be modified to analyse other elements of the system phantom such as the proton density array, fiducial array, resolution inserts or slice profile ramps. The presented framework automates a number of tasks common to phantom analysis in general. It is foreseeable that this tool could be easily adapted to analyse other quantitative phantoms, such as the NIST diffusion phantom (Boss et al 2014, Palacios et al 2017 by adding a diffusion phantom template for ROI detection and a diffusion model for ADC quantification. The accuracy of atlas-based ROI detection may be affected by the geometric accuracy of different MRI sequences (Walker et al 2014) and manufacturing tolerances between phantoms. To account for offset ROI positions in the current software, the user can manually adjust the central coordinates in the atlas file. A future improvement would be to automatically refine the detected ROI positions using shape based computer vision techniques, such as those applied to automated analysis of PET phantoms (Ulrich et al 2018). The accuracy of T 1 estimation with the T 1 VFA model is affected by deviation of the flip angle (Stikov et al 2015) and could be improved by adding B 1 correction to a future version of the software. Our immediate focus will be collaboration with the research community to further validate the tool and add support for images from other MRI scanner vendors.

Conclusions
MR-BIAS is an open-source software for accurate, repeatable and efficient quantification of T 1 and T 2 relaxation with the ISMRM/NIST system phantom. The software has the potential to streamline quality assurance tasks, and is freely available from http://github.com/JamesCKorte/mrbias. Further use and development of the software by the broader research community is encouraged. Figure 5. The quantitative performance of (purple) MR-BIAS in comparison to (white) a custom script from a previously published study (Carr et al 2021). There was no statistically significant difference in (a) the overall bias for T 1 VIR , T 1 VFA and T 2 MSE measurements across 11 datasets as calculated by MR-BIAS and the custom script. The %bias in each ROI across 11 datasets for (b) T 1 measured with variable inversion recovery (c) T 1 measured with variable flip angle and (d) T 2 measured with multiple spin-echoes. Significant differences between MR-BIAS and the custom script for each ROI are marked with a single asterisk * , p < 0.05, and a double asterisk ** , p < 0.01.