Unified Bayesian network for uncertainty quantification of physiological parameters in dynamic contrast enhanced (DCE) MRI of the liver

Objective. Physiological parameter estimation is affected by intrinsic ambiguity in the data such as noise and model inaccuracies. The aim of this work is to provide a deep learning framework for accurate parameter and uncertainty estimates for DCE-MRI in the liver. Approach. Concentration time curves are simulated to train a Bayesian neural network (BNN). Training of the BNN involves minimization of a loss function that jointly minimizes the aleatoric and epistemic uncertainties. Uncertainty estimation is evaluated for different noise levels and for different out of distribution (OD) cases, i.e. where the data during inference differs strongly to the data during training. The accuracy of parameter estimates are compared to a nonlinear least squares (NLLS) fitting in numerical simulations and in vivo data of a patient suffering from hepatic tumor lesions. Main results. BNN achieved lower root-mean-squared-errors (RMSE) than the NLLS for the simulated data. RMSE of BNN was on overage of all noise levels lower by 33% ± 1.9% for k trans, 22% ± 6% for v e and 89% ± 5% for v p than the NLLS. The aleatoric uncertainties of the parameters increased with increasing noise level, whereas the epistemic uncertainty increased when a BNN was evaluated with OD data. For the in vivo data, more robust parameter estimations were obtained by the BNN than the NLLS fit. In addition, the differences between estimated parameters for healthy and tumor regions-of-interest were significant (p < 0.0001). Significance. The proposed framework allowed for accurate parameter estimates for quantitative DCE-MRI. In addition, the BNN provided uncertainty estimates which highlighted cases of high noise and in which the training data did not match the data during inference. This is important for clinical application because it would indicate cases in which the trained model is inadequate and additional training with an adapted training data set is required.


Introduction
Dynamic contrast-enhanced MR (DCE-MR) imaging has been used as a non-invasive in vivo imaging modality for diagnosis and treatment monitoring of cancer by injection of a gadolinium (Gd) based contrast agent (CA) (Choyke et al 2003, Jackson et al 2007, Gurney-Champion et al 2020, Dündar et al 2022).A series of dynamic T1weighted images are then acquired to capture the temporal changes in the CA concentration in the tissue.
Quantitative analysis of temporal changes (concentration time curves, CTCs) with tracer kinetic modeling enables the estimation of quantitative physiological parameters, such as vascular permeability and tissue perfusion (Heye et al 2016, Fang et al 2021, Ottens et al 2022).These parameters have been shown to improve diagnosis and treatment planning of various diseases (Berks et al 2021, Wang et al 2022).Nonlinear least squares (NLLS) fit (Ahearn et al 2005) is conventionally used for the fitting of tracer kinetic models to the measured

Methods
In this section, we describe the proposed parameter estimation and uncertainty quantification framework.The core of our approach is a BNN that employs an input-dependent heteroscedastic noise model for parameter estimation and capturing of aleatoric uncertainty.A variational approximation technique is used for modeling epistemic uncertainty.The parameters of the BNN are learned based on simulated data.The data is simulated using an extended Tofts (eTofts) model (Tofts et al 1999) and AIFs obtained from in vivo patients DCE scan.The structure of the framework is shown in figure 1.

BNN architecture
The BNN takes the one-dimensional CA concentration in the plasma, C p (t) and the CA concentration in the tissue, C(t) simulated with the corresponding C p (t), as inputs.A multiscale temporal filter bank (Bliesener et al 2020) was applied to each of the inputs C p (t) and C(t) separately to intrinsically learn the relation between C p (t) and C(t).Each temporal filter bank consists of three 1D convolutional layers with different filter length (i.e. 10, 5 and 3) and stride length (10, 1 and 1) to extract low-level, medium-level and high-level temporal features, respectively.
The pre-connected temporal filter banks to the C p (t) and C(t) with high, medium and small filter sizes are intended to extract features with high, medium and low temporal resolution from each of the inputs, respectively.To observe the fast dynamics of the flow of the CA at the early phase, a filter with the smallest size and stride combination was used.This provides a high-temporal resolution information to capture the flow (Cristina 2015, Bliesener et al 2020).With medium temporal resolution, i.e. medium filter size, longer time scales could be observed.This enables the estimation of ve as the backflux of the contrast agents happens on longer time scales (Cuenod and Balvay 2013, Hansen et al 2019, Bliesener et al 2020).On the other hand, the filters with the highest filter and stride size provide a low temporal resolution information.In low temporal resolutions, contrast leakages could not be observed as they require longer time periods.The total tissue concentration, C(t), which is calculated as C(t) = v p C p (t) + v e C e (t) becomes C(t) = v p C p (t) when there are no contrast leakages (Sourbron andBuckley 2013, Bliesener et al 2020).Hence, the ratio of C(t) and C p (t) for the low temporal resolution filters allow computation of v p when there are no contrast leakages.The outputs of the extracted temporal features from each of the temporal filter banks are concatenated to fuse input information of C p (t) and C(t) as shown in figure 1.
The BNN consists of six fully connected layers as shown in figure 1 which have 600, 400, 300, 200 and 100 and 6 neurons in each of the layers.The network architecture in Bliesener et al (2020) was adopted and customized into a BNN with two additional layers and output neurons specific to the application of the proposed approach.The network depth was increased to six layers to proportionally distribute neurons in the third, fourth and fifth layers.Leaky rectified linear units (RELU) activation functions were used for the first five layers to introduce nonlinearity into the mapping and the last layer has sigmoid activation function to yield parameters in physiological ranges of 0 and 1.To ensure that the estimated k trans values lay in physiological ranges, the sigmoid activation function for k trans estimation was customized pre-activation to yield parameter values in ranges of 0 and 2.
The outputs of the BNN are the posterior distribution of three parameters of the eTofts model θ = {k trans , v e and v p } and their aleatoric uncertainty σ a = {σ a − k trans , σ a − v e and σ a − v p }, for each voxel.The eTofts model parameters θ and aleatoric uncertainty (σ a ) are iteratively updated during the network training to learn a posterior distribution of the parameters and aleatoric uncertainties for θ.A point estimate of the parameters is obtained by sampling from the posterior distribution.In contrast, the epistemic uncertainties σ e = {σ e − k trans , σ e − v e and σ e − v p } are not the direct outputs of the BNN.In order to obtain the epistemic uncertainty, the trained network is run n times (n = 100) and the standard deviation of the model prediction is computed.This quantifies the average variance of the model prediction.We therefore estimate, for each input C(t)and C p (t), the mean of the parameters θ, the aleatoric (σ a ) and epistemic (σ e ) uncertainties.

Extended tofts model
The eTofts model (Tofts 1997) is a two compartmental model, which describes the flow of the contrast agent between the extavascular extracellular space (EES) and the plasma space (vascular space).The eTofts model is solved to estimate three physiological parameters namely, the transfer constant (k trans ), fractional volume of the EES (v e ) and fractional volume of the plasma space (v p ). k trans -

(
) min 1 is the rate by which the contrast agent diffuses from the plasma space to the EES while v e and v p are the EES and plasma space volume per unit tissue volume, respectively.The eTofts model can be written as (Tofts 1997, Heye et al 2016): where C(t) is the time-dependent CA concentration in the tissue, C p (t) is the concentration of the CA in plasma space over time t and Δt is the time delay of the arrival of the CA at the liver tissue.

Uncertainty quantification
In the proposed framework, two separate sources of uncertainty are modeled.Aleatoric uncertainty represents the ambiguity that exists in the data.It cannot be reduced by having more training data as it is inherent to the data (Hüllermeier and Waegeman 2021, Tanno et al 2021, Muchen et al 2022).Epistemic uncertainty quantifies the uncertainties in the trained network that emerge from limited training data (Klepaczko et al 2020).This uncertainty describes the presence of features that do not exist in the training data, and can be reduced by adapting the training data (Hüllermeier and Waegeman 2021, Tanno et al 2021, Muchen et al 2022).In the rest of this section, we explain how these two sources of uncertainties are estimated.

Aleatoric uncertainty and heteroscedastic noise model
A heteroscedastic noise model (Glang et al 2020), which depends on the input data, is applied to quantify the aleatoric uncertainties.We applied the heteroscedastic noise model because this model describes the variance as a function of the value of the input.For the DCE-data used in this study, the main source of incoherent 'noiselike' artefacts are residual undersampling artefacts.The amplitude of these artefacts is expected to scale with the overall amplitude of the image at each time point.The loss function of the BNN applies the heteroscedastic noise model to optimize the input-dependent varying variance, i.e. the aleatoric uncertainty, as a function of the difference between the reference and estimated parameters.The aleatoric uncertainty is enforced to be large when the difference between the reference and estimated parameters is large (equation ( 2)).This ensures robustness of the network to noise and outliers in the training data (Tanno et al 2021) and is crucial for a robust DCE parameter estimation in the presence of noise in the training CTCs.
The loss function given in equation (2) uses this noise model to optimize the training of the BNN.During the training process, the parameters of the eTofts model q ˆand the aleatoric uncertainties (σ a ) of each parameter are both optimized through minimization of this loss function.The loss function is a sum of three terms The first term is the squared sum of differences between the ground truth parameters θ i and estimated parameters q ˆi divided by the aleatoric uncertainty s a i for each voxel i.This denominator enforces s a i to be large when the difference between the reference and estimated parameters is large i.e. high uncertainty is given to voxels with high error.While the second term prevents the spread of s a i from growing indefinitely, the third term p ( ) log 2 n 2 represents a constant from the Gaussian likelihood function (Glang et al 2020).

. Epistemic uncertainty and bayesian inference
Given the CTCs, BNN are able to quantify the epistemic uncertainties by providing a posterior distribution of the parameter estimates.In this work, Bayesian inference (Mittermeier et al 2019) is employed for capturing the epistemic uncertainty due to insufficient training data.The exact Bayesian inference of neural network weights is intractable.Therefore, a variational approximation of the true posterior distribution is required.Variational approximation techniques represent weights of neural networks as a probability distribution instead of point estimates.Thus, a variational approximation of the true posterior is given by q(w|β), where β denotes the parameters of a Gaussian distribution learned by minimizing the variational free energy shown in equation (3) (Blundell et al 2015).
which can be approximated as: The cost function in equation (3) has two terms, namely, the Kullback-Leibler (KL) divergence and the likelihood function.The first term with the KL divergence is data independent and is used to compute the epistemic uncertainty by learning a posterior distribution of the network weights, q(w|β).The prior p(w) provides information about the distribution of the weights prior to observing the CTCs.We have used standard normal Gaussian priors for the weights of the BNN.
During training, we learn the parameters (mean and standard deviation) of the Gaussian distribution β = (μ, α) by minimizing the KL divergence in equation (3).We assumed that β follows a Gaussian distribution.The second term on the other hand determines the data likelihood.In order to compute the likelihood function, Monte Carlo (MC) approximation is performed by sampling network weights w j from the variational posterior distribution (w j ≈ q(w j |β)) as expressed in equation (4) (Blundell et al 2015).

Combined loss function
The combined loss function, shown in equation (5), jointly optimizes the aleatoric and epistemic uncertainties The first term (a) is the heteroscedastic Gaussian noise model in equation (2), which is evaluated at the end of the forward pass to capture the aleatoric uncertainty.The second term (b) is evaluated layer-wise to reduce the KL divergence between the approximate posterior and prior distributions of the weights to capture the epistemic uncertainty (Blundell et al 2015).

Experiments
The proposed deep learning framework was trained on simulated data.We evaluated the performance with simulated and patient data.

Simulated DCE-MRI data
Due to respiratory motion and relatively large field-of-view (FOV) during imaging, high quality 3D DCE-MRI data of the liver is challenging to acquire.In order to overcome the lack of high quality data, we carried out numerical simulations of CTCs to create a training data set.The eTofts model was applied to generate simulated CTCs C(t), using different combinations of pharmacokinetic parameters, θ = (k trans , v e and v p ) and an AIF, describing CA uptake in healthy tissue and tumors.For each patient data, C p (t) was extracted from manually placed ROI on the hepatic artery.For each C p (t), we simulated 75 000 CTCs with the following parameter ranges: max NL was a scaling factor denoting noise levels (NL) in percentages and ( ) C t max was the maximum value of the CTCs in noise-free tissues.Overall, the simulated data was used to investigate the potential of the proposed framework for accurate physiological parameter and uncertainty estimation on different scenarios that mimic in vivo patient data.

Network training
The simulated CTCs were split into 80%, 10% and 10% training, testing and validation dataset, respectively.The testing dataset was used to asses the generalization of the model while the validation dataset was used to evaluate the convergence of the training algorithm (Fang et al 2021).Furthermore, the network weights were initialized with standard normal Gaussian priors and trained with ADAM optimizer with a learning rate of 1e −5 and mini batches of size 50.The network was implemented in Python using keras library with Tensorflow backend (Abadi et al 2016) and was trained for 200 iterations with a GPU workstation (NVIDIA GeForce RTX 2080).Figure 2 showed the changes in training and validation loss over epochs for a BNN trained with 15% noise level.The gradual decrease in the training loss shows the BNN's ability to learn useful features that maps the input CTCs into physiological parameters (Ulas et al 2019).

Patient DCE-MRI data
Data set from five patients with tumor lesions in the liver were used to assess the performance of the BNN trained with simulated data.DCE data of the liver were acquired using a 3T Biograph mMR hybrid scanner (Siemens Healthcare, Erlangen, Germany) for a duration of 5 minutes after administering a bolus of 0.01 mmol kg −1 of hepato-specific contrast agent (gadoxeate disodium).It was a continuous acquisition during free-breathing and motion-corrected image reconstruction was used to obtain DCE images with a temporal resolution of 6 seconds.The acquisition parameters were: TR/TE = 3.3 ms/1.36 ms, flip angle = 12 °, FOV=345 × 345 mm 2 , spatial resolution = 1.5 mm 2 , and partial Fourier factor = 5/8.More details on the patient data acquisition parameters can be found in (Ippoliti et al 2019).

Evaluation on simulated data
The performance of the proposed framework was evaluated by using simulated data.

Accuracy of parameter estimation
We evaluated the accuracy of parameter estimation using the BNN for different noise levels.λ in equation ( 6) was set to 0.006, 0.03, 0.06, 0.09 and 0.12, which corresponds to noise levels of 1%, 5%, 10%, 15% and 20%, respectively.Δt in equation (1) was set to zero for the evaluations on the simulated data.
Network performance.We quantitatively evaluated the parameter estimation by calculating the root-meansquared-errors (RMSE) and R 2 (Inge 1987) values of the output of the network and the reference parameters for different noise levels.In addition, parameter estimation was carried out with a NLLS method for comparison purpose.NLLS fitting was implemented using the inbuilt python minimize function.The NelderMead simplex algorithm (Donald and Lloyd 1975) was applied to minimize the residual between the predicted and measured CTCs voxel-wise.The initial parameters of the NLLS fitting were 0.8, 0.1, 0.01 and 0.001 for k trans , v e , v p and Δt, respectively.In addition, we set the lower bounds of the fitting to [0.001, 0.001, 0.001, 0] and upper bounds to [2, 1, 1, 0.4] for k trans , v e , v p and Δt, respectively to constrain the parameters to physiologically plausible ranges.Δt was set in minutes.
We applied Akaike information criterion (AIC) (Bozdogan 1987) to compare the model performance between the NLLS fit and the BNN on a test data.The AIC for regression models are calculated as (Banks and Joyner 2017): where n is the total number of samples, ò is the mean square error, and k is the total number of parameters estimated by the model.The best performing model would show a lower AIC.OD-NL.We investigated the impact of applying test data with OD-NL on the uncertainty estimate.A BNN was trained with NL ä [5%, 10%] and tested with individual NL of 15% and 20% (table 1).The AIF was the same for training and testing.
OD-CTC.We evaluated how varying the size of training data influenced parameter and uncertainty estimation.A BNN was trained with CTCs using parameters k trans , v e and v p from evenly spaced values in step size of 0.04, 0.02 and 0.02, respectively.The step size for k trans was increased to 0.04 to decrease the training size by 50%.A second BNN was trained with CTCs generated from k trans , v e and v p values with step size of 0.02 for all parameters (i.e.50% more training data).The AIF and NL was the same for training and testing (table 1).
OD-AIF.We investigated the impact of applying test data with OD-AIF on the uncertainty estimate.The reference AIF was modified to a OD-AIF to mimic errors in the estimation of the uptake of a contrast agent in a patient.We broadened the peak amplitude of the AIF by a factor of two and shifted it by six time points (i.e.36 seconds) to generate an OD-AIF.In the liver, delays of the AIF upto 20 seconds are most likely considered to be physiological (Miyazaki et al 2008, Chouhan et al 2016).Hence, a delay of 36 s represents an OD-AIF since larger delay are less likely to be physiological.The modified AIF represented an OD or inaccurate AIF.The NL was the same for training and testing (table 1).
OD-Δt.The sensitivity of the uncertainty estimate to the shifting of the CTCs was investigated by applying test data with OD-Δt.The reference AIF was shifted by two time points (i.e. 12 s) to generate CTCs with OD-Δt.The NL was the same for training and testing (table 1).
3.4.4.Analysis.Coverage evaluation.We evaluated the use of uncertainty as a metric to measure errors of parameter estimates.Errors (ò i = q q -| ˆ| i i ) were computed as the absolute difference between the ground truth parameters (θ i ) and BNN parameter estimates (q ˆi) for each voxel i.The errors were a combination of several sources of uncertainties.Hence, we computed the total uncertainty (s of each voxel i as a combination of the aleatoric (s a i ) and epistemic uncertainties (s e i ).The association between the estimated uncertainties and errors of parameter estimates were evaluated by computing the number of voxels (%) with errors below the 90% confidence interval of the total uncertainty, i.e.  s < * z i t i , where z was the critical value of 1.64.

In vivo application
The BNN trained with simulated data was evaluated by applying it to patients data set to estimate physiological parameters and their uncertainties.

Parameter estimation
Parameter estimation was performed for every voxel within the liver.The liver is most likely characterized by delays of the AIF upto 20 seconds, which are considered to be physiological (Miyazaki et

Uncertainty evaluation of OD data
We evaluated uncertainty estimates for two different OD cases: (i) not having enough training data (OD-CTC) and (ii) use of inaccurate AIF (OD-AIF).OD-CTC.We simulated an insufficient number of training samples to investigate the influence of lack of CTCs in the training data set on the estimated uncertainty.A BNN was trained with simulated CTCs using parameters k trans , v e and v p from evenly spaced values in step size of 0.04, 0.02 and 0.02 and five patients AIF, respectively.This indicates the case of 50% less training data as compared to CTCs generated with step size of 0.02 for all parameters.
OD-AIF.We simulated the use of inaccurate AIFs for generating the training samples and investigated their effect on the parameters and uncertainty estimate.A BNN was trained with CTCs generated without the testing data AIF.The patient data with this OD-AIF for validating the BNN was a unique scan with the same acquisition parameters as the training data as recommended in good machine learning practices (Aggarwal et al 2023).

Regions-of-Interest (ROI) analysis
We manually chose two ROIs, with one encompassing a tumor lesion and the other healthy liver tissue.The parameter estimates for the healthy and tumor ROI by the BNN and NLLS method were compared.We evaluated the statistical significance of difference of each parameter estimate of both a healthy and a tumor ROI.In addition, the statistical significance of differences of parameter estimates for the tumor and normal tissue ROI was evaluated.

Accuracy of parameter estimation
Examples of noise-free and 5% noise affected reference CTCs from a test data are shown in figure 3. The predicted curves approximated the reference noise-free curves very well.In addition, BNN determined k trans , v e and v p parameter values close to the reference.Table 2 summarized the RMSE and R 2 values for 0%, 1% ,5%, 10% and 15% NLs for the NLLS method and the BNN.NLLS achieved the lowest RMSE and highest R 2 value for k trans for 0% noise level.For all other noise levels and parameters, BNN achieved lower RMSE and higher R 2 values than NLLS.RMSE of BNN was on average of all noise levels by 33% ± 1.9% for k trans , 22% ± 6% for v e and 89% ± 5% for v p lower than the NLLS method.In addition, NLLS was more sensitive to noise than BNN with a performance deterioration already for a 1% noise level.The AIC values for the BNN decreased by 28.5% ±7.5%, 12.7% ±6% and 12.2% ±2.7% for k trans , v e and v p on average over all noise levels, respectively.

Uncertainty evaluation of ID data
Figure 4 illustrates BNN estimates of aleatoric and epistemic uncertainties for different NLs.
The aleatoric uncertainties of the test data increased (p < 0.001) when NL was increased from 5% to 10%, 15% and 20% as presented in figure 4 for k trans , v e and v p .The network yielded lower aleatoric uncertainties for lower NLs and higher uncertainties for higher NLs.In addition, testing a BNN with ID data changed the epistemic uncertainty (figure 4) by a smaller value of 0.08 ± 2.7%, 0.03 ± 3.4%, 0.02 ± 5% for k trans , v e and v p on average of all noise levels, respectively.Overall, the epistemic uncertainties look similar for all ID test data.

Analysis Coverage evaluation.
A BNN tested with ID-NLs had a coverage greater than 94% for all NLs and parameters, k trans , v e and v p as summarized in table 3. The percentage of voxels with ID-NLs significantly decreased (p < 0.05) with increasing noise levels.For a BNN tested with OD-NLs, the coverage decreased for the OD data (p < 0.001).

4.2.
In vivo application 4.2.1.Parameter estimation An exemplary slice of the DCE-MR data acquired at = t 4.1 min showing the anatomy of the liver is displayed in figure 6(a).Physiological parameter maps estimated by the NLLS method and the trained BNN are illustrated in figure 6(b).Tumor lesions showed high k trans and v p and low v e values.Generally, BNN showed similar parameter maps with the NLLS giving less noisy estimations, particularly for k trans and v e .For v p , BNN and NLLS showed large differences.The estimated parameter maps for the BNN trained with OD training data is shown in figure 7(b).The results for OD-CTCs and OD-AIFs also showed high k trans and v p and low v e values for tumor lesions.

Computational cost
The BNN required 0.1 min to estimate the eTofts model parameters, aleatoric and epistemic uncertainties from a DCE-MR slice with 192 × 192 voxels on a GPU.In contrast, the NLLS fitting required 270 min to estimate eTofts model parameters on a CPU.The BNN substantially decreased the time cost and was faster than the NLLS fitting.

Uncertainty evaluation of ID data
The aleatoric and epistemic uncertainties of the DCE-MR slice in figure 6(a) are separately shown in figure 8 and figure 9, respectively.In addition, the aleatoric and epistemic uncertainties of the four patients in figures 10(a)-(d) are shown in figures 10(i)-(l) and figures 10(m)-(p), respectively.Similar to the aleatoric and epistemic uncertainties of the ID data of figures 8 and 9, the uncertainties for the four patients were high, especially in the regions of the tumor.

Uncertainty evaluation of OD data
Aleatoric uncertainties of parameters estimated by a BNN trained with smaller training data size (OD-CTC) showed similar values to the results of the ID data as shown in figure 8. On the other hand, the aleatoric uncertainties for k trans and v p of the OD-AIFs showed increased uncertainty than the ID BNN.
However, epistemic uncertainties of all parameters, particularly k trans and v p increased for the OD data (figure 9).(c).Differences between healthy and tumor ROIs were significant for all parameters (p < 0.0001).In addition, the difference between each parameter estimate of both a healthy and a tumor ROI were significant (p < 0.0001).The difference between parameter estimates (k trans , v e and v p ) for a healthy and tumor ROI were also significant (p < 0.0001).Tumors were characterized by high k trans and v p and low v e values in both BNN and NLLS.

Discussion
We implemented a deep learning based framework for both parameter and uncertainty estimation using simulated data, and showed its application in patients DCE-MRI data.Our proposed framework improved the performance of parameter estimation and was more robust to noise than the NLLS method as summarized in table 2.Moreover, BNN was able to assign each voxel its own aleatoric and epistemic uncertainty instead of assigning global uncertainty values for certain set of samples.Such voxel-based uncertainties provide more reliable measurements as compared to global bounds that are prone to either over-or under-estimations (Bliesener et al 2020).
BNN was sensitive to different NLs if all the NLs were ID of the training data (figure 4).Such sensitivity to variations in NLs is important when dealing with patient data because the NLs in DCE-MR varies due to variation in patient size, administered contrast agent dose, and residual motion and under-sampling artefacts (Jiao et al 2020, Ippoliti et al 2021, Pandey et al 2021).Therefore, the BNN could be applied to different NLs and it could capture the NLs in the data.
As expected, for ID data different NLs had only a very small effect on the epistemic uncertainty with an average change of 0.08 ± 2.7%, 0.03 ± 3.4%, 0.02 ± 5%, for k trans , v e and v p over all noise levels, respectively.The epistemic uncertainty increased when a BNN was evaluated with OD data such as, NLs, CTCs, AIF and Δt as illustrated in figures 5(e), (f), (g) and (h), respectively.Insufficient training data increased the variance of the posterior distribution, thereby resulting in higher epistemic uncertainty (Tanno et al 2021).
BNN demonstrated higher aleatoric uncertainty for k trans with an out-of-distribution test data AIF and Δt as depicted in figures 5(c) and (d), respectively.This suggests that OD data also impacts on the estimation of the aleatoirc uncertainty.The network yields less reliable output for OD data which is correctly reflected in a higher epistemic uncertainty.In addition, the network cannot estimate the aleatoric uncertainty as well as for ID data leading to an apparent increase of the aleatoric uncertainty.Further studies are required to investigate this effect in more detail.
Overall, our proposed BNN was able to separate the different sources of uncertainty which resulted from the underlying noise or residual under-sampling artefacts (aleatoric uncertainty) or insufficient training data (epistemic uncertainty).
The RMSE was associated with the total uncertainties and the coverage was greater than 90% for all ID-NL and greater than 80% for OD-NL experiment as shown in table 3. The association between the errors and uncertainty estimates were better characterized with ID-NL because a BNN could capture the variations of NL if the testing data was ID of the training data.This indicates the use of uncertainty as a metric to measure errors of parameter estimates.
Experiments on five patients data set revealed the potential of BNN trained with a simulated data set in estimating k trans , v e and v p of eTofts model parameters (figures 6(b) and 10).The results proved that the simulated data was a good representation of the patient data and the BNN was well trained and generalized the training data (Zou et al 2020).Consistent to the simulations, k trans of a patient data set showed the highest aleatoric uncertainty followed by v p .For the epistemic uncertainty, v e showed the lowest values.This was also consistent with the results from the simulation.
The analysis with and without incorporating the test data AIF showed the utility of epistemic uncertainty in explaining performance of parameter estimation with respect to insufficient training data.The epistemic uncertainty of the parameters increased when the test data AIF was not taken into account.Moreover, the epistemic uncertainty increased when a larger step size was applied.This demonstrates the feasibility of epistemic   In this study, we used a single input two-compartment eTofts model.The use of a dual input twocompartment eTofts model (Yang et al 2016, Chouhan et al 2017, Liu et al 2019, Li et al 2020) which takes the arterial input function (AIF) and portal input function (PIF) as an input would also be possible with our network design and could lead to a more accurate estimation of perfusion in the liver.We applied Δt values up to 18 s in steps of one time-point (6 s) and generated CTCs with the shifted AIFs to confine it with physiological ranges of the liver (Miyazaki et al 2008, Chouhan et al 2016).Estimation of Δt as an output parameter would also be possible with our network design and could lead to a more accurate estimation of perfusion in the liver.Future work with the estimation of Δt as an output parameter could enhance the reproducibility of liver DCE MRIbased physiological parameters estimation and could be of clinical interest (Chouhan et al 2016).Further studies are required to investigate the usability of Δt as a diagnostic parameter.The evaluations with five patients showed the ability of the BNN to estimate physiological parameters better than the NLLS.Nevertheless, larger studies are required to carry out a full assessment of the benefits and limitations of the proposed approach in clinical practice (Aggarwal et al 2023).One limitation of the BNN is the use of a fixed temporal resolution of 6 seconds for simulating the training C p (t) and C(t).Future work is required to improve the generalizability of the BNN to consider varying temporal resolutions.
The epistemic uncertainties indicated how well the BNN performed with respect to less training data and cases where the training data differs from the application data.One limitation of this work is, that it does not utilise XAI methods.The use of explainable AI (XAI) techniques would give further rationales on the estimated parameters of the BNN.One possibility could be the use of post-hoc XAI techniques, such as feature attribution methods (Jiménez-Luna et al 2020, Letzgus et al 2022) to investigate the trained BNN parameter estimates by applying input CTCs simulated with OD-AIF or OD-Δt.This can be used to visually show the BNN's performance with OD-data.In addition, the resilience of the trained BNN to noise in estimating the physiological parameters could be visualized with feature attribution methods.By varying the noise levels in the input CTCs, the effect of noise in the input on the trained BNN parameter estimates can be visualized.Future work with the application of the above mentioned XAI methods could further improve the interpretability of the BNN.

Conclusion
In this work, we were able to estimate aleatoric and epistemic uncertainties associated with parameter estimation of DCE-MR data.Our proposed framework was evaluated by numerical simulations and applied to data set of patients suffering from hepatic tumor lesions.The proposed BNN outperformed the standard NLLS method in accuracy of parameter estimation and computational cost.In addition, BNN was able to estimate the different sources of uncertainty which resulted from the underlying noise or residual under-sampling artefacts (aleatoric uncertainty) or insufficient training data (epistemic uncertainty).Aleatoric uncertainties increase with increasing NLs while, the epistemic uncertainties increase for AIFs, NLs, Δt or CTCs, which were not part of the training data (out-of-distribution).

Figure 1 .
Figure 1.The proposed deep learning framework for parameter estimation and uncertainty quantification.The extended Tofts (eTofts) model, f, is applied to the physiological parameters of the eTofts model, θ = {k trans , v e and v p }, time delay, Δt and timedependent CA concentration in the plasma, C p (t) to simulate the time-dependent CA concentration in the tissue, C(t).The BNN takes C(t) and C p (t) as input to estimate parameters of the eTofts model q ˆ, aleatoric uncertainties, σ a = {σ a − k trans , σ a − v e and σ a − v p } and epistemic uncertainties, σ e = {σ e − k trans , σ e − v e and σ e − v p } by minimizing the loss function L. μ and σ, which are the mean and standard deviation of a Gaussian distribution, are updated iteratively together with q ˆand σ a for the minimization of L.
01, 1] and v p ä [0.01, 0.3] from evenly spaced values in step size of 0.02 to confine them within physiological ranges (Sourbron and Buckley 2011, Ottens et al 2022).The temporal resolution of C(t) and C p (t) was 6 seconds.Noise level between patients can vary due to variation in patient size, administered contrast agent dose, residual motion and under-sampling artefacts (Jiao et al 2020, Ippoliti et al 2021, Pandey et al 2021).To simulate the effect of signal noise and residual under-sampling artefacts in in vivo patient data (Garpebring et al 2013, Bliesener et al 2020, Klepaczko et al 2020, Ottens et al 2022), Gaussian noise (n i ) was added to the CTCs, i.e. noise were random samples drawn from a Gaussian distribution with mean (μ = 0), and standard deviation:

Figure 2 .
Figure 2. Training and validation loss obtained by training a BNN using CTCs with 15% noise level.

Figure 4 .
Figure 4. BNN estimates of aleatoric (σ a ) and epistemic (σ e ) uncertainties for varying NLs for (a) k trans , (b) v e and (c) v p .
Moreover, the DCE-MR slice of four more additional patients are illustrated in figures 10(a)-(d).The corresponding k trans maps estimated by the BNN for the four patients are shown in figures 10(e)-(h).The tumors showed high k trans values.

Figure 6 .
Figure 6.(a) Exemplary slice of the DCE-MR data for a patient with hepatic tumor lesions (depicted as 'darker' spots in the liver) acquired at = t 4.1 min, (b) Estimated parameter maps using NLLS and BNN for k trans , v e and v p .Each row represents a given parameter and the two columns for each row represent the NLLS and BNN.

Figure 7 .
Figure 7. (a) Exemplary slice of the DCE-MR data for a patient with hepatic tumor lesions (depicted as 'darker' spots in the liver) acquired at = t 4.1 min, (b) estimated parameter maps (k trans , v e and v p ) using a test data set consisting of ID data, OD-CTCs and OD- AIFs.Each row represents a given parameter and the first two columns for each row represent the ID and OD-CTCs, while the third column presents the OD-AIFs.
4.2.5.Regions-of-interest (ROI) analysisThe ROIs for a tumor lesion and a healthy region are shown in figure11(a).For the NLLS (figure 11(b)), k trans and ve for the tumor showed an overlap.In addition, k trans and v p of the tumor overlapped with k trans and v p of the healthy region.High variance for the parameter estimates was observed in k trans for the tumor and v e of the healthy region.On the contrary, BNN parameter estimates for k trans , v e and v p in the selected ROIs provided a clear distinction and discrimination between healthy and tumor regions as shown in figure11

Figure 8 .
Figure 8. Aleatoric uncertainties (σ a ) estimated by BNN for ID data, OD-CTC and OD-AIF for k trans , v e and v p .Each row represents a given parameter and the first two columns for each row represent the ID and OD-CTCs, while the third column presents the OD-AIFs.

Figure 9 .
Figure 9. Epistemic uncertainties (σ e ) estimated by BNN for ID data, OD-CTC and OD-AIF for k trans , v e and v p .Each row represents a given parameter and the first two columns for each row represent the ID and OD-CTCs, while the third column presents the OD-AIFs.

Figure 10 .
Figure 10.(a)-(d) Exemplary slices of the DCE-MR data for four patients with hepatic tumor lesions (depicted as 'darker' spots in the liver), (e)-(h), estimated k trans maps for each patient, (i)-(l), aleatoric uncertainties (σ a ) for k trans , (m)-(p), epistemic uncertainties (σ e ) for k trans .Each column represent the corresponding estimates for a patient.
3.4.2.Uncertainty evaluation of in-distribution (ID) dataWe simulated ID training data by applying NL ä [1%, 5%, 10%, 15%, 20%] and evaluated if the aleatoric uncertainty was sensitive to different NL in the data.ID data represented the behavior of the training data.A BNN with NL ä [1%, 5%, 10%, 15%, 20%] was trained.Then, the BNN was applied to test data with a NL of 1%, 5%, 10%, 15% and 20% separately to investigate the effect of different noise levels on the estimated uncertainty.The training and testing noise levels are shown in table 1.The AIF was the same for training and testing.
3.4.3.Uncertainty evaluation of out-of-distribution (OD) dataWe evaluated uncertainty estimate for four different OD cases: (i) training data with noise levels different to the application data (OD-NL) (ii) not having enough training data (OD-CTC) (iii) use of inaccurate AIF (OD-AIF) (iv) uncorrected delays in the bolus arrival (OD-Δt).The same AIF was applied for the training and testing data of OD-NL and OD-CTC experiments.For OD-AIF and OD-Δt experiments, different AIF for training and testing were applied.

Table 1 .
Noise levels and step sizes applied in training and testing of independent BNNs.For all experiments, we have kept the training and testing dataset the same.The difference are the added NL, step sizes and whether the AIF and Δt were ID or OD.The BNN for both the simulated and in vivo test data were trained based on simulated data.In addition to the parameter combinations, also the figure numbers are given where the results of the respective evaluations are shown.We evaluated the time cost of the BNN to estimate the eTofts model parameters, aleatoric and epistemic uncertainties from a DCE-MR slice with 192 × 192 voxels on a GPU.In addition, the time cost of the NLLS fitting for estimating the eTofts model parameters was evaluated using the Python time module (Fang et al 2021) on a CPU.
al 2008al , Chouhan et al 2016)).We applied Δt values up to 18 seconds in steps of one time-point (6 seconds) and generated CTCs with the shifted AIFs.First, a network was trained using a combination of CTCs simulated with NL ä [10%, 15%, 20%], Δt ä [0 s, 6 s, 12 s, 18 s] and five AIFs extracted from patient data.Next, physiological parameter maps (k trans , v e and v p ) were estimated by applying the trained BNN on a DCE-MR patient data set.During training, the BNN learns the shifts in the CTCs arising from the delays in the arrival of the CA.Hence, the inference on patient data will be non-uniform across different voxels of a liver tissue.3.5.3.Uncertainty evaluation of ID dataWe simulated ID training data by applying NL ä [10%, 15%, 20%], Δt ä [0 s, 6 s, 12 s, 18 s] and AIFs from five patients, respectively.Although the true NL is unknown in patient data, we selected the NLs for training visually such that CTCs generated with these NL could be regarded as ID noise levels.The AIFs of the training CTCs were ID as we applied different Δt values that mimic the possible AIF shifts in the in vivo data.

Table 2 .
RMSE and R 2 between reference and estimated parameters for NLLS and BNN.Best results (lowest RMSE and highest R 2 ) are shown in bold.

Table 3 .
Percentages of voxels with error estimates within the 90% confidence interval of the total uncertainty.The coverage was greater for ID-NL (shown in bold) as compared to OD-NL.