Deep-learning-based image segmentation for image-based computational hemodynamic analysis of abdominal aortic aneurysms: a comparison study

Computational hemodynamics is increasingly being used to quantify hemodynamic characteristics in and around abdominal aortic aneurysms (AAA) in a patient-specific fashion. However, the time-consuming manual annotation hinders the clinical translation of computational hemodynamic analysis. Thus, we investigate the feasibility of using deep-learning-based image segmentation methods to reduce the time required for manual segmentation. Two of the latest deep-learning-based image segmentation methods, ARU-Net and CACU-Net, were used to test the feasibility of automated computer model creation for computational hemodynamic analysis. Morphological features and hemodynamic metrics of 30 computed tomography angiography (CTA) scans were compared between pre-dictions and manual models. The DICE score for both networks was 0.916, and the correlation value was above 0.95, indicating their ability to generate models comparable to human segmentation. The Bland-Altman analysis shows a good agreement between deep learning and manual segmentation results. Compared with manual (computational hemodynamics) model recreation, the time for automated computer model generation was significantly reduced (from ∼2 h to ∼10 min). Automated image segmentation can significantly reduce time expenses on the recreation of patient-specific AAA models. Moreover, our study showed that both CACU-Net and ARU-Net could accomplish AAA segmentation, and CACU-Net outperformed ARU-Net in terms of accuracy and time-saving.


Introduction
An abdominal aortic aneurysm (AAA) is a local distortion of the descending aorta, typically occurring in the abdominal region. It is more prevalent in men over 65, affecting approximately 2% of the population [1][2][3]. With aging societies, particularly in Western countries, millions of seniors are affected by AAA, posing significant healthcare and socio-economic challenges. Knowing the risk of rupture and the predictable growth rate of an AAA is valuable in clinical practice. For instance, treatment is recommended when the AAA diameter reaches 5.5 cm in men or 5.0-5.4 cm in women, while smaller AAAs undergo imaging surveillance [4][5][6]. Assessing AAA growth rates aids in determining optimal surveillance intervals, balancing clinical needs and economic strain on patients [7,8].
In a few recent studies [12][13][14][15], when hemodynamic metric, morphological features, and patient health information were combined using machine learning, the prediction performance of AAAs' growth status was encouraging (i.e., the area under the curve [AUC] of receiving operating curve [ROC] > 0.8). In 'imagebased' computational hemodynamics [16][17][18][19], medical imaging data (e.g., computed tomography angiography [CTA], digital subtraction angiography [DSA], etc) from individual patients are used to create 'patient-specific' geometrical models. Then, those geometrical models are meshed for computational fluid dynamics (CFD) simulations to obtain hemodynamic parameters. Although computational hemodynamics is gaining attention and has the potential for clinical translation, a few roadblocks prevent its translation into the clinical workflow. A typical computational hemodynamics protocol uses manual segmentation to obtain vascular geometry. All four studies mentioned above used manual segmentation [12][13][14][15]. Nevertheless, manual segmentation is labor-intensive, timeconsuming, and affected by the operator's subjectivity.
Automatic segmentation models based on deep learning are gaining considerable attention in medical image analysis. In an early publication [20], we quantitatively compared the segmentation performance of five published state-of-the-art models: Graph-Cuts [21], 3DUNet [22], SegNet [23], 3DResUNet [24], and KiU-Net [25], for their ability to segment AAAs' lumen. The DICE scores of those four early CNNbased models (3DUNet [22], SegNet [23], 3DResUNet [24], and KiU-Net [25]) ranged between 0.70 and 0.77. After qualitative inspections of geometries segmented by those four CNN-based models, we concluded that those published CNN models needed more development.
Motivated by the unmet need, our group proposed an innovative deep-learning neural network model named Attention-based residual U-Net (ARU-Net) [26], the first deep-learning-based image segmentation method tested for computational hemodynamics in cerebral aneurysms. We recently developed a Context-Aware Cascaded U-Net (CACU-Net) to classify AAAs' lumen and intraluminal thrombosis [20]. Although both models showed promising results, they were not tested for computational hemodynamic analyses in AAA applications. On the other hand, adding deep-learning-based image segmentation is afflicted by a fundamental problem: A limited understanding of the failure models of existing models prevents the effective development of newer models. Consequently, understanding the limitations of both models while applying them to the model creation for computational hemodynamics will provide further insight, driving innovations and accelerating continued developments.
To this end, our primary goal is to investigate the feasibility of using two published deep-learning segmentation algorithms (i.e., ARU-Net and CACU-Net) for computational hemodynamic analyses of AAAs. In addition to traditional metrics for performance evaluations of image segmentation (e.g., DICE score, sensitivity, precision, etc), we also compared morphological and hemodynamic metrics obtained from ARU-Net and CACU-Net models to those obtained by a human (expert) user.
Our contributions are summarized below: (1) This is the first study in which computer models created by deep-learning image segmentation methods are directly used for the computational hemodynamics of AAAs. Thus, this study gains new insights into integrating deep-learning segmentation methods and computational hemodynamics. Such integration is vital in translating computerized hemodynamic analyses of AAAs into the clinical workflow. (2) We investigate how varying structures of deep-learning image segmentation methods (i.e., two modifications of the classic U-Net model [27]) influence vessel geometry delineation and subsequent CFD results derived from those geometries. Since those simulated hemodynamic metrics are often used for predicting AAAs' growth status and intracranial aneurysm (IA) rupture status [12,15,28], changes in those hemodynamic metrics may have consequences.

Related works
Vascular imaging segmentation methods (e.g., reviews by Moccia et al [29] and Ciecholewski et al [30]) can be divided into (1) traditional and (2) deep learningbased image segmentation methods, as briefly summarized below.

Traditional image segmentation methods
Traditional methods, also called semi-automatic methods, are image segmentation techniques that primarily rely on prior knowledge. Freiman et al [31] presented an iterative model-constrained graph-cut algorithm to segment the AAA by iteratively growing it from an initial manual annotation. Zohios et al [32] developed a geometrical method for level set-based AAA segmentation through boundary curve fitting and reconstruction on coarse manual annotations. A major drawback of the traditional method is the lack of generality, which means that users need to have model-specific manual input for the algorithms to perform further processing.

Deep learning-based image segmentation methods
Deep learning methods are being widely appreciated in the medical imaging field due to their excellent feature learning ability. It can be further breakdown into two classes: (1) U-Net structures and (2) cascade networks.
For the U-Net and its variants, Patel et al [33] exploited a DeepMedic [34] based architecture for aneurysm detection from DSA sequences and compared its performance with the 3D U-Net [35] model. Their models are highly computationally expensive and not generalized for different datasets. Hong and Sheikh [36] presented a deep belief network (DBN) for detecting and segmenting the preoperative AAA region of 2D CTA. As one of the U-Net variants, our previously proposed ARU-Net [26] has multiple advantages compared to other methods: (1) the depthaware attention gates are grid-based gating, which ensures that the attention coefficient focus on local areas to preserve small secondary blood vessels; (2) dense label prediction maintains a large amount of detailed knowledge and location information; (3) the simplicity of ARU-Net's structure maintains a relatively low computational cost. However, despite its excellent focal information learning, global information extraction is the one weakness of the ARU-Net.
Cascade networks often contain fine and coarse segmentation stages. Chen et al [37] proposed a CCDU-Net for segmenting aneurysms of 3D TOF-MRA images. The CCDU-Net is a cascade network consisting of a convolutional neural network for coarse segmentation and a DU-Net for fine segmentation. The dual-channel inputs feature of DU-Net can augment the vascular morphological information by combining both the vessel image and its contour image. However, the performance of the fine segmentation stage can be significantly affected by the results of the coarse segmentation stage, which is prone to false detection and missed detection. Wu et al [38] presented a network structure cascade of a fine-tuned feature pyramid network (FPN) and a traditional 3D V-Net. The FPN can detect aneurysm location, followed by a dual-channel ResNet aneurysm classifier to increase accuracy. Then the detected aneurysm is segmented by 3D V-Net. The limitation of this research is that it only fine-tuned the FPN model and could not be applied to small datasets. As a state-of-the-art method, the advantages of our proposed CACU-Net [20] are: (1) the second stage of the training process augments both the low-level appearance features of the raw images and predicted probability maps of high-level shape information, which allows the classifier to correct early prediction errors by exploiting new contextual features; (2) benefit from the learned local shape and connectivity contained in the posterior distribution of labels, CACU-Net can effectively perform discriminative mining of contextual information around the target to improve segmentation performance; (3) the ability to naturally handles the balance between image features and contextual knowledge due to a deeper supervision and flexible configuration of CACU-Net. As a result of great long-range global information detection, minor details of the image might be ignored.

Methods and materials
Recall that our goal is to evaluate the overall performance of the two recently published neural networks regarding computer model creation for AAAs. Essential details of those two neural networks and qualitative and quantitative comparisons methodology are included below for completeness. Our overall workflow is shown in figure 1.

Data acquisition
Thirty (30) sets of CTA angiography images randomly selected from our internal database were used for this study. Dr McBane, a Cardiologist from the Cardiovascular Medicine Department of Mayo Clinic, provided imaging data used in this study. The institutional review board at Michigan Technological University (Houghton, MI, USA) approved our study. All CTA data were stored in Digital Imaging and Communications in Medicine (DICOM) format.
The in-plane resolution of CTA data is 512 × 512, and the slice number ranged from 214 to 2433, Figure 1. A schematic diagram showing the overall workflow of this study. Our protocol includes four steps indicated by different colors. In Step 1 (blue textboxes), two different operators manually segmented the raw DICOM. The segmentation produced by an experienced operator (human 1) was used as training labels (ground truth) for the respective deep-learning-based segmentation algorithm. In subsequent Step 2 (green textboxes), training and testing of the CNN and postprocessing of the predicted models were conducted. In Step 3 (yellow textboxes), 3D volumetric computer models were generated (part A), followed by running CFD simulations (part B). In Step 4(violet textbox), morphological and hemodynamic analyses were performed.
representing a slice thickness of 0.5-1.25 mm/slice. The in-plane resolution of CTA data ranged from 0.75 * 0.75 mm to 0.95 * 0.95 mm.

Manual annotation
Since ARU-Net and CACU-Net used supervised learning, all 30 image sets were first annotated by an expert (i.e., human 1), and those labels were used as the ground truth (GT) for the training of the ARU-Net and CACU-Net models. Another human user (human 2) also manually annotated the CTA data; hereafter, their results are referred to as manual segmentation results.
Manual annotation started with importing a DICOM image set to Mimics Innovation Suite (V.24.0, Materialise Inc. Leuven, Belgium) for initial processing. A mask representing the major region of the descending aorta was first generated based on the intensity differences between the aorta (including the AAA) with its surroundings. The mask was then transformed into a stereolithography (STL) file, in which unstructured 3D triangles represent the lumen. All irregularities and errors in the STL file were fixed using 3-Matic software (Version 16.0, Materialise Inc., Leuven, Belgium). All major outlets were maintained, including the mesenteric arteries, celiac trunk, and renal arteries. Smaller outlet vessels were removed to reduce the structural complexity, as we verified that the omission of those small secondary arteries has a negligible impact on aneurismal hemodynamics. This study followed the segmentation workflow done by Rezaeitaleshmahalleh et al [15], verified by our clinical collaborators (a cardiologist and a vascular surgeon) from the Mayo Clinic.

Neural network implementation
modified from a classical 3D U-Net structure, as shown in figure 2. Compared to the classic 3D U-Net, the ARU-Net implemented in this study integrated the attention gate module at each decoding layer to enhance spatial resolution. Specifically, the attention gate module was added to the long skip connection, where each encoder propagates information to the decoder.

CACU-Net overview
Context-aware cascaded U-Net [20] (CACU-Net, figure 3) consists of a residual 3D U-Net and an autocontext 3D U-Net. Notably, the last convolutional layer adopts the context module for capturing longrange contextual information. The major advantage of the CACU-Net is its ability to implement multi-scale contextual information through a two-stage training process. In the first stage, the context-aware 3D U-Net captures contextual information from the low-resolution levels. The second stage takes that low-resolution information as input into the full-resolution 3D U-Net and extracts contextual information at a higher resolution.

Training of deep learning models
Before training, all raw imaging data in DICOM format were converted to Neuroimaging Informatics Technology Initiative (NIfTI, or nii) format. At the training stage, 20 randomly selected cases were used. Both ARU-Net and CACU-Net were implemented into the PyTorch framework (Version 1.11.0) and tested on a computer node with dual Tesla V100 PCIe GPUs (32 GB RAM each) for a total of 600 epochs. The batch size was set to 2 due to the limitation of CUDA memory, and the initial learning rate was set to 10 −4 . A supervised decay coefficient of 0.33 was applied for ARU-Net, and an Adam optimizer was selected for training. Referring to CACU-Net, the Adam algorithm optimized the learning rate, and the adaptive adjustment was guided by a multi-step learning rate (MultiStepLR) schedule approach. The remaining 10 cases were used for testing and generation of predicted models.
The deep learning segmentation results were qualitatively compared with manual labels using different morphological comparison methods, including DICE score, Relative Volumetric Error (the ratio of the absolute error to the measured volume), Sensitivity, Specificity, HD95 (95% Hausdorff distance [39], and Average symmetric surface distance (average of all the distances from points on the boundary of machine segmented region to the boundary of the ground truth).

Volumetric model and CFD simulation
The CFD workflow was verified with both phasecontrast magnetic resonance angiography (PC-MRA) [40,41] and ultrasound Doppler [42]. More detailed protocols are previously published and can be found elsewhere [15,19,43,44], and thus, we only provide essential information below.
Following our previous publication [15], cylindrical flow extensions were added to all models to eliminate the impact of plug flow boundary conditions [45,46]. Then Tetgen mesh generator within VMTK was used to generate computer meshes with five boundary layers. The generated volumetric mesh varied by around 5 million to 9 million tetrahedra elements, depending on the complexity and size of the vasculature.
The volumetric mesh was then loaded into ANSYS FLUENT (V.17.0, ANSYS-FLUENT Inc., Canonsburg, PA, USA) for CFD simulations. The CFD simulation setting, including mesh sensitivity tests, was identical to a recent publication [15]. Upon completion of CFD simulations, morphological and hemodynamic parameters were calculated with C++ and Python scripts using VMTK. Brief descriptions of those morphological and hemodynamic parameters can be found in Supplementary Material Part 1.
Three statistical analysis methods were applied to verify the agreement between manual and deep learning-based segmentation, including Pearson's correlation coefficient (PCC), linear regression, and the Bland-Altman method [47].

Image segmentation performance
For qualitative evaluation of the algorithms' performance in image segmentation, all surface models were compared as a whole geometry using 6 different parameters, shown in table 1. Results shown for ARU-Net and CACU-Net are comparable, especially the DICE score, sensitivity, and specificity. All three parameters' values (DICE score, sensitivity, and specificity) exceed 0.9, and the minimum RVE values are around 0.15; our results indicate a good overlap between the AI-segmented surface model and the ground truth. Furthermore, the average HD95 values are 11.614 mm and 8.208 mm, and ASSD values are 2.266 mm and 2.181 mm for ARU-Net and CACU-Net, respectively, representing a minor geometrical discrepancy in the deep learning-based segmentation.
Recall that typical AAA diameters are around 50 mm; our segmentation errors might be sufficiently small for adopting our methods in the clinical workflow. However, our methods must still be validated with more data, preferably from an independent cohort.
Overall, our preliminary results show good consistency between human segmentation and deep learning-based segmentation: only 2 out of 10 cases have some degree of surface error. Figure 4 demonstrates that both algorithms falsely extracted unwanted regions and, thus, overestimated the extent of AAA. Despite the difference, the simulated wall shear stress values still exhibit a high similarity.
Nevertheless, we observed the two cases shown in figure 5, in which segmentation outcomes of the same imaging data differed under ARU-Net and CACU-Net. Notably, ARU-Net preserved smaller vessels, while CACU-Net missed those small vessels and preserved extra length on larger vessels.
After the vasculature segmentation from CTA, considerable manual processing/editing remained to create computer models for computational hemodynamics. Consequently, the processing time needed for CFD model creation includes image segmentation and post-segmentation editing time. The time usage recorded in table 2 shows that each case's CFD model creation time was around 2 h on average for manually segmented geometries (20%-30% image segmentation time + 70%-80% editing time).
In contrast, on average, the CFD model creation time needed was reduced to around 10 min for  ARU-Net and CACU-Net. It is worth noting that the inference time needed for trained ARU-Net and CACU-Net models is low (approximately 30-60 seconds). Hence, our results in table 2 indicate that limited manual editing is required to meet the minimum requirements for successfully running CFD simulations when geometries are obtained from ARU-Net and CACU-Net.

Quantitative comparison analysis
To better evaluate the performance of the two ARU-Net and CACU-Net models, we performed Pearson correlation coefficient analysis and linear regression analysis to verify the linearity and consistency between different segmentations: Ground Truth (annotated by human 1), Human 2, and automated segmentation. The average PCC and average slope between human operators are 0.967 and 0.933, indicating a good correlation and linearity. Regarding automated segmentation methods, the average PCC and slope for ARU-Net are 0.951 and 0.897, respectively; the same numbers for CACU-Net are 0.965 and 0.930, respectively. Overall, there was a good correlation and linearity between deep learning-based segmentation and manual segmentation. CACU-Net slightly outperformed ARU-Net.
Table S1 (Attached in Supplementary material) shows that the Bland-Altman analysis shows a generally positive bias on morphological parameters, except for aspect ratio and UI, shown in the bold font numbers. This observation indicates that deep-learning segmentation tends to overestimate the size of the aneurysm. Furthermore, ARU-Net has a higher bias compared with CACU-Net.
Qualitative comparisons of all 10 cases can be found in Supplementary material (figure S2).

Discussion
Based on the quantitative and qualitative analysis results, ARU-Net and CACU-Net performed well when applied to segment AAAs, particularly in creating CFD models. Despite both networks having almost identical DICE scores, sensitivity, and specificity, CACU-Net has a lower average HD95 and ASSD. Our finding suggests that CACU-Net is more similar to the ground truth. To understand the reason causing the performance difference, we need to compare two networks from an algorithm perspective. ARU-Net was originally proposed for cerebral aneurysm segmentation, in which the small blood vessels in the brain circulation have complex network structures. It utilizes an attention module for local information enhancement and extraction, allowing the network model to correctly separate the focal region from the surrounding noise.
In contrast, CACU-Net was proposed for simultaneously classifying AAAs' lumen and intraluminal thrombosis. The cascade network and context module enable long-range context detection and integration. With a more advanced context-aware configuration, CACU-Net performed slightly better in our case than ARU-Net.
As shown in figure S2. in supplementary material, ARU-Net, and CACU-Net provided vessel geometries  with good quality, and their quality was generally comparable. Certainly, ARU-Net and CACU-Net typically produced longer small vessels. Differences in the small upstream vessels, as depicted in figure 5, appear to have minimal to negligible effects on the hemodynamics within the aneurysm for two reasons.
First, those small vessels are 1∼3 mm in diameter, and the aorta's diameter is often between 30-40 mm. Second, during the CFD model generation, small vessels are often first shortened, followed by adding a cylindrical extension (see section 3.4). Also, small branches off the iliac artery are removed for CFD model creation. Collectively, such length differences of small arteries make little impact on the created CFD models. In only one case (Case 2 in figure S2), we observe that both networks preserve a small vessel not included in the ground truth. The diameter of the preserved vessel is around 2.7 mm, which is about 1/11 of the inlet diameter. However, there remains the possibility of localized clinical investigations where focal hemodynamics could indeed differ in the presence or absence of these small vessels.
It is also significant to note that, in the framework of computational hemodynamics, retaining upstream and downstream vessels with reasonable dimensions allows us to maintain the integrity of our computer models. However, the reasonable dimensions are anatomy-specific. For instance, including vessels whose dimensions are 3.0 mm in diameter (1/10 of the aorta) or more around an AAA may be reasonable. When we model the brain circulation, we should include smaller upstream and downstream vessels (perhaps, 0.5 mm or more). This consideration has not been met and, thus, motivates us to develop image segmentation algorithms that allow user control (e.g., vessel length, size, etc) to some extent. Relative to CFD model creation, manual annotation of the original image is highly subjective, and there are inter-and intra-operator variability issues, hampering the reproducibility. As a result, many other researchers are putting effort into mitigating human errors [48][49][50]. In this study, the PCC and LR slope values between two human segmentations are 0.967 and 0.933, as shown in table 3. The results are relatively consistent, as demonstrated by the Bland-Altman analysis in table S1. We all hope that the deeplearning-based segmentation can be integrated into the computational hemodynamics workflow to (1) reduce the time for model creation and (2) enhance reproducibility.
Both networks preserved smaller falsely identified regions for those two cases, causing outliers in the Linear Regression and B-A analysis. This observation also partially explained the existence of positive biases in estimated AAAs' volumes. Moreover, in figure 4, fewer falsely extracted regions (see red arrows in figure 4) are generated by CACU-Net.
As shown in figure 6, the potential cause of those surface errors are the calcified regions (see red arrows in figure 6) which have high-intensity values on the image. We find a large thrombus region on the left plot in figure 6 (see an area between the red and yellow contours in figure 6). Also, the suspected thrombus region might connect with a calcified aorta wall. Low or no contrast exists to differentiate cavity and intraluminal thrombosis, causing both ARU-Net and CACU-Net to make mistakes, as shown in figure 6. Both cases required manual editing during the post-processing stage. More work remains to segment low-contrast regions accurately.
Our results in table S1 suggest that positive biases exist when the aneurysm volume is overestimated. As a result, the average WSS and low WSS area (LSA) are underestimated and overestimated, respectively. Recall that large aneurysms lead to small overall WSS and large low WSS shear areas; that explains the biases found among hemodynamic parameters, as shown in table S1.
One limitation of this study is using a rigid wall boundary condition for the CFD simulation. This implies that the wall or boundary surface is treated as entirely fixed and immovable. While this assumption is prudent for computational efficiency, it may influence the accuracy of hemodynamic assessments of AAAs. In contrast, other studies [51][52][53][54] have employed the Fluid-Structure Interaction (FSI) models, which hold greater physiological accuracy. The FSI model accounts for the arterial wall's deformation in response to blood flow, while adopting a rigid wall may lead to potential underestimation of vortex development and overestimation of shear stress due to the exclusion of wall compliance. It may be worthwhile to conduct a comparative study to gain a more comprehensive understanding of how different flow models can potentially influence the outcomes of CFD simulations. Another limitation is that our data size is small, and our ongoing plan is to extend the research into a large patient data set.

Conclusion
We implemented ARU-Net and CACU-Net for AAA segmentation in this feasibility study. Both automated segmentation methods can significantly reduce the model recreation time (from ∼2 h to ∼10 min). The segmentation results from both networks show an excellent correlation with the outcomes from an experienced human operator. Compared with ARU-Net, CACU-Net's performance was slightly better. More importantly, the simulated hemodynamic metrics using models created by two AI models were comparable to those obtained using models created by a human expert user. We noticed that the segmentation results were relatively poor under extreme circumstances (e.g., massive thrombosis combined with multiple calcifications). Hence, our ongoing work is to improve further our ability to generate CFD models.
One of our primary research goals is to establish an integrated clinical streamline for physicians to gain access to essential hemodynamic information during medical decision-making. Hence, automated medical image segmentation-based is a critical hindrance to overcome. With a reliable and accurate automated segmentation method, our future research will focus on assembling an efficient workflow that physicians can utilize.

Acknowledgments
We thank Dr Robert McBane from Mayo Clinic (Rochester, MN) for providing imaging data used in this study. We also want to thank former lab members (Mr Tonie Johnson and Kevin Sunderland) for their contributions to early discussions and data processing. Dr Brian Yuan at Michigan Technological University provided technical support for using Applied Computing's GPU cluster. We thank the American Heart Association (POST1022454 to NM) and Michigan Technological University (HRI Research Fellowship to ZL) for funding support.

Data availability statement
The data cannot be made publicly available upon publication because they are owned by a third party, and the terms of use prevent public distribution. The data that support the findings of this study are available upon reasonable request from the authors.