Deep neural network-based approach to improving radiomics analysis reproducibility in liver cancer: effect on image resampling

Pengfei Yang; Lei Xu; Yidong Wan; Jing Yang; Yi Xue; Yangkang Jiang; Chen Luo; Jing Wang; Tianye Niu

doi:10.1088/1361-6560/ac16e8

Abbreviations

ST	slice thickness
CT	computed tomography
DNN	deep neural network
LiTS	liver tumor segmentation benchmark
DnCNN	denoising convolutional neural networks
RMSE	root-mean-square-error
Conv	convolutional layers
SSIM	structural similarity
PSNR	peak signal-to-noise ratio
CCCs	concordance correlation coefficients
GLCM	gray level co-occurrence matrix
GLRLM	gray-level run-length matrix
GLSZM	gray-level size zone matrix
NGTDM	neighborhood gray-tone difference matrix.

Introduction

Radiomics approach, which derives quantitative features from medical images, has shown good predictive value in liver tumors, such as lesion types classification (Shen et al 2020), microscopic vascular invasion prediction (Xu et al 2019), tumor response prediction (Reimer et al 2018) and prognosis (Kim et al 2019). CT is one of the most commonly used imaging modalities for liver lesion evaluation and staging in the clinic (Hennedige and Venkatesh, 2013). Further, CT is also the most widely used image type in radiomics studies of liver tumors (Wakabayashi et al 2019). Yet, a challenge in CT-based radiomics analysis is feature reproducibility (Bi et al 2019).

As more and more radiomics studies are conducted for different cancer types and image modalities, the reproducibility of radiomics studies is increasingly becoming a significant factor hindering the clinical application of radiomics research (Yang et al 2020). Therefore, the standardization of radiomics studies has become an important research field. The feature value in CT radiomics analysis was reported to be affected by acquisition and reconstruction parameters (Choe et al 2019). Mathias Meyer et al found that around 90% of radiomics features were susceptible to CT acquisition and reconstruction settings variations, including dose levels, ST, reconstruction kernels, and algorithm settings in metastatic liver lesions (Meyer et al 2019). Fanny Orlhac et al tested a method for normalizing radiomics features from different datasets to improve reproducibility (Orlhac et al 2019). Jooae Choe et al demonstrated the possibility of using DNN to convert CT images reconstructed with different kernels to improve feature reproducibility (Choe et al 2019). Among these reconstruction setting variations, the ST showed the most considerable impact on feature reproducibility among the reconstruction parameters (Meyer et al 2019). More studies concerning the compensation of ST variations are needed to facilitate the standardization of radiomics studies.

The image resampling is commonly used to unify varying pixel spacings and STes, especially in studies including CT images from different scanners and multi-center (Mackin et al 2017; Shafiq-ul-Hassan et al 2017). For the tradeoff between storage data space requirement (Bellon et al 2014) and radiation dose (Kanal et al 2007), CT images are usually reconstructed with small pixel spacing and large ST in the clinic. As such, upsampling CT images from thick to thin ST is applied in radiomics studies (Mackin et al 2017, Bologna et al 2019) to generate CT images with a minor (He et al 2016, Li et al 2018) and isotropic voxel size (Bologna et al 2019) for better prognostic performance. The interpolation-based methods are currently widely used in radiomics research for upsampling images (Su et al 2019, Ligero et al 2021). However, this approach might affect the accuracy of radiomics features in characterizing tumor heterogeneity on the medical images (Huang et al 2018) and hamper reproducibility across datasets (Traverso et al 2018).

The DNN scheme has been successfully applied in the field of super-resolution of natural images (Yang et al 2019) and transferred to the application of CT image upsampling (Li et al 2021). For example, Umehara et al constructed a three-layer super-resolution convolutional neural network and investigated its application in chest CT images (Umehara et al 2018). Park et al utilized a convolutional neural network with a modified U-net structure to upsample and denoise brain CT images (Park et al 2018). You et al investigated the application of a super-resolution generative adversarial network on cadaveric ankle specimens and abdominal CT images (You et al 2020). Previous studies also demonstrated the possibility of using the DNN model to generate high-resolution chest CT images to improve feature reproducibility in pulmonary nodules and lung cancer (Choe et al 2019, Park et al 2019). With the development of more and more CT radiomics based tools for liver tumors, the requirement for comprehensive analyzing the effect of ST and reducing the loss in upsampling is essential to improve study reproducibility. Yet, to the best of our knowledge, no previous study has utilized DNN to upsample liver CT images and analyzed the improvement of radiomics feature reproducibility in liver tumors. Introducing a DNN scheme to image resampling might improve the accuracy of up-sampling liver CT image ST and the reproducibility of corresponding radiomics features. This study aims to investigate the effect of traditional up-sampling methods on the reproducibility of CT radiomics features of liver tumors and investigate the improvement using the DNN scheme.

Methods

Datasets description

The imaging dataset from the LiTS (Bilic et al 2019) was used for analysis in this study. The CT images of 201 patients were included in this dataset. This dataset was collected from seven clinical centers worldwide and contained CT images of liver tumors with diverse subtypes (primary, secondary, metastasis). The CT images were also acquired with varying scanning protocols (scanners, enhancement contrasts, x-ray energy, tube current, etc), which could represent the commonly used CT images of liver tumors. The pixel spacings of these images ranged from 0.56 mm × 0.56 mm to 1.00 mm × 1.00 mm, and the STes ranged from 0.45 to 6 mm. Only the CT images with a ST smaller than or equal to 1 mm were selected for further analysis. All the liver and tumor regions in the CT images had been segmented by trained radiologists at each clinical center and verified by three experienced radiologists.

Datasets preparation for training DNN

A previous study revealed that both the liver tumor and the neighborhood liver region around the tumor (tumor ring) could provide prognostic value (Sun et al 2018). As such, we performed training on both the tumor region and surrounding liver regions. We cropped the selected CT images with the minimum box, which contained the whole liver region. All the CT images in the box regions were used to train the DNN. The workflow for dataset preparation was summarized in figure 1. Firstly, as we focus on the up-sampling in the vertical direction, the original CT images in the transverse plane were resliced to images in the coronal and sagittal plane. Then, all the resliced images were resampled into the pixel spacing of 1.00 mm × 1.00 mm, as the high-resolution ground truth. Secondly, as the ST of 3 and 5 mm were commonly used in liver CT scan (Cozzi et al 2017, Nie et al 2020), two datasets were generated by down-sampling high-resolution images to the pixel spacing of 1.00 mm × 3.00 mm and 1.00 mm × 5.00 mm. Thirdly, the two datasets were up-sampled into the pixel spacing of 1.00 mm ×1.00 mm using the 1-D bicubic interpolation approach slice by slice, resulting in the low-resolution samples (input). Then the residual images between the high- and low-resolution images were acquired as the labels (output). The training of DNN requires samples and labels of fixed size. Finally, all the samples and labels of varying sizes were randomly cropped into patches of 50 × 50 (Zhang et al 2017). Then the two datasets for different training tasks were acquired, which are Dataset 3 mm for conversion from 3 to 1 mm and Dataset 5 mm for the job of 5–1 mm. For each image at each training epoch, we empirically extracted 128 patches. Then all the training patches were randomly divided into the training, validation, and testing group by the original patient number in the dataset with the proportion of 60%, 10%, 30%, respectively.

**Figure 1.** The workflow of datasets preparation. Note: the content in the brackets showed the pixel spacing of corresponding images; ↓×3 / ↓×5: downsample scale.
Download figure:
Standard image High-resolution image

DNN structure

The residual learning strategy was applied in the training of DNN. The model was trained to learn the residual between the high- and low-resolution images from the low-resolution image. The architecture of DnCNN exhibited high effectiveness in denoising, single image super-resolution, and deblocking in natural images, showing the good extension ability (Zhang et al 2017). As such, we constructed a DNN using the modified DnCNN architecture for dealing with super-resolution of liver CT images. The network consisted of twenty 2D Conv with 64 filters of kernel size 3 × 3 × 64. The rectified linear units (ReLU) were connected after the Conv layers, except for the last layer. The batch normalization layers were added between the Conv and ReLU layers, except for the first and last Conv layers. The RMSE between the actual residual image and that generated from the DNN was applied as the training criteria. Two DNNs were trained for the up-sampling tasks of 3 mm–1 mm (DNN-3) and 5 mm–1 mm (DNN-5), respectively. The network was trained with the following parameters: epoch count: 40; mini-batch size: 128; initial learning rate: 0.01; learning rate decay schedule: 10. The training process was performed on a computer with one NVIDIA™ 1080Ti GPU and Intel^® Core^TM i7-6900K CPU @ 3.20 GHz and RAM of 128 GB using the Deep learning Toolbox in Matlab.

Radiomics feature extraction

Two volumes of interest (VOIs) were used for radiomics feature extraction, i.e. the tumor region and tumor ring region. First, the tumor ring was acquired by performing dilation and erosion on the original tumor region defined by experts for 2 mm width respectively, resulting in a tumor ring with 4 mm thickness, as suggested by the study of (Sun et al 2018). Then, Radiomics features were extracted from the original tumor and corresponding ring region. A total of 540 commonly used radiomics features were extracted, including raw features and wavelet-based features. The raw features quantify the intensity and texture of VOIs from original CT images. Wavelet-based features are extracted from images decomposed and reconstructed by wavelet transform. The prefixes of 'HHH_,' 'HHL_,' 'HLH_,' 'LHH_,' 'LLH_,' 'LHL,' 'HLL_,' and 'LLL_' represents the eight subtypes of wavelet features, denoting the different combinations of signal components, namely high-frequency (H) and low-frequency (L) components, in X, Y, and Z directions, respectively. The description of these features can be found in our previous studies (Yang et al 2020). In addition, the shape features reproducibility to variation of up-sampling methods was assessed as described in supplementary I. (available online at stacks.iop.org/PMB/66/165009/mmedia) Only the CT images containing the tumor region were used for feature extraction and reproducibility analysis.

Quality evaluation and statistical analysis

Two widely used image quality metrics which are the SSIM and PSNR, were used to assess the image quality of the liver region on CT images (Zhou et al 2004). The SSIM between given images $x$ and $y$ was calculated as following:

$\begin{eqnarray*}&&SSIM=\displaystyle \frac{\left(2{\mu }_{x}{\mu }_{y}+{h}_{1}\right)\left(2{\delta }_{xy}+{h}_{2}\right)}{\left({{\mu }_{x}}^{2}+{{\mu }_{y}}^{2}+{h}_{1}\right)\left({{\delta }_{x}}^{2}+{{\delta }_{y}}^{2}+{h}_{2}\right)},\end{eqnarray*}$

$\begin{eqnarray*}&&{h}_{1}={\left(0.01L\right)}^{2},\,{h}_{2}={\left(0.03L\right)}^{2},\end{eqnarray*}$

where ${\mu }_{x}$ and ${\mu }_{y}$ are the mean values of images x and y. ${\delta }_{x}$ and ${\delta }_{y}$ are the variances of x and y. ${\delta }_{xy}$ is the covariance of x and y. ${h}_{1}$ and ${h}_{2}$ are two constants. $L$ is the range of pixel values in x and y. The PSNR of a given image $x$ with the reference image $x$ is defined as:

$\begin{eqnarray*}&&PSNR=10\,lo{g}_{10}\left(\displaystyle \frac{MA{{X}_{x}}^{2}}{MSE}\right),\end{eqnarray*}$

where $MA{X}_{\unicode{x01E8B}}$ is the maximum pixel value in $x$ and MSE is the mean square error of $x.$ The CCC between the features extracted from high-resolution images and upsampled images were used to assess the reproducibility of each feature (Tsai 2017). The cutoff of 0.85 was set as the threshold of CCC for selecting reproducible features (Tanaka et al 2019). For comparison, the traditional interpolation-based up-sampling approaches, i.e. the nearest neighbor (Nearest), bilinear (Linear), and bicubic (Cubic) interpolation methods, were also evaluated for both the image quality and corresponding radiomics features reproducibility. Finally, the paired t-test and Mann–Whitney U test were used to compare image quality and feature reproducibility, where appropriate.

Results

Datasets characteristics

Among the 201 patients in LiTS dataset, 108 patients met the inclusion criteria that the ST of CT images should be equal to or smaller than 1 mm, were used in this study. In these patients, 72 patients were with both the tumor and liver region label on the CT images, and 36 patients had no tumor regions. The maximum number of tumors per patient was 75. The tumor size in this dataset ranged from 38 mm³ to 349 cm³. Finally, 63, 11, and 34 patients were included in the training, validation, and testing groups, respectively. The corresponding numbers of resliced images in the three groups were 28902 (61.91%), 4438 (9.51%), and 13344 (28.58%).

Image quality evaluation

The ground truth and converted images using different methods were shown in figure 2. The images converted by DNN restored more details than those using the other three methods, especially in the box region marked by red dotted lines. The RMSE in the liver, tumor, and tumor ring regions using different methods in the test dataset were summarized in table 1. The mean ± standard deviations (mean ± std) of RMSE using DNN were 13.03 ± 3.90, 14.43 ± 3.94 and 14.64 ± 3 .91 in the liver, tumor, and tumor ring regions respectively for the task of 3 mm to 1 mm and 17.63 ± 4.02, 19.8 ± 4.81 and 20.58 ± 5.76 for the job of 5 mm to 1 mm. The RMSE value using DNN was smaller than that using the other three methods. The improvement of RMSE using DNN was also observed in different sub-groups, as shown in supplementary table S2 (3–1 mm) and S3 (5–1 mm).

Table 1. The RMSE in the liver, tumor, and tumor ring regions using different methods in the test datasets.

RMSE	Liver	Tumor	Tumor ring
3–1 mm
Nearest	18.29 ± 4.33	18.92 ± 6.09	22.87 ± 9.45
Linear	16.75 ± 4.25	17.63 ± 5.39	20.38 ± 8.50
Cubic	15.72 ± 4.23	16.71 ± 5.13	18.92 ± 7.86
DNN	13.03 ± 3.90	14.43 ± 3.94	14.64 ± 3.91
5–1 mm
Nearest	28.20 ± 4.81	25.86 ± 8.04	33.40 ± 15.05
Linear	25.13 ± 4.59	24.28 ± 7.80	29.63 ± 13.50
Cubic	23.75 ± 4.61	23.32 ± 7.35	28.00 ± 12.68
DNN	17.63 ± 4.02	19.80 ± 4.81	20.58 ± 5.76

Both the DNN-3 and DNN-5 showed significant improvement in the image quality compared with traditional interpolation methods. The boxplot of SSIM and PSNR values for images up-sampled from 3 mm thickness and 5 mm thickness images in the test group were shown in figure 3. The mean and standard deviations of SSIM and PSNR of images converted using different methods in the test dataset were shown in table 2. The boxplots of SSIM and PSNR values using different methods in different sub-groups were shown in supplementary figure S1 (a) and S3 (a). The statistical comparison results between different methods was labeled above the boxplots. The DNN method showed higher SSIM and PSNR than all the interpolation-based methods with significance (p < 0.05). For DNN-3 in the test group, the mean ± std of SSIM using the Nearest, Linear, Cubic, and DNN scheme were 0.91 ± 0.08, 0.94 ± 0.06, 0.94 ± 0.05, and 0.97 ± 0.02, respectively. The corresponding PSNR of the four methods were 20.61 ± 8.30 dB, 26.62 ± 8.55 dB, 27.23 ± 8.56 dB, and 34.60 ± 7.14 dB. The DNN method also showed significantly higher PSNR and SSIM value (p < 0.05) in the training and validation groups than traditional methods, showing the excellent reproducibility of the DNN methods. The Cubic approach showed the best performance among the three interpolation-based methods, as shown in supplementary figure S1(b) and S3(b).

**Figure 3.** Comparison of the SSIM and PSNR values of converted CT images using DNN and bicubic (Cubic), bilinear (Linear), nearest neighbor (Nearest) interpolation schemes in the test datasets. The stars on the paired groups in the boxplots showed the range of corresponding p-value. Note: *: *p-value* < 0.05; **: *p-value* < 0.01; ***: *p-value* < 0.001.
Download figure:
Standard image High-resolution image

Table 2. The SSIM and PSNR results using different methods in the test datasets.

Methods	SSIM	PSNR	SSIM	PSNR
	3–1 mm		5–1 mm
Nearest	0.91 ± 0.08	20.61 ± 8.30	0.85 ± 0.11	17.16 ± 7.54
Linear	0.94 ± 0.06	26.62 ± 8.55	0.89 ± 0.08	22.79 ± 8.28
Cubic	0.94 ± 0.05	27.23 ± 8.56	0.90 ± 0.08	22.00 ± 8.66
DNN	0.97 ± 0.02	34.60 ± 7.14	0.94 ± 0.04	30.95 ± 7.06

Radiomics feature reproducibility evaluation

The distribution of robust radiomics features from images converted using different methods was shown in table 3 (3–1 mm) and table 4 (5–1 mm). The features derived from the 3 mm ST images showed better reproducibility than those from 5 mm ST images. For tumor region features from 3 mm ST images using the Cubic interpolation method in the test group, 416 out of 540 (77%) features were found reproducible, while only 322 features (60%) from 5 mm ST images were stable. The comparison result of radiomics feature reproducibility between the tumor and tumor ring regions is summarized in table 5. The comparison was performed based on converted CT images in the test dataset. The tumor and tumor ring regions showed equal average CCC values in the task of 3–1 mm. In the job of 5–1 mm, the tumor ring region showed a higher mean CCC value. All the statistical analyses showed no significant difference between the tumor and tumor ring region (p > 0.05). The CCCs of the radiomics features showed a positive correlation with SSIM and PSNR of the corresponding volumes, as shown in tables 2 and 5.

Table 3. Reproducible radiomics features from images converted using different methods in the task of 3–1 mm.

Category	Nearest		Linear		Cubic		DNN
	Tumor	Tumor ring	Tumor	Tumor ring	Tumor	Tumor ring	Tumor	Tumor ring
Raw features
Intensity	4(57%)	7(100%)	4(57%)	7(100%)	4(57%)	7(100%)	7(100%)	7(100%)
GLCM	4(18%)	16(73%)	5(23%)	18(82%)	6(27%)	18(82%)	14(64%)	18(82%)
GLRLM	2(15%)	7(54%)	8(62%)	11(85%)	5(38%)	11(85%)	12(92%)	11(85%)
GLSZM	5(38%)	3(23%)	7(54%)	9(69%)	6(46%)	11(85%)	13(100%)	11(85%)
NGTDM	2(40%)	4(80%)	2(40%)	4(80%)	2(40%)	4(80%)	2(40%)	3(60%)
Sum	17(28%)	37(62%)	26(43%)	49(82%)	23(38%)	51(85%)	48(80%)	50(83%)
Wavelet-based features
LLL	31(52%)	33(55%)	52(87%)	55(92%)	57(95%)	56(93%)	56(93%)	59(98%)
LLH	37(62%)	30(50%)	55(92%)	48(80%)	57(95%)	50(83%)	57(95%)	56(93%)
LHL	28(47%)	37(62%)	47(78%)	47(78%)	48(80%)	49(82%)	56(93%)	58(97%)
LHH	31(52%)	33(55%)	49(82%)	48(80%)	52(87%)	51(85%)	57(95%)	52(87%)
HLL	25(42%)	43(72%)	34(57%)	46(77%)	42(70%)	49(82%)	55(92%)	52(87%)
HLH	26(43%)	37(62%)	43(72%)	53(88%)	46(77%)	55(92%)	50(83%)	56(93%)
HHL	29(48%)	26(43%)	48(80%)	42(70%)	47(78%)	41(68%)	46(77%)	45(75%)
HHH	25(42%)	27(45%)	44(73%)	42(70%)	44(73%)	44(73%)	45(75%)	53(88%)
Sum	232(48%)	266(55%)	372(78%)	381(79%)	393(82%)	395(82%)	422(88%)	431(90%)
Total	249(46%)	303(56%)	398(74%)	430(80%)	416(77%)	446(83%)	470(87%)	481(89%)

Table 4. Reproducible radiomics features from images converted using different methods in the task of 5–1 mm.

Category	Nearest		Linear		Cubic		DNN
	Tumor	Tumor ring	Tumor	Tumor ring	Tumor	Tumor ring	Tumor	Tumor ring
Raw features
Intensity	2(29%)	4(57%)	2(29%)	5(71%)	3(43%)	5(71%)	5(71%)	7(100%)
GLCM	0(0%)	1(5%)	2(9%)	7(32%)	2(9%)	9(41%)	7(32%)	18(82%)
GLRLM	1(8%)	2(15%)	7(54%)	8(62%)	7(54%)	9(69%)	7(54%)	11(85%)
GLSZM	1(8%)	1(8%)	3(23%)	6(46%)	3(23%)	6(46%)	7(54%)	8(62%)
NGTDM	1(20%)	3(60%)	2(40%)	2(40%)	2(40%)	3(60%)	0(0%)	3(60%)
Sum	5(8%)	11(18%)	16(27%)	28(47%)	17(28%)	32(53%)	26(43%)	47(78%)
Wavelet-based features
LLL	16(27%)	18(30%)	26(43%)	26(43%)	30(50%)	31(52%)	52(87%)	55(92%)
LLH	20(33%)	15(25%)	36(60%)	27(45%)	42(70%)	27(45%)	51(85%)	47(78%)
LHL	17(28%)	24(40%)	35(58%)	31(52%)	39(65%)	39(65%)	46(77%)	44(73%)
LHH	19(32%)	29(48%)	38(63%)	38(63%)	38(63%)	40(67%)	40(67%)	42(70%)
HLL	14(23%)	25(42%)	33(55%)	34(57%)	30(50%)	35(58%)	42(70%)	40(67%)
HLH	19(32%)	20(33%)	40(67%)	46(77%)	40(67%)	49(82%)	37(62%)	35(58%)
HHL	20(33%)	15(25%)	43(72%)	32(53%)	46(77%)	33(55%)	40(67%)	38(63%)
HHH	18(30%)	16(27%)	3(5%)	35(58%)	40(67%)	36(60%)	45(75%)	34(57%)
Sum	143(30%)	162(34%)	254(53%)	269(56%)	305(64%)	290(60%)	353(74%)	335(70%)
Total	148(27%)	173(32%)	270(50%)	297(55%)	322(60%)	322(60%)	379(70%)	382(71%)

Table 5. Comparison of radiomics feature reproducibility from the tumor and tumor ring regions in the test dataset.

CCC (Mean ± Std)	Tumor	Tumor ring	p-value
	3–1 mm
Nearest	0.73 ± 0.27	0.73 ± 0.31	0.07
Linear	0.89 ± 0.14	0.89 ± 0.15	0.76
Cubic	0.90 ± 0.13	0.90 ± 0.15	0.44
CNN	0.93 ± 0.10	0.93 ± 0.12	0.56
	5–1 mm
Nearest	0.61 ± 0.30	0.63 ± 0.32	0.12
Linear	0.80 ± 0.22	0.82 ± 0.19	0.71
Cubic	0.82 ± 0.21	0.84 ± 0.19	0.64
CNN	0.87 ± 0.16	0.88 ± 0.15	0.61

The DNN method showed significantly higher CCC values than interpolation-based methods, demonstrating improved reproducibility of radiomics features. The boxplot of feature CCC values in the test datasets was shown in figure 4. In the tumor region, compared with the Cubic approach, the number of reproducible features increased 54 (10%) and 57 (11%) in the tasks of 3 mm–1 mm and 5 mm–1 mm, respectively. The improvements in the tumor ring region were 35 (6%) features in 3–1 mm conversion and 60 (11%) in the job of 5–1 mm. The boxplots of CCC values in different sub-groups were shown in supplementary figure S2 (3–1 mm) and S4 (5–1 mm).

**Figure 4.** Comparison of the CCC value of features from converted CT images using DNN and bicubic (Cubic), bilinear (Linear), nearest neighbor (Nearest) interpolation schemes in the test datasets. The stars on the paired groups in the boxplots showed the range of corresponding *p-value*. Note: *: *p-value* < 0.05; ***: *p-value* < 0.001.
Download figure:
Standard image High-resolution image

In the tumor region, the wavelet-based features showed higher robustness to the conversion of ST compared with raw features. For example, in the task of 5–1 mm, among the features extracted from images converted using DNN, 43% (26) raw features were found reproducible, while 74% (353) wavelet-based features were stable. After conversion by DNN, the GLCM and NGTDM based features showed the lowest reproducibility among the texture-based radiomics features in both tasks. Among the wavelet-based tumor region features, the 'HLH−' and 'HHL−' sub-groups showed the most inferior reproducibility in the missions of 5–1 mm and 3–1 mm, respectively. And the sub-groups with more high-pass filters ('HHH−,' 'HHL−,' 'HLH−,' 'LHH−') showed less reproducible features in both the two tasks. In the shape feature robustness analysis, the mask images converted by the Cubic interpolation method can be used for robust shape feature extraction in both the 3–1 and 5–1 mm tasks, as shown in supplementary I.

Discussion

In this study, the DNN was applied to up-sample the ST of liver cancer CT images from 3 to 1 mm and 5 to 1 mm. The DNN method showed significant improvement of image quality in PSNR and SSIM, compared with interpolation-based methods. We also demonstrated the capability of DNN in improving the reproducibility of radiomics features after ST conversion, which could serve as a tool to enhance the reproducibility of CT-based radiomics studies in liver cancer.

To fit input images of varying sizes, we applied a patching strategy in generating the training datasets. To make the cropped patches cover the entire liver region as much as possible, we extracted 128 patches from each image at each epoch. Cutting images into patches could also facilitate the DNN in learning local image patterns, which could help recover high-resolution images (Zhao et al 2018). Furthermore, by re-cropping patches at different epochs, slightly different training samples were used in different training periods, which could avoid overfitting in training and enable the DNN to memorize the exact details of the training data.

We analyzed the effect of up-sampling methods on radiomics features extraction, including intensity and texture features. Shape features are also commonly used in radiomics analysis (Xie et al 2019). Shape features are derived from the mask image depicting the volume of interest. The reproducibility of shape features might also be affected when the mask images are up-sampled from low-resolution images. As the proposed DNN in this study is trained using CT images and is unsuitable for upsampling low-resolution binary mask images, we only compared the shape feature robustness among the interpolation-based methods. We found that most shape features are robust to the error in up-sampling mask images, and the Cubic interpolation method is capable and could be recommended for up-sampling mask images in liver cancer radiomics studies.

Although the tumor region and tumor ring regions have different image patterns, we found no significant feature reproducibility difference in the two regions concerning the effect of upsampling. The tumor region showed higher up-sampling accuracy, namely smaller RMSE than the tumor ring region, as shown in table 1, while the corresponding feature reproducibility showed insignificant improvement. The gray level normalization process in radiomics feature extraction might explain this. The gray level normalization process normalizes the VOI into fixed gray-level and might reduce the effect of up-sampling accuracy difference between tumor and tumor ring regions.

Park et al applied the DNN based super-resolution algorithms to improve the reproducibility of CT radiomics features in lung cancer (Park et al 2019). We found that among the wavelet-based features, the features from images transformed by more low-pass filters showed higher robustness, which was consistent with the study of Park et al. In wavelet-feature extraction, the original CT image was decomposed into eight categories by performing a wavelet transform on three directions (X, Y, Z). In each direction, the image signal was decomposed into the high-frequency (H) and low-frequency (L) components by the wavelet transform. The difference between high-resolution and low-resolution images can be treated as a high-frequency signal. By performing low-pass filters on the original image, the difference between high-resolution and low-resolution images might be reduced, resulting in higher reproducibility.

In liver tumor regions, the wavelet-based features were more stable than the raw features from original images. While in lung tumors (Park et al 2019), wavelet-based features were found less robust than raw features in lung cancer. This might be explained by the difference between the characteristics of lung and liver tumors on CT images. The adjacent area of lung tumors is air-filled lung tissue, while the area surrounding liver tumors is substantial liver tissue. The difference in tumor characteristics might result in variation of feature reproducibility. We observed that in the tumor ring region, which is also with higher contrast like lung tumors, the reproducibility of wavelet features is not always higher than that of raw features. For example, in the task of 5–1 mm (table 4), among the features extracted from images converted using DNN, 78% (47) raw features were found reproducible, while 70% (335) wavelet-based features were stable, which is in accordance to the result in lung tumors. As such, the reproducibility of wavelet features in the liver tumor ring region could vary in different upsampling conditions and should be more properly evaluated before the modeling in radiomics studies.

Compared with the study of Park et al we performed a more comprehensive analysis focusing on the ST conversion of CT images in liver cancer. We analyzed the effect on the reproducibility of radiomics features in both the liver tumor and tumor ring regions. Apart from the construction of DNN, we also compared among the traditional interpolation-based methods using both the SSIM/PSNR to evaluate image quality and CCC to quantify feature stability.

He et al and Li et al demonstrated that higher classification accuracy could be achieved by using CT images with thinner ST (He et al 2016, Li et al 2018) in radiomics studies. The CT images with thin ST contained more information that could more accurately reflect the tumor heterogeneity. However, the traditional interpolation methods were found to restore only around 55% and 80% reproducible features from the 5 mm ST and 3 mm ST images, respectively, which could potentially affect the accuracy and extension ability of radiomics models. More studies were needed to investigate the appropriate voxel size in CT-based radiomics studies of liver cancer. The DNN based approach showed a promising result in improving the reproducibility of features, and the corresponding proportion increased to around 70% and 90%, which could reduce the potential effect of ST.

There were some limitations in this study. The datasets applied in this study were comparatively small, which might cause the variation between sample distributions in the training and testing datasets. Secondly, we tested the reproducibility of individual features without further validating the radiomics model accuracy, such as the variation of radiomics signature value and corresponding classification accuracy. Thirdly, we only applied a 2D-DNN for ST conversion without further investigating the performance of 3D-DNN. In the future, we will further optimize the DNN structure used for up-sampling, enroll more datasets from the local medical centers and validate our findings with specific clinical targets.

Conclusion

The DNN based scheme restored more reproducible radiomics features from thick-ST liver CTs than conventional interpolation-based methods and might improve the standardization of radiomics studies of liver cancer.

Deep neural network-based approach to improving radiomics analysis reproducibility in liver cancer: effect on image resampling

Article metrics

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

Abbreviations

Introduction