Deep learning method for reducing metal artifacts in dental cone-beam CT using supplementary information from intra-oral scan

Objective. Recently, dental cone-beam computed tomography (CBCT) methods have been improved to significantly reduce radiation dose while maintaining image resolution with minimal equipment cost. In low-dose CBCT environments, metallic inserts such as implants, crowns, and dental fillings cause severe artifacts, which result in a significant loss of morphological structures of teeth in reconstructed images. Such metal artifacts prevent accurate 3D bone-teeth-jaw modeling for diagnosis and treatment planning. However, the performance of existing metal artifact reduction (MAR) methods in handling the loss of the morphological structures of teeth in reconstructed CT images remains relatively limited. In this study, we developed an innovative MAR method to achieve optimal restoration of anatomical details. Approach. The proposed MAR approach is based on a two-stage deep learning-based method. In the first stage, we employ a deep learning network that utilizes intra-oral scan data as side-inputs and performs multi-task learning of auxiliary tooth segmentation. The network is designed to improve the learning ability of capturing teeth-related features effectively while mitigating metal artifacts. In the second stage, a 3D bone-teeth-jaw model is constructed with weighted thresholding, where the weighting region is determined depending on the geometry of the intra-oral scan data. Main results. The results of numerical simulations and clinical experiments are presented to demonstrate the feasibility of the proposed approach. Significance. We propose for the first time a MAR method using radiation-free intra-oral scan data as supplemental information on the tooth morphological structures of teeth, which is designed to perform accurate 3D bone-teeth-jaw modeling in low-dose CBCT environments.


Introduction
Dental cone-beam computed tomography (CBCT) has been developed to significantly reduce radiation dose while maintaining image resolution with minimal equipment cost and is increasingly being used in several dental applications such as implant planning and, dental and maxillofacial surgery (Sukovic 2003, Marchetti et al 2007, Miracle and Mukherji 2009, Gupta and Ali 2013, Scarfe et al 2017, Weiss and Read-Fuller 2019. Currently, the removal of metallic object-related artifacts poses a major challenge in lowdose dnetal CBCT. Artifacts related to metallic objects result in severe streaking and shadowing artifacts, which cause a significant loss of the morphological structures of teeth in CT images. Consequently, such artifacts interfere with 3D bone-teeth-jaw modeling for planning diagnosis and treatment in clinical practice (Santler et al 1998, Gateno et al 2007, Schulze et al 2011, Nardi et al 2015. As the number of people with metallic oral appliances such as implants, crowns, and dental fillings continues to increase, metal artifacts have been common and their reduction has drawn increased attention (Draenert et al 2007, Sanders et al 2007  Metal artifact reduction (MAR) is a very difficult problem because the generation of metal-induced streaking and shadowing artifacts is intricately intertwined with interactions between metal, bone, and tissue involving various factors such as beam hardening, scattering, nonlinear partial volume effects, photon starvation, and a high degree of attenuation inhomogeneity (i.e. metal, bone, tissue, air) (Schulze et al 2011, Gjesteby et al 2016. MAR is much more challenging in low-dose dental CBCT environment owing to offset detection, truncation of the field of view (FOV), low radiation dose, and 3D characteristics in image reconstruction (Bayaraa et al 2020). See figure 1. Moreover, when multiple and strong metallic inserts occupy a significant area, low radiation doses often cause photon starvation along metal traces. These result in severe loss or disruption of tooth structures around the inserts in the reconstructed image.
Numerous MAR methods have been developed, including dual-energy approaches (Alvarez and Macovski 1976, Lehmann et al 1981, Yu et al 2012, statistical iterative correction (De Man et al 2001, Elbakri and Fessler 2002, Williamson et al 2002, Menvielle et al 2005, O'Sullivan and Benac 2007, sinogram inpaintingbased correction (Kalender et al 1987, Bazalova et al 2007, Abdoli et al 2010, Meyer et al 2010, Park et al 2013, and deep learning methods (Park et al 2018, Zhang and Yu 2018, Gjesteby et al 2019, Lin et al 2019, Yu et al 2020. Although these methods have been shown to mitigate metal-induced artifacts, their performance in dental applications remain unsatisfactory, and they involve limitations in low-dose dental CBCT environments. Figure 2 highlights that the improvement of corrupted and missing details remains an arduous task even with state-of-art deep learning methods. There seems to be a fundamental limitation in accurately restoring the morphological structure of teeth using only sinogram data severely damaged by metal inserts. Figure 1. Low-dose dental CBCT uses a small detector with offset array. The small detector size leads to a small area of the scanner FOV, which causes the patient's head to be cut off the sinogram data in the transversal direction. This incomplete sinogram data can be combined with beam hardening of the teeth, creating streaked artifacts. Photon starvation is very common in dental low-dose x-ray CBCT, especially when the patient has many implants. Figure 2. Dental CBCT image and segmented teeth before and after applying deep learning (DL)-based MAR. Even though the deep learning method enhances the overall image quality, it still suffers from recovering corrupted tooth details, as indicated in the yellow arrows. Besides, as seen in the orange arrows, it is also very hard to separate two teeth (covered by dental crown) visualized as like being attached to each other due to metal-inducing morphologcal information loss in CT data.
In this study, we propose a deep learning-based MAR method using radiation-free intra-oral scan data as supplemental information for tooth morphological structure, as shown in figure 3. The proposed MAR method is a two-stage approach, which is designed to perform accurate 3D bone-teeth-jaw modeling. In the first stage, we employ a deep learning network that utilizes intra-oral scan data as side-inputs and performs multi-task learning of auxiliary tooth segmentation. The network is designed to improve the learning ability of capturing teeth-related features effectively while mitigating metal artifacts. The suitable incorporation of explicit shapeprior information from intra-oral scan data with deep learning models can provide significant benefits in terms of accuracy, learnability, feature extraction, and so forth , Liu et al 2021. In the second stage, a 3D bone-teeth-jaw model is constructed with weighted thresholding, where the weighting region is determined depending on the geometry of the intra-oral scan data. We also adopted a simulation approach to train the proposed deep learning network. For each metal-free CBCT scan, the corresponding metal-free CBCT and intra-oral scans are generated using a self-developed data generation tool that does not involve any timeconsuming and labor-intensive manual processes.
We conducted numerical simulations and clinical experiments to investigate the potential impact of the use of intra-oral scan data in MAR and bone-teeth-jaw modeling. The results of the experiments demonstrate the feasibility of the proposed method and the benefits of using intra-oral scan data in low-dose dental CBCT environments.

Method
In low-dose dental CBCT, the measured sinogram data P can be expressed as Here, μ E is the attenuation coefficient distribution of a 3D human body to be scanned at an energy E, η is the normalized energy distribution of the x-ray source, à  is a cone beam projection, n is the CT noise, and  is truncation caused by the size and arrangement of the detector (typically, small and offset). See figure 1. In the presence of metallic objects inside the FOV, the standard FDK algorithm (Feldkamp et al 1984) produces severe streaking and shadowing artifacts that cause the image quality of maxillofacial structures to deteriorate. Hence, high-quality 3D bone-teeth-jaw modeling is arduous only with the image.
The goal of the proposed method is to provide a high-quality 3D bone-teeth-jaw (or maxillofacial) model from metal-affected sinogram data P by leveraging intra-oral scan data O. The output should be competitive with a 'gold-standard' bone-teeth-jaw model y mf acquired from an artifact-free CT image y that is reconstructed by P å , where P å represents the artifact-free sinogram data corresponding to P. The intra-oral scan data O provide a 3D tooth surfaces that can be useful as prior information about tooth geometry. It is assumed that intra-oral scan data O provides exact tooth boundary information.
The proposed method is based on the image-to-image learning approach and weighted thresholding that leverages intra-oral scan data as explicit shape prior information of tooth geometry for MAR. The reconstruction map f can be expressed as . 3D dental CBCT and intra-oral scan data. The intra-oral scan data can provide 3D surface information of teeth. We assume that intra-oral scanning provides exact tooth boundary information.
•  † FDK is the weighted FDK algorithm involving the sinogram extrapolation method for addressing offset detector arrangement and FOV truncation.
• f IE is the tooth geometry prior information-based-image-enhancing network f IE , which mitigates metalrelated artifacts.
• f α-WT is a weighted thresholding, wherein the weighting region is determined in basis of the α-shape from intra-oral scan data. This procedure is used for further removing the remaining streaking artifacts around the teeth in constructing a bone-teeth-jaw model.
Here, the input of f is a pair of metal-affected data P and intra-oral scan data O (i.e. f: The overall process is illustrated in figure 4.
Stage 1. Image-enhancing network f IE In our experience, an image domain-learning-based approach can mitigate metal-related artifacts effectively, whereas it tends to have weakness in recovering tooth shape, especially when being destroyed by severe artifacts or when being missing. To compensate for this weakness, we attempt to take advantage of supplemental shape information from intra-oral scan data. We emphasize that data acquisition by the intra-oral scanner does not increase the total amount of radiation exposure to a patient. Let x be a 3D CBCT image reconstructed using the FDK algorithm (i.e. = - To accomplish these goals, two strategies are adopted; side-input layer and multi-task learning. First, additional information of intra-oral scan data is repeatedly enriched during feature extraction in an encoding path. These side inputs can help the network extract tooth shape while compensating for missing or severely distrusted structures through high quality shape information provided by intra-oral scan data. Second, multitask learning is applied, which learns image reconstruction and auxiliary tooth segmentation in a parallel fashion. In the medical imaging field, it has been reported that deep learning-based image reconstruction ability . Overall process of the proposed metal artifact reduction method with explicit shape-prior of intra-oral scan data for low dose dental CBCT-based bone-teeth-jaw modeling. can be boosted by learning other image-related tasks, such as segmentation and registration , Liu et al 2021. In terms of image recovery, the auxiliary tooth segmentation is expected to reveal the shapes of the teeth in the decoding path and the interference of tooth features, which are joint domain information of the interrelated tasks, through the shared parameters. Figure 4 shows the overall procedure of the proposed image-enhancing network f IE . Inspired by M-net (Mehta and Sivaswamy 2017), the proposed network has side-input and side-output layers. In the side-input layers, intra-oral scan data O with suitable resizing is repeatedly added to the encoding path after 3 × 3 convolution. In the side-output layers, tooth segmentation masks are obtained during the decoding path. The detailed backbone structure can be found in Mehta and Sivaswamy (2017).
When ( ) ( ) s j i 0 is the final network output of ith training data and jth slice (i.e.
, the network f IE is trained as follows.
is a set of side outputs in the decoding path,  ℓ 2 is the standard ℓ 2 loss, and  ce is the cross-entropy loss. For convenience, the notation f IE (x, O) is used to represent the output image (i.e. the first channel output).
Stage 2. α-shape-based weighted thresholding f α-WT The next step is bone-teeth-jaw modeling from the metal-artifact-reduced CBCT image obtained in the previous stage. A final 3D bone-teeth-jaw model is obtained by weighted thresholding, which can further reduce the remaining streaking artifacts around teeth. The weighting region is determined depending on the geometry of the intra-oral scan data O. To extract the geometry, the α-shape technique (Edelsbrunner and Mucke 1994) is used. It provides a family of piece-wise linear lines associated with the shape of the teeth. Figure 5 shows the overall process. When Here, p is a point in a grid of y mf , τ is a thresholding constant, and  O is a thresholding region obtained using the α-shape from O. The region  O is obtained as follows. For given intra-oral scan data O,  is a point cloud corresponding to O. Denoted by a  , an α-shape of  is given by a polytope with a boundary a ¶  , which is defined by 3, is exposed , 8 where D  denotes a simplex for , and D  is α-exposed if and only if there exists an open ball B α with radius α such that Ç = AE a  B and ¶ Ç = a   B . Here, ∂B α is a boundary of B α . After the α-shape is obtained, an extension direction on each vertex of a  is defined by taking the average of the normal vectors on the faces that contain the vertex. Along the direction, a  is extended while preserving its shape and converted into a binary mask α O , where the inner regions of the shape boundary are filled with one. Finally, the region  O is determined by where O is the binary mask where the inner part of tooth surfaces in O is filled with one.

Experiment setting
The sinogram data of a real patient were obtained from a commercial CBCT machine (Q-FACE, HDXWILL). The voxel size was 1200 × 654 × 658 with real scale of 0.2 mm for each axis, where 1200 is the number of uniformly sampled projection views in [0, 2π), and 654 × 658 is the number of samples measured by the 2D flat detector for each projection view. CBCT images were reconstructed in a voxel size of 800 × 800 × 400 with a real scale of 0.2 mm. For cone beam projection, an open-source code, known as TIGRE (Biguri et al 2016), was used, where the projection algorithm was implemented by a ray-driven method. The scattering was not considered in this study. All simulated data were consistently generated to have same scale as the real data. A self-developed fullyautomated paired data generation tool was used. The detailed process is described in section 3.2. Figure 6 shows several samples of the simulated data using the data generation tool.
Metal-free CBCT sinogram data were collected from 20 patients without any metallic objects. They were used for data generation. Metal-affected CBCT data were collected from nine patients. They were used for test purposes. Among the metal-affected data, real intra-oral scan data for one patient was provided. The intra-oral scan data was acquired from a scanner (i500, MEDIT), where the file format was provided by the standard triangle language (STL). A set of its vertices is a point cloud in millimeters, where the maxilla and mandible are represented by approximately 100 000 and 70 000 points, respectively. For registration into the dental CBCT system, the method described by  was applied.
In PyTorch environment (Paszke et al 2019), all deep learning experiments were conducted with a computer system equipped with two Intel Xeon CPUs E5-2630 v4, 128GB DDR4 RAM, and four NVIDIA GeForce GTX 2080ti GPUs. The optimization was conducted using Adam optimizer (Kingma and Ba 2014) and multi-GPUs. Batch normalization was applied to achieve fast convergence and minimization (Ioffe and Szegedy 2015). The network capacity (i.e. feature and network depths) was minimized as much as possible while maintaining the backbone structure because of the huge computational cost associated with the CBCT image size of 800 × 800 × 400.
For α-shape implementation, open source packages, Visualization ToolKit (VTK) and Alpha Shape Toolbox (AST), were used. The adaptive values α and τ were selected empirically.

Fully-automated paired data generation
We generated a realistic paired training dataset for MAR through the following procedure, which do not involve any time-consuming and labor-intensive manual process (see figure 7 for overall workflow). As a first step, fullyautomated individual tooth segmentation was performed on metal-free CBCT data by using the technique reported by . Several tooth positions were chosen randomly in which virtual metal implants could be placed. For the crown case, a crown mask was constructed by cutting the roots of chosen teeth based on crown height information for each tooth (Nelson 2014), and then by the erosion process. The crown thickness was randomly set from 0.6 to 1.4 mm. For an implant case, instead of erosion, another process was applied to create an implant screw bar. A line was defined for each tooth that passed through two points of the tooth center in the lowest and middle slices, except those containing a tooth root. Then, the root parts were filled with circles whose center was located at the line, and the radius was empirically set. Using the generated dental crown or tooth implant mask, metal-affected sinogram data was artificially synthesized using the Beer-Lambert law (1) and combined with metal-free sinogram.
The simulation projection data was generated at a tube voltage of 85 keV. A metal attenuation coefficient was randomly assigned from {Au, Pd, Ni, Cr, Zr, Al}. For the numerical simulation, the energy distribution of the x-ray source and attenuation coefficient values were those described elsewhere (Hubbell andSeltzer 1995, Mahesh 2013). Poisson and Gaussian noise were added to take account of the CT noise. A total of 20 metal-free scans were split into two disjoint sets (i.e. 15 and 5 scans) and used for training and testing, respectively. There is no common ground-truth (i.e. metal-free scan) between the two sets. From 15 scans, total 60 paired data (4 data from each scan) were generated and only used for training purpose. From 5 scans, 10 test data (2 data from each scan) were generated. Here, the number of inserted metal implants was randomly set from two to five.
The intra-oral scan data was simulated as a boundary mask of teeth with inserted metal objects. The boundary mask was obtained by applying the erosion process to the segmented teeth and inserted metal objects.

Experimental results
To investigate the advantages of the proposed network, performance comparisons were conducted with various MAR methods. The experiments were based on three test sets: synthesized CBCT data + simulated intra-oral scan data, clinical CBCT data + simulated intra-oral scan data, and clinical CBCT data + real intra-oral scan data. Qualitative and quantitative evaluations were conducted on the synthesized CBCT dataset in which the corresponding ground-truth images are given. For clinical CBCT data, qualitative evaluations were performed. For a quantitative comparison of tooth shape restoration near the metallic objects, we computed the Hausdorff distance (Huttenlocher et al 1993) between the tooth boundary segmented manually from a CT image and the corresponding intra-oral scan data. Here, the Hausdorff distance was computed on a region around the metal.
It should be mentioned that comparison with other methods is unfair, because the proposed method takes advantage of additional information from intra-oral scan data. Figure 8 and table 1 show qualitative and quantitative performance comparisons of the proposed network with linear interpolation, an image domain network, a sinogram domain network, and a sinogram inpainting network. For the linear interpolation, the sinogram reflection technique reported by Bayaraa et al (2020) was applied to deal with metal trace truncation. Image thresholding was used to extract metal traces. For the image domain network, U-net (Ronneberger et al 2015) was trained, which directly maps from an uncorrected image to the corresponding ground truth image. For the sinogram domain network, U-net was trained, which directly maps from an uncorrected sinogram to the corresponding ground truth sinogram. For the sinogram inpainting network, U-net was trained such that only the metal traces in the sinogram were corrected by a network output. Figure 8. Comparison of metal artifact reduction over simulated data with various MAR methods; linear interpolation (LI), image domain learning (Img DL), sinogram domain learning (Sino DL), sinogram inpainting learning (Sino Inpaint DL), and the proposed network. Case 1 is the best MAR case and Case 2 is the worst MAR case. The Hausdorff distance between tooth boundary segmented manually from a CT image and the corresponding intra-oral scan data is provided as a yellow value. The distance was computed in the region of a yellow box.

Test on synthesized CBCT and simulated intra-oral scan data
In the experiments, the proposed network exhibited the best performance, significantly improving the shape quality of teeth and bone associated with bone-teeth-jaw modeling. In particular, the proposed network appears to have an outstanding ability to recover the tooth shape, even if it is fairly disrupted or missing because of metalrelated artifacts. The performance of the proposed method was validated as well via 4-fold cross validation. See appendix for details.
As shown in figure 9 and table 1, an ablation study for multi-task learning (MT) and side input layer (SI) in the proposed network was conducted qualitatively and quantitatively. The single use of MT did not provide any advantage in the sense of improving the reconstruction ability in the quantitative and qualitative sense. Either SI or a combination of SI and MT enhances the reconstruction performance both qualitatively and quantitatively. The combination of SI and MT appears to provide an optimal result owing to the synergistic effect. Figure 10 shows a comparison of the test set of real clinical CBCT data and simulated intra-oral scan data, where the intra-oral scan data were obtained by tooth segmentation from the clinical CBCT data. Here, the method of Jang et al (2021) was utilized, which provides considerably accurate tooth segmentation, even in the presence of metal-related artifacts. Several simulated intra-oral scan data are listed in the first column of figure 10. In three cases from different patients, the proposed network successfully reduced metal artifacts while recovering the boundary of the teeth effectively, whereas the image domain network tended to suffer from loss, blurring, or disruption of the tooth boundary around metal objects.  3.3.3. Test on clinical CBCT and real intra-oral scan data Figure 11 shows reconstructed results using clinical CBCT and real intra-oral scan data. The proposed method consistently preserves or recovers the boundary of the teeth around metal objects compared with the image domain network. See regions highlighted by yellow arrows in figure 11. The performance of the proposed method was compared as well when using simulated and real intra-oral scan data for the same clinical CBCT data. There was some performance degradation in the case of real intra-oral scan data relative to the simulated intra-oral scan case. See the region indicated by the orange arrows in figure 11. 3.3.4. 3D bone-teeth-jaw model construction Figure 12 shows 3D segmented bone-teeth-jaw models by uncorrected image + image thresholding, the proposed network + image thresholding, and the proposed method (the proposed network + the proposed αshape-based weighted thresholding). The result was obtained using clinical CBCT data and real intra-oral scan data. The proposed method clearly enhanced the quality of a 3D bone-teeth-jaw model so that it precisely depicted the tooth and bone structures. The α-shape-based weighted thresholding was found to be powerful in real intra-oral scan data for high quality bone-teeth-jaw modeling.

Conclusion and discussion
This study is a first attempt to pave the way toward MAR utilizing the shape prior from intra-oral scan data. The utilization of radiation-free intra-oral scan data is meaningful in the trend of that dental CBCT has been being developed toward the direction of minimizing radiation exposure while maintaining diagnostic image quality. Our experiments demonstrated the tremendous potential of the intra-oral scan data to have a significant positive effect on the restoration of tooth shape loss by metal-related artifacts.
To train the proposed network, a paired dataset of metal-artifacted data, metal-artifact-free data, and intraoral scan data is required, but data accessibility is limited in clinical practice. Hence, the data generation tool was utilized to provide a realistic paired dataset, where the intra-oral scan data for training was simulated as a set of boundaries of individual teeth segmented in an artifact-free CBCT image. However, the simulation did not fully Figure 10. Comparison of metal artifact reduction with clinical CBCT data and simulated intra-oral scan data; image domain learning (Img DL) and the proposed method. The Hausdorff distance between tooth boundary segmented manually from a CT image and the corresponding intra-oral scan data is provided as a yellow value. The distance was computed in the region of a yellow box. reflect the real scanning environment, such as scanning protocol, condition, and performance. The difference between the training and test domains brought the performance degradation. The performance of the proposed network on real intra-oral scan data can be improved if more realistic simulated intra-oral scan data or a sufficient number of real oral scan data for training can be obtained (Hyun et al 2021).
Intra-oral scan is very accurate for small area scans, but its accuracy gradually decreases as the scan moves away from the start of the scan due to cumulative stitching errors (Nagy et al 2020). Recent advances in intra-oral Figure 11. Comparison of metal artifact reduction with clinical CBCT data; image domain learning (Img DL), the proposed method with real intra-oral scan data, the proposed method with simulated intra-oral scan data. In the second row, we provides an overlapped image of a reconstructed image with the corresponding intra-oral scan data (solid line with apricot color). The Hausdorff distance between tooth boundary segmented manually from a CT image and the corresponding intra-oral scan data is provided as an orange (for real intra-oral scan) or yellow (for simulated intra-oral scan) value. The distance was computed in the region of a yellow box. Figure 12. CBCT-based 3D bone-teeth-jaw modeling via the proposed method with clinical CBCT and real intra-oral scan data. The Hausdorff distance between tooth boundary obtained from a model and the corresponding real intra-oral scan data is provided as an orange value. Here, the value was obtained by computing the Hausdorff distance at each 2D slice in the region of a yellow box and then taking average over slice. scan technology have significantly reduced stitching errors in full-arch description (Winkler and Gkantidis 2020). Specifically, an average full arch description error of 0.008 83 ± 0.010 88 mm in vivo analysis was reported by Kwon et al (2021) for the intra-oral scanner (i500, MEDIT, Seoul, South Korea) used in this study. In our CBCT imaging setup with a spatial resolution of 0.2 mm, the errors can produce variations of 1 pixel or 2 pixels. This error can be effectively addressed using the stitching error correction method proposed by  to mitigate the possible influence on MAR performance caused by error-related shape variation.
The ability of the proposed MAR method can be further improved through complex network architectures and a large-scale training dataset. However, there is a trade-off with the total computational cost for learning that can be critical, especially in high-dimensional data applications . Even for the simple M-net architecture shown in figure 4, at least 10 d are required for training of 300 epochs with a dataset of 60 image voxels under the computational resources used in this study. Even though the use of sophisticated networks or large training datasets can potentially enhance MAR capability, associated hurdles involving high dimensionality should be addressed for practical dental CBCT applications.

Appendix. Cross validation for MAR performance comparison
A total of twenty metal-free scans were equally split into four non-overlapping folds. Here, one fold was retained and used for testing, and the remaining three folds were used for training. This validation process was then repeated 4 times. From each scan of folds, we generated four (for a training fold) or two (for a test fold) realistic metal-artifacted data using the method described in section 3.2. In each iteration (or partition), a total of 60 paired data (= 4 syntheses × 5 scans × 3 folds) were used for training, and a total of 10 paired data (= 2 syntheses × 5 scans × 1 fold) were used for test. Figure A1 illustrates the 4-fold cross-validation process. Table A1 shows the normalized mean square error for the test data in four different partitions. The result shows that the proposed method outperforms the img DL method on all partitions.