A computational strategy for the identification of pulmonary squamous cell carcinoma in computerized tomography images

The objective of the work is to propose a computational strategy to identify lung squamous cell carcinoma in three-dimensional databases (3D) of multislice computerized tomography. This strategy consists of the pre-processing, segmentation, and post-processing stages. During pre-processing, an anisotropic, gradient-based diffusion algorithm and a filter bank are used to address artifact and image noise issues. During segmentation, the technique called region growing is applied to pre-processed images. Finally, in the post-processing, a morphological dilation filter is used to process the segmented images. In order to make value judgments about the performance of the proposed strategy, the relative percentage error is used to compare the dilated segmentations of the squamous cell carcinoma with the segmentations of the squamous cell carcinoma generated, manually, by a pulmonologist. The combination of parameters linked to the highest PrE, allows establishing the optimal parameters of each of the algorithms that make up the proposed strategy.


Introduction
In the clinical and social contexts, the multiple diseases linked to the lungs are of great interest. In 2018, the World Health Organization (WHO) published the main causes of death in the world, among which include cardiovascular diseases, cancer, diabetes and chronic lung diseases [1].
One of the lung diseases, which is of special interest for the present work, is Lung Cancer, specifically, squamous cell carcinoma (SqCC) which classifies in non-small cell cancer (NSCC) and small cell cancer (SCC). Approximately 85% of the cases correspond to NSCC and 15% to SCC. The main types of NSCC are adenocarcinoma and squamous cell carcinoma. Also, the SqCC is formed in the large airways (bronchi) that connect the trachea to the lung. In the beginning, cancer is found only in the lung but then, usually, it metastasizes [2]. Smoking is the most likely cause of SqCC, increasing the risk of lung cancer with the number of years a person smokes and the number of cigarettes smoked each day. Less frequent causes include breathing second-hand smoke, occupational exposure to IOP Conf. Series: Journal of Physics: Conf. Series 1160 (2019) 012004 IOP Publishing doi: 10.1088/1742-6596/1160/1/012004 2 asbestos or other carcinogens, exposure to radon, previous treatment with radiotherapy or chemotherapy, and infection with the human immunodeficiency virus (HIV) [3].
One way to detect SqCC is through multislice computerized tomography (MSCT), which is considered the standard in pulmonary imaging, since they provide important anatomical data of great utility in the diagnosis and monitoring of patients with lung diseases. The MSCT allows observing the size, shape and position of any tumor in the lung, and can help locate enlarged lymph nodes that may contain cancer as a result of metastasis [4].
It is important to indicate that the images acquired through a tomograph present problems that affect the quality of the information they contain. These problems are noise (Poisson type) and artifacts (dark band, shading, ring and ladder) thus, images require processing with certain computational techniques to minimize such problems or imperfections [5].
On the other hand, the pulmonologist, to identify certain pulmonary anatomical structures of interest, performs a manual segmentation process which is a operator-dependent and, generally, requires excessive time for its completion. For this reason, the present work proposes an effective and efficient computational strategy that allows the identification of the SqCC, in images of MSCT, to improve the reproducibility and reduce the time required to detect the pathology. For this reason, worldwide, research has been carried out related to the segmentation of tumors present in the lung, using computational techniques. In this sense, Yang et al. [6] propose pulmonary tumor segmentation based on the matching of templates and growth of regions.
The proposed method consists of three steps: First, the bone is removed from the computerized tomography (CT) images. Second, a multi-scale Gaussian filter is used to locate lung tumors in threedimensional positron emission tomography (PET) images, in order to eliminate the liver and heart with similar voxel values but with different texture. Third, the seeds are automatically obtained in the process of matching templates, so that the segmentation process is fully automatic. The Euclidean distance measure is added to the growth criterion of the region, in order to accelerate the convergence of segmentation and avoid over-segmentation in an effective way. The proposed method was tested on 10 PET-CT images of 5 patients with lung tumors, and the average dice coefficient (Dc) was greater than 0.90.
In addition, Ait et al. [7] propose a segmentation of pulmonary CT images using the convolutional neuronal network architecture (U-net), one of the most used architectures in deep learning for image segmentation. The architecture consists of a contracting route to extract high-level information and a symmetric expansion path that retrieves the necessary information. The experimental results show a precise segmentation with a Dc of 0.9502.
The present investigation is an extension to pulmonary images of the work presented in [8]. The main contributions are: a) Develop an efficient automatic algorithm to segment the SqCC. b) Consider the relative percentage error (RPE) to perform a comparative study between manual and automatic segmentations, in such a way that RPE allows establishing the optimal parameters of the algorithms that make up the proposed technique.

Description of the databases
The databases (DB) used are available on the Internet link: https://xnat.bmia.nl/app/action/ProjectDownloadAction/project/stwstrategyln1. They were acquired through the MSCT modality and are constituted by three-dimensional images (3D), corresponding to chest studies of 3 male patients. Figure 1 presents a schematic diagram that synthesizes the computational algorithms that make up the automatic technique for segmentation of the SqCC.

Description of the algorithms that are part of the proposed technique
Of all the algorithms that appear in the block diagram, presented in Figure 1, only the anisotropic diffusion filter will be described since the rest of the algorithms are described in detail in [6,8,9] and  [10]. It is important to point out that the seed required for the growth of regions is based on least squares support vector machine (LSSVM) and applying a process analogous to that presented in [8].

Gradient-based anisotropic diffusion filter (ADF).
The filters of anisotropic diffusion, and their discrete implementation based on the approximation of partial derivatives by means of finite differences, were introduced in the image processing by Perona and Malik [11]. The purpose of applying such filters is to soften the information contained within the regions delimited by the edges of the objects present in an image. Anisotropic diffusion filters can be modeled mathematically by Equation (1).
being: ( , ) the gradient of the image in the voxel x during the iteration (time) t, ( , ) the partial derivative of ( , ), and ( , ) the conductivity function given by the Equation (2).
being: k the conductivity parameter. As observed in Equations (1) and (2), the anisotropic filters use an edge detector that guides the diffusion process. Normally, such equations are solved numerically using finite differences, by means of an explicit scheme that allows softening the image, in an iterative way, in each increment of time. However, in the presence of noisy contours, diffusion filters have a tendency to degrade the edges of the images they process in proportion to the number of iterations. For this reason, the number of iterations must be chosen carefully, in such a way that the aforementioned degradation is not excessive.

Obtaining optimal parameters
The adequate performance of the proposed technique requires obtaining optimal parameters for each of the algorithms that compose it. To do this, using DB as a reference, modify the parameters associated with the technique you want tuning by systematically going through the values that belong to certain ranges, as described below.
The parameters of the gradient-based anisotropic diffusion filter are given by the number of iterations [1 100], the time base [0.01 1] and the conductivity [0. 1 10].
The parameters for the filter based on Gaussian smoothing, in the 3-D domain, are: a) The standard deviations of each direction in which the filtering is intended. b) The size of the Gaussian kernel.
During smoothing, to decrease the number of parameters of the Gaussian filter, the size of its neighborhood is arbitrarily set in (3 × 3 × 3); while all the values included in the interval are assigned to its standard deviation, with a step size of 0.25.
The parameters of LSSVM, σ 2 and γ, are established assuming that the cost function is convex and developing the experiments described in [8]. The optimal parameters of the LSSVM are those values of γ and σ 2 which correspond to the relative minimum percentage error, calculated by considering the manual coordinates of the reference seed, established by the pulmonologist and the automatic ones generated by the LSSVM.
During the tuning process each of the automatic segmentations of the tumor, obtained using region growing (RG) technique, is compared with the manual segmentations of the tumor generated by a pulmonologist considering the RPE, which is calculated using the mathematical model given by the Equation (3). and manual and automatic segmentation volume, respectively. Figure 2 shows a 2D view of both the original SqCC and the processed versions after applying the proposed technique to the databases considered. The SqCC of each of the databases is identified with a red cross on each of the images.

Qualitative results
The order of the images of Figure 2, considering from the first row of images until the last one, is the following: original, anisotropic diffusion, Gauss, gradient magnitude, erosion, segmentation and dilation.

Quantitative results
The tuning process, for a particular filter, stops when the optimum value for its parameters is obtained considering the relative percentage error. Table 1 and Table 2 show the optimal parameters for the proposed strategy, considering the segmentations of lung tumors.   Table 3 presents the values corresponding to the volumes of each tumor of both the manual segmentation (performed by the specialist) and the automatic segmentation, and additional relative percentage errors related to segmentations are presented.

Contributions, differences and expectations
The detection of lung cancer through digital image processing is an important tool for diagnoses and treatments. Different methods have been proposed to detect the cancer cells of a tumor present in the lungs.
A detailed study of the literature shows a bias towards the use of techniques such as multiscale templates, deep learning and fuzzy logic, which are algorithms that have the disadvantage of presenting a high computational cost, which limits their use for three-dimensional MSCT images.
On the other hand, the application of the proposed strategy, in the present investigation allows generating the three-dimensional segmentations of the SqCC with a computation time of 25.98 seconds. This time includes the complete execution of the proposed strategy.
Moreover, the strategy exposes a robust design and development, obtaining effective segmentations based on mathematical models.
It is expected that the automatic segmentations generated by the proposed method can be useful to promote, deepen and potentiate the study of the real anatomy of the structures linked to the lungs. One could consider evaluating the performance of the proposed technique, taking into account an important number of pulmonary MSCT databases belonging to different subjects.

Conclusions
By means of the present work an automatic technique was developed which allowed an accurate segmentation of the LCSC, present in computed tomography images.
On the other hand, the filter bank considered allowed to minimize the problems present in the MSCT images such as noise and artifacts.
It is important to note that the designed technique behaves effectively when segmenting SqCC since the percentage relative error did not exceed 5%.
The segmentations generated automatically by the proposed technique allow the volume of each SqCC to be calculated accurately and efficiently.