Lung cancer classification using wavelet recurrent neural network

In this book chapter, we combine Wavelet and Recurrent Neural Network to classify lung cancer based on lung image. Wavelet is used to remove the noise from the original image, while Recurrent Neural Network is used for the classification process.

1.1. Introduction

Cancer is one of the leading causes of death in the world [1]. One of the most common types is lung cancer which has the highest mortality rate among all types of cancer [2]. Early detection may increase the survival rate of patients significantly. An early detection method to find out if someone has lung cancer is a lung radiology test (lung image). This image provides information on whether the lung is normal or not. A normal lung will show no nodules in the image, whereas an abnormal lung will show some nodules in the image. However, these nodules do not always indicate lung cancer since they can be caused by other diseases such as pneumonia or tuberculosis [3]. Nodules detected in the lung can be classified into two categories, namely non-cancerous nodules (benign) and cancerous nodules (malignant).

Therefore, to be able to classify the nodules for early detection of lung cancer based on the lung image, we need classification methods. The neural network model has been developed by researchers to classify lung cancer based on lung image. The recurrent neural network is one type of neural network model that has a feedback link on the network. As an image contains noise, the noise needs to be removed from the original image before applying the neural network network model for the classification process. This is known as the image denoising process. This process aims to get an image with better/higher quality. One of the methods of image denoising is using wavelet. The model that combines wavelet and recurrent neural network is called wavelet recurrent neural network (WRNN) [4].

Most of the models for lung cancer classification based on lung cancer images are various types of neural network model with binarization image pre-processing. The previous research for lung cancer classification has been done using the neural network model and it provides an accuracy of 80% [5]. Further research has been done using recurrent neural network model and it provides result with accuracy of 81.33% [6]. This research is not concerned with the image characteristics. Therefore, in this chapter, we will use WRNN to classify lung cancer based on lung images. Wavelet is concerned with the image characteristics and is used to remove the noise from the original image, while recurrent neural network is used for the classification process. As it is an elaborated model, it is expected to produce better/higher accuracy.

1.2. Lung cancer and lung image

1.2.1. Lung cancer

Cancer is a leading cause of death worldwide. According to the report, there were an estimated 9.6 million deaths in 2018 caused by cancer [1]. Globally, 1 in 6 deaths is caused by cancer. Lung cancer is the most common cancer and is responsible for 2.09 million cases. Moreover lung cancer is also the most common cause of cancer death with 1.76 million deaths [2].

Smoking is not the only cause of lung cancer. However, smoking is a significant factor that causes lung cancer. It is responsible for about 22% of deaths from lung cancer [1]. Active smokers are the most vulnerable group to lung cancer. Passive smokers also have a risk of lung cancer. Cigarettes and cigarette smoke contain more than 4000 chemicals and more than 60 toxic and carcinogenic substances. In the initial stages, the toxic substances do not affect the function of the lung organs, but the longer a person smokes, the damage to lung tissue will increase. This damage will cause cells to react abnormally and uncontrollably until cancer cells finally appear.

Early detection may increase the survival rate of patients significantly. Early detection that can be used to establish whether someone has lung cancer can be done through lung radiology tests (images of the lungs), called chest x-ray radiography. These images provide information on whether the lungs are normal or not. Normal lungs will show no nodules in the image, whereas abnormal lungs will show some nodules in the lung images. However, these nodules are not always indications of lung cancer since they can be caused by other diseases such as pneumonia or tuberculosis [3]. Nodules detected in the lungs can be classified into two categories, namely non-cancerous nodules (benign) and cancerous nodules (malignant).

1.2.2. Lung image

The image is a matrix of the pixel. It is arranged in rows and columns. The pixel describes the row and column location. It also has intensity symbolized as p(x,y), where x is the row location and y is the column location of the pixel. Based on its intensity, the image can be categorized into three types, namely RGB, grayscale, and binary.

1.2.2.1. RGB image

RGB (red, green, blue) are the basic colors accepted by human eyes. Each pixel in the RGB image represents three basic colors. Each base color has intensity with a minimum value of 0 and a maximum value of 255. Mathematically, the intensity of each basic color of an RGB image is:

$\begin{eqnarray}&&0\leqslant p(x,y)\leqslant 255\end{eqnarray} \tag{ 1.1 }$

Figure 1.1 is an example of an RGB image. This image can be defined in a matrix sized 8×8 which contains its pixel. The red color matrix of figure 1.1 is as follows:

The green color matrix of figure 1.1 is as follows:

The blue color matrix of figure 1.1 is as follows:

1.2.2.2. Grayscale image

A grayscale image has only one intensity value for each pixel. It determines that red = green = blue. Each pixel of the grayscale image is a gray shadow that has an intensity value of 0 (black) to 255 (white). Mathematically, the intensity of the grayscale image can be written as follows:

$\begin{eqnarray}&&0\leqslant p(x,y)\leqslant 255\end{eqnarray} \tag{ 1.2 }$

Figure 1.2 is lung image as one of the example of grayscale image lung image [7]. The graysacle matrix of figure 1.2 is as follows:

1.2.2.3. Binary image

A binary image only contains two colors, that is black (0) and white (1). Mathematically, the intensity of a binary image can be defined as follows.

$\begin{eqnarray}&&p\left(x,y\right)=\left\{\begin{array}{c}\,0,{\rm{black}}\\ 1,{\rm{white}}\end{array}\right.\end{eqnarray} \tag{ 1.3 }$

If we transform figure 1.2 to a binary image, the image will be as figure 1.3. The matrix of figure 1.3 is as follows:

1.2.3. Image processing

Image processing applies to visually analyzing the lung image. This is a method for performing several operations on an image. Image processing aims to extract useful information from an image. Some extraction features to signify the lung classification can be extracted using several image processing techniques, such as an enhancement method to get a better image quality. Enhancements are applied to reduce or eliminate noise.

Noise is random speckles on the surface of the image, it is not part of the image. One of the most common noise models is additive noise. This noise adds uniformly in the field of imagery. Additive noise is formulated as follows:

$\begin{eqnarray}&&p\left(x,y\right)={p}_{a}\left(x,y\right)+{p}_{n}\left(x,y\right)\end{eqnarray} \tag{ 1.4 }$

where p(x,y) is the pixel value the original image, ${p}_{a}\left(x,y\right)$ is the pixel value of the denoised image, and $\,{p}_{n}\left(x,y\right)$ is the pixel value of the noise.

Removing noise from an image is called as the denoising step/pre-processing process before the image processing. One technique of the denoising step is to use the wavelet transforms. Wavelet will be discussed in the next section.

After the denoised image is obtained, the next step is to get feature extraction by applying the gray level co-occurrence matrix (GLCM). This is a second-order statistical method which is different from first-order statistical methods. Second-order statistical methods involve environmental values with the consideration that the input is in the form of a matrix. In the next section, we will discuss the GLCM to obtain some feature extractions.

1.3. Classification process

1.3.1. Classification

Classification is defined as a grouping process based on the characteristics of similarity and difference. This can be used when the information about how the data is grouped has been predetermined. The training process is carried out with training data that has been labeled into groups, while the testing process will classify the testing data into the existing groups. Classification is a type of supervised learning. Supervised learning is a machine learning task to learn functions that map input to output based on examples of input–output pairs [8].

Sometimes there is a misinterpretation between classifications and clustering. Both classification and clustering are used to categorize objects into one or several classes based on features/characteristics. They appear to be a similar process, but in this case clustering is used when it is not known how the data should be grouped. The number of groups in the clustering process is self-assumed by the model without predetermination. The output of the clustering process is grouped data. The differences of classification and clustering are summarized in table 1.1.

Table 1.1. Classification and clustering.

Classification	Clustering
Supervised learning	Unsupervised learning
Infinite set of input data	Finite set of data
Based on training data	Based on prior knowledge
Output target	No output target
Used to classify future observations	Used to understand the data
Goal: assigning new input to a class	Goal: finding similarities within given data

Based on the summarization table 1.1, classification is type of supervised learning. There is an output target in classification, while clustering is a type of unsupervised learning. There is no output target in clustering. Classification is more complex than clustering since there are many levels in the classification process, whereas the clustering process is only a grouping process based on similarities of the given data. Examples of classification methods are logistic regression, Naive Bayes classifier, support vector machines, neural networks, etc. Examples of clustering methods are the k-means clustering algorithm, the fuzzy c-means clustering algorithm, the Gaussian clustering algorithm, etc. In the next section we will discuss classification using neural networks.

1.3.2. Features extraction

Image extraction is a technique for collecting features from an image. The GLCM method is one of the methods that can be used for image extraction. This type of image extractions results in 14 features, i.e., energy, contrast, correlation, the sum of square variance, Inverse Difference Moment (IDM), sum average, sum entropy, sum variance, entropy, difference variance, difference entropy, maximum probability, homogeneity, and dissimilarity [9]. All of the features extraction formulated as following detailed:

1.
Energy: energy is the sum of square elements in GLCM. If the energy value equals 1 then the image is a constants image. The equation of energy is:
$\begin{eqnarray}&&{\rm{Energy}}=\displaystyle \sum _{x}\displaystyle \sum _{y}{\left(\displaystyle \frac{p\left(x,y\right)}{R}\right)}^{2}\end{eqnarray} \tag{ 1.5 }$
with p(x,y) = pixel value row- x and column- y, $R=\sum p\left(x,y\right)$ . R is a constant of normalization such that the normalized pixel value satisfies the mass probability function characteristics, that is $0\leqslant \frac{p\left(x,y\right)}{R}\leqslant 1$ and $\sum \frac{p\left(x,y\right)}{R}=1$ .
2.
Contrast: contrast is the measure of contrast intensity between one pixel and another pixel. It is formulated as follows.
$\begin{eqnarray}&&{\rm{Contrast}}=\displaystyle \sum _{{x}}\displaystyle \sum _{{y}}\displaystyle \frac{{p}\left({x},{y}\right)}{{R}}{({x}-{y})}^{2}\end{eqnarray} \tag{ 1.6 }$
3.
Correlation: correlation is used to measure the linear level dependence of gray (gray level) between pixels in one position to another position. The equation of correlation is formulated as follows:
$\begin{eqnarray}&&{\rm{Correlation}}=\displaystyle \sum _{{x}}\displaystyle \sum _{{y}}\displaystyle \frac{\displaystyle \frac{{xyp}\left({x},{y}\right)}{{R}}-{\mu }_{{x}}{\mu }_{{y}}}{{{\rm{\sigma }}}_{{x}}{{\rm{\sigma }}}_{{y}}}\end{eqnarray} \tag{ 1.7 }$
with ${{\rm{\mu }}}_{{\rm{x}}}$ = $\displaystyle \sum _{{x}}\displaystyle \sum _{{y}}{xp}\left({x},{y}\right)$ , ${{\rm{\mu }}}_{{y}}=\displaystyle \sum _{{x}}\displaystyle \sum _{{y}}{yp}\left({x},{y}\right)$ , ${{\rm{\sigma }}}_{{x}}=\displaystyle \sum _{{x}}\displaystyle \sum _{{y}}{\left({x}-{{\rm{\mu }}}_{{x}}\right)}^{2}{p}\left({x},{y}\right),$ and ${{\rm{\sigma }}}_{{y}}=\displaystyle \sum _{{x}}\displaystyle \sum _{{y}}{\left({x}-{{\rm{\mu }}}_{{y}}\right)}^{2}{p}\left({x},{y}\right).$
4.
Sum of square variance: the equation of sum of square variance is:
$\begin{eqnarray}&&{\rm{Sum\; of\; Square\; Variance}}=\displaystyle \sum _{x}\displaystyle \sum _{y}\displaystyle \frac{p\left(x,y\right)}{R}{(x-\mu )}^{2}\end{eqnarray} \tag{ 1.8 }$
with μ= $\frac{\displaystyle \sum _{x}\displaystyle \sum _{y}p\left(x,y\right)}{{xy}}$ .
5.
Inverse different moment: inverse different moment is a measure of local homogeneity. It is formulated as follows.
$\begin{eqnarray}&&{\rm{Inverse\; Different\; Moment}}=\displaystyle \sum _{x}\displaystyle \sum _{y}\displaystyle \frac{\displaystyle \frac{p\left(x,y\right)}{R}}{1+{\left(x-y\right)}^{2}}\end{eqnarray} \tag{ 1.9 }$
6.
Sum average: sum average is formulated as follows.
$\begin{eqnarray}&&{\rm{Sum\; Avearge}}=-\displaystyle \sum _{k}k{p}_{x+y}(k)\end{eqnarray} \tag{ 1.10 }$
with ${p}_{x+y}(k)$ = $\displaystyle \sum _{x=1}^{{N}_{g}}\displaystyle \sum _{y=1}^{{N}_{g}}\frac{p\left(x,y\right)}{R}{\rm{;}}x+y=k{\rm{;}}k=2,3,\ldots ,2{N}_{g}$ and ${N}_{g}$ is the biggest dimension of the image row and column.
7.
Sum entropy: sum entropy is formulated as follows.
$\begin{eqnarray}&&{\rm{Sum\; Entropy}}=-\displaystyle \sum _{k}{p}_{x+y}(k){\rm{log}}{\rm{}}{p}_{x+y}(k)\end{eqnarray} \tag{ 1.11 }$
8.
Sum variance: sum variance shows how many gray levels vary from the average value. The equation of sum variance is as follows.
$\begin{eqnarray}&&{\rm{Sum\; Variance}}=\displaystyle \sum _{k}{\left(i-{SE}\right)}^{2}{p}_{x+y}(k)\end{eqnarray} \tag{ 1.12 }$
9.
Entropy: the equation of entropy is:
$\begin{eqnarray}&&{\rm{Entropy}}=-\displaystyle \sum _{x}\displaystyle \sum _{y}\displaystyle \frac{p\left(x,y\right)}{R}{\rm{log}}p(x,y)\end{eqnarray} \tag{ 1.13 }$
10.
Difference variance: difference variance is formulated as follows.
$\begin{eqnarray}&&{\rm{Different\; Variance}}={\rm{variance}}\,{\rm{of}}\,{p}_{x-y}(k)\end{eqnarray} \tag{ 1.14 }$
with ${{p}_{x-y}}_{({\rm{k}})}$ = $\displaystyle \sum _{x=1}^{{N}_{g}}\displaystyle \sum _{y=1}^{{N}_{g}}\frac{p\left(x,y\right)}{R},\left|x-y\right|=k$ and $k=0,1,\,2,\ldots ,({N}_{g}-1)$ .
11.
Difference entropy: difference entropy is formulated as follows.
$\begin{eqnarray}&&{\rm{Different\; Entropy}}=-\displaystyle \sum _{k}{p}_{x+y}(k){\rm{log}}\,{p}_{x+y}(k)\end{eqnarray} \tag{ 1.15 }$
12.
Maximum probability: the equation of maximum probability is:
$\begin{eqnarray}&&{\rm{Maximum\; Probability}}={{\rm{\max }}}_{x,y}\{\displaystyle \frac{p\left(x,y\right)}{R}\}\end{eqnarray} \tag{ 1.16 }$
13.
Homogeneity: homogeneity is a measure of the closeness of distribution of elements in the GLCM to the diagonal of the GLCM. It is formulated as follows:
$\begin{eqnarray}&&{\rm{Homogeneity}}=\displaystyle \sum _{x}\displaystyle \sum _{y}\displaystyle \frac{\displaystyle \frac{p\left(x,y\right)}{R}}{1+{\rm{| }}x-y{\rm{| }}}\end{eqnarray} \tag{ 1.17 }$
14.
Dissimilarity shows the difference of each pixel. It is formulated as follows. Dissimilarity:

$\begin{eqnarray}&&{\rm{Dissimilarity}}=\displaystyle \sum _{x}\displaystyle \sum _{y}\displaystyle \frac{p\left(x,y\right)}{R}{\rm{| }}x-y{\rm{| }}\end{eqnarray} \tag{ 1.18 }$

Here is an example of feature extraction's manual calculation. Let us do features extraction of figure 1.3, then the energy and dissimilarity value are calculated as follow:

$\begin{eqnarray*}\begin{array}{rcl}R&=&\sum p(x,y)=p\left(1,1\right)+\,p\left(1,2\right)+\ldots \,+\,p\left(8,7\right)+\,p\left(8,8\right)=1+\,1+\ldots \\&&+1+0=55\end{array}\end{eqnarray*}$

$\begin{eqnarray*}\begin{array}{rcl}{\rm{Energy}}&=&\displaystyle \sum _{x}\displaystyle \sum _{y}{p(x,y)}^{2}={\displaystyle \frac{p\left(1,1\right)}{R}}^{2}+\ldots +\,{\displaystyle \frac{p\left(8,7\right)}{R}}^{2}+{\displaystyle \frac{p\left(8,8\right)}{R}}^{2}={\displaystyle \frac{1}{55}}^{2}+\ldots\\&& +{\displaystyle \frac{1}{55}}^{2}+{\displaystyle \frac{0}{55}}^{2}=0.018182\end{array}\end{eqnarray*}$

$\begin{eqnarray*}\begin{array}{rcl}{\rm{Dissimilarity}}&=&\displaystyle \sum _{x}\displaystyle \sum _{y}\displaystyle \frac{p\left(x,y\right)}{R}{\rm{| }}x-y{\rm{| }}={\displaystyle \frac{p\left(1,1\right)}{R}}^{2}\left|1-1\right|+{\displaystyle \frac{p\left(1,2\right)}{R}}^{2}\left|1-2\right|+\ldots \\&&+\,{\displaystyle \frac{p\left(8,7\right)}{R}}^{2}\left|8-1\right|+{\displaystyle \frac{p\left(8,8\right)}{R}}^{2}\left|8-8\right|={\displaystyle \frac{1}{55}}^{2}.0+{\displaystyle \frac{1}{55}}^{2}.1+\,\ldots \\&&+{\displaystyle \frac{1}{55}}^{2}.1+{\displaystyle \frac{0}{55}}^{2}.0=0.05124\end{array}\end{eqnarray*}$

From the calculation above, we can conclude that figure 1.3 has an energy value 0.018 182 and a dissimilarity value 0.051 24.

1.3.3. Wavelet

Wavelet is an analytical tool commonly used to present data or functions operators into different frequency components, and then review each component with a resolution according to its scale [10]. Wavelet is a small wave that has its energy concentrated in time to give a tool for the analysis of transient, non-stationary, or time-varying phenomena [11]. Wavelet can concentrate image energy in some coefficients. The coefficients separate into two categories: coefficients with high energy and coefficients with low energy. The coefficients with low energy can be removed since they do not give significant information.

Wavelet has a gender, i.e., the father wavelet φ and the mother wavelet ψ [12]. The integral of the father wavelet equals 1 and the integral of the mother wavelet equals 0. They are mathematically defined as follows.

$\begin{eqnarray}&&\int \varphi \left(t\right){dt}=1\,\mathrm{and}\int \psi \left(t\right){dt}=0\,\end{eqnarray} \tag{ 1.19 }$

The father wavelet is used to represent the smooth and low-frequency part of the signal and the mother wavelet is used to represent the detailed and high-frequency part of the signal.

1.3.3.1. Haar wavelet

Haar Wavelet is the base of the other wavelet types (figure 1.4). The mother wavelet function is defined as follows:

$\begin{eqnarray}{\psi }_{j,k}\left(x\right)=\left\{\begin{array}{c}1,{\rm{\& }}x\epsilon \left[0,\displaystyle \frac{1}{2})\right.\\ -\,1,{\rm{\& }}x\epsilon \left[\displaystyle \frac{1}{2},1)\right.\\ 0,\,{others}\end{array}\right.\end{eqnarray} \tag{ 1.20 }$

While the father wavelet function is defined as follows.

$\begin{eqnarray}{\varphi }_{j,k}\left(x\right)=\left\{\begin{array}{c}1,{\rm{\& }}x\epsilon [0,1)\\ 0,\,{others}\end{array}\right.\end{eqnarray} \tag{ 1.21 }$

**Figure 1.4.** Haar wavelet.
Download figure:
Standard image High-resolution image

Given a signal that is defined by $f=({f}_{1},{f}_{2},{f}_{3},\ldots ,{f}_{N})$ , where $N={2}^{J}$ and J is a positive integer, so the first level of Haar transformation decomposed the signal f becoming two sub-signal ${a}^{1}$ and ${d}^{1}$ , where the length of each sub-signal is half the length of signal f. Sub-signal ${a}^{1}=({a}_{1},{a}_{2},{a}_{3},\ldots {a}_{\frac{N}{2}})$ is the trend component of Haar transformation and the sub-signal ${d}^{1}=({d}_{1},{d}_{2},{d}_{3},\ldots {d}_{\frac{N}{2}})$ is the fluctuation component of the Haar transformation. Scaling signal ${V}_{m}^{1}$ of the first level of Haar wavelet is ${V}_{1}^{1}=({\alpha }_{1},{\alpha }_{2},0,0,\ldots ,0,0)$ , ${V}_{2}^{1}=(0,0,{\alpha }_{1},{\alpha }_{2},0,0,\ldots ,0,0)$ ,..., ${V}_{N/2}^{1}=(0,0,\ldots ,0,0,{\alpha }_{1},{\alpha }_{2})$ , where ${\alpha }_{1}={\alpha }_{2}=\frac{1}{\sqrt{2}}$ , while the wavelet signal ${W}_{m}^{1}$ of the first level of Haar wavelet is ${V}_{1}^{1}=({\beta }_{1},{\beta }_{2},0,0,\ldots ,0,0)$ , ${V}_{2}^{1}=(0,0,{\beta }_{1},{\beta }_{2},0,0,\ldots ,0,0)$ , ..., ${V}_{N/2}^{1}=(0,0,\ldots ,0,0,{\beta }_{1},{\beta }_{2})$ , where ${\beta }_{1}=\frac{1}{\sqrt{2}}$ and ${\beta }_{2}=-\frac{1}{\sqrt{2}}$ . Then the sub-signal of the Haar transformation can be written as follows:

$\begin{eqnarray}&&{a}_{m}=f.{V}_{m}^{1}\,\mathrm{and}\,{d}_{m}=f.{W}_{m}^{1}\end{eqnarray} \tag{ 1.22 }$

where $m=1,2,3,\ldots ,\frac{N}{2}$ . The wavelet decomposition of each level is as follows.

$\begin{eqnarray}&&{f}_{\longrightarrow }^{{H}_{1}}\left({a}^{1}| {d}^{1}\right)\end{eqnarray} \tag{ 1.23 }$

Then, analogously with the pyramid algorithm, the next level of Haar transformation is as follows:

$\begin{eqnarray}&&{f}_{\longrightarrow }^{{H}_{2}}({a}^{2}{\rm{| }}{d}^{2}{\rm{| }}{d}^{1})\end{eqnarray} \tag{ 1.24 }$

where ${a}^{2}$ and ${d}^{2}$ are calculated from

$\begin{eqnarray}&&{{a}^{1}}_{\longrightarrow }^{{H}_{1}}({a}^{2}{\rm{| }}{d}^{2})\end{eqnarray} \tag{ 1.25 }$

1.3.3.2. Daubhecies wavelet

Daubhecies wavelet is a kind of wavelet which was recognized by Ingrid Daubchecies. There are many kinds of Daubhecies wavelet. In this research, the type of Daubhecies wavelet used is Daubh4. The concept of the Daubh4 wavelet is similar to the Haar wavelet, the difference is the scaling signal ${V}_{m}^{1}$ and the wavelet function ${W}_{m}^{1}$ . Scaling signal ${V}_{m}^{1}$ of the first level of Daubh4 wavelet is ${V}_{1}^{1}=({\alpha }_{1},{\alpha }_{2},{\alpha }_{3},{\alpha }_{4},\ldots ,0,0)$ , ${V}_{2}^{1}=(0,0,0,0,{\alpha }_{1},{\alpha }_{2},{\alpha }_{3},{\alpha }_{4},\ldots ,0,0)$ , ..., ${V}_{N/2}^{1}=(0,0,\ldots ,0,0,{\alpha }_{1},{\alpha }_{2},{\alpha }_{3},{\alpha }_{4})$ , where ${\alpha }_{1}=\frac{1+\sqrt{3}}{4\sqrt{2}},{\alpha }_{2}=\frac{3+\sqrt{3}}{4\sqrt{2}},{\alpha }_{3}=\frac{3-\sqrt{3}}{4\sqrt{2}}$ , and ${\alpha }_{4}=\frac{1-\sqrt{3}}{4\sqrt{2}}$ . While the wavelet signal ${W}_{m}^{1}$ of the first level of Daubh4 wavelet is ${W}_{1}^{1}=({\beta }_{1},{\beta }_{2},{\beta }_{3},{\beta }_{4},\ldots ,0,0)$ , ${W}_{2}^{1}=(0,0,0,0,{\beta }_{1},{\beta }_{2},{\beta }_{3},{\beta }_{4},\ldots ,0,0)$ , ..., ${W}_{N/2}^{1}=(0,0,\ldots ,0,0,{\beta }_{1},{\beta }_{2},{\beta }_{3},{\beta }_{4})$ , where ${\beta }_{1}=\frac{1-\sqrt{3}}{4\sqrt{2}},{\beta }_{2}=\frac{\sqrt{3}-3}{4\sqrt{2}},{\beta }_{3}=\frac{3+\sqrt{3}}{4\sqrt{2}}$ , and ${\beta }_{4}=\frac{-1-\sqrt{3}}{4\sqrt{2}}$ . The decomposition of the first level of Daubh4 wavelet is as follows.

$\begin{eqnarray}&&{f}_{\longrightarrow }^{{D}_{1}}({a}^{1}{\rm{| }}{d}^{1})\end{eqnarray} \tag{ 1.26 }$

Then, analogously with the pyramid algorithm, the next level of Daubh4 transformation is as follows:

$\begin{eqnarray}&&{f}_{\longrightarrow }^{{D}_{2}}({a}^{2}{\rm{| }}{d}^{2}{\rm{| }}{d}^{1})\end{eqnarray} \tag{ 1.27 }$

where ${a}^{2}$ and ${d}^{2}$ are calculated from

$\begin{eqnarray}&&{{a}^{1}}_{\longrightarrow }^{{D}_{1}}({a}^{2}{\rm{| }}{d}^{2})\end{eqnarray} \tag{ 1.28 }$

Here is an example of manual calculation, given a signal $f=\left({f}_{1},{f}_{2},{f}_{3},{f}_{4},{f}_{5},{f}_{6},{f}_{7},{f}_{8}\right)=(2,1,3,-1,2,4,3,4)$ , then the signal f can be transformed as $j=1,\ldots ,(\left({{\rm{log}}}_{2}8)-1=2\right)$ . The Haar level 1 transformation is as follows.

$\begin{eqnarray*}&&{a}_{1}^{1}=\displaystyle \frac{{f}_{1}+{f}_{2}}{\sqrt{2}}=\displaystyle \frac{2+1}{\sqrt{2}}=\displaystyle \frac{3}{\sqrt{2}}\end{eqnarray*}$

$\begin{eqnarray*}&&{a}_{2}^{1}=\displaystyle \frac{{f}_{3}+{f}_{4}}{\sqrt{2}}=\displaystyle \frac{3+(-1)}{\sqrt{2}}=\displaystyle \frac{2}{\sqrt{2}}\end{eqnarray*}$

$\begin{eqnarray*}&&{a}_{3}^{1}=\displaystyle \frac{{f}_{5}+{f}_{6}}{\sqrt{2}}=\displaystyle \frac{2+4}{\sqrt{2}}=\displaystyle \frac{6}{\sqrt{2}}\end{eqnarray*}$

$\begin{eqnarray*}&&{a}_{4}^{1}=\displaystyle \frac{{f}_{7}+{f}_{8}}{\sqrt{2}}=\displaystyle \frac{3+4}{\sqrt{2}}=\displaystyle \frac{7}{\sqrt{2}}\end{eqnarray*}$

and

$\begin{eqnarray*}&&{d}_{1}^{1}=\displaystyle \frac{{f}_{2}-{f}_{1}}{\sqrt{2}}=\displaystyle \frac{1-2}{\sqrt{2}}=-\displaystyle \frac{1}{\sqrt{2}}\end{eqnarray*}$

$\begin{eqnarray*}&&{d}_{2}^{1}=\displaystyle \frac{{f}_{4}-{f}_{3}}{\sqrt{2}}=\displaystyle \frac{-1-3}{\sqrt{2}}=-\displaystyle \frac{4}{\sqrt{2}}\end{eqnarray*}$

$\begin{eqnarray*}&&{d}_{3}^{1}=\displaystyle \frac{{f}_{6}-{f}_{5}}{\sqrt{2}}=\displaystyle \frac{4-2}{\sqrt{2}}=\displaystyle \frac{2}{\sqrt{2}}\end{eqnarray*}$

$\begin{eqnarray*}&&{d}_{4}^{1}=\displaystyle \frac{{f}_{8}-{f}_{7}}{\sqrt{2}}=\displaystyle \frac{4-3}{\sqrt{2}}=\displaystyle \frac{1}{\sqrt{2}}\end{eqnarray*}$

It can be written as follows.

$\begin{eqnarray*}&&{f}_{\longrightarrow }^{{H}_{1}}\left({a}_{1}^{1},{a}_{2}^{1},{a}_{3}^{1},{a}_{4}^{1}| {d}_{1}^{1},{d}_{2}^{1},{d}_{3}^{1},{d}_{4}^{1}\right)\end{eqnarray*}$

$\begin{eqnarray*}&&{f}_{\longrightarrow }^{{H}_{1}}\left(\displaystyle \frac{3}{\sqrt{2}},\displaystyle \frac{2}{\sqrt{2}},\displaystyle \frac{6}{\sqrt{2}},\displaystyle \frac{7}{\sqrt{2}}| -\displaystyle \frac{1}{\sqrt{2}},-\displaystyle \frac{4}{\sqrt{2}},\displaystyle \frac{2}{\sqrt{2}},\displaystyle \frac{1}{\sqrt{2}}\right)\end{eqnarray*}$

Haar level 2 transformation is as follows.

$\begin{eqnarray*}&&{a}_{1}^{2}=\displaystyle \frac{{a}_{1}^{1}+{a}_{2}^{1}}{\sqrt{2}}=\displaystyle \frac{\displaystyle \frac{3}{\sqrt{2}}+\displaystyle \frac{2}{\sqrt{2}}}{\sqrt{2}}=\displaystyle \frac{5}{2}\end{eqnarray*}$

$\begin{eqnarray*}&&{a}_{2}^{2}=\displaystyle \frac{{a}_{3}^{1}+{a}_{4}^{1}}{\sqrt{2}}=\displaystyle \frac{\displaystyle \frac{6}{\sqrt{2}}+\displaystyle \frac{7}{\sqrt{2}}}{\sqrt{2}}=\displaystyle \frac{13}{2}\end{eqnarray*}$

$\begin{eqnarray*}&&{d}_{1}^{2}=\displaystyle \frac{{a}_{2}^{1}-{a}_{1}^{1}}{\sqrt{2}}=\displaystyle \frac{\displaystyle \frac{2}{\sqrt{2}}-\displaystyle \frac{3}{\sqrt{2}}}{\sqrt{2}}=\displaystyle \frac{-1}{2}\end{eqnarray*}$

$\begin{eqnarray*}&&{d}_{2}^{2}=\displaystyle \frac{{a}_{2}^{1}-{a}_{1}^{1}}{\sqrt{2}}=\displaystyle \frac{\displaystyle \frac{7}{\sqrt{2}}-\displaystyle \frac{6}{\sqrt{2}}}{\sqrt{2}}=\displaystyle \frac{1}{2}\end{eqnarray*}$

It is also can be written as follows.

$\begin{eqnarray*}&&{f}_{\longrightarrow }^{{H}_{2}}\left(\left.{a}_{1}^{2},{a}_{2}^{2}\right|{d}_{1}^{2},{d}_{2}^{2}| {d}_{1}^{1},{d}_{2}^{1},{d}_{3}^{1},{d}_{4}^{1}\right)\end{eqnarray*}$

$\begin{eqnarray*}&&{f}_{\longrightarrow }^{{H}_{2}}\left(\left.\displaystyle \frac{5}{2},\displaystyle \frac{11}{2}\right|-\displaystyle \frac{1}{2},\,\displaystyle \frac{1}{2}| -\displaystyle \frac{1}{\sqrt{2}},-\displaystyle \frac{4}{\sqrt{2}},\displaystyle \frac{2}{\sqrt{2}},\displaystyle \frac{1}{\sqrt{2}}\right)\end{eqnarray*}$

Furthermore, the first average signal and the first fluctuation signal are:

$\begin{eqnarray*}&&{A}^{1}=\left(\displaystyle \frac{{a}_{1}^{1}}{\sqrt{2}},\displaystyle \frac{{a}_{1}^{1}}{\sqrt{2}},\,\ldots ,\,\displaystyle \frac{{a}_{4}^{1}}{\sqrt{2}},\displaystyle \frac{{a}_{4}^{1}}{\sqrt{2}}\right)=\left(1.5,\,1.5,\,1,\,1,\,3,\,3,\,3.5,\,3.5\right)\end{eqnarray*}$

$\begin{eqnarray*}&&{D}^{1}=\left(\displaystyle \frac{{d}_{1}^{1}}{\sqrt{2}},-\displaystyle \frac{{d}_{1}^{1}}{\sqrt{2}},\,\ldots ,\,\displaystyle \frac{{d}_{N/2}^{1}}{\sqrt{2}},-\displaystyle \frac{{d}_{N/2}^{1}}{\sqrt{2}}\right)=\left(0.5,\,-0.5,\,2,-\,2,\,-1,\,1,\,-0.5,\,0.5\right)\end{eqnarray*}$

Then the signal $f={A}^{1}+{D}^{1}=(2,1,3,-1,2,4,3,4)$ . In the same way, the second average signal and the second fluctuation signal are:

$\begin{eqnarray*}&&{A}^{2}=(1.25,\,1.25,\,1.25,\,1.25,\,3.25,\,3.25,\,3.25,3.25)\end{eqnarray*}$

$\begin{eqnarray*}&&{D}^{2}=(0.25,\,0.25,\,-0.25,\,-0.25,\,-0.25,\,-0.25,\,0.25,\,0.25)\end{eqnarray*}$

Then we get ${A}^{1}=\left(1.5,\,1.5,\,1,\,1,\,3,\,3,\,3.5,\,3.5\right)$ and $f={A}^{2}+{D}^{2}+{D}^{1}=(2,1,3,-1,2,4,3,4)$ .

1.3.3.3. Image denoising with wavelet

Wavelet 2D is used for image denoising. The algorithm used for image denoising is the tree-adapted wavelet shrinkage (TAWS) algorithm. The TAWS algorithm is a simple, but highly effective, wavelet-based image denoising algorithm [13]. The wavelet transformation for an image can be calculated for an image with odd numbers of rows and columns. Consider an image f,

$\begin{eqnarray}f=\left(\begin{array}{c}\begin{array}{ccc}{f}_{1,M} & {f}_{2,M} & \ldots \\ \vdots & \vdots & \ddots \\ {f}_{1,2} & {f}_{2,2} & \ldots \end{array}\,\begin{array}{c}{f}_{N,M}\\ \vdots \\ {f}_{N,2}\end{array}\\ \begin{array}{ccc}{f}_{1,1} & {f}_{2,2} & \ldots \end{array}\,{f}_{N,1}\end{array}\right)\end{eqnarray} \tag{ 1.29 }$

The first level of wavelet transformation is started by calculating the 1D wavelet transform (first level) on each row of f, so that it produces a new image. On the new image obtained from step 1, calculate the same 1D wavelet transforms on each of its columns. The wavelet transformation of an image f can be symbolized as follows:

$\begin{eqnarray}f\longmapsto \left(\begin{array}{cc}{h}^{1} & {d}^{1}\\ {a}^{1} & {\mathit{\unicode[Book Antiqua]{x76}}}^{1}\end{array}\right)\end{eqnarray} \tag{ 1.30 }$

where the sub-images ${h}^{1}$ , ${d}^{1}$ , ${a}^{1}$ , and ${\mathit{\unicode[Book Antiqua]{x76}}}^{1}$ each of them has M/2 rows and N/2 columns. The sub-image ${a}^{1}$ is trends along rows of f followed by computing trends along with columns. The sub-image ${d}^{1}$ is created from the fluctuations along both rows and columns. The sub-image ${h}^{1}$ is created by computing trends along the image f followed by computing the fluctuations along with columns. The sub-image ${\mathit{\unicode[Book Antiqua]{x76}}}^{1}$ is the reverse of the sub-image ${h}^{1}$ . The next step is thresholding with the formula as follows.

$\begin{eqnarray}&&{f}_{t}\left(x,y\right)=\left\{\begin{array}{c}\left|\,f(x,y)\right|-{T}_{B},\,{\rm{if}}\,\left|\,f\left(x,y\right)\right|\geqslant \,{T}_{B}\,\\ 0,\,{\rm{if}}\,\left|\,f\left(x,y\right)\right|\lt \,{T}_{B}\end{array}\right.\end{eqnarray} \tag{ 1.31 }$

where ${T}_{B}=\sigma \sqrt{2{log}M}$ , σ is the standard deviation of the noise, and M is the maximum dimension of the rows and column of the image. Then the last step is calculating the inverse of the wavelet transformation. Then, analogously with the pyramid algorithm, the next level of the transformation is as follows:

$\begin{eqnarray}f\longmapsto \left(\begin{array}{cc}{h}^{1} & {d}^{1}\\ \begin{array}{cc}{h}^{2} & {d}^{2}\\ {a}^{2} & {\mathit{\unicode[Book Antiqua]{x76}}}^{2}\end{array} & {\mathit{\unicode[Book Antiqua]{x76}}}^{1}\end{array}\right)\end{eqnarray} \tag{ 1.32 }$

where ${h}^{2}$ , ${d}^{2}$ , ${a}^{2}$ , and ${\mathit{\unicode[Book Antiqua]{x76}}}^{2}$ are calculated from

$\begin{eqnarray}{a}^{1}\longmapsto \left(\begin{array}{cc}{h}^{2} & {d}^{2}\\ {a}^{2} & {\mathit{\unicode[Book Antiqua]{x76}}}^{2}\end{array}\right)\end{eqnarray} \tag{ 1.33 }$

1.3.4. Machine learning

Artificial intelligence (AI) has been discussed by scientists worldwide for decades. The main idea of AI initially suggests that machines ccould think like humans. AI is a simulation of human intelligence (HI) processed using a machine or computer. AI develops a computer system to do a task that involves a thought process like the human brain. In human intelligence, the cognitive process is complex, the decisions and thought processes are influenced by subjective factors. The decisions in AI are objective as they are not influenced by any motivation or emotions.

Machine learning is approach learning using mathematics equations linearly to make the decision. Machine learning is an AI. It describes an AI system that is taught to learn and make decisions by examining large amounts of input data. It makes calculated predictions based on analyzing the input information and performs tasks that are considered to require human intelligence. In the machine learning approach, the computer does not rely on the rules programmed by humans. It looks for certain patterns appearing in the data. These patterns are learned by the computer/machine and become a reference when the system is used.

Machine learning imitates human intelligence to make decision by providing the systems with the ability to automatically learn and improve from experience. Machine learning has the ability to generalize decisions using training processes. The training process is a repeated process of a dataset which contains input and target information. There are some machine learning alghorithms such as neural networks, perceptron, learning vector quantization, recurrent neural network, etc.

1.3.5. Neural network

1.3.5.1. Activation function

Neural network (NN) is an algorithm that is inspired by the workings of neurons in the human brain that are interconnected through chemical reactions. Each neuron will flow information that is connected to each synapsis. NN is an architecture consisting of many neurons that work together to respond to the inputs. Neurons are information processing units that form the basis of NN operations. NN consists of three layers, i.e., the input layer, the hidden layer, and the output layer. The input layer receives inputs (problems) from the outside and the output layer provides answers to the problems received by the input layer. The input layer can be connected to the output layer via a hidden layer.

Each NN neuron is activated by an activation function. The activation function determines the output of a unit. It converts the input signal into an output signal which will be sent to another unit [14]. Activation functions that can be used as NN activators include linear functions, binary stepping functions, bipolar functions, binary sigmoid functions, and bipolar sigmoid function. The output value in the linear function is the same as the input value. The linear function (figure 1.5) is formulated as follows.

$\begin{eqnarray}&&f\left(x\right)=x\end{eqnarray} \tag{ 1.34 }$

The binary steps function is used to convert the input of a variable whose value is continuous to a binary output (0 and 1). The binary steps function (figure 1.6) is formulated as follows [15]:

$\begin{eqnarray}&&f\left(x\right)={\{}_{0,\,{\rm{others}}}^{1,\,x\geqslant 0}\end{eqnarray} \tag{ 1.35 }$

**Figure 1.5.** Linear function.
Download figure:
Standard image High-resolution image

**Figure 1.6.** Binary step function.
Download figure:
Standard image High-resolution image

The bipolar function is similar to the binary step function. The difference lies in the output value produced. The bipolar function produces an output value of 1 and −1. Bipolar functions, as shown in figure 1.7, are formulated as follows:

$\begin{eqnarray}&&f\left(x\right)={\{}_{-1,\,x\lt 0}^{\,1,\,x\geqslant 0}\end{eqnarray} \tag{ 1.36 }$

**Figure 1.7.** Bipolar function.
Download figure:
Standard image High-resolution image

The binary sigmoid or log sig function has non-linear characteristics. It can be used to solve a complex problem with non-linear characteristics. The binary sigmoid function produces a value between 0 and 1 (figure 1.8). It is mathematically defined as follows.

$\begin{eqnarray}&&f\left(x\right)=\displaystyle \frac{1}{1+{e}^{-x}}\end{eqnarray} \tag{ 1.37 }$

**Figure 1.8.** Binary sigmoid function.
Download figure:
Standard image High-resolution image

The bipolar or tansig sigmoid function is an activation function similar to the binary sigmoid function, the difference is in the range of output values. Bipolar sigmoid functions produce output values between −1 and 1 (figure 1.9). Bipolar sigmoid functions are formulated as follows.

$\begin{eqnarray}&&f\left(x\right)=\displaystyle \frac{1-{e}^{-x}}{1+{e}^{-x}}\,\end{eqnarray} \tag{ 1.38 }$

**Figure 1.9.** Bipolar sigmoid function.
Download figure:
Standard image High-resolution image

1.3.5.2. Architecture

Network architecture is an arrangement of neurons arranged in the NN layer. There are three types of NN architecture., i.e., NN with a single layer (single layered network), NN with multiple layers (multi-layer network), and NN with competitive layers. Single layer NN (figure 1.10) is the simplest type of NN architecture. NN with a single layer does not use a hidden layer. The input layer is directly connected to the output layer. NN with a single layer is an advanced feed network or feedforward [16]. Figure 1.10 shows that there are $l$ neurons in the input layer and $n$ neurons in the output layers, meanwhile ${w}_{{ik}}$ is the weight connected input layer to the output layer with $i=1,2,\ldots ,l$ and $k=1,2,\ldots ,n$ . The weighted input signal ( ${z\_i}{n}_{k}$ ) to the neuron $\,{Z}_{k}$ is the sum of the signal neuron weight ${X}_{1},{X}_{2},\ldots ,{X}_{l}$ , then:

$\begin{eqnarray}&&{z{\rm{\_}}{in}}_{k}=\displaystyle \sum _{i=1}^{l}\displaystyle \sum _{k=1}^{n}{x}_{i}{w}_{{ik}}\end{eqnarray} \tag{ 1.39 }$

Activation ${z}_{k}$ from neuron ${Z}_{k}$ uses activation function f, then ${z}_{k}=f({{z\_in}}_{k})$ .

**Figure 1.10.** Single layered neural network.
Download figure:
Standard image High-resolution image

If a single layer NN does not have a hidden layer then in NN with multiple layers there is at least one hidden layer (figure 1.11). Signals from the input layer on NN with multiple layers are transmitted to the output layer through one or more hidden layers. Figure 1.11 shows there are l neurons in the input layer, m neurons in the hidden layer, and n neurons in the output layer. The weight that connects the input layer to the hidden layer is ${w}_{{ik}}$ with i=1,2,...,l and k=1,2,...,m. The weight that connects the hidden layer to the output layer is ${\mathit{\unicode[Book Antiqua]{x76}}}_{{kr}}$ with k=1,2,...,m and r=1,2,...,n. The weighted input signal ( ${y\_i}{n}_{k}$ ) to the neuron $\,{Y}_{k}$ is the sum of neuron signal weight ${X}_{1},{X}_{2},\ldots ,{X}_{l}$ then

$\begin{eqnarray}&&{y{\rm{\_}}{in}}_{k}=\displaystyle \sum _{i=1}^{l}\displaystyle \sum _{k=1}^{n}{x}_{i}{w}_{{ik}}\end{eqnarray} \tag{ 1.40 }$

**Figure 1.11.** Multiple layered neural network.
Download figure:
Standard image High-resolution image

Activation ${y}_{k}$ from neuron ${Y}_{k}$ uses activation function f, then ${y}_{k}=f({{y\_in}}_{k})$ . The weighted input signal ( ${z\_i}{n}_{r}$ ) to the neuron $\,{Z}_{r}$ is the sum of neuron signal weight ${Y}_{1},{Y}_{2},\ldots ,{Y}_{n}$ then

$\begin{eqnarray}&&{z{\rm{\_}}{in}}_{r}=\displaystyle \sum _{k=1}^{m}\displaystyle \sum _{r=1}^{n}{y}_{k}{\mathit{\unicode[Book Antiqua]{x76}}}_{{kr}}\end{eqnarray} \tag{ 1.41 }$

Activation ${z}_{r}$ from neuron ${Z}_{r}$ uses activation function f, then ${z}_{r}=f({{z\_in}}_{r})$ .

The NN architecture with competitive layers is different from single layer NN or multiple layer NN. The neurons in the NN with the competitive layer can be interconnected. One of the examples of competitive layered NN is the recurrent neural network (RNN). It is a NN that has a feedback link. The layer on the RNN includes the input layer, the output layer, and the hidden layer as in the other networks, but in the RNN there is at least one feedback layer. RNN can be divided into two, i.e., the Elman RNN network and the Hopfield RNN network. The Elman RNN network has feedback from the hidden layer to the input layer while the Hopfield RNN network has feedback at each layer.

1.3.6. Recurrent neural network

RNN is a neural network with a feedback link. There are four layers in RNN: input layer, hidden layer, output layer, and feedback link. Activation function is used to connect one layer to another layer. The feedback link accommodates the network output to be re-input to the network. There are two RNN networks, namely the Hopfield network and the Elman network. The Hopfield network is a network single layer feedback with symmetrical weights introduced by John Hopfield in 1982. In this chapter, the researchers use RNN from the Elman Network which has a feedback link from the hidden layer to the input layer. The RNN architecture with the Elman Network is shown in figure 1.12.

**Figure 1.12.** Recurrent neural network architecture. Copyright (2017) IEEE. Reprinted, with permission, from [4].
Download figure:
Standard image High-resolution image

1.3.7. Mean square error

Mean square error (MSE) measures the average of the errors of the network. It is the average squared difference between target and output of the network. The smaller the MSE value from which model is used for classification, the better the resulting classification. MSE is formulated as follows [17]:

$\begin{eqnarray}&&{\rm{MSE}}=\displaystyle \frac{1}{n}\displaystyle \sum _{t=1}^{n}{({g}_{t}-{z}_{t})}^{2}\end{eqnarray} \tag{ 1.42 }$

${g}_{t}$ is the target, ${z}_{t}$ is the output, t =1,2,...n, and n = numbers of observed data.

1.3.8. Sensitivity, specificity, and accuracy

The possibilities that can happen in a diagnostic test are shown in table 1.2. There are four results for diagnostic tests. There are true positive (a) which means that sick patients are correctly identified as sick, false positive (b) which means that healthy patients are incorrectly identified as sick, false negative (c) which means that sick patients arec incorrectly identified as healthy, and true negative (d) which means that healthy patients are correctly identified as healthy.

Table 1.2. Diagnostic test.

Measure	True situation
Measure	Performance indicator present	Performance indicator absent
Positive	True positive (a)	False positive (b)
Negative	False negative (c)	True negative (d)
Total	(a)+(c)	(b)+(d)

The sensitivity is the proportion ratio of true positive over the total number of true positive and false negative. It is mathematically calculated as follows:

$\begin{eqnarray}&&{\rm{sensitivity}}=\displaystyle \frac{a}{a+c}\end{eqnarray} \tag{ 1.43 }$

The specificity is the proportion ratio of true negative over the total number of false positive and true negative. It is mathematically formulated as follows:

$\begin{eqnarray}&&{\rm{specificity}}=\displaystyle \frac{d}{b+d}\end{eqnarray} \tag{ 1.44 }$

The accuracy is the proportion ratio of the total number of true positive and true negative over the whole total number of cases. It is measured by:

$\begin{eqnarray}&&{\rm{accuracy}}=\displaystyle \frac{a+d}{a+b+c+d}\end{eqnarray} \tag{ 1.45 }$

1.4. Dataset

The Japanese Society of Radiology Technology (JSRT) released the Digital Image Database [7]. It is an open-access database of lung image radiology. This research uses JSRT dataset for as many as 100 lung images with details as follows:

1.
the image format is *.jpg,
2.
grayscale image,
3.
the matrix of the image is 2048 × 2048.

The dataset is divided into three part (figure 1.13) and each image is named with prefix format as follows:

1.
35 normal lung images, named with the prefix a, such as a-1.jpg; a-2.jpg; a-3.jpg, etc
2.
35 benign lung images, named with the prefix b, such as b-1.jpg; b-2.jpg; b-3.jpg, etc
3.
35 malignant lung images, named with the prefix c, such as c-1.jpg; c-2.jpg; c-3.jpg, etc

There were 100 lung images from the Japanese Society Radiology and Technology used in this paper (35 normal lung images, 33 benign lung images, and 32 malignant lung images). The images are grayscale images. Figures 1.14, 1.15, and 1.16 are the example of the normal lung image, benign lung image, and malignant lung image, respectively. The length of the matrix is 2048 that is ${2}^{J}$ where J=11. So that the image matrix can be decomposed with Haar Wavelet and Daubhecies Wavelet until level 10.

**Figure 1.14.** Normal lung image. Credit: Japanese Society of Radiological Technology.
Download figure:
Standard image High-resolution image

**Figure 1.15.** Benign lung image. Credit: Japanese Society of Radiological Technology.
Download figure:
Standard image High-resolution image

**Figure 1.16.** Malignant lung image. Credit: Japanese Society of Radiological Technology.
Download figure:
Standard image High-resolution image

1.5. Modeling wavelet recurrent neural network for lung cancer nodule classification

1.5.1. Image denoising using wavelet

The denoised lung images with Haar Wavelet and Daubh4 Wavelet level 1 to level 10 were shown in figure 1.17 and 1.18. Figure 1.17 and 1.18 show denoised images were more obscure as the level increase. In this chapter we chose Haar Wavelet Level 4 for the denoising step and Daubh4 Wavelet Level 6. Another Haar Wavelet level and Daubh4 Wavelet level may be chosen for other research.

**Figure 1.18.** Denoising with Daubh4 wavelet.
Download figure:
Standard image High-resolution image

1.5.2. Wavelet recurrent neural network for lung cancer classification

WRNN modelling begins with the process of image denoising with the wavelet method. The denoised image is then extracted by the GLCM method. Feature extraction is used as the input variables in the classification process with RNN. The classification process uses the stage of the lung image as the target variable. Targets and outputs of the network are numerical data, namely 1 for normal lung images, 2 for benign lung images, and 3 for malignant lung images. In the classification process, there are three criteria, that is if the output value (z) is 1<y⩽1,5 then coded by 1 (the classification result is normal), if the output value (z) is 1,5<y⩽2,5, then coded by 2 (the classification result is benign), and if the output value (z) is 2,5<y<3, then coded by 3 (the classification result is malignant). The next step is dividing the data. It was divided into training data (75 images of the lungs) and testing data (25 images of the lungs). The network used is RNN with backpropagation algorithm. The next procedure of choosing the best model is as follows.

1.
Choosing the denoised image: The denoised image can be directly extracted by the GLCM method. However, the denoised image can be transformed as a binary image before the extraction process. Table 1.3 describes the MSE of classification process with denoised image directly and binary image. Based on table 1.3, the best MSE is given by the extraction using Matlab software and without binarization process. Then the next step is using the 14 features extraction resulted from Matlab Software and without binarization process.
2.
Choosing how many neurons in the hidden layer: Choosing how many neurons in the hidden layer is done by calculating the MSE value for the training data and testing the data as follows (table 1.4).Based on table 1.4, the RNN network with three neurons for the Haar Wavelet is better than the RNN network with other neurons in the hidden layer. Menawhile for the Daubh4 Wavelet, it is chosen by using only one hidden layer on the network.
3.
Input Elimination: Table 1.5 is the MSE value for data training and data testing for the elimination input process.Based on table 1.5, 14 features are chosen as input variables for Haar wavelet and 13 Features (without X10) as input variables for the Daubh4 wavelet.

Table 1.3. MSE for choosing the denoised image.

Extraction	Wavelet	MSE data training	MSE data testing
With binarization	Haar	0.483 27	0.638 45
With binarization	Daubh4	0.397 73	1.1982
Without binarization	Haar	0.467 22 ^*	0.573 63 ^*
Without binarization	Daubh4	0.398 64 ^*	0.412 26 ^*

^*Chosen model.

Table 1.4. MSE value for choosing how many neurons in the hidden layer.

Haar wavelet			Daubh4 wavelet
Neuron	MSE data training	MSE data testing	Neuron	MSE data training	MSE data testing
1	0.467 22	0.573 63	1	0.398 64 ^*	0.412 26 ^*
2	0.415 38	0.512 01	2	0.315 49	0.484 66
3	0.382 06 ^*	0.455 53 ^*	3	0.357 19	0.477 06
4	0.345 26	0.584 08	4	0.315 85	0.523 37
5	0.292 48	0.696 55	5	0.308 12	1.5948

^*Chosen model.

Table 1.5. MSE value for input elimination.

Haar wavelet
Elimination	Input (features)	MSE training	MSE testing
	14	0.382 06 ^*	0.455 53 ^*
${X}_{8}$	13	0.369 33	0.572 39
${{\boldsymbol{X}}}_{{\boldsymbol{9}}}$	13	0.357 07	0.438 78
${X}_{3}$	13	0.469 62	0.912 22
${X}_{5}$	13	0.4859	0.605 63
${X}_{13}$	13	0.374 78	0.6787
Daubh4 wavelet
${\rm{Elimination}}$	${\rm{Input}}({\rm{features}})$	${\rm{MSE\; training}}$	${\rm{MSE\; testing}}$
$-$	$14\,$	$0.398\,64$	0.412 26 ^*
${X}_{10}$	13	0.394 94 ^*	0.411 05 ^*
${X}_{9}$	13	0.398 98	0.411 54
${X}_{3}$	13	0.458 64	0.641 94
${X}_{5}$	13	0.423 06	0.537 84
${X}_{13}$	13	0.442 13	0.7417
${X}_{9},{X}_{8}$	12	0.401 71	0.8217
${X}_{9},{X}_{3}$	12	0.458 35	0.838 02
${X}_{9},{X}_{5}$	12	0.371 95	0.603 21
${X}_{9},{X}_{13}$	12	0.475 99	0.486 46

^*Chosen model.

1.6. Results and discussion

The classification results using Haar WRNN and Daubh4 WRNN are shown in table 1.6 for training data and table 1.7 for testing data. Based on table 1.6 and table 1.7, the sensitivity, specificity, and accuracy of the model can be calculated (table 1.8 and table 1.9).

Table 1.6. Diagnostic test for training data.

	TP	FP	FN	TN
Haar WRNN	45	9	3	18
Daubh4 WRNN	46	14	2	13

Table 1.7. Diagnostic test for testing data.

	TP	FP	FN	TN
Haar WRNN	15	2	2	6
Daubh4 WRNN	14	3	3	5

Table 1.8. Sensitivity, specificity, and accuracy for training data.

	Sensitivity	Specificity	Accuracy
Haar WRNN	93.75%	66.67%	84%
Daubh4 WRNN	95.83%	48.15%	78.67%

Table 1.9. Sensitivity, specificity, and accuracy for testing data.

	Sensitivity	Specificity	Accuracy
Haar WRNN	88.24%	74%	84%
Daubh4 WRNN	82.35%	62.5%	76%

As shown in table 1.8 the accuracy when using Haar WRNN for training data is better than the accuracy when using Daubh4 WRNN for training data. The sensitivity, specificity, and accuracy using Haar WRNN for training data is respectively 93.75%, 66.67%, and 84%.

As shown in table 1.9 the accuracy using Haar WRNN for testing data is better than the accuracy using Daubh4 WRNN for tresting data. The sensitivity, specificity, and accuracy using Haar WRNN for testing data is respectively 88.24%, 62.5%, and 84%.

The highest accuracy value is given by WRNN model with Haar Wavelet that is 84% for tarining and testing data. However this model is not optimal to use as an indicator when the result is negative (normal lung) and when the result is positive (lung cancer). The accuracy of WRNN model was better than the previous research using NN [5] and RNN [6] which have accuracy respectively 80% and 81.33%.

1.7. Conclusion

The best WRNN model for lung classification was WRNN with three hidden layers and 14 features extraction as the input variables. The results it provided of sensitivity, specificity, and accuracy were respectively 93.75%, 66.67%, and 84% for training data and 88.24%, 75%, and 84% for testing data. The accuracy of WRNN model was better than the previous research using NN [5] and RNN [6] which have accuracy respectively 80% and 81.33%. Elaboration using other wavelet models such as Daubhecies, Symlet, or Coiflet can be done to denoise images. Meanwhile other image extractions and other neural networks such as Hopfield network and the convolutional neural network can be developed to get better accuracy in terms of lung cancer classification.

References

[1]World Health Organizations 2018 https://who.int/news-room/fact-sheets/detail/cancer
- Go to reference in chapter
[2]2020 Int. Agency for Research on Cancer World Cancer Report 2020 ed C P Wild, E Weiderpass and B W Stewart (Lyon: World Health Organization)
- Go to reference in chapter
[3]Udhesani K A G, Meegama R G N and Fernando T G I 2011 Statistical feature-based neural network approach for the detection of lung cancer Int. J. Image Process. vol.5 425–34
- Go to reference in chapter
[4]Nurtiyasari D, Rosadi D and Abdurakhman 2017 The application of wavelet recurrent neural network for lung cancer classification 2017 3rd Int. Conf. on Science and Technology - Computer (ICST), Yogyakarta 127–30
- Go to reference in chapter
- Crossref
[5]Miah M and Yousuf M A 2015 Detection of lung cancer from CT image using image processing and neural network Int'l Conf. on Electrical Engineering and Information & Communication Technology 1–6
- Go to reference in chapter
[6]Nurtiyasari D 2014 The application of recurrent neural network model and recurrent neuro fuzzy model for lung cancer nodule classification BSc Thesis Yogyakarta State University
- Go to reference in chapter
[7]1997 Japanese Society of Radiology Technology Digital Image Database
- Go to reference in chapter
[8]Russell S J and Norvig P 2010 Artificial Intelligence: A Modern Approach 3rd edn (Englewood Cliffs, NJ: Prentice-Hall)
- Go to reference in chapter
[9]Harralick R M, Shanmugam K and Dinstein I 1973 Texture features for image classification IEEE Trans. on System, Man and Cybernet. 3 610
- Go to reference in chapter
- Crossref
[10]Daubechies I 1992 Ten lectures of wavelets CBMS-NSF Regional Conf. Series in Applied Mathematics, vol. 61 (Philadelpia: Society for Industrial and Applied Mathematics (SIAM))
- Go to reference in chapter
[11]Burrus C S et al 1998 Introduction to Wavelets and Wavelet Transforms. (Englewood Cliffs, NJ: Prentice-Hall)
- Go to reference in chapter
[12]Bruce A and Gao H 1996 Applied Wavelet Analysis with S-PLUS. (New York: Springer)
- Go to reference in chapter
[13]Walker J S 2008 A Primer on Wavelets and their Scientific Applications. (Boca Raton, FL: Chapman and Hall (CRC Press))
- Go to reference in chapter
[14]Fausett L 1994 Fundamentals of Neural Network-Architectures, Algorithms, and Applications. (Englewood Cliffs, NJ: Prentice-Hall)
- Go to reference in chapter
[15]Yeung D S et al 2010 Sensitivity Analysis for Neural Network. (Berlin: Springer)
- Go to reference in chapter
[16]Haykin S 1999 Neural Networks and Comprehensive Foundation. (New York: Prentice-Hall)
- Go to reference in chapter
[17]Hanke J E and Wichern D W 2005 Bussiness Forecasting 8th edn (Englewood Cliffs, NJ: Prentice-Hall)
- Go to reference in chapter

Export references: BibTeX RIS

Footnotes

*
Chosen model.
*
Chosen model.
*
Chosen model.

Lung cancer classification using wavelet recurrent neural network

Chapter navigation

Export citation and abstract

Permissions

Share this chapter

Dates

Chapter DOI

Books links

Abstract

1.1. Introduction

1.2. Lung cancer and lung image

1.2.1. Lung cancer

1.2.2. Lung image

1.2.2.1. RGB image

1.2.2.2. Grayscale image

1.2.2.3. Binary image

1.2.3. Image processing

1.3. Classification process

1.3.1. Classification

1.3.2. Features extraction

1.3.3. Wavelet

1.3.3.1. Haar wavelet

1.3.3.2. Daubhecies wavelet

1.3.3.3. Image denoising with wavelet

1.3.4. Machine learning

1.3.5. Neural network

1.3.5.1. Activation function

1.3.5.2. Architecture

1.3.6. Recurrent neural network

1.3.7. Mean square error

1.3.8. Sensitivity, specificity, and accuracy

1.4. Dataset

1.5. Modeling wavelet recurrent neural network for lung cancer nodule classification

1.5.1. Image denoising using wavelet

1.5.2. Wavelet recurrent neural network for lung cancer classification

1.6. Results and discussion

1.7. Conclusion

References

Footnotes