DNA Microarray Image Segmentation Using Markov Random Field Algorithm

A deoxyribonucleic acid (DNA) microarray image requires a three-stage process to enhance and preserve the image’s important information. These are gridding, segmentation, and intensity extraction. Of these three processes, segmentation is considered the most difficult, as its function is to differentiate between features in the foreground and background. The elements in the foreground form the object or the vital information of the image, while the background features less critical information for DNA microarray image analysis. This paper presents a study that utilises the Markov random field (MRF) segmentation algorithm on a DNA microarray image. The MRF algorithm evaluates the current pixel depends on its neighbouring pixels. The experimental results show that the MRF algorithm works effectively in the segmentation process for a DNA microarray image.


Introduction
Scientists can investigate thousands of gene expressions simultaneously using a DNA microarray image [1]. Initially, these gene expressions are kept on a glass slide containing thousands of probes [2]. The glass slide is then used to perform hybridisation between two samples. The two cDNA (complementary DNA) samples are stained with different fluorescent dyes; Cy3 dye is used for the normal sample, and Cy5 dye is used for the malignant sample [3]. When the hybridisation step is completed, the DNA microarray image is created, and the intensity of the spots on the image is calculated. The intensity of the dots shows their state, and the aggregate results allow scientists to evaluate and study gene expression [4], [5]. A high-quality DNA microarray image is required to generate this information.
The DNA microarray image may become polluted during the scanning procedure, compromising gene expression analyses [6]. One way for improving and optimising the microarray image is image processing [7]. The processing of the microarray image is divided into three parts. First, gridding (addressing) is employed to determine each spot's location. Second, segmentation to detect the features in the image's foreground (object). Finally, intensity extraction is employed to calculate the intensity of each spot [8].
An MRF segmentation for a DNA microarray image is presented in this research. Based on the experimental results, the performance of this algorithm is then evaluated. Section II introduces and explores various ways to image segmentation using MRF algorithms. Section III describes the approach employed in this investigation. Section IV discusses and analyses the experimental data, and Section V concludes this study.

Markov Random Field
The MRF algorithm evaluates the current pixel value, taking into consideration the neighbouring pixels [14]. Figure 1 shows an example of a neighbourhood system, where N0 denotes the site of interest and N1, N2, N3, N4, and N5 its neighbours. This neighbourhood system and its groupings, known as cliques, can be understood as follows. Pixels labelled N1 indicate the sites of the first-order neighbourhood system, as shown in Figure 1 (a). Pixels labelled N1 and N2 indicate the sites of the second-order neighbourhood system, as shown in Figure 1 (b). Figure 1 (c) shows the nth-order neighbourhood system, in which n = 5. The neighbouring sites can be viewed as a single element enclosing the site (N0). Figure 2 shows some examples of several types of a clique; the cliques for the first-order neighbourhood system are shown in (a). The cliques in (a), (b), (c), and (d) constitute the second-order neighbourhood system. This shows that the number of types of clique increases as the order of the neighbourhood increases. The probability in equation (1) below is defined as the Gibbs distribution [14]. The parameter Z is the normalising constant, β is a positive constant, and U(x) is the energy function also knows as Gibbs energy [15]. where Vc(x) denoted the sum over all clique potentials of the given neighbourhood systems. The s and n are the neighbours of each other, which the only n is the elements of neighbourhood systems [15].
Equation (5) defined the Bayesian theorem, represented by equation (6). The P(Y) have a total probability that equal to one, and thus considered to be a constant. Therefore, the posterior probability P(X|Y) is proportional to the prior probability P(X) and likelihood probability P(Y|X), as expressed in equation (7) [15].
Equation (8) defined the conditional probability P(Y|X) follow a Gaussian distribution by considering the image intensity representing either the foreground or background [15].
where y is the observed image, and μs and σs are the parameters of the distribution of the xs. Equation (9) is the maximum a posteriori (MAP) estimation of posterior given by equation (7). Then equation (10) produced by substitute equation (1) and equation (8) into equation (9). Then the equation (10) is optimising further by taking the negative and generates the minimisation of the equation as stated in equation (11) [15].

Methodology
This section discusses the segmentation of a DNA microarray image [16] (2200×7300 pixels) using the MRF algorithm [15]. Figure 3 shows a fragment of an image of size 446×431 pixels from the DNA microarray image, which is used as the input image for this work. The yellow box shown in figure 3 is the worst case for this segmentation process. Figure 4 presents a flow chart of MRF segmentation. Firstly, the input image is converted into a greyscale image. Then, an initial labelled image, based on the input image, is generated. Next, MRF segmentation is applied to the initial labelled image to generate a new labelled image. This process continues until the maximum number of iterations is reached. Finally, the segmented image is produced after the iteration is completed. In this study, a second-order neighbourhood system was chosen, and the iteration was set to 5. All experiments were performed using MATLAB R2019a software on a Windows 7 operating system with a 2.50GHz Intel Core i5 CPU and 8GB of RAM.   The first labelling was generated as part of the initial segmentation of the input image, with the initial foreground labelled as two, and the initial background labelled as one. In this study, the labelling process were using K-means algorithm. In this investigation, the second order neighbourhood system is selected to guide the Gibbs energy, as stated in equation (3). This system allows the comparing method only included the neighbourhood in labelled N1 and N2 for each N0, as shown in figure 5 (a). Following the system, the comparing method will be using horizontal, vertical, and diagonal pair-site to calculate the Gibbs energy, as shown in Figure 5   The total energy is the summation between the Gibbs energy and the log-likelihood, which can be calculated as stated in equation (8) of each labelled possibility. The possibility refers to the site's label (N0), label '1' and '2'. After both total energies are computed, the label with minimum total energy is selected as the new site (N0) label. Then, the total energy of the following site is calculated till the new labelled image is generated. This new labelled image results from the MRF segmentation of the initial labelled image, and the process is repeated until the maximum iteration is reached. After five iterations, the final segmented image is generated. The label '2' of the segmented image representing the foreground while label '1' representing the background.

Results and Discussion
The previous section described the MRF algorithm and the steps used in the segmentation in this experiment. The segmentation experiment was performed using MATLAB simulation tools on a Windows operating system. The experimental results of all steps are presented here. Firstly, the microarray image is cropped to the size described above. Next, this cropped image is converted from true colour (RGB) to a greyscale image, as shown in Figure 3. Figure 6 presents the initial labelled image and the MRF segmentation result after five iterations. Firstly, the initial labelled image is generated based on the input image by using K-means algorithm, as shown in Figure 6 (a). The labels are determined by the value of the foreground and background means. A value of 2 is triggered if the intensity is close to the foreground mean; otherwise, one is triggered. Next, based on the initial labelled image, the likelihood energy and the Gibbs energy are computed. A new labelling image is produced based on the total energy computed. Finally, the process is repeated until the maximum iteration is reached, with the MRF segmentation results are shown in Figure 6 (b). The value of 2 representing the foreground while the value of 1 representing the background. Figure 6 (b) shows that the result improves the foreground identification compared to the initial labelled.

Conclusion
In this paper, a segmentation process based on the MRF algorithm is used, demonstrating that this approach is suitable for performing segmentation on a DNA microarray image. An evaluation of each pixel that considers its neighbourhood pixels offers improvements in classifying pixels into the foreground and background features. The results show that the MRF algorithm performs well in the segmentation of a DNA microarray image.