3D Texture Segmentation using Supervised Methods

Supervised learning methods have been widely used for image classification in various fields, including medical and industrial sectors. Some of these methods are traditional and possess certain limitations when addressing complex problems. The most common and effective approaches involve Convolutional Neural Networks (CNNs), such as U-Net. However, most studies employ CNNs in their 2D structures, which can impose limitations in classifying 3D objects. The purpose of this paper is to propose the utilization of 3D CNNs to potentially enhance the classification of 3D data, particularly X-ray micro-computed tomography images of reservoir rock samples. Our focus is on examining the performance of the 3D U-Net architecture, a supervised classification approach, in segmenting various 3D rock textures.


Introduction
Carbonate reservoirs are known for their complex petrophysical behaviours, which arise from their heterogeneous pore structures at several length scales.As a consequence, petrophysical characterization, data acquisition, petrophysical characterization, and reservoir classification tend to be extremely challenging [1][2][3][4][5][6][7][8][9][10].However, estimating the petrophysical features of carbonate reservoir rocks is one of the priorities in the oil industry.Hence, we will propose some supervised textural classification algorithms using machine learning methods to classify rock samples.
Computed tomography has become widely used in scientific medical research and industry for its nondestructive and high-resolution means of detecting internal structure [11][12][13][14][15][16][17][18][19].There is a large volume of research which exploits the use of texture for different purposes, like segmentation or classification in different modalities such as magnetic resonance imaging (MRI), ultrasound, computed tomography (CT), microscopy, and others.In particular, Digital Rock Physics (DRP) is a complementary tool to conventional laboratory measurements of rock properties from Micro computed tomography images.
Supervised learning algorithms require a set known as training data.In the case of image segmentation, these sets include images and their corresponding labels or masks.Supervised algorithms use these data to learn how to generate accurate labels for new data.Some are traditional methods such as Support Vector Classifiers.However, these methods do not work well with large data, and they are sensitive to noise.Here is where the deep learning algorithms arise.They are more advanced techniques with great aspirations that overcome the limits of traditional methods.
With the development of new deep learning technologies in recent years, deep learning has become a common tool for image classification of carbonate rocks.For instance, convolutional neural networks (CNNs) are one type of learning-based technique that is highly adapted to texture classification.One of the most popular CNNs is the U-Net architecture, which has received hundreds of citations in the short time after it was released [2].It is a frequently used tool for segmentation and analysis, and it has been implemented in various studies to segment rock samples.
The majority of the studies segmented 3D rock samples using U-Net and other CNNs in their 2D structure, which operates over 2D slices.However, one important limitation of 2D CNNs is that using 2D rock textures may fail to recognize some 3D patterns and rock features [3].As a result, most of them recommend performing additional research using 3D CNNs for the possibility of gaining a sharper picture and detecting more patterns.In this paper, we will introduce two 3D CNNs that have great promise for improving segmentation accuracy and apply one of them to segment different rock textures.

Deep Learning Algorithm: 3D Convolutional Neural Networks
A convolutional neural network (CNN) is a deep learning algorithm that is primarily used to analyse images, usually through supervised learning principles [4][5].Based on the size of the convolutional kernel that is used, CNNs for segmentation can be divided into different categories.2D CNNs use 2D operations and convolutional kernels to segment a single slice.But, since they only accept one slice as input, they are unable to leverage context from neighbouring slices.To predict segmentation maps, voxel information from nearby slices may be helpful, which raises the importance of 3D CNNs [5,21].This problem is solved in 3D CNNs by using 3D convolutional kernels for volumetric patching of a scan.Although this approach may improve performance, the increased number of parameters used adds to the computational cost.In this literature, we are going to introduce two 3D convolutional neural networks, 3D U-net and 3D U-ResNet.

3D U-Net
U-Net is a convolutional neural network architecture designed for image segmentation in biomedicine.It receives its name from its U-shaped construction, which contains a contracting and expanding path.The contracting path acts as a feature extractor and uses a series of convolutional and pooling layers to capture the context of the input image.On the other hand, the expanding path builds the segmentation mask by up sampling the features and concatenating them with the contracting path's corresponding features [5,20].The U-Net architecture was first described by Ronnerburger as shown in figure 1.

Figure 1. 2D U-Net architecture
Figure 1 shows the structure of a 2D Unet which executes 2D operations.In our case, we will be using an extended model with extra depth.3D U-Net was first developed to process volumetric data in medical data analysis.It is based on the standard 2D U-Net architecture.The only distinction is that the operations are now performed in 3D rather than 2D.In particular, 3D U-Net takes 3D volume as inputs and employs 3D convolution, 3D max-pooling, and 3D up-convolutional layers, as shown in figure 2 [6].

Figure 2. 3D U-Net architecture
Before discussing the 3D architecture, let's consider the following examples to understand how the 3D operations are performed.Figure 3 illustrates an example of a 3D convolution that takes a 4x4x4 input matrix and produces a 3x3x3 output using a 2x2x2 kernel and a stride of 1.The submatrices of size 2 by 2 by 2 are multiplied element-by-element by the kernel K [7].

Figure 3. Example of 3D Convolution operation with kernel of size 2x2x2
Another important concept in CNNs is pooling layers.They compress a large image into a smaller image.One common type of pooling operation is max pooling, which decreases an image's dimensions by selecting the highest value from non-overlapping sections of the image [22].Figure 4 illustrates a 2 by 2 by 2 max pooling operation executed on a 4 by 4 by 4 matrix [8].The procedure consists of determining the highest value within each non-overlapping 2 by 2 by 2 sub-matrices, which constitutes the larger 4 by 4 by 4 matrix.As a result, the output obtained is a compressed 2 by 2 by 2 matrix that contains these maximum values.These are some of the main operations involved in our 3D Unet architecture shown in figure 2. The architecture forms a U shape that describes the two paths involved in the network.In the encoder path, there are two 3 x 3 x 3 convolutions in each layer, followed by a ReLU and a 2 × 2 × 2 max-pooling with strides of two in each dimension.In the decoder path, each layer consists of an up convolution of 2 × 2 × 2 by strides of two in each dimension, followed by two 3 × 3 × 3 convolutions, each followed by a ReLu.The encoder path provides essential high-resolution features to the decoder path by using shortcut connections from layers of equal resolution.The last 1 × 1 × 1 convolution reduces the number of output channels to the number of labels, which is 3. Before each ReLU, each block uses batch normalization.During training, each batch is normalized with its mean and standard deviation, and then these values are used to update global statistics [6].

3D U-ResNet
U-ResNet is a network that combines two CNNs, U-Net and ResNet.It takes a combination of short and long skip connections to allow networks to scale well as complexity increases.Since we have already seen the architecture of U-net, we will include a brief introduction to Resnet to understand U-ResNet [9].ResNet refers to the residual network.Its architecture is not in the form of a symmetric encoderdecoder, as it has a significantly larger encoding part.The encoder part is composed of repeated residual blocks linked by short skip connections.The residual blocks have a pair of convolution layers followed by a batch normalization layer and a ReLU activation function.The architecture of ResNet used in [9] is shown in figure 5.The core idea of ResNet is that it introduced short skip connections between blocks, which allows the network's accuracy to increase steadily with network depth.This network is used for image classification as well as super-resolution in a modified form [9].

Figure 5. The architecture of ResNet
The main strength of U-Net and ResNet is the usage of the skip connection, which has gradually increased their performance.Figure 6 shows the architecture of U-ResNet, which integrates the two networks' primary strengths-short and long skip connections-into a single network.This network is designed in 3D and uses 3D operations to improve 3D object segmentation accuracy [9].

Optimization Algorithm
The Adam optimizer is a mathematical technique used in neural network training.It combines the benefits of two different algorithms: AdaGrad and RMSProp.During the training process, the Adam optimizer adjusts the network's learning rate so that it can converge to the optimal set of parameters more effectively.This optimizer has been widely used since it's easy to implement, requires small memory storage, and works effectively for problems with lots of parameters or large datasets [10,23].
[10] introduced Adam and summarized the algorithm in the pseudo-code shown in figure 7.At first, Adam requires to predefine the following parameters: • : the step size.
•  1 and  2 : the exponential decay rates for the first and second moment, respectively.
• : a small number to avoid division by zero.
The recommended settings for these parameters are depicted in figure 7. A function , which represents the loss function for the model to be trained, is also required by the ADAM method.It represents the difference between the actual labels and those predicted by the network. denotes the loss function parameters.

Figure 7. ADAM Algorithm
ADAM's algorithms then begin by initializing  0 and  0 to zero.mt represents the first moment at time t, which is an estimate of the mean of the gradient.  represents the second moment at time t, which is an estimate of the variance of the gradient.In each iteration, ADAM tracks the gradient of the loss function and updates the values of   and   Since the algorithm initializes the moments to 0,   and mt tends to be biased to 0. To overcome this issue, the moments are divided by bias corrector coefficients.Lastly, these values are used to update the parameter vector .The process is repeated until we either obtain the ideal settings with the minimum loss, or we reach the last training iteration [10].

Network Configuration
In this research, we decided to implement the 3D U-Net we introduced earlier due to its significant advantages over the other methods.To prepare for testing, we must first establish the network configuration we will be utilizing.We will be implementing a 3D U-Net architecture with the help of the built-in unet3dLayers function in MATLAB.This function builds a 3D U-Net network with a pixel classification layer, which predicts categorical labels for each pixel within a volumetric image.The network consists of a total of 55 layers, as indicated in Table 1.
As mentioned earlier, our optimization method will be based on the Adam optimizer as it combines the advantages of both Adagrad and RMSprop.It will update the internal parameters of the 3D U-Net to improve its accuracy.The results will be evaluated 50 epochs with setting the initial learning rate to 10 −3 and with a mini-batch of 16.

Training stage
To train the 3D U-Net described in the previous section, we will use a diverse set of textures originating from rock samples and 3D geometries of size 256x256x256.A slice from each texture is shown in figure 8.The geometries include synthetic fractures, sphere packing and sand-pack different-sized particles, representing a range of systems.We will generate two montage blocks consisting of four 3D texture datasets each, sized at 512x512x256, along with corresponding labels that assign a unique label to each dataset.A slice from each block is displayed in figure 9. We will use 25% of the data for testing and evaluating the network's performance, while the remaining 75% will be used for training.Specifically, we will employ a 512x512x192 image for training and 512x512x64 for testing.The training data will be generated by iterating over 64x64x64 regions of the training image.

Blocks as validation data
To evaluate the performance of the U-Net, we assembled two blocks comprising four distinct textures, encompassing rock samples textures and 3D geometries that were artificially created.Some of these textures were binary in nature, so we converted them to grayscale by multiplying them with a random number ranging from 0 to 255.The resulting images were sized at 256x256x256 pixels and combined with the mask shown in figure 10 to create a composite image arrangement.

Model Quality Assessment
During the validation, the performance of the model will be assessed using a range of measurements, such as the confusion matrix, intersection over union (IoU), and pixel accuracy.The confusion matrix is a tabular illustration that shows the number of pixels that were correctly and incorrectly classified for each class.The IoU calculates how much each class's actual segmentation and expected segmentation overlap.The pixel classification will be calculated based on the class labels assigned to each selected texture, and the following metric will be used to evaluate the model's performance: where TP stands for True-Positives, TN stands for True-Negatives, FP stands for False-Positives, and FN stands for False-Negatives.The results obtained for the 3D texture segmentation using the described 3D U-Net architecture are quite impressive as shown in figures 11 and 12. Almost each texture has a single unique colour, which indicates that it was classified correctly with minor errors.Tables 2 through 5 measure the accuracy of the network using several measurements.Tables 2 and 4 show that the first montage gave a Global Accuracy of 0.98956, and the second montage gave a Global Accuracy of 0.99469.This indicates that the model predicted the correct label for nearly 99% of the voxels in the segmented volumes.This is an impressive result, indicating that the model has learned to segment the 3D textures with a high degree of accuracy.The Mean Accuracy for both montages are also high indicating that the network is able to accurately classify the different classes in the textures.
The Mean IoU values for both montages are also quite high, with values of 0.97943 and 0.98946 respectively.This indicates that the network is able to accurately segment the textures, with a high degree of overlap between the ground truth and predicted segmentation.The Weighted IoU values for both montages are also high, indicating that the network is able to accurately segment the textures across all classes, with a higher weight given to the more important classes.
The Confusion Matrix shows the number of true positive, false positive, false negative, and true negative predictions for each class.Results show that the network is able to accurately classify the different classes in both montages.For example, in the first montage, class 4 has many true positives, with only 2 false negatives and 8,517 false positives.This indicates that the network is able to accurately identify this class.Similarly, in the second montage, class 3 has many true positives and very few false positives or false negatives, indicating accurate classification.
The Mean BF-Score is also high for both montages, with values of 0.92677 and 0.96342 respectively.This indicates that the model was able to accurately segment the boundaries between different textures.
Overall, the results obtained for the 3D texture segmentation are very promising.The high Global Accuracy, Mean Accuracy, Mean IoU, Weighted IoU, and Mean BF-Score values indicate that the network is able to accurately segment the textures and accurately classify the different classes in both montages.These results suggest that the 3D U-net can be an effective tool for 3D rock texture segmentation and can be used in a variety of applications such as mineral exploration, rock characterization, and geological mapping.

Conclusion
In conclusion, this paper explored the task of 3D texture segmentation using supervised methods.Most traditional approaches have limitations as they are sensitive to noise and may not work well with complex tasks.To overcome these limitations, Convolutional Neural Networks (CNNs) were introduced as a powerful alternative.
Specifically, the paper focused on an advanced architecture for 3D texture segmentation: 3D U-Net.The model demonstrated superior performance in capturing both local and global contexts by incorporating skip connections.It was then selected for implementation on rock textures.The ADAM optimizer was employed to train the model, enabling efficient convergence and improved generalization.The results were highly promising, with the implemented 3D U-Net achieving an impressive global accuracy of approximately 99% on the two rock textures montages.The accuracy results indicate that the network can be able to accurately segment the textures and accurately classify the different classes in both montages.The success of the 3D U-Net model in accurately delineating rock textures further supports its potential in various domains requiring precise segmentation of complex 3D structures.
However, it is important to acknowledge some potential limitations and avenues for future research.Training the two models was very time-consuming.Training the 3D U-net took about 27 hours using the first block and over 30 hours with the second block.A suggestion might be to reduce the number of iterations to speed up the training process.Also, it would be beneficial to evaluate the performance of the proposed methods on diverse textures and datasets to ensure generalizability.Additionally, further validation and testing may be necessary before the model can be used in real-world applications.

Figure 6 .
Figure 6.The architecture of U-ResNet

Figure 8 .
Figure 8.A slice from each texture used in training stage.

Figure 10 .
Figure 10.Illustration of the mask that corresponds to the montage of textures

Table 2 :
Accuracy results for block 1.

Table 4 :
Accuracy results for block 2.