The Implementation of a Deep Neural Network (DNN) Approach in a Case Study Predicting the Distribution of Carbon Dioxide (CO2) Gas Saturation

Predicting the distribution of CO2 gas saturation is one example of how multiphase flow might be evaluated in Carbon Capture and Storage (CCS). The TOUGH2 simulator is one of the numerical simulations commonly used for multiphase flow simulation. Ordinary numerical simulations have several issues, including high grid spatial resolution and high processing costs. One of the most effective deep learning approaches to predicting the distribution of CO2 gas saturation is the deep neural network (DNN). A deep neural network is a network with three interconnected layers, there are input, hidden, and output layers. DNN learns about the previously constructed architecture from the input data. DNN requires a large quantity of data as input. Thus, in this study, we use 700 data points for each of the train_a and train_b variables. The distribution of CO2 gas saturation will be predicted automatically by the trained DNN model. This technique can handle complex data patterns, such as gas saturation in multiphase flow problems. The reconstruction loss findings show that the loss value decreases as the number of epochs increases. Furthermore, we used 3 and 4 epochs to determine the difference in results between the two. As a result, the model with 4 epochs and 10−3 regularization weight obtained the lowest error value of 0.4305. In summary, this model is capable of predicting CO2 gas saturation distribution, but more research is needed to produce more optimal results. This research hopes to help monitor multiphase flow in CCS systems in the future by forecasting the distribution of CO2 gas saturation.


Introduction
Carbon Capture and Storage (CCS) is a technology potentially reduce to reduce carbon emissions by up to 85% by 2050 [1].The way the CCS system works is that when it is finished being injected, the CO2 in the permeable storage reservoir will be controlled by several conditions (fluid pressure, temperature, composition, and stress field) as well as rock properties (porosity, permeability, density) [2].Above the permeable layer there is a seal (caprock) so that CO2 cannot move directly upwards.Then, CO2 will subsequently spread and steadily migrate upslope.Migration will proceed until it reaches a trap on the outermost layer, where CO2 will be collected.Therefore, multiphase flow is one of the things that needs to be analyzed in this system [3].The reason for this is that multiphase flow can be used to tackle subsurface flow issues [4].1307 (2024) 012026 IOP Publishing doi:10.1088/1755-1315/1307/1/012026 2 Subsurface geological heterogeneity can produce variations in permeability and capillary pressure [5].By analysing how well the fluid is able to move through the system, effective permeability is critical in multiphase systems in order to precise simulation and forecasting of the flow of fluids.[6].The low permeability value is caused by the existence of a low gas saturation level [7].
Multiphase flows are often simulated using numerical simulations [8].The most frequently used numerical simulation is the TOUGH2 simulator.Conventional computational simulations, have a number of restrictions, including higher grid spatial resolution [9], [7] along with expensive processing costs [10] One of the algorithms used for dealing with the inadequacies of conventional computational simulation is Deep Neural Network (DNN), a three-layer artificial neural network consisting of an input layer, a hidden layer, and an output layer [11].Each connection between neurons has a weighting in the form of random numbers, which is changed for each connection in each iteration.Weight figures are adjusted by providing feedback to the input node.There is also a bias at each layer to assist the machine in generalizing the learning process [12].In Figure 1, there is an activation function in a neural network.The activation function is a mechanism that sums all input signals and determines whether the sum has reached the threshold limit or not, so that later it will trigger the appearance of an output signal [13].The advantages of using the DNN technique, according to previous research, is that DNN results for image and voice recognition can be quite accurate [14], [15].Moreover, the DNN technique has the advantage of producing CO2 migration projections with comparable precision compared to conventional numerical models [16].

Figure 1. Perceptron concept
The reconstruction loss function must be utilized when developing the DNN model to demonstrate how well the network can repair data.The gap among the predicted and true numbers is calculated using common loss function or the Mean Square Error (MSE).The lower the numbers, the more capable the model is in reconstructing the input data, and the procedure seems to be equivalent to using the function of objective in the reverse issue.This algorithm has been chosen because greater carbon dioxide gas saturation in multiphase flows is usually linked with increased movement, necessitating exceptional accuracy in plume prediction [17].
The 3D temporal model is meant to obtain temporal data from the model to be predicted.The aforementioned model includes a temporal layer that is capable of simulating the depth of the temporal convolutional kernel, making it ideal for acquiring temporal data in the immediate, medium, and distant futures.The design is made up of three major components: an encoder, a processor, and a decoder [18].Therefore, the goal of this study is to accurately predict how CO2 gas saturation distributes using the Deep Neural Network (DNN) approach.As a result, it is expected that our methods will be able to address the increasing need to evaluate the storage of carbon dioxide.

Methodology
Anaconda (with Python 3.6 environment), Jupyter Notebook, and Microsoft Office are all used in this study.This study was conducted using computer hardware that met the following requirements: 

DNN architecture development
The first step starts by importing the libraries required for the script to execute as needed.Then, determine the regularization weight value that will be used.Developing the 3D temporal architecture is the next step in this process.The 3D temporal architecture consists of an encoder, processor, and decoder, as shown in the overall workflow in Figure 2. Following that, the architecture development process proceeds by compiling a specified layer to produce a model of Variational Autoencoder [19].
The final stage in constructing this architecture is to create the output model.

Data training process
The procedure for data training starting with importing the necessary libraries and loading the shapes based on the previously generated architecture model.Then input the data from both the test and train sets, an also shuffle it.Following the loading of the data sets, the procedure of establishing the loss function as well as determining the specifications for training is carried out.Several sub-steps must be completed in the process of determining training standards.Defining train data specification (epochs, batch size, learning rate), compiling the ADAM optimizer and loss function, calculating total loss, updating optimizer parameters, conducting training model iterations and model evaluation, and determining the directory for model output are all part of this process.The training step is maintained by iterating every single epoch and batch, with the model resulting from the training data being saved at the conclusion [19].

Process of data prediction and depiction
Importing the necessary library is the first step in the data projection and representation process.The steps are then completed by inputting test data to extract the test data anticipated by the model.Then, as a consequence of the preceding training procedure, load the trained model.The prediction procedure uses a combination of test data and the trained model.Afterwards, a plot of the projected results is generated to show the model's visualization [19].

Input Data Quality Analysis
A training dataset is required for an algorithm to carry out the learning process.A training dataset is one that is used to build a model during the "training" stage.After training using train data, a model must be evaluated with new data to determine its performance; this new data is generally referred to as test data.The test and training datasets in this study contains A and B data.The train_a and test_a data sets are made up of reservoir conditions (starting pressure, temperature, and formation thickness), geological model (permeability), and injection design (injection rate, injection time, and perforation thickness).Meanwhile, the data train_b and test_b are separate processing data that comes from the Eclipse (e300) program in the form of CO2 gas saturation data with 0 to 1 value.

DNN model analysis
The produced DNN model script can be assessed based on which layer or levels are used to construct the Deep Neural Network (DNN) architecture.The first is the 3D convolutional layer, which is used to execute convolution operations on three-dimensional input data.The following layer is reflection padding, which is applied to the input volume.When the convolution procedure is conducted, this layer retains the spatial dimension of the input.Then there's a batch normalization layer, which helps to speed up the training process by normalizing the input on each micro batch.The activation layer used next to activate the output from the prior layer.This DNN model was created with ReLU (Rectified Linear Unit) as its activation function.Following that is an add layer, which is responsible for implementing shortcut connections in the residual connection layer.Following that is an upsampling layer, which performs upsampling operations on the input data.
Furthermore, the hyperparameter values that can be examined from this DNN model include the amount of train and test data, the number of epochs, and the batch size.The number of train data used in this study is 700 for each of the train_a and train_b variables, while the test data used 300 data points for each test_a and test_b.There are three elements that influence the amount of data loaded in this training process.First, the train and test data are massive, preventing the computer from loading all of the data.Second, based on the availability and complexity of the data, the composition ratio of the amount of data used is 70:30.Third, the computer's limited ability to carry out the training process generated technical challenges when the procedure is loaded with greater quantities of data.Ideally, the more data that is used, the better the results will be.However, it must be tailored to the availability and complexity of the data, computer performance, and research duration.
The number of epochs chosen might also affect the training process and results.The greater the number of epochs used, the more features in the dataset used to train the model could acquire through appropriate repetition.Therefore, epochs 3 and 4 were used in this investigation.
In  Regarding Figure 5 above, it can be seen that there are 6 convolution layers used in the encoder, then in the processor there are 8 residual convolution layers, and there are 6 deconvolution layers in the decoder section.Convolution layer 1 uses a filter with a size of (3,3,3) and a number of filters of 32.In this layer, it uses stride 2, which means the number of kernel shifts in the input matrix is 2.Then, in convolution layer 2, it uses a filter with size (3,3,3) for a total of 64 filters.This layer uses stride 1, which means the number of kernel shifts in the input matrix is 1.And so on until deconvolution layers.

Figure 6. Summary of DNN model architecture
According to Figure 6 above, there are 4 columns that summarize the composition of the DNN model that has been created.In the first column, there is a list of layers used; in the second columns, the size of the output produced; in the third column, the number of parameters; and in the fourth column, information about where the layers are connected.For example, the input image that has been previously defied is (96, 200, 24, 1).96 are pixels in the height dimension, 200 are pixels in the width dimension, 24 are depth dimensions, and 1 refers to the colour channel (if 1 is usually a grayscale image).Therefore, in convolution layer 1, the output shape becomes (None, 48, 100, 12, 32).This size is reduced from the input size because this layer uses filters with sizes (3,3,3) for a total of 32 filters and uses stride (2,2,2).Convolution layer 1 is connected to the previous input layer.Then, to calculate the number of parameters, you can use the following formula: Param = (kernel size × kernel size × kernel size × input channels + 1) × number of filters Thus, the calculation for the number of parameters in convolution layer 1 becomes as follows: Param = (3 × 3 × 3 × 1 + 1) × 32 = 896.
These calculations also applies to convolution layer 2 and beyond.

Qualitative analysis predicting the distribution of CO2 gas saturation
The code that was utilized in this study to predict the distribution of CO2 gas saturation is a modified version of one produced by Wen et al (2021), which may be accessed at the following website (https://github.com/gegewen/ccsnet_v1.0).    Figure 7 to Figure 10 presents the results of this visualization, which differ significantly from the findings that are presented in the research paper provided by Wen et al [17].The visual representation of the distribution prediction results in the published appears identical to the numerical modelling results, demonstrating that the error is insignificant.However, in this case, the prediction results differed significantly from the original input.The differences in findings are attributable to variances in the different parameters employed.These distinctions include differences between the number of epochs, regularization weight values, batch sizes, and test numbers [19].This indicates that changing the hyperparameter has an effect on the prediction results.

Quantitative analysis predicting the distribution of CO2 gas saturation.
The application of parameters commonly seen in deep learning is required for quantitative analysis.The amount of reconstruction loss is the variable used in this investigation.Reconstruction loss is determined utilizing Mean Square Error (MSE) approaches to quantify the values of train and eval reconstruction loss [19].We only employed regularization weight of 10 -3 and 10 -5 in this study because of the results of the research revealed substantial differences between the two regularization weights.Figure 12 and Figure 13 show the calculated results with a weight regularization of 10 -5 .As the epoch increases, the value of the reconstruction loss is supposed to get lower.The ideal reconstruction loss value will be very close to zero.Generally, a model with three epochs will have a lower train and eval loss value as the number of epochs increases.The numerical amount of the train reconstruction is decreasing, according to the pattern.The trend in the eval reconstruction loss is similar, with the loss value getting smaller.Figure 12 further demonstrates that the eval reconstruction loss values is lower than the train reconstruction loss values.This signifies that the model outperformed the testing data [19].The model with four epochs has a much lower train and eval loss value as the number of epochs increases.The value of the train reconstruction loss is decreasing, according to the trend of the loss's worth.The eval reconstruction loss's value shows the same trend, with the loss value reducing.A further metric that could be used to calculate the amount of reliability of a model is the error value.The Root Mean Square Error, or RMSE for short, is a widely applied measurement of error calculations in deep learning models.To put it simply, the RMSE value is derived by taking the square root of the squared difference withing the predicted and actual test data values [19].These formulas result in the RMSE value, as shown in Table 1 below: Referring to the RMSE evaluation findings in Table 1, the model with a regularization weight of 10 -5 and 4 epochs has the smallest error value compared to the model with the same regularization weight and 3 epochs.Furthermore, the model with a 10 -3 regularization weight and 4 epochs has the lowest error values compared to the model with the same regularization weight and 3 epochs.When all models are compared, it is apparent that the model with a regularization weight of 10 -3 and 4 epochs has the lowest errors.In summary, when assessed against the remaining three models, the last one is most comparable to the numerical models.

Conclusion
As a consequence of this study, we can conclude that the model is suitable for estimating CO2 gas saturation distribution.This is demonstrated by the ensuing reconstruction loss value, which decreases as the number of epochs increases.In addition, the RMSE values measured varied between 0.4305 to 0.4530.To attain more accurate results, this model will need to be refined further.

Recommendations
According to the data analysis findings, the authors recommend that future research use actual field data as inputs in order to better depict actual conditions.The author suggests utilizing over 700 train data, 300 test data, and 4 epochs to assist the model better grasp the data.If the regularization weight is set to a value that is too high, the autoencoder may overfit the data and perform poorly on new data.And if the regularization weight is too low, the autoencoder may underfit the data and miss critical features.Therefore, according to this research, the author suggests a regularization weight of 10 -3 .

Figure 4 .
Figure 4. Workflow of data prediction and visualization addition to the amount of data and the number of epochs, the batch size during the training process can affect the training time of a DNN process.By processing multiple data at once, batching saves training time.The training process can be slower when the batch size is enormous.

Figure 10 . 3 In Figure 7 ,Figure 8 ,Figure 9 ,
Figure 10.The comparison of numerical simulation results, DNN prediction results, and error using epoch 4 and regularization weight 10 -3 In Figure 7, Figure 8, Figure 9, and Figure 10, for each of these pictures, three images are produced: numerical simulation results, DNN prediction outputs, and errors.In each image, the X axis represents distance, while the Y axis represents formation thickness.Furthermore, the range of saturation values is represented by a colour scale, with a value of 0 representing dark blue and a value of 1 representing yellow.

Figure 7 '
Figure 7's distribution visualization findings show that highest saturation highlighted in yellow on the left side.The saturation distribution then shifts to the right side, corresponding to the lower saturation value rising [19].Similarly, Figure 8's saturation distribution visualization findings show that the saturation distribution visualization findings show that the saturation distributed widens to the right side, with high saturation values on the opposite side.The anticipated model results in Figure 7 and Figure 8 seems similar at a first sight because the number of epochs used vary very slightly.Similarly, because the number of epochs varies slightly, the visual representation of the distribution in Figure 9 and Figure 10 typically appears comparable.

Figure 11 .Figure 11 '
Figure 11.The visualization of previous research prediction results [14] Figure 11's depicts the three categories of result images: the predicted result for 1.3 years, the expected result for 10.4 years, and the predicted result for 30 years.Each of these images sets has three output images: numerical simulation, CNN prediction results, and errors [19].

Figure 12 .
Figure 12.Reconstruction loss results on training and testing data with 3 epochs and a regularization weight of 10 -5

Figure 13 .
Figure 13.Reconstruction loss results on training and testing data with 4 epochs and a regularization weight of 10 -5

Figure 13 also
shows that the eval reconstruction loss is greater than the train reconstruction loss in the initial epoch.The first period demonstrates overfitting.

Figure 14 .
Figure 14.Reconstruction loss results on training and testing data with 3 epochs and a regularization weight of 10 -3Generally, the train loss and eval loss values of a model with three epochs decreases as the epoch number increases.The value of the train reconstruction loss decreases in line with the trend of its worth.The trend in the eval reconstruction loss value is comparable, with the loss value lowering.In addition, Figure14demonstrates that the eval reconstruction loss is larger than the train reconstruction loss in the initial epoch.The first period demonstrates overfitting[19].

Figure 15 .
Figure 15.Reconstruction loss results on training and testing data with 4 epochs and a regularization weight of 10 -3The model with four epochs has a much lower train and eval loss value as the number of epochs increases.The value of the train reconstruction loss is decreasing, according to the trend of the loss's worth.The eval reconstruction loss's value shows the same trend, with the loss value reducing.Figure15also shows that the eval reconstruction loss is greater than the train reconstruction loss in the initial epoch.The first period demonstrates overfitting.

Table 1 .
RMSE calculation results of the model