Processing synthetic seabed logging (SBL) data using Gaussian Process regression

This paper presents a study on processing one dimensional (1D) synthetic seabed logging (SBL) data which were generated through Computer Simulation Technology (CST) software using Gaussian Process Regression (GPR). Seabed Logging (SBL) is an application of electromagnetic (EM) wave emitted from a controlled source to discover hydrocarbon-saturated layers beneath the seabed. In this paper, GPR is proposed as the processing tool to provide any additional information for SBL application. GPR is able to provide predicted mean values and uncertainty measurement in terms of ± standard deviation. The procedures of regressing Gaussian Process (GP) are described thoroughly in this paper. Squared exponential (SE) is chosen as the covariance function used in the GPR. SE covariance function is capable of producing smooth and infinitely differentiable of predicted functional estimates. Log-marginal likelihood is then optimized in order to infer the hyper-parameters involved in the SE covariance function. For model validation, mean square error (MSE) is calculated and observed to determine the reliability of the GPR model in the synthetic SBL data. This shall give an indication that GPR is an appropriate tool for processing nonlinear SBL data with uncertainty quantification and low MSE.


Introduction
Gaussian Process (GP) is a statistical model that can be exercised for Bayesian supervised learning. According to [1], there are two tasks involved in supervised learning which are regression and classification. However, this paper only focuses on Gaussian Process regression (GPR) since the estimation is for continuous dependent variable (magnitude of E-field). GPR puts directly a prior on the function space, f , without parameterizing the function. Due to that, GP can be fitted in many different forms of data without any parameter alteration as long as covariance function is defined to each type of data. Processing 1D synthetic SBL data using GPR model could provide beneficial information for computational and visualization of EM wave application.
SBL application is widely used in deep offshore hydrocarbon (HC) exploration since last decades. This frequency domain technique exploits EM wave to make reliable measurements of subsurface electrical resistivity underneath the seabed. Marine controlled-source electromagnetic (CSEM) technique has been considered as a necessary tool to interpret the acquired seismic imaging data in order to reduce the exploratory risk and ambiguities [2]. This is because seismic imaging technique fails to distinguish the fluid content in the potential reservoir whether it is hydrocarbon-or water-filled reservoir [3]. SBL survey generates huge amount of noisy data [4]. Thus, flexible model that is able to provide the uncertainty quantification is required in the data processing in order to handle the noise.

Literature Reviews
This section provides brief understanding on GPR, SBL application and Computer Simulation Technology (CST) software.

Gaussian Process Regression (GPR)
Basically, Gaussian Process regression (GPR) model is a distribution over function. This probabilistic and non-parametric model is very flexible [5]. GPR model has been known as a machine of probabilistic kernel and has been applied in many scientific and engineering application areas which comprises machine learning, geo-statistics, electronics etc. [6]. The main advantage of using GPR is, it can provide variance for predicted mean distribution as a description of uncertainty measurements. Over-fitting problem also can be avoided in the predictions since it follows Bayesian interpretation setting [6].
Theoretically, GPR is defined as a collection of random variables such that any of the variables is distributed by Gaussian distribution [7]. A GP is fully specified by mean function, ) (x m , and the covariance function, also is said to follow GP distribution. Detail explanations on GPR will be described in-depth in methodology section.

Seabed Logging (SBL) Application
Seabed Logging (SBL) application is usually employed in deep offshore environment for locating hydrocarbon reservoirs beneath the seabed using EM wave. This application assumes that hydrocarbon has higher electrical resistivity compared to its surrounding hosts [8]. Researcher in [9] stated that hydrocarbon has electrical resistivity of 30-500Ωm compared to seawater and sedimentary rocks which are 0.5-2Ωm and 1-2Ωm, respectively. SBL application has been successfully applied in various geological physical structures of oceans such as offshore of West Africa, Norway, Brazil and Gulf of Mexico [10]. Figure 1 shows the schematic diagram of SBL application during survey process. During SBL survey, EM wave is emitted from a mobile source or known as Horizontal Electrical Dipole (HED) transmitter and EM receivers record the returned EM signals [11]. HED transmitter is usually elevated at 30-40m above the seabed. According to [12], in order to generate higher wavelength that plays role as the distance function, low frequency EM wave (0.01-10Hz) is exercised in the survey. Conceptually, electric field (E-field) of EM energy is slowly attenuated in high resistive medium due to skin depth effect compared to high conductive medium (low resistivity).

Computer Simulation Technology (CST) Software
Well propagation of EM signal can provide good information about the structure of the subsurface beneath the seabed. [13] mentioned that Maxwell's equation is required to predict the propagation of low frequency EM wave. CST software is able to discretize the Maxwell's equation in order to probe the resistivity contrast [14]. CST EM Studio uses finite-integration technique (FIT) to solve the equations in finite calculation domain in grid cell [15]. It can be in hexahedral or tetrahedral mesh elements. Researchers in [16] used mesh element of tetrahedral in CSEM modeling.

Methodology
This section explains the methodologies involved in this study which includes the procedures of the GPR.

SBL Modeling and Synthetic Data Acquisition
Synthetic SBL data are generated through Computer Simulation Technology (CST) software. The simulation model is designed in rectangular prism where it consists of air, seawater, sediment (overburden and under burden layers) and hydrocarbon. The overburden thickness of the SBL model is fixed at 500m. 270m long transmitter is positioned at the center of the model at 30m above the seabed. Frequency of EM wave used in this study is 0.125Hz. The volume of the SBL model is 10×10×5 km 3 and for the hydrocarbon is 10×5×0.2 km 3 . Every layers involved in the simulation model have unique permittivity, electrical conductivity and thermal conductivity. Specific physical properties for each layer are followed as in [4]. Figure 2 shows the simulation model of SBL application.

Gaussian Process Regression (GPR)
Researchers in [5] mentioned that basic GPR used a zero mean function as generalization purposes. Zero mean prior does not imply zero mean in posterior distribution. Hence, GP distribution on function, f , can be written as K is the n n u covariance matrix with elements of covariance function in training inputs.
The smoothness of GPR model is influenced by the correlation behavior in the covariance function [7], and the reliability of GPR model relies on the performances of the covariance function. Therefore, squared exponential (SE) is selected as the covariance function as it is a popular choice among other covariance functions in kernel machine fields [1]. The SE covariance function can be defined as

V T
, must always be positive and must be sensibly chosen in order to get the best correlation between data points. This corresponds to the optimization of the log-marginal likelihood function. The negative log-marginal likelihood [17] is defined as . GPR is able to predict expected output at any desired input or also called as test set input,   (6) This posterior distribution is exploited in order to predict the corresponding test output [18] which are the estimated mean, * m , and the estimated variance, * k . Both equations are essential for GPR prediction.

Calculating Mean Square Error (MSE)
Mean square error (MSE) is one of many prediction error tools to quantify the differences between the true values and the predicted values. The differences are observed to evaluate the performance of the prediction. According to [9], the measured error is the amount by which the estimator, i y differs from the quantity that is estimated, *

Results and Discussions
GPR was fitted by using Gaussian Process Machine Learning (GPML) toolbox in MATLAB code written by [19]. First five data, which consist four training set and one test input, were selected for the purpose of the GP regression. The steps to estimate the target test output (log10 of magnitude of E-field), * y at the test input (offset), 26 . 2399 * x , are elaborated thoroughly as below. At first, the hyper-parameters were chosen randomly from the prior distribution. Equation (3) was numerically minimized using the initial hyper-parameters, and an estimate of hyper-parameters was obtained from the process. This process is called as optimization. The optimization was repeated and the estimate with the smallest negative log-marginal likelihood was selected as the optimal estimate of the hyper-parameters. The optimal hyper-parameters used in this demonstration; 63671909 . 6 f V and 01914 . 12309 l . Then, matrix (1) was constructed using equation (2 (7) and (8) were calculated. In order to determine the reliability of the GPR model, the estimated target test output was then validated by calculating the MSE, as equation (9), between the true output value at 26 . 2399 * x and the predicted value. The calculated MSE is 1.4086E-13. This implies that the model has a very small error in the prediction and the calculated MSE is highly accepted since it is approaching to zero. However, it should be noted that the MSE was calculated for only one data point. Thus, MSE for higher number of data is discussed later. Figures below show the plots for fitted GPR model on the dataset. Since the error bar is hardly to be seen, this paper provides the figure in zoom-in scale to clearly see the 95% confidence interval at the test data. In figures above, the x-axis represents the offset, and the y-axis represents the magnitude of E-field in log10 scale. The purpose of using log10 scale is to make the data interpretable. Based on Figure 3, the round symbols denote the training data and the zoomed-in cross symbol denotes the test observation,  Figure 4 designates ± V 2 , which represents the uncertainty quantification for the test data. Based on Figure 4, the actual point is still within the vertical line (95% confidence interval) of the predicted mean. This implies that the prediction is fit well in the dataset. This paper also provides the GPR for 1199 data to determine whether GPR can fit in huge amount of data or not. The dataset consisted 766 training set and 433 test set data based on suggestion by [20] where about one-third of the dataset should be considered as the test data. The hyper-parameters were inferred by multivariate optimization algorithm as elaborated before. Figure 5 shows the GPR model which was fitted to the huge number of data. The grey-colored region designates ± V 2 (95% confidence interval) for the predicted test data. The MSE between the 433 actual and predicted test data was calculated, and the calculated MSE is 2.7044E-05. This information give a good implication that GPR model also can fit well in huge amount of SBL data with 95% confidence interval.

Conclusions
This paper reveals that Gaussian Process regression (GPR) can fit well in 1D synthetic SBL data with low MSE. The outputs are also predictable with uncertainty quantification which is 95% confidence interval. Processing the 1D SBL data by using GPR must be taken into consideration as future work. GPR has a good potential as data processing tool to provide beneficial hydrocarbon information especially to de-risk the hydrocarbon exploration for SBL application.