A case study of petrophysical prediction using machine learning integrated with interval inversion in a tight sand reservoir in Egypt

This study presents a new algorithm for reservoir characterization using borehole logging data, which integrates unsupervised machine learning techniques and interval inversion to automatically determine layers’ boundaries and petrophysical parameters. The research aims to reduce the time and manual input required for borehole inversion to estimate petrophysical parameters. The algorithm was used to predict different layer boundaries of sand-shale intercalations for both synthetic and field wireline log data. Field well logging measurements were obtained from an oil and gas field in Egypt, specifically the Jurassic reservoir. The reservoir is composed of a dense sandstone layer with significant heterogeneity due to diagenesis, which converts kaolinite into illite. The algorithm was used to predict petrophysical parameters, resulting in a decrease in porosity and permeability. The field data from the well reveals that the reservoir is made up of varying-quality sandstone, impacting storage capacity and hydrocarbon saturation. The algorithm demonstrates consistent convergence of the data at 7.5%. Overall, the integration of the new cluster technique and interval inversion can improve the time-intensive and laborious process of borehole data inversion to estimate petrophysical parameters.


Introduction
Borehole geophysics contains important information about the subsurface conditions at the location of the drilled wells.The evaluation of the wireline logs can be used to gain insights into the volume of rock constituents, storage capacity, and fluid saturation [1].The clustering techniques can be defined as a density estimation problem [2].In other words, the clustering problem aims to group the data points of similar characteristics into one group and different from the other groups.The similarity of data values can be measured using the distance between the data points; the closest data points are related to the same cluster while the farthest points belong to another cluster group [3].Portioning clustering such as k-means clustering relies on the initialization of a number of centroids and measures the distance between data points and each cluster then updates the location of these centroids until no change occurs [4].This kind of clustering has two common weaknesses; the first one is the initial location dependency, while the second one is the sensitivity to the outliers [5].To overcome these disadvantages, this research proposed the use of an iterative algorithm called Most Frequent Value 1295 (2024) 012008 IOP Publishing doi:10.1088/1755-1315/1295/1/012008 2 (MFV) for automatic refining of the calculated distances as well as updating the location of the centroids.The MFV algorithm has been used in many applications to guarantee the stability of the convergence in different optimization problems [6,7].The integration between the MFV and partitioning cluster produces the so-called MFV-clustering by which the exact obtained solution can result.
The stability of the optimization process is a function of several parameters such as the initial model condition, the shape and properties of the cost function, hyperparameter tuning, and data and parameter quantity [8,9].One approach to improve the stability and reliability of the borehole optimization problem is to analyze the sensitivity of various parameters.[10] presented the importance of sensitivity analysis of borehole geophysics for aquifer characterization.In that study, the sensitivity of several geophysical parameters such as electrical resistivity and borehole flow was investigated according to subsurface parameters such as porosity and permeability.The sensitivity study can be a good guide to choosing the parameters for the optimization process [11].
The traditional techniques used for the assessment process include using the cross-plot techniques for the determination of lithology, moreover, it needs a sequential workflow to turn the raw data into petrophysical parameters using multidimensional wireline log data sets [12].However, traditional techniques often suffer from limitations, sensitivity to noise, and cumulative errors.To overcome such limitations, well logs can be assessed in a numerical inverse problem called joint inversion [13].The joint inversion problem of good logging can be solved using the linearized inversion methods [14].Nevertheless, the linearized joint inversion can give a proper and quick solution in case there is a well understanding of the prior information [15].the prior information includes laboratory measurements of core data and layers' boundaries.Besides that, the traditional joint inversion of borehole geophysics has some limitations such as weak overdetermined ratio, non-uniqueness, and computational complexity.One approach to overcome these limitations is to use global optimization to solve the inverse problems [16].The key wireline logs used include gamma ray, resistivity, neutron-density, and sonic.These measurements respond to lithology, porosity, and pore fluid content in the reservoir.

MFV Clustering
The most frequent value method is based on the weighting of the calculated distance between data points and centroids where the weights are calculated automatically from an iterative algorithm [17].The weighted average is calculated with the symmetric weight function.In the first step, the dihesion (ԑ) can be calculated from the maximum and minimum of the data set.In the later iterations the Most Frequent Value (M) and ԑ can be updated from each other, where the optimal weights are automatically predicted from each other by using the following update equations: The MFV algorithm automatically tests the calculated distance and gives high weights to the points in the most frequent range and low weights to the points far from the most frequent range, the calculated distance called Steiner's distance.Furthermore.The centroid location update is based on the most frequent value not the arithmetic mean of each clustered data.Therefore, the Steiner weights can be used for calculating the robust Steiner's distance (  ) and the new centroids (   ) may be determined using Steiner's weights from the following equations: (3)

Interval Inversion
In borehole logging inversion, the petrophysical parameters can be optimized simultaneously using the so-called joint inversion.The interval inversion models the geophysical log response through rock physics relationships that link lithology, porosity, fluid saturation, and other parameters to the measured data.It solves the inverse problem by iteratively adjusting the parameters to match the observed logs, generating estimates of porosity, hydrocarbon saturation, shale volume, and other petrophysical properties from the geophysical measurements.The relationship between the petrophysical parameters and logging data can be presented in the form called response function.The linear inversion technique of borehole geophysics relies on solving an overdetermined inversion problem at each depth point in what we can call a point-by-point inversion.This kind of inversion has a limited overdetermination ratio which controls the number of unknowns that can be involved in the inverse problem.On the contrary, interval inversion can optimize the petrophysical parameters of certain depth intervals.This can be reached by simulating the change of each petrophysical parameter as a discretized series expansion of polynomial as follows: Where   () is the series expansion of i th parameters with a degree of q, and   is the basis function.
The detection of boundaries of the interval of interest are usually detected by using the Heaviside function as a basis function to express the petrophysical properties in sequential homogeneous layers.
Counting on that concept the model update equation will be able to optimize the series expansion coefficients that represent the change of model parameters in certain intervals as follows:  +1 =   + ((  )  (  ) + ) −1 (  )   (6)

The integration between MFV-cluster and interval inversion
The borehole geophysical inversion problem consists of two parts; the first part represents the geometrical identification of the promising interval, while the second part represents the petrophysical parameters prediction.The petrophysical parameters can be predicted using the linear interval inversion while the geometrical part needs more robust techniques which can solve the problem in a sequential order.This research presents a new integration between the MFV-clustering and interval inversion by which the geometrical and petrophysical parameters can be automatically predicted in one joint inversion method.The proposed workflow is shown in Figure (1).
The integration between the MFV-cluster and interval inversion can be used instead of using the Heaviside part.The cluster analysis arranges the data into groups of similar lithological characteristics.This prior information can be useful in the case of borehole geophysical inversion to choose the initial model as input for the inversion procedure, in other words, the examination of the cluster number log enables formation separation and provides a preliminary estimate for layer thicknesses.This research proposes simultaneous detection for the initial model boundaries and the petrophysical parameters, where the variance in lithology along a borehole is highly correlated with the log of clusters.Layer boundaries can be automatically read by computer processing from changes in the group number of clusters that show on the log.Input for the interval inversion technique comes from the estimated layer boundary coordinates from the MFV-clustering phase.This prior information allows to development of an initial model for each predefined interval.Pertaining to algorithms, the workflow shows that there are two loops the outer main loop represents the MFV clustering by which the different lithological intervals will be separated, and then, according to the decided interval boundaries the interval inversion will perform in the second inner loop.

Results
The proposed workflow was tested using a field dataset from a tight sand gas reservoir in the northwestern part of Egypt.The wireline logging dataset recorded the change of parameters of the Lower Safa member of the Khatatba Jurassic sequence.

Rock physics understanding
To understand the lithological characteristics and fluid effects of the reservoir, studies of the physical characteristics (features) of the rocks were carried out.Figure 2 shows rock physics analysis results used to characterize the lithology and fluid saturation of the reservoir.The left cross-plot shows the relationship between the shear and compressional slowness, which depicts data points following expected trends for sandstone, limestone, and dolomite, with gas saturation evident from lowered Vp/Vs ratios.The right cross-plot of Vp/Vs versus effective porosity reveals a clear cutoff value of 1.75 for Vp/Vs, differentiating sandstone reservoirs from non-sandstones.This rock physics analysis indicates the reservoir comprises primarily gas-saturated sandstones, along with some shale intervals where shale type impacts layer quality.The rock physics study concludes that the reservoir is saturated with gas and consists mainly of sandstones.However, the reservoir is imputed by shale intervals and the quality of the sandstone layers is controlled by the type of shale filling the pores.The tidal-influenced study of the core samples showed that the collected samples represented two different sandstones, one deposited in a fluvial sedimentary environment and one deposited in a tidal sedimentary environment.All this information is important to validate the interval detection and the predicted petrophysical parameters.

Sedimentological study
The sedimentological study of the collected core samples concluded that the core samples were acquired from different sandstone layers.The variety of the sedimentary structures refers to the high degree of heterogeneity of the reservoir.In addition to the sedimentary structures' investigation, the SEM and petrography analysis were performed to delineate the effect of the diagenesis processes.Furthermore, the quartz overgrowth was heavily covered by fibrous illite.Finally, there is a kaolinite transformation into illite.The presence of illite indicates that the permeability of the reservoir may be reduced regarding the amount of diagenesis.Figure (4) shows two types of pore space cementation that affect the reservoir quality: (a) and (b) -microcrystalline kaolinite aggregates (thin sections petrography), (c) and (d) -quartz overgrowth and fibrous illite (SEM photomicrograph).The pore spaces were filled by quartz overgrowth, Also, authigenic kaolinite filling inter-granular pore spaces.

The fully automated inversion problem
The MFV-clustering was able to estimate the differences between the data points.Besides that, the boundary decision that converts the results labels from the cluster into layers' boundaries was 2 meters.That means if the change in labels was in the range of 2 meters, count it at the same layer, while if the change in the labels was higher than 2 meters, the boundary will be laid.Figure (5) shows the fitting between the calculated and measured data, where the first track shows the rescaled depth to represent the thickness of the reservoir from 0 to 37 m, while the last track shows the resulted labels from the cluster loop with the boundaries between different intervals, while figure (6) shows the predicted petrophysical parameters.The proposed algorithm shows that the reservoir consists mainly of sandstone intervals with some shale sticks that can affect the vertical homogeneity of the reservoir.The first five meters of the reservoir represent a shale interval that reduced both the porosity and fluid saturation.The integration between MFV-clustering and interval inversion could isolate the shale layer 7 and perform a separate interval inversion regarding this shale layer to improve the fitting between the calculated and measured datasets.The stability of the fully automated inversion problem was tested using the data distance convergence cross-plot.The comparison between the inversion using the whole recorded interval as one layer and the cluster-defined inversion shows that the proposed interval is more stable regarding the convergence process.Besides that, the MFV-cluster-assisted interval inversion shows the stability of the convergence of the data distance at 7.5%, which is lower than that of the normal inversion that converged to 10%. Figure (7) shows the data distance convergence for both the MFV-cluster-assisted interval inversion and the conventional interval inversion.

Conclusion
This study demonstrates a new integrated workflow that utilizes unsupervised machine learning and interval inversion techniques for efficient reservoir characterization from wireline logging data.The approach combines Most Frequent Value clustering to automatically detect lithological layers from borehole measurements, coupled with an interval inversion that estimates petrophysical parameters within each layer.Application to data from a heterogeneous gas-bearing sandstone reservoir in Egypt showed the technique can rapidly characterize complex formations by leveraging wireline logs.The clustering improved convergence stability over standard methods.It revealed sequences of sandstone units with interbedded shales, consistent with rock physics analysis.Overall, the integration of data analytics and physics-based inversion extracts more value from existing borehole data to deliver an automated characterization solution where manual log analysis is time intensive.This has significant potential to rapidly characterize reservoirs where wells have been drilled but measurements remain under-utilized.The technique brings together machine learning and modeling workflows to enhance subsurface analysis.

Figure 1 .
Figure 1.The workflow of the interval inversion and MFV-cluster integration.

Figure 2 .
Figure 2. Rock physics study of the Lower Safa member.Vp-Vs cross plot shows the gas effect of the well(left), and the multi-well cross plot shows the cut-off value of sandstone (1.75) (right).

Figure 5 .
Figure 5. Fitting between measured (solid black lines) and calculated data (red dashed lines).

Figure 6 .
Figure 6.Petrophysical parameters prediction (red dashed lines are porosity, water saturation, and hydrocarbon volume from left to right, and the yellow dashed line is sand volume, while the blue dashed line is water volume).