Rockburst classification based on cross reconstruction learning under small-sample condition

Rockburst is a prevalent geological hazard in deep geotechnical engineering, and its accurate prognostication is vital for prevention measures. Consequently, this research proffers a pioneering classification prediction methodology, namely Cross Reconstruction Learning (CR), underpinned by conventional machine learning algorithms and metric learning strategies. Initially, this technique partitions and restructures the original dataset, where each sample feature intersects and reconfigures with features from other samples within the set. During this amalgamation, new samples are assigned labels based on the degree of divergence or congruity between two sets of sample labels, thereby forming a new set of samples. Subsequently, an array of machine learning algorithms is utilized to train and test this new dataset. Ultimately, employing a universal class voting mechanism and decoding test set results through probability assignment, the predicted labels are converted back into rock burst outcomes, thereby determining the final prediction classification. The proposed model was trained on a database encompassing 239 instance samples, and its performance was validated against the currently proficient models (KNN, XGBoost, and Random Forest algorithms) employed in rock burst prediction. The outcome revealed a decline in the performance metrics of all three machine learning algorithms when interfaced with the Cross Reconstruction learning method, particularly the KNN algorithm, owing to the doubled feature dimensions in the combined dataset. However, the metrics of ensemble models, XGBoost and Random Forest, exhibited a notable improvement compared to the original classification models. On comparing multiple performance metrics, it was discovered that the CR-XGBoost model outperformed others across all evaluations, thereby offering significant guidance for practical engineering applications.


Introduction
Rockburst is characterized as a dynamic instability catastrophe wherein, under substantial geostress conditions in subterranean engineering, rocks abruptly discharge stored elastic strain energy due to excavation or alternative load disturbances, culminating in rock exfoliation, fragmentation, and ejection [1].Incidences of rockburst have been documented extensively in numerous lead-zinc mines across China.For instance, with the escalation in mining depth at the Huize Lead-Zinc Mine, geostressrelated calamities such as rockbursts have proliferated, inflicting conspicuous equipment damage [2].During the deep stope backfilling procedure at the Fankou Lead-Zinc Mine, the rock mass under significant geostress was perturbed by blasting, thereby exacerbating rockburst manifestations and gravely impairing operational safety [3].As shallow mineral resources are being depleted, deep mining is progressively emerging as the mainstay in mineral resource exploitation [4], thereby imposing severe challenges on rockburst mitigation [5].Therefore, accomplishing precise predictions of rockbursts becomes indispensable for the advancement of future subterranean spatial engineering.
At present, numerous scholars have employed machine learning methodologies for rockburst prediction.For instance, Zhou et al. [6] constructed a database rooted in 132 rockburst instance data and utilized the Particle Swarm Optimization (PSO) algorithm to expedite the optimization velocity of SVM algorithm hyperparameters, discerning that the hybrid model possessed robust performance.X. Yin et al. [7] harnessed data mining techniques to attenuate dimensions, identify outliers, and replace outliers in the gathered 246 sets of rockburst data.Subsequently, they employed the Stacking ensemble method for the amalgamation and comparative analysis of disparate models.The outcomes indicated that the Stacking ensemble method possesses distinct benefits when engaging with unbalanced data.The accuracy and dependability of rockburst prediction methodologies predicated on machine learning predominantly hinge on the quantity and quality of rockburst instances [8].However, the majority of existing research is grounded in one to two hundred data sets, thereby generating less reliable prediction models.Given that the number of rockburst instances is unlikely to experience a significant surge in the short term, augmenting the prediction accuracy of models based on the existing modest volume of data emerges as an urgent issue in contemporary research.
To mitigate the aforementioned challenge, this investigation introduces a unique prediction methodology -Cross Reconstruction Learning (CR), which is rooted in metric learning strategies [9] and conventional machine learning algorithms.This investigation deploys models (KNN, XGBoost, and Random Forest algorithms) that have evidenced efficacy in prior rockburst prediction research [10][11][12], integrating them with the proposed Cross Reconstruction technique to fabricate a novel hybrid prediction model.Assessment and comparison of assorted prediction schemes suggest that the proposed Cross Reconstruction strategy efficaciously amplifies the predictive precision of the model.

Dataset description
Predicated on the rockburst instances gathered by J. Zhou et al. [26], discrepancies and missing data were rectified and augmented, culminating in a total of 239 comprehensive rockburst cases.These

Cross reconstruction learning
Cross reconstruction learning method.The principle is shown in Figure 2, with detailed introduction as follows: (  (3) Decoding of predicted class labels in the combined test set The samples from the combined test set and their corresponding prediction outcomes are merged to constitute the query test set.For a sample in the original test set, all samples that serve as the ( 2 , 2 , 2 , 2 , 2 , 12 , 22 ) component in the query test set are retrieved.Subse quently, the corresponding categories in the original training set are tallied for each sample ( 1 , 1 , 1 , 1 , 1 , 11 , 21 ) when the predicted class Y=1.The frequencies of each class are computed using formulas (1)~( 2), and the class with the highest frequency is selected as the r ockburst classification outcome for the test set sample.This process is iterated until all predict ion outcomes for the original test set are procured.
=  ( 0 , 1 , 2 , 3 ) (2) where,   is the probability of class ;  is the frequency of class appearing in the new samples where( 1 ,  1 ,  1 ,  1 ,  1 ,  11 ,  21 ) part has a prediction result of 1; k is the class label; Y is the predicted class;  is the quantity of each class sample in the original training set after undersampling;   is the rock burst prediction class for sample x;   is the rockburst class corresponding to the maximum value among  0 ~3 .

Model assessment indicators
To ensure objectivity in the ultimate prediction result comparison, all models predict categories from the original test set, which remains entirely independent throughout the model training procedures.Several evaluation metrics are selected for comparative assessment, including Accuracy, Precision, Recall, and F1-score.Given the existence of sample imbalances among various categories in the original test set, Precision, Recall, and F1 scores are evaluated using the Macro-average method to appraise prediction outcomes.

Results and discussion
The results of each evaluation metric are delineated in Table 1.It is discernible that, upon the implementation of the cross-reconstruction learning methodology, the predictive performance of the KNN model has experienced a minor decline.The cause for this can be attributed to the fact that subsequent to the construction of the combined training set, the feature dimensionality has been amplified twofold, engendering a larger distance between each sample in the high-dimensional space.This is not conducive to the KNN model's training and prediction of test set class labels through the measurement of distance between samples.The values of each evaluation metric for the ensemble models XGBoost and RF have seen considerable augmentation, signifying that the cross-reconstruction learning methodology has performed more proficiently for models predicated on decision tree ensembles, thereby validating the efficacy of the data combination methodology in enhancing model accuracy.Among them, the data combination classification prediction scheme predicated on the XGBoost model performed optimally in all metric evaluations (ACC=0.81;PREM=0.86;RECM=0.75;F1M=0.78).

Conclusions
This manuscript proposes a classification prediction methodology-Cross-Reconstruction Learningthat effectively augments the prediction precision of models dealing with datasets possessing small sample sizes.Through the comparison and analysis of prediction outcomes from various model prediction schemes, it was observed that the utilization of the Cross Reconstruction Learning methodology resulted in a performance decrement in the KNN model due to an amplification in feature dimensions.Conversely, the ensemble models XGBoost and RF demonstrated significant enhancements in all prediction assessment results, thereby validating the stability and efficacy of the proposed Cross Reconstruction Learning methodology.Furthermore, following a multi-metric comparison, the hybrid CR-XGBoost model exhibits the most favorable prediction outcomes (ACC=0.81;PREM=0.86;RECM=0.75;F1M=0.78),thereby providing invaluable guidance for practical rockburst prediction in engineering undertakings.
served to construct a rockburst class prediction database.The collected rockburst instances are classified into four classes -none rockburst (39 cases), light rockburst (71 cases), moderate rockburst (97 cases), and strong rockburst (32 cases) -contingent upon the severity of the occurrence and the destruction characteristics.The dataset also manifests a class imbalance phenomenon.The initial database is partitioned into a training set (comprising 80% of the samples) and a test set (encompassing 20% of the samples) for model instruction and performance validation.To address the issue of class sample imbalance, stratified sampling techniques are employed to segregate the original database.A semi-violin plot is utilized to analyze each feature variable in the training and testing sets, as illustrated in Figure 1.The figure manifests the scatter distribution, kernel density distribution, and additional information of each feature variable in the training set and test set separated from the original database.Notably, the data distribution disparities of each feature variable in the segregated training and test sets are comparatively minor.Considering that anomalies can transpire in actual operational conditions, this paper utilizes data inclusive of outliers for model training and prediction.

Figure 1 .
Figure 1.Comparative distribution of each feature variable in the training and test Sets.

1 )
Construct combined training set Firstly, the training set is duplicated, and the feature variable names in the original traini ng set and the duplicated training set are modified to ( 1 , 1 , 1 , 1 , 1 , 11 , 21 )and( 2 , 2 ,  2 , 2 , 2 , 12 , 22 ), respectively.The samples in the original training set and the duplicated training set undergo cross-reconstruction, with the class encoding of the new samples determin ed based on the similarity or discrepancy in the amalgamated sample categories, if the two sa mple classes belong to the same class, the new sample class label is designated as 1, whereasif the two sample classes pertain to distinct categories, the label is set as 0. Consequently, e ach new sample data comprises ( 1 ,  1 ,  1 ,  1 ,  1 ,  11 ,  21 ,  2 ,  2 ,  2 ,  2 ,  2 ,  12 ,  22 ,Y), w here Y can assume a value of 0 or 1.Given that the research is a multi-classification proble m, the quantity of new samples with class 0 will considerably surpass the quantity of new sa mples with class 1. Random undersampling techniques are employed to randomly eliminate sa

Figure 2 .
Figure 2. Schematic diagram of the principle of cross-reconstruction learning.
mples from the larger class.The residual new samples collectively formulate the combined trai ning set.(2)Construct combined test set.Initially, the feature variable names of the original test set are altered to ( 2 ,  2 ,  2 ,  2 ,  2 , 12 , 22 ).If the original training set exhibits class imbalance whilst constructing the com bined test set, it can influence subsequent decoding work.Therefore, it is requisite to apply ra ndom undersampling to the original training set to ensure an equitable quantity of samples acr oss all categories.Subsequently, the processed original training set and original test set sample s are cross-reconstructed individually.The new samples thus generated collectively constitute t he combined test set, and the class labels of the new samples are sourced from ensuing mode l predictions.

Table 1 .
Performance evaluation of models.