Surface and Underwater Acoustic Source Recognition Using Array Feature Extraction Based on Machine Learning

Sound source recognition is an important application of passive sonar. How to distinguish between surface and underwater sound sources is always a difficult problem. In order to solve the problem of S/U sound source identification, this paper proposes a machine learning method based on array feature extraction, which has certain innovative value. Firstly, according to the experimental environment of SACLANT 1993, simulation data is generated based on KRAKEN. Secondly, the simulation data and experimental data are used to extract the array features. Thirdly, the accuracy, recall rate, F1 and accuracy of GBDT classifier in three different frequency bands are evaluated. The results show that the training model established by using the array feature extraction method can effectively solve the problem of poor accuracy of some channels in the single channel classification, and obtain good experimental accuracy. Finally, the experimental accuracy of the three bandwidths is 0.9920, 0.9857 and 0.9713, respectively.


Introduction
With the rapid development of underwater operation technology, underwater target detection technology has attracted the attention of many countries in the world.As an important part, the distinction between surface and underwater acoustic source still faces many challenges.
There are many methods of surface and underwater target identification, but they have not reached the expected effect.The pattern capture based approach is one of the most classic of these approaches [1].The method is based on identifying the overall difference in the spectral shape of the patterns measured between the surface and underwater source classes caused by pattern capture.
Some work [1~3] has implemented the binary classification problem.Premus et al. [2] proposed a matching subspace method in 2007, which is suitable for depth recognition in shallow water waveguides.In 2013, Premus et al. [2] used horizontal linear arrays (HLA) to identify sound sources with depths of 9 m and 60 m.Yang [3] proposed a data-based matching pattern source location method in 2014.
In recent years, many fields have adopted machine learning to solve related problems [4,5].Machine learning is much more efficient than traditional methods.In machine learning, surface and underwater discrimination is equivalent to 0/1 classification; It assigns labels 0/1 based on the characteristics of the input values.In 2022, Wen Zhang et al. [6] used only one hydrophone to distinguish S/U sound sources, based on three supervised ML models: K-nearest neighbor (kNN), random subspace kNN (RS-kNN), and ResNet-18.
This paper introduces the method of combination of array signal processing and machine learning, which has certain innovation.Array signal processing is a common processing method in traditional underwater acoustic problems, and the detection performance of many models has been improved to a certain extent after the introduction of array signal processing.
Due to the limited ocean observation data, this paper uses KRAKEN [7] to generate training data, and the test data is the actual sea test data.This paper adopts a classical machine learning method, Gradient Boosting Decision Tree (GBDT) [8].
This paper consists of four parts.Section 2 introduces an ML classifier (GBDT) and VLA signal processing.At the same time, simulation is established based on the experimental environment and simulation data is generated.In Section 3, the GBDT model is trained based on simulation data, and then the trained model is used to analyze the experimental data under different bandwidths.Finally, in the fourth part of the summary and discussion.

GBDT
GBDT stands for Gradient Enhanced Decision Tree.The idea of GBDT binary classification algorithm is to use a series of gradient lifting trees to fit this logarithmic probability, and its classification model can be expressed as: where,  is the input,  = 1 represents an inverse class,   () is the final strong learner expression, ( = 1|) is the probability of  = 1 when inputting sample .

Array Feature Extraction
In array feature extraction, the covariance matrix is a very important piece of content.Each element of the covariance matrix is the covariance between each vector element, which is a natural generalization from a scalar random variable to a high-dimensional random vector.
Set  = ( 1 ,  2 , … ,   )  as an n-dimensional random signal, the matrix C is the covariance matrix of n-dimensional random signal and the   is the covariance of two random signal.

𝐶 = (𝑐
The covariance matrix represents the pairwise linear correlation between a group of random signals [9].In this paper, the covariance between different channels is calculated and the covariance is extracted as a feature.

Data preprocessing 2.3.1
The experimental information of SACLANT 1993.SACLANT Centre carried out an experiment in the shallow water area north of Elba Island on October 26 and 27, 1993 [10,11].The sound velocity distribution and geometric structure of the experiment are shown in Figure 1.The environment information is shown in Table 1.
On October 26th, the transmitted signal was pseudorandom noise (PRN) with a frequency band from 300 to 350 Hz.On October 27th, a mobile underwater source with a depth of approximately 69 meters was deployed from a moving ship.The underwater sound source emits an acoustic signal with a central frequency of about 170 Hz for 30 seconds, then stops for 30 seconds and repeats 10 times.The surface ship is driving.
In general, the typical draft of a shallow-water vessel does not exceed 20 meters.Therefore, this paper [12,13] uses 30m as the critical depth, and divides all sound sources into two categories: surface and underwater.interval is 1m, the range is 1~90m, and there are 90 discrete points.Therefore, the number of all samples is 2790 (=31 × 90), 930 (=31 × 30) surface target space samples and 1860 (=31 × 60) underwater target samples.The 930 data samples from the surface signal source are labeled as tag 0. The 1860 underwater signal source data are labeled as tag 1.The sound signal has a wide frequency band, with an interval of 0.5 Hz.As mentioned above, there are three types of data which are 20~72 Hz, 150~210 Hz and 300~350 Hz.Therefore, for the three types of data mentioned above, the characteristic number are 105, 121 and 101, respectively.
The simulation data is the spectrum, in dB.Finally, in order to reduce the adverse impact brought the uniqueness of sample data, the simulated data is normalized to [-1,1], line by line.

2.3.3
The experimental data.In the experiment on Oct.26, the number of underwater data sampling points is 301056, the sampling rate is 1000Hz.The length of time is 301.056seconds [13].Taking 2000 points as samples and segmenting the experimental data, with 1800 overlapping points.For the underwater target, the surface sample experimental data is a 1496 × 2000 matrix.Performing Fourier transform on the matrix by row.Only frequency points within the [300,350] Hz range are retained.The ultimate sample is a 1496 × 101 matrix.Normalize all data line by line to [-1,1], mark all samples as 1.In the experiment on Oct.27, the number of surface and underwater data sampling points is 602056, the sampling rate is 1000Hz.The length of time is 602.056seconds [13].After the same processing as the simulation data, the experimental data of the sample is 3001 × 2000 matrix.Performing Fourier transform on the matrix by row.Only frequency points within the [20,72] Hz range are retained.The ultimate sample is a 3001 × 105 matrix.Normalize all data line by line to [-1,1], mark all samples as 0. For the underwater target, the underwater sample experimental data is a 3001 × 2000 matrix.Performing Fourier transform on the matrix by row, and only reserving frequency points within the [150,210] Hz range are retained.The ultimate sample is a 3001 × 121 matrix.Normalize all data by row to [-1,1], mark all samples as 1.

Feature extraction.
Then, based on different feature extraction methods of the array signal, the covariance matrix between different channels is obtained.That is, the total size of the training set for the covariance matrix of all 48 channels is 2790 × 1128.On October 26, the total size of the covariance matrix experiment set was 1496 × 1128.On October 27, the total size of the covariance matrix experiment set was 3001 × 1128.

Results and Analysis
In this section, the GBDT model is used to analyze VLA with a bandwidth.Section 3.1 shows the results of single channel classification of surface and underwater targets, and sections 3.2~3.4show the results of multi-channel combination classification.

The results of single channel
The verification results of simulation data based on single channel with a bandwidth 20~72 Hz (precision, recall, F1) are shown in Figure 3a.Single-channel simulation accuracy and experimental accuracy are shown in Figure 3b [13].The verification results of simulation data based on single channel with a bandwidth 150~210 Hz (precision, recall, F1) are shown in Figure 4a.Single-channel simulation accuracy and experimental accuracy are shown in Figure 4b [13].
As can be seen from Figure 3 and Figure 4, although some channels have high accuracy, there are still a few channels with poor accuracy or even close to 0. Simulation accuracy and experimental accuracy for all 48 single channels

The results of 20~72 Hz
In this section, the hyperparameters of the GBDT model are as follows: step size is 0.1, maximum number of iterations is 190, maximum depth of the tree is 16, minimum sample number of leaf nodes is 6, internal node subdivision score is 19, and maximum number of features is divided by 2. The parameters set in sections 3.3 and 3.4 are the same as in sections 3.2.The precision, recall, F1, accuracy of the simulation covariance matrix and accuracy of experimental covariance matrix with a bandwidth 20~72 Hz are 0.9839, 0.9946, 0.9892, 0.9928 and 0.9920, as shown in Table 2. From Table 2, it can be seen that the trained model has achieved good results in accuracy and recall, and the experimental accuracy is also very high.The Confusion matrix diagram is shown in Figure 5a.The ROC diagram is shown in Figure 5b, where labels 0 and 1 represent S/U sources, respectively.Among them, 99.5% of surface sources are correctly classified as surface sources, and 0.5% of surface sources are misclassified as underwater sources.98.4% of underwater sources are correctly classified as underwater sources, and 1.6% of underwater sources are misclassified as surface sources.The AUC of the model is 0.989, which means that it has a good classification effect.

The results of 150~210Hz
The precision, recall, F1, accuracy of the simulation covariance matrix and accuracy of experimental covariance matrix with a bandwidth 150~210 Hz are 0.9847, 1, 0.9923, 0.9946 and 0.9857, as shown in Table 2.The Confusion matrix diagram is shown in Figure 6a.The ROC diagram is shown in Figure 6b.Among them, 100% of surface sources are correctly classified as surface sources, and 0% of surface sources are misclassified.98.4% of underwater sources are correctly classified as underwater sources, and 1.6% of underwater sources are misclassified.The AUC of the model is 0.992.

The results of 300~350Hz
The precision, recall, F1, accuracy of the simulation covariance matrix and accuracy of experimental covariance matrix with a bandwidth 300~350 Hz are 1, 0.9895, 0.9947, 0.9964 and 0.9713, as shown in Table 2.The Confusion matrix diagram is shown in Figure 7a.The ROC diagram is shown in Figure 7b.Among them, 98.9% of surface sources are correctly classified as surface sources, and 1.1% of surface sources are misclassified.100% of underwater sources are correctly classified as underwater sources, and 0% of underwater sources are misclassified.The AUC of the model is 0.995.

Analysis
Section 3.1 shows the classification effect of single channels, and it is found that the accuracy of some channels is poor.Sections 3.2 to 3.4 show the results of the combination of three different bandwidth channels, as shown in Table 2.The experimental accuracy under three different bandwidths is 0.9920, 0.9857, and 0.9713, respectively.Compared with the single channel results shown in Section 3.1, the array feature extraction method overcomes the problem of poor accuracy of some channels in the single channel results and achieves stable classification results.The model trained by the array feature extraction method can obtain better simulation and experimental accuracy.This proves that the method is effective in improving the classification accuracy.

Conclusions
A machine learning method for S/U sound source recognition based on array feature extraction is proposed.Through the array feature extraction method, the original frequency band feature is changed into the covariance feature between different channels, and the multi-channel data is effectively used for processing, and good experimental results are obtained.The problem of poor accuracy of some channels in the single channel results is overcome by the method of array feature extraction, and a stable classification effect is obtained.Finally, the experimental accuracy of the three bandwidths is 0.9920, 0.9857 and 0.9713, respectively.In the next step, the influence of noise on classification will be considered to improve the experimental accuracy.

Figure 3 .
(a) The verification results (precision, recall, and F1) of the simulation data with a bandwidth 20~72 Hz (b) Simulation accuracy and experimental accuracy for all 48 single channels (a) (b) Figure 4. (a) The verification results of the simulation data with a bandwidth 150~210 Hz (b)

Figure 5 .
(a) The confusion matrix diagram of the model (b) The ROC diagram of the model.

Figure 6 .
(a) The confusion matrix diagram of the model (b) The ROC diagram of the model.

Figure 7 .
(a) The confusion matrix diagram of the model (b) The ROC diagram of the model.

Table 1 .
The environment information of SACLANT 1993.

Table 2 .
The results (precision, recall, F1 score and accuracy) of simulation and experimental data of the final model with different bandwidths.