Research on Feature Extraction of Underwater Acoustic Target Radiation Noise Based on Machine Learning Algorithm

Underwater acoustic target recognition is a very important technology in the field of underwater acoustics, with great economic and military value. Feature extraction technology for underwater acoustic target radiation noise signals is the key to achieving acoustic target recognition. This study aims at the feature extraction task of acoustic targets and extracts 10 types of 252-dimensional feature vectors from three domains: time domain, frequency domain, and auditory domain. Through 7 machine learning algorithms for classification and recognition experiments, the experimental results show that the recognition performance of the ensemble classifier is much better than that of a single classifier. For different types of features, this study combines three ensemble learning algorithms and feature selection algorithms to select the original 252-dimensional features. The feature selection experiment shows that the wrapper feature selection algorithm has the best effect, and the feature vector dimension can be reduced to 40 dimensions. The recognition accuracy rate is not less than 92.8%, which provides feature extraction guidance for acoustic target recognition based on feature extraction.


Introduction
Underwater acoustic target recognition is a pattern recognition technique that uses the target's radiated noise characteristics to identify its attributes and has great economic and military value.To effectively counter the aggression of hostile forces, safeguard our country's maritime development rights and interests, and firmly defend national sovereignty, security, and development interests, we must vigorously develop naval equipment and technology and enhance the survival and combat capabilities of our navy ships in the confrontation with hostile military forces.Therefore, Underwater acoustic target recognition technology has become a research focus for scientific and technological workers.At present, the main technical methods in the field of underwater acoustic target recognition include two types: one is to calculate the physical parameters of the target from its acoustic characteristics and match them with the target's physical parameter database; the other is to directly use the target acoustic characteristics and the acoustic characteristic database for pattern matching.Limited by the complex underwater acoustic channel transmission environment and the low signal-to-noise ratio of the target acoustic signal, the recognition effect of the first method is poor, and it cannot effectively identify individual targets.Therefore, the pattern recognition method using signal processing technology and acoustic feature extraction technology has become the mainstream method of underwater acoustic target recognition.The purpose of signal processing and acoustic feature extraction is to mine useful information that can distinguish different targets from the target radiated noise signal.The purpose of pattern recognition is to establish a mapping relationship model between the target acoustic characteristics and the database target category attributes and design a suitable classifier to realize underwater acoustic target type recognition.
Researchers have conducted extensive studies on extracting feature information that is conducive to distinguishing target attributes from underwater acoustic target radiated noise.In [1], a 9-dimensional feature vector is constructed, containing zero-crossing wavelength, peak-to-peak amplitude, zerocrossing wavelength difference, and wave train area, which achieves a recognition rate of 89.5% on the test data, through a support vector machine (SVM) with a radial basis function (RBF) as the kernel function.In [2], the line spectrum feature and the average power spectrum feature of the underwater acoustic target radiated noise is extracted by using the Fourier transform.In [3], a high-order cumulant feature extraction method is proposed based on the Hilbert-Huang transform, which first performs the Hilbert-Huang transform on the target signal, and then extracts high-order cumulant features from the obtained intrinsic mode functions.Kang et al. [4] proposed a sparse representation classification method for underwater acoustic targets based on passive sonar signals.They first extracted the power spectrum features of the target radiated signals, then constructed a sparse representation of the complete sub-library by using the power spectrum features of samples with known target types, realized the sparse representation of the power spectrum features of unknown type targets, and performed target classification based on this.The experimental results show that based on extracting power spectrum features, the classification method based on sparse decomposition is better than traditional classifiers such as nearest neighbor and support vector machine.In [5], we verified that t-SNE has a good dimensionality reduction effect based on extracting spectral features, and the effect is best when the frequency range of target radiated noise is within 10-150 Hz.In [6], a restricted Boltzmann machine (RBM) is used to perform automatic encoding and decoding of ship radiated noise data's power spectrum and demodulation spectrum, realizing data augmentation of ship radiated noise data, designing automatic encoding feature extraction based on power spectrum and demodulation spectrum, and using BP neural network to realize target classification.The experimental results show that the feature information based on the fusion of the power spectrum and modulation spectrum can achieve better recognition accuracy than using the power spectrum alone, and the model after data augmentation is also better than the model without data augmentation.In [7], the Gammatone filter is used to realize underwater acoustic target signal feature extraction and 19 machine-learning classifiers are used to classify 16 underwater acoustic targets.The experimental results show that the Gammatone frequency cepstral coefficient (GFCC) has better classification performance than MFCC, AR, and zero-crossing features.
This paper combines the existing research and extracts the features of underwater acoustic target radiated noise signals from three domains: time domain, frequency domain, and auditory domain.It also studies the feature importance of different types of signal features by using the current hot machine learning algorithms and provides feature extraction guidance for underwater acoustic target classification and identification.

Introduction to underwater acoustic target recognition dataset
The dataset used in this paper is a hybrid underwater acoustic target recognition dataset composed of the ShipsEar dataset and simulated submarine data.The ShipsEar dataset records the radiated noise of different types of vessels, including fishing boats, ro-ro ships, ocean liners, pilot boats, motorboats, other 11 categories of targets, and one category of underwater background noise.The data sampling device uses a 24-bit DigitalHyd SR-1 recorder manufactured by MarSensing Lda (Faro, Portugal) with a sampling frequency of 52, 734 Hz.The submarine dataset is a simulated dataset with a sampling frequency of 2, 500 Hz. Figure 1 shows the duration statistics of the ShipsEar dataset.2, with the horizontal axis representing time and the vertical axis representing normalized amplitude.

Feature extraction techniques of underwater acoustic target
Feature extraction is the process of extracting feature parameters that can characterize the target information from the target radiation noise signal while eliminating as much as possible the information that is irrelevant to the identification, ensuring that the extracted feature parameters satisfy the "intra-class invariance" and "inter-class discrimination".The feature parameters remain invariant in the same category of targets and have discrimination for different categories of targets.The process of classifier design is to use the extracted feature information to establish the mapping relationship between the target category and the feature vector, thus completing the final recognition task.This paper performs feature extraction on the target radiation noise from the time domain, frequency domain, and auditory domain.

Waveform feature of time domain.
The time-domain features of target radiated noise are among the earliest features used to identify target attributes.These features capture the waveform information of target radiated noise from different aspects.The common time-domain features include zerocrossing rate, peak-to-peak amplitude distribution, wavelength difference distribution, and wave train area.Due to the interference of complex ocean ambient noise, the recognition performance of timedomain features is unsatisfactory.This paper extracts 11-dimensional time-domain feature parameters proposed in [8] as the time-domain features for underwater acoustic target classification and recognition.The calculation method of the 11-dimensional feature parameters is shown in Table 2, where N denotes the number of samples of the target signal.The sources of acoustic target radiated noise mainly consist of three aspects: the first one is the mechanical noise produced by the vibration of the internal mechanical structure of the target, which mainly includes low-frequency line spectrum and broadband spectrum; the second one is the cavitation noise and rotational noise caused by the propeller spinning in water, which mainly comprises high-frequency broadband spectrum and line spectrum; the third one is the hydrodynamic noise induced by water flow passing over the target, which mainly involves broadband spectrum.This paper extracts 10-dimensional continuous spectrum features, 50-dimensional LOFAR spectrum features [9], and 6-dimensional DEMON spectrum features [9] from the underwater acoustic target radiated noise dataset.Figure 3 shows the LOFAR spectrum diagrams of six target signals.

2.2.3
Auditory spectrum feature.The human auditory system can identify sounds of interest in highly complex environments.Researchers have investigated the human auditory system in depth and developed a series of feature extraction methods based on auditory attributes, mainly including Mel Frequency Cepstral Coefficients (MFCC) [10], Gammtone Filter Cepstral Coefficients (GFCC) [11], Linear Prediction Cepstral Coefficient (LPCC) [12], Perceptual Linear Predictive (PLP) [13], and timbre features and loudness features (Zwicker loudness).This paper performs experiments on the above auditory domain features on a dataset of underwater target radiated noise.

Data preprocessing
To identify underwater acoustic targets, directly using the acoustic signal data after sonar beamforming for feature extraction and classification may be affected by environmental noise and equipment variations, which can degrade the performance of the classifier.Therefore, preprocessing of the target radiated noise data is necessary.

Pre-emphasis.
As shown in Figure 2, the radiated noise of different targets has abundant lowfrequency and high-frequency components.During the actual propagation of the radiated noise, the ocean channel causes much more attenuation to the high-frequency part.To extract reliable signal characteristics, pre-emphasis can be used to boost the high-frequency part of the signal.It is assumed that the sampled signal of the radiated noise is , and then we apply Equation ( 1) to perform pre-emphasis on . ( where is the pre-emphasis coefficient, which is chosen as according to the prior studies of the researchers to apply pre-emphasis to the raw acoustic target signal.Figures 4 and 5      , each of which is called a frame.The framing process uses a fixed-length window function to slide on the original signal and perform multiplication operations.In the process of window sliding, to ensure the integrity of the original signal, each frame of signals overlaps with each other.Figure 6 shows the schematic diagram of signal framing.Figure 6.Framing process diagram.
The process of frame splitting inevitably involves applying a window function to each frame of signal, which is equivalent to multiplying the window function and the target signal in the time domain and convolving their spectra in the frequency domain.Its process will cause the target signal to produce a spectral leakage phenomenon.To mitigate the effect of spectral leakage, this paper uses the Hanning window function for all window functions in the preprocessing stage, and each frame length is 3 s, with 50% overlap between frames.

Signal amplitude normalization.
The amplitude of target radiated noise is a physical quantity that measures the loudness of the sound source.However, the sound signal attenuates with distance during propagation, so the amplitude information is greatly affected by the distance between the target and the hydrophone.Moreover, with the development of noise reduction technology, the distance factor reflected by the amplitude becomes more ambiguous.To reduce the adverse effects of different amplitude values of the received signal under different conditions on feature extraction, this paper uses amplitude normalization to uniformly limit the amplitude of the target radiated noise signal after framing.Equations ( 2) and (3) are used to perform amplitude normalization on each frame of signal. (2) (3)   For each frame of the signal, 252-dimensional classification recognition features of three categories, time domain, frequency domain, and auditory domain, are extracted and concatenated into a 17, 853×252 dimensional feature vector input to the classifier for classification recognition.The training set and test set in the data set are divided according to the ratio of 3:1, with 13, 389 training samples and 4, 464 test samples.To avoid the adverse effects of different dimensions of each feature on the model training recognition, the feature vector after feature fusion is standardized.The experimental environment is based on the sci-kit-learn library under Python3.7,and the hardware platform uses AMD Ryzen 7 5, 700 G CPU, and NVIDIA GeForce RTX3060Ti GPU.

Classification recognition experiment and result analysis
Based on the feature extraction experiments, this paper performed classification recognition experiments by using the basic classifiers KNN algorithm, decision tree algorithm, support vector machine algorithm, and ensemble learning classifiers AdaBoost, random forest, XGBOOST, and GBDT.Each algorithm chose the model hyperparameters through the grid search method.Table 4 shows the hyperparameter settings and recognition accuracy of the three basic classifier models.As can be seen from Table 4, among the three-machine learning base learners, the decision tree model has the shortest training time, but its recognition accuracy is lower than the K-nearest neighbor algorithm and support vector machine.The KNN algorithm has the best comprehensive performance Table 3. Data preprocessing results. in terms of training time and recognition accuracy.To further study the performance of the three models on the underwater acoustic target dataset, Figure 8 shows the confusion matrices of the recognition results of the three models in the test set.As shown in Figure 8, the three types of machine learning algorithms suffer from severe distortion problems in recognizing A and B targets.The KNN algorithm and the SVM algorithm have a probability of nearly 50% to misclassify A and B targets as C and D targets, while the decision tree algorithm will classify 66% of B targets as C targets.
To further investigate the recognition potential of machine learning algorithms in acoustic target recognition, we conducted experiments by using ensemble learning algorithms Adaboost, random forest (RF), GBDT, and XGBOOST.Table 5 presents the hyperparameter settings, recognition accuracy, and training time of the four ensemble learning algorithms.From Table 5, we can see that the recognition accuracy of the AdaBoost algorithm is significantly lower than that of the other three types of ensemble learning algorithms.The recognition accuracy of the GBDT algorithm is high, but the model training time is considerably increased compared with the other algorithms.The recognition accuracy of the XGBOOST algorithm is the highest, and the training time is more balanced.The model training time of the random forest algorithm is the fastest, and the recognition performance is also good.Figure 9 shows the confusion matrix plots of the four algorithms.According to the comparative experiments of different machine learning algorithms, the ensemble learning algorithms of random forest, GBDT, and XGBoost perform better than the single classifiers of KNN, DT, and SVM in the task of underwater acoustic target recognition.Considering both the model training time and the confusion matrix results, the XGBoost method has the strongest applicability for underwater acoustic target recognition.

Results and analysis of the feature selection experiment
In the 252-dimensional feature vector, different types of features have different impacts on the classifier recognition performance.To further optimize the classification recognition model and improve the model recognition efficiency, this paper explores the effects of different domain features on the classification recognition results by conducting feature selection experiments on the extracted feature vectors.
This paper conducts feature selection experiments on 252-dimensional feature vectors by using three ensemble learning algorithms: random forest, GBDT, and XGBoost.The experiments evaluate the performance of three kinds of feature selection methods in terms of filter, embedded model, and wrapper in target recognition tasks.Since some models have significant weaknesses in recognizing Class A and Class B targets, the main evaluation metrics are the recognition accuracy of Class A and Class B targets when performing feature selection.Taking into account the performance of the three ensemble learning algorithms when trained on the whole feature vector, the feature selection standards are set as follows: the recognition accuracy of Class A and Class B targets should reach: RF (70%), GBDT (80%), and XGBoost (85%).In Figure 10, Subfigure (a) presents the mean recognition accuracy of Class A and Class B targets under five conditions.It reveals that only when the feature vector dimension reaches 204 dimensions, the recognition accuracy of GBDT and XGBOOST algorithms can satisfy the criterion, and the effect of feature selection is not obvious.In Subfigure (b), the horizontal axis represents the feature vector dimension selected by the chi-square test, and the vertical axis denotes the mean recognition rate of three ensemble learning models on Class A and Class B targets.Based on the feature selection standard, the feature vector chosen by the RF model and XGBOOST model needs to attain 200 dimensions, and the GBDT model needs to achieve 190 dimensions.It can also be observed from Subfigure (b) that when using the chi-square test method for feature vector selection, the model recognition accuracy is highly influenced by the feature vector dimension, and a higher feature vector dimension is necessary to guarantee the model performance.From Subfigure (c), it can be seen that the performance of feature selection based on the mutual information method is significantly superior to that of the variance method and chi-square test method.The three algorithms can meet the accuracy requirement when the feature dimension reaches 100 dimensions.At this point, the feature vector dimension is reduced by 60% compared with the original feature vector.

Wrapping.
Wrapper feature selection algorithms can determine the corresponding feature subsets for each classifier.The advantage is that they can achieve higher classification recognition rates with lower feature vector dimensions, but the disadvantage is that they are slower in computation.This paper uses the feature recursive elimination method to experiment on the recognition performance of three ensemble learning models, with feature dimensions ranging from 10 to 120, recursively eliminating 10 feature vectors each time.Figure 11 shows the change in the average recognition rate of the algorithm in Class A and Class B targets concerning the feature dimension.Figure 11 shows that by the wrapper method for feature selection, the recognition accuracy of three ensemble learning classifiers can satisfy the requirement when the feature dimension is 30, and the feature selection efficiency is very high, which also fully illustrates the effect of "customization".However, compared with the filter method, the selection algorithm takes much longer to run.

Embedding.
Embedded feature selection is a process that integrates feature selection and model training into one, and both are completed in the same optimization process.In the embedded feature selection experiment, the importance index of each feature relative to different models can be obtained, and feature selection of different types of models can be realized according to this index.In this paper, the importance index of each feature is obtained in the experiment, and then the median of the feature importance index is used as the basis for feature selection and features with an importance greater than the median are selected for classification recognition.The feature selection and classification recognition accuracy of three ensemble learning models are shown in Table 6.Table 6.Results of feature selection results by embedding.

3.3.4
Experimental results and analysis.This paper conducts various feature selection experiments and combines the feature vectors derived by RF, GBDT, and XGBOOST algorithms by using wrapper method feature selection to form a 40-dimensional feature vector, which retains only 16% of the components compared to the original 252-dimensional feature vector.The composition of the feature vector is illustrated in Figure 12.To illustrate the impact of feature selection, we extract three features from the 40-dimensional feature vector and display the data set in a three-dimensional coordinate system, as shown in Figure 13.It is evident that in this three-dimensional feature space, the target samples belonging to the same category are approximately grouped, and there is some distinction between the target samples of different categories.Based on this, the model was retrained and predicted by using the 40-dimensional feature vector.Figure 14 shows the confusion matrix diagrams of the three algorithms.From Figure 14, it can be seen that after feature selection, the recognition performance of the three algorithms still maintained a high level, especially for the random forest algorithm, whose recognition performance exceeded the results under the original 252-dimensional feature vector condition.Table 7 gives the comparison of the training time of the three models before and after feature selection.In general, after feature selection, the model recognition performance was guaranteed and the training efficiency was greatly improved.8, this paper adopts the TOPSIS, which is a multi-criteria decision analysis technique that compares alternatives based on their similarity to the ideal solution, to perform statistical analysis on different features.This paper believes that among the 10 categories of features, LPCC features, continuous spectrum features, loudness features, timbre features, and time domain features should be prioritized for underwater acoustic target recognition.

Conclusions and prospects
This paper studies the machine learning-based underwater acoustic target recognition, focuses on the impact of feature extraction on acoustic target recognition, and mainly accomplishes the following tasks: (1) It studies the preprocessing techniques of acoustic target signals, feature extraction methods, and classification performance of different machine learning algorithms.After preprocessing the acoustic target radiated noise signals with pre-emphasis, framing, and windowing, it extracts 252dimensional feature vectors for classification from the time domain, frequency domain, and auditory domain.It uses three base learning algorithms, KNN, decision tree, and support vector machine, to classify five acoustic targets and one ocean environment noise.The three algorithms have significant performance defects in recognizing targets A and B. When the base learners do not meet the requirements, multiple learners can be integrated to use multiple classifiers for classification.Under this idea, this paper simulates four ensemble learning algorithms (AdaBoost, random forest, GBDT, and XGBOOST), and except for AdaBoost's poor recognition performance, the other three ensemble learning algorithms achieve better results than single-base learners.
(2) Based on the three ensemble learning algorithms, it simulates three feature selection algorithms: filter method, wrapper method, and embedded method.It studies the model recognition performance changes under the condition of reducing feature dimensions.In the wrapper method feature selection experiment, the three ensemble learning algorithms can maintain high recognition performance under the condition of 40-dimensional features, which improves the model training efficiency.
(3) It uses feature selection algorithms to calculate the importance indicators of 252-dimensional features and comprehensively considers the average scores of each feature.This paper believes that under the current data set conditions, LPCC features, continuous spectrum features, loudness features, timbre features, and time domain features should be preferentially extracted for acoustic target recognition among 10 categories of features.

Figure 2 .
Time domain waveform of the underwater acoustic target sample signal.
present the time-domain and frequency-domain contrast plots of a 170 ms segment of acoustic target data before and after pre-emphasis in the dataset.It can be observed from the figure that pre-emphasis increases the variation of the target signal and amplifies the high-frequency components in the signal.

Figure 4 .
Figure 4. Preemphasis time domain before and after contrast.

Figure 5 .
Figure 5. Preemphasis frequency domain before and after contrast.

2. 3 . 2
Frame splitting and windowing.Due to the different sample sizes and audio lengths of various targets in the original dataset, it is difficult to extract signal features in a standardized way.Moreover, the target radiated noise signals generally exhibit non-stationary characteristics, which limit the conventional Fourier transform analysis methods and are not conducive to our signal analysis.According to research, within a few seconds of a short time, the non-stationary underwater target radiated noise can be treated as a stationary signal for processing and analysis.In this paper, preemphasized underwater signals are framed, so that the target radiated noise with different lengths is ETAI-2023 Journal of Physics: Conference Series 2644 (2023) 012008 IOP Publishing doi:10.1088/1742-6596/2644/1/0120087 segmented into many fixed-length segments

3. 1
Data aggregation and feature extraction overview 3.1.1Data aggregation.After preprocessing the underwater acoustic signals, a normalized target signal dataset with amplitude and duration is obtained.

Figure 8 .
Confusion matrix of three machine learning algorithms.

Figure 9 .
Confusion matrix of four ensemble learning algorithms.As shown in Figure 9 (a), the random forest algorithm has fast model recognition time and high overall accuracy, but its recognition accuracy for Class A and Class B targets is low, and it misclassifies them as Class C with a 14% probability.Figures (b) and (c) reveal that the GBDT and XGBOOST algorithms are also less accurate in recognizing Class A and Class B targets than the other four classes, but they outperform the random forest algorithm.Figure (d) illustrates that the poor performance of the AdaBoost algorithm is due to its inability to effectively identify Class A, B, and C targets.

3. 3 . 1
Filtering.In feature selection based on filtering methods, this paper experimentally verified the variance filtering method based on univariate analysis and the chi-square test and mutual information test methods based on multivariate correlation analysis.ETAI-2023 Journal of Physics: Conference Series 2644 (2023) 012008 IOP Publishing doi:10.1088/1742-6596/2644/1/01200811(a) Results based on the variance selection method (b) Accuracy based on chi-square (c) Accuracy based on mutual information Figure 10.Results of feature selection by filtering.

Figure 11 .
Figure 11.Results of feature selection results by wrapping.

Figure 12 .
Figure 12. Results of feature selection

Figure 13 .
Figure 13.Target sample distribution under the condition of three-dimensional features.
Figure 14.Confusion matrix of ensemble learning algorithm after feature selection.

Table 1 .
Classification of underwater acoustic target data.

Table 2 .
Time domain features of underwater acoustic target recognition.
Table 3 presents the data of 5 types of target signals and 1 type of environmental noise signal after preprocessing.Feature extraction experiment.According to the feature extraction process, the preprocessed acoustic target signals are subjected to feature extraction, resulting in 252-dimensional features for target classification and recognition.The detailed information of the features is shown in Figure 7.

Table 4 .
Hyperparameter setting and recognition performance of the machine learning model.

Table 5 .
Hyperparameter setting and recognition performance of the ensemble learning model.

Table 7 .
Comparison of training time before and after feature selection.featureoptimizationselection in the underwater acoustic target recognition task, the feature vectors obtained by filter method and embedded method feature selection were scored (wrapper method only selects feature subset without feature importance index), and Table8shows the average score of each type of feature.The score scales vary for different feature selection methods.

Table 8 .
Feature score statistics by the feature selection algorithm.