Separability Measure Supervised Network for Radar Target Recognition

In the radar automatic target recognition (RATR) field, radar high-resolution range profiles (HRRPs) have garnered significant attention. While traditional methods focus on extracting features having physical explanations, including power spectra, FFT magnitudes, etc, the effectiveness of these features relies heavily on personal experience and skills. In contrast, deep learning networks have shown strong competence in extracting discriminative features of HRRPs. However, the deep learning networks’ feature extraction procedure is solely based on the targets’ label information, which has almost no correlation with the feature separability. As a result, this approach can lead to poor convergence and limited recognition performance. To address this issue, we propose a Separability Measure Supervised Network (SMSN), which integrates a separability measure based on the rate-distortion function into the loss function to direct the training of the network. Comparative experiments on the airplane electromagnetic simulation HRRP dataset demonstrate that SMSN achieves higher recognition accuracy compared to the backbone networks, with significantly improved feature separability.


Introduction
In the Radar Automatic Target Recognition (RATR) field, the high-resolution range profile (HRRP) is a crucial data source due to its efficiency and low complexity [1].It provides valuable knowledge regarding target structure, including scattering distribution, geometry size, and target shape [2].Additionally, obtaining HRRP data is easy, and it has low requirements on radar systems.As a result, research on HRRP-based radar target recognition has garnered widespread attention in radar communities as a promising approach.
Generally, the approach based on HRRP for radar target recognition involves three steps: data preprocessing, feature extraction, and classifier design.In recognition, data quality and feature separability are crucial for improving recognition performance.Various methods have been proposed for addressing the amplitude, translation, and orientation present in HRRP data [3].However, current feature extraction algorithms do not optimize feature separability.
Early traditional HRRP recognition methods extract features with physical explanations, including power spectrum, FFT magnitude, etc.Although these features are interpretable, their separability depends heavily on personal experience and skills.Recent studies suggest that neural networks can extract more discriminative features of HRRPs than traditional shallow feature learning methods.The variational autoencoder (VAE) as a deep feature learning method is first considered to apply for HRRP recognition.E.g., for the purpose of learning hierarchical features for the HRRP, Feng et al. [4] introduced a stacked corrective auto-encoder (SCAE) model.However, the VAE may not create a discriminative subspace without label information because it's a typical unsupervised generative model.

2
To address this issue, Du et al. [5] used the conditional variational auto-encoder (CVAE) with label information to obtain discriminative latent representations.Coincidentally, Liao et al. [6] constructed a complex VAE that exploits the magnitude and phase information of HRRP echoes.In addition to VAE models, the convolutional neural network (CNN) is applied to extract the corresponding spectrogram features of HRRPs [7].Additionally, recurrent neural networks (RNNs) have been used to learn the sequential information over time between HRRP range cells [8].In general, the deep feature learning models described above improve recognition performance by incorporating label information into the loss function.However, this approach can lead to convergence of the loss function only being closely related to the gap between the predicted label and the actual label.As a result, this label-guided model fails to fully utilize the separability information of latent features during training, thus limiting recognition performance.
To address this issue, the proposed model introduces the separability measure, which is an inherent property that describes how data points belonging to different classes are mixed with each other [9].In [10], existing classification complexity measures are summarized from a data separability perspective, which primarily assesses the Euclidean distance between intra-class and inter-class data.However, such distance-based methods still raise an important question, namely, whether different distance metrics, such as Euclidean, Manhattan, and Minkowski distance, are suitable for assessing data separability in high-dimensional spaces [11].In contrast, the separability measure based on the rate-distortion function [12] is more robust to changes in feature dimensionality, as it utilizes the singular value of a feature to measure the distance between inter-class samples.Recent research has applied rate-distortion theory to explain neural network models, optimal feature learning methods, and so on [13].
In this study, we propose the SMSN, a separability measure supervised network for HRRP recognition.The SMSN is composed of three modules.The backbone module follows the Autoencoder (AE) structure and extracts latent features of the input HRRP data in an unsupervised manner.The unsupervised latent feature generation process is derived from the reconstruction loss optimization.The Separability Measure Module evaluates the latent feature separability and applies the rate-distortion function to calculate the separability loss.The Loss Fusion Module combines the two losses stated above with predetermined weights to construct the total loss.The SMSN extracts the separable latent features for the training and test data, and the Linear Support Vector Machine (SVM) classifies the separable feature to complete the classification process.Extensive experiments on an airplane electromagnetic simulation dataset demonstrate that SMSN improves the backbone networks' recognition capability with enhanced feature separability.This paper's primary contributions are: 1) Our proposed method extracts a more separable feature.We introduce the rate distortion-based separability measure to quantify feature separability.By optimizing the separability loss during the training process, intra-class samples are more clustered, and inter-class samples are more dispersed.
2) Our proposed model achieves a higher recognition accuracy.Ablation experiments demonstrate that it has a significant improvement in recognition performance with fewer iterations compared to the simple AE and VAE models.

Backbone module of the AE and VAE structure
The backbone module extracts hidden layer features for recognition by an AE or VAE framework consisting of fully connected neural network units, as shown in Figure 1.To extract separable latent features, a separability measure is introduced into the loss function.
represents the output of the AE model, then the reconstruction loss function is expressed as The VAE model is a variant of the AE model, which is actually a variational inference model.Its loss function appends the Kullback-Leibler (KL) divergence to the reconstruction error, constraining the latent feature Z to obey a Gaussian distribution.The VAE loss function can be defined as: where q Φ (Z|X) is defined as the probability distribution of the hidden variable Z generated by the input X, and p θ (X|Z) denotes the probability distribution of the output X sampled from the reparametrized Z, and Φ, θ respectively hold the parameters of the encoder f and the decoder g.

Separability measure module
The separability measure module quantifies the latent feature separability extracted by the backbone module.Specifically, assume an HRRP latent feature Z = [z 1 , …, z m ] ∈ R n×m with m samples of n dimensions and an encoding precision ε > 0, let Π = {Π j ∈ R m×m } k j=1 be the label matrix of the Z in the k classes, and Π j (i, i) be the label of z i belonging to class j, our data separability measure based on rate-distortion function is: where the R(Z, ε|Π) and R(Z, ε) denote the local and global coding rate of the data.According to Cover and Thomas' [14] definition of rate-distortion, with the anticipated decoding error less than ε, the rate-distortion R(Z, ε|Π) is the desired minimum volume of binary bits to encode Z.The actual estimation coding rate of Z with zero means can be defined as follows: Furthermore, suppose Z has k-class samples, then Z = Z 1 ∪Z 2 ∪…∪Z k , and the data Z j in each class j also occupy a certain volume in its low dimensional subspace.By applying the above coding rate equation (4) for each subset, R(Z, ε|Π) can be given by The loss function in Equation ( 3) is employed to guide the network learning a more separable feature with low dimension compared to the original HRRP data.During the training process, maximizing L SEP means the higher R(Z, ε) and the lower R(Z, ε|Π) are expected.Consequently, the volume of all features Z expands to its maximum, and each class Z j compress to its minimum.Thus, the hidden layer features exhibit discriminative between classes, while preserving intra-class similarity.

Loss fusion module
The loss fusion module combines the reconstruction loss L AE or L VAE with the separability loss L SEP .Considering the magnitude gap between different types of loss functions, the preset hyperparameter λ is applied to balance the loss value.For the AE model, λ equals 0.00001, and for the VAE model, λ is 100.The complete loss function can then be represented below.

Experimental results and analysis
In this section, we outline the implementation and results of our experiments.Firstly, we introduce the airplane electromagnetic simulation dataset.Next, we preprocess the data using L2 normalization and Maximum correlation alignment to address issues related to HRRP amplitude and translation and improve data quality.Finally, we conduct an ablative experiment to demonstrate the superiority of our proposed separability measure supervised network.Our comparative results, which include LSVM's recognition accuracy, cosine similarity matrix between samples, and t-distributed stochastic neighbor embedding (t-SNE), demonstrate that our proposed method (SMSN) can extract more separable latent features and achieve better recognition performance.We used Pytorch to carry out all of our experiments using a laptop with an NVIDIA GeForce MX150 graphics card.

Dataset
In this study, we experiment with the F-35, F-117, and P-51 aircraft types from the aircraft electromagnetic calculation dataset.Figure 2 displays their 3D models, and The aircraft data is simulated on X-band radar with a simulated frequency range of 9.5 GHz to 10.5 GHz in 5 MHz steps.The dataset size is 901×101×20, where 901 represents the number of HRRP samples taken every 0.1 degrees in the radar azimuth angle range from 0 to 90 degrees, 101 represents the number of HRRP samples taken every 0.1 degrees in the radar pitch angle range from 0 to 10 degrees, and 201 represents the dimension of one HRRP.For the training set, we select the 46th to 49th pitch angles of the HRRPs from all azimuth angles for the three types of aircraft, while the 50th pitch angle is used for the test set.The training set consists of 10,812 instances, and the test set contains 2,703 instances.Figure 3 illustrates the three types of HRRPs in training and test data, with the 46th to 49th pitch angles of the training HRRPs plotted in the same figure.As shown in Figure 3, each HRRP has a varying amplitude and center translation, which makes it different from intra-class samples and results in a challenging recognition issue.

Data preprocessing results
To enhance the quality of the HRRP data and address issues related to amplitude and translation, we apply two preprocessing techniques, namely amplitude L2 normalization and the Maximum Correlation Alignment (MCA) method.The effectiveness of these techniques is validated by visualizing the 48th pitch angle of F-117 HRRPs before and after preprocessing. Figure 4 illustrates the results of this comparison.Specifically, as shown in Figures 4(a) and 4(b), the amplitude of the HRRPs is normalized by the L2 normalization technique, resulting in a more prominent magnitude feature for each HRRP.Furthermore, as shown in Figures 4(c) and 4(d), the MCA method is applied to align the HRRP sequences, resulting in a more symmetrical center of gravity for the HRRPs.

Comparative and ablative experiment results
On the basis of the aforementioned simulated dataset, the comparison experiments are conducted in this section using conventional AE and VAE methods as well as the ablative experiments using the proposed method.Our backbone network is a component of a series of fully connected layers.For the encoder, each layer's neural unit number is set to 201, 1500, 500, and 50 respectively.And for the decoder, the unit number is 500, 1500, and 201.The Adam optimizer is applied to train the network, using a learning rate of 0.001.Each loss function summarized in section II respectively stands for the methods shown in Table 2. L AESEP and L VAESEP are our proposed method training with feature separability measure, and traditional L AESEP and L VAESEP models are considered as comparison methods in an ablative experiment.The highest recognition accuracy for each approach is displayed in Table 2. Additionally, we provide the recognition accuracy curves during 100 iterations for each model in Figure 5.
Table 2.The highest accuracy of models with different loss functions.In general, the proposed SMSN model outperforms the AE and VAE models in terms of recognition accuracy.Specifically, the method L VAESEP has nearly 10% higher accuracy than the method L VAE , and the method L AESEP also achieves a significant accuracy improvement, arriving at the highest accuracy of 0.9926.Furthermore, from Figure 5, it is evident that the separability measure supervised methods have outstanding performance at the beginning of the training process, which means that the separability loss function is optimized and the SMSN model focuses on mining for a more separable latent feature.

Loss Accuracy
To demonstrate how our proposed method SMSN makes a difference in extracting more separable features and improving recognition accuracy, the cosine similarity metric is utilized to calculate the similarity between instances and generated a feature reduction map using the t-SNE method.The resulting cosine similarity matrix and t-SNE map of the latent feature for each model are presented in Figure 6 and Figure 7, respectively.

Figure 1 .
Figure 1.The framework of the separability measure supervised network.

Figure 2 .Figure 3 .
Figure 2. The three types of aircraft 3D models

Figure 5 .
Figure 5.The accuracy comparison results during 100 iterations

Figure 6 .
Figure 6.The cosine similarity matrix between instances.

Figure 7 .
Figure 7.The 2D t-SNE visualization results of the test data's latent feature.

Table 1 .
Table 1 lists the specific simulation settings for each of the several types of aircraft.Three distinct types of aircraft simulation settings