Minimum Mean Square Distance Estimation of Subspaces in presence of Gaussian sources with application to STAP detection

We consider the problem of signal subspace estimation from a given sample set. We rely on the Bayesian framework developped in [1] to obtain minimum mean square distance (MMSD) estimators, which minimize the expected distance between the true projection matrix UUT and its estimate ÛÛT. In this work, we extend the estimators of [1] to the context of linear model with Gaussian sources, with respectively Bingham and von Mises Fisher priors for the basis Ū. Numerical simulations are given in order to assess the performance of the proposed estimator. The interest of the considered approach is finally illustrated on a real data set, for a Space Time Adaptive Processing (STAP) application.


Introduction
Subspace estimation is an ubiquitous issue in signal processing. It aims at estimating the low dimensional subspace where information (usually signals of interest) lies in. This problem arises in numerous applications, such as in PCA, DOA estimation [2], interference cancellation [11] or signal detection [10].
In the literature, a widely used model is the linear model where the data matrix is modeled as Y = US + N, in which U spans the subspace of interest, S is the signal of interest and N is an additive noise. The addressed problem is to estimate U from the observations Y. The most usual mean of estimating this subspace U is the singular value decomposition (SVD) of the sample covariance matrix (SCM), i.e., taking the largest eigenvectors of the SCM. This provides usually an accurate estimator of range space R(U) of U for high signal to noise ratio (SNR) and/or for large number of sample support contexts. However, this method shows its limits outside these asymptotic regimes. A possible solution to overcome this issue is to incorporate some prior knowledge into the estimation process, as proposed in [1], [5]. For example, in a Bayesian setting, the minimum mean square error (MMSE) [5] which minimizes the average Euclidean distance between the true subspace U and its estimate U, and the minimum mean square distance (MMSD) [1] which aims to minimize the distance between the true projection matrix UU T and its estimate U U T , were proposed. More specially, in [1] the MMSD is derived for both linear and covariance models with uniform prior distributions for S. In this paper, we extend the work of [1] for the linear model, for the case where, the signal model S is assumed to follow a Gaussian law and for U which has respectively a Bingham and a von Mises Fisher prior distributions as [1].
In this work, we extend the estimators of [1] to the context of linear model with Gaussian sources, with respectively Bingham and von Mises Fisher priors for the basis U. Numerical simulations are given in order to assess the performance of the proposed estimator. The interest of the considered approach is finally illustrated on a real data set, for a Space Time Adaptive Processing (STAP) application.

Theory background and model setup
Along this paper, we denote by Y ∈ R N ×K the data matrix, U ∈ R N ×P the unknown subspace of interest, S ∈ R P ×K the signal of interest and N ∈ R N ×K the additive noise, in which, N denotes the number of antennas, K the number of snapshots, P the dimension of subspace of interest (P N ) and p(Y | U) the conditional probability of Y given U. Tr{.} stands for trace of a given matrix and E {.} is the expectation of a given random variable.

Minimum mean square distance (MMSD)
The MMSD estimator minimizes the distance between the range spaces R( U) and R(U), i.e., F which is a natural metric in Grassman manifold G N,P (the set of P-dimensional subspace in R N ) as proven in [3], [4]. This estimator is therefore expressed as: U mmsd = arg min U U U T − UU T 2 F = arg max U E{Tr{ U T UU T U}}.As shown in [1], the MMSD estimator is obtained in general from the P largest eigenvectors of the matrix M(p(U | Y)) = UU T p(U | Y)dU [1] and [5] which is expressed below as: where, P P {.} is an operator that stands for the P strongest eigenvectors of a given matrix. The generic expression of the MMSD depends on p(U | Y) which has to be specified based on both the data model and the prior distribution of S and U.

Model setup
In this paper, we assume that the data is the sum of a signal of interest and a zero white Gaussian noise. Furthermore, we assume that the columns of the additive noise N are i.i.d. and each follows N (0, σ 2 I) distribution and the columns of the signal of interest S are i.i.d. and each follows N (0, σ 2 S I) distribution. By contrast, we note that in [1], S is assumed to follow a uniform law. The linear Gaussian model is given by: Since, we adopt a Bayesian framework in this paper, a prior distribution must be assigned to U. With regarding to Grassman and Stiefel manifold, the most known distributions are Bingham and von Mises Fisher (vMF) [7], [8] which are specified below.

General Bingham von Mises Fisher distribution (BMF)
The BMF distribution has as parameter matrices A ∈ R N ×N , B ∈ R P ×P and C ∈ R N ×P [6]. In this paper, it is sufficient to assume that B is a diagonal matrix, A is a symmetric matrix, C is an orthogonal matrix. The matrix U is said to have U ∼ BM F (A, B, C) distribution, if its probability density function is given as below : pBMF(U) ∝ etr{C T U + BU T AU} in which, etr{.} stands for the exponential of the trace of a given matrix and ∝ indicates "proportional to". The Bingham and vMF are special cases of this distribution.

Bingham distribution
The Bingham distribution [7] is a particular case of the BMF distribution where, C is a null matrix. The matrix U is said to have a Bingham distribution with parameters A and B, i.e., U ∼ B(A, B), if its probability density function reads as pB(U) ∝ etr{BU T AU}in which, A gathers the prior knowledge about the subspace of interest. Here, we focus on the case where A is proportional to a projection matrix spanned by the basisŪ. Thus, we denote B = I and A = κŪŪ T : whereŪ is therefore the center of the distribution and κ is the concentration parameter of the distribution.

vMF distribution
The vMF distribution [8] also is a special case of BMF for which A and B are nulls. U is said to have a vMF distribution with parameters C, i.e., U ∼ vM F (C), if its probability density function is given by: pvMF(U) ∝ etr{C T U} . As previously, we focus on vMF distribution where C is proportional to a unitary matrixŪ. We denote C = κŪ, thus, Remark1: Note that the Bingham distribution p B characterizes the distribution of R(U) range space of U. The vMF distribution p vMF characterizes the distribution of the orthogonal base U. Hence, as defined in (3) and (4), p B is invariant by rotation while, p vMF is not.

Proposed MMSD estimators in presence of Gaussian sources
Here, we develop the MMSD estimator from 2.1 for the model specified in 2.2 using the prior distributions of U given in 2.2. Let us derive the conditional probability of Y given U:

Bingham prior
We start by deriving the posterior probability of U, which is given by Then, we move to the derivation of the MMSD estimator using (1), we obtain: At this point, we use proposition 1 of [1] which states that UU T etr{U T (αGYY T + κŪŪ T )U}dU and αGYY T + κŪŪ T share same eigenvectors and its correspondents eigenvalues have the same order. Consequently, we obtain:

vMF prior
We start with deriving the posterior probability of U regarding the vMF prior: In this case, pvMF-G ∝ BM F (YY T , αGI, κŪ). Then, we use (1) to derive the estimator: Unfortunately, there is no closed form of the the MMSD in this context. However, an efficient Gibbs simpling can be generated to draw the BMF distribution [6]. In order to compute the so-called induced arithmetic mean (IAM) [9] of the unitary matrix U (n) ∼ BM F (YY T , αGI, κŪ) and then, the MMSD is given by: where N bi stands for the burn-in samples which is the number of thrown samples from the Markov chain, N r is the number of samples used to approximate the estimator.
Discussion: We recall the expressions of the derived estimators in linear model context in [1], where S has a prior uniform distribution. For Bingham prior, the estimator is given by: U mmsd-B-U = PP αU YY T + κŪŪ T . For the vMF prior, the MMSD estimator is obtained as (7) where U (n) ∼ BM F (YY T , α U I, κŪ) and αU = 1 The main difference between our MMSD estimators (6), (7) and those presented previously [1] lies in the scalar α G from (6). Notice that for low SNR, α G α U which means that our estimators are more shifted towards the priorŪ than the ones given above. While, for high SNR, α G ≈ α U which means that the estimators coincide, as shown in fig. 1.

Simulation
In this section, we illustrate the performance of the proposed estimator through evaluating the average fraction of energy (AFE). The AFE of a given estimator U is expressed as: AFE( U) = E{Tr{U T U U T U}} which is considered as a criteria of performance, since, it evaluates the closeness of an estimator towards the random generated matrix U. We set K=5, N =20, P =5, SNR=0dB, κ=20 andŪ = Ip 0 . We compare in this part the proposed MMSD estimators denoted U mmsd-B-G and U mmsd-vMF-G with those presented in [1] denoted respectively U mmsd-B-U and U mmsd-vMF-U and with the SVD estimator of the SCM, denoted U SCM which takes no prior knowledge of subspace, i.e., U is deterministic. For computing the IAM, we fix N bi = 10 and N r = 200. Then, we illustrate figures below which examine the impact of SNR and K on the performance of estimators and for Gaussian prior for S.  According to fig.2 and fig.3, the SVD estimator shows his poor performance outside the asymptotic regimes. In regard to fig.2 and fig.3, we notice that for low SNR and/or small number of snapshots, the proposed estimators (6), (7), prove a better performance compared respectively to those presented in discussion above. In asymptotic regimes, the proposed estimators have almost same performance as the ones given in discussion above.

Application to STAP low rank detection
In this section, we illustrate the interest of MMSD estimators for space-time adaptive processing (STAP) on a real data set, provided by the French agency DGA/MI [16]. STAP [12] is applied to airborne radar in order to detect moving targets. Typically, the radar receiver consists in an array of Q antenna elements processing M pulses in a coherent processing interval (N c = M Q).
In the classical framework [15], we assume that the K + 1 samples are drawn as: 8) where the tested cell z 0 potentially contains a target d = α 0 p(θ, v t ) to be detected. α 0 is the amplitude and p is the normalized steering vector, v t is the target speed and θ is the DOA of the target (α 0 , v t and θ are unknown). For each sample, the additive noise is the sum of the ground clutter c and white Gaussian noise n. The secondary data {z k } are assumed to be i.i.d., signal and outlier free. Thanks to the Brennan rule [13], the ground clutter is known to be contained in a low dimensional subspace Π c = U c U H c of (known) rank R. Hence, if we concatenate the secondary data, we obtain in matrix form: It is worth mentioning that, even if the fluctuation of the clutter c may not be white Gaussian in practice, results reveal that the considered framework remains robust to such assumptions (c.f. Figure 4). We investigate the performance of the low rank adaptive normalized matched filter (ANMF) detector [12]: where Πc = Uc U T c is a clutter subspace projector (CSP) estimated from the secondary data {z k }. The performance of a given detector depends on the accuracy of the CSP estimator from which it is built. This is why we compare the performance of the following detectors: ΛSCM, where the CSP is estimated from the SVD of the SCM. Λ SFPE , where the CSP is estimated from the SVD of the shrinkage fixed point estimator with minimal regularization parameter γ [14]. Λ mmsd-B-G , where the CSP is estimated using our proposed schemes thanks to (6). Since the parameters α G and κ are here unknown, we will illustrate the performance of the estimator defined as U mmsd-B-G = PR ZZ H + βŪŪ H for different values of the ratio parameter β = κ/α G . The priorŪ is computed from the SVD of the STAP covariance matrix model in [12]. Λprior, the (non adaptive) detector using the priorŪ from the SVD of the STAP covariance matrix model in [12]. Figure 4 displays the output of the different detectors (i.e. the value of the ANMF for a grid of target parameters) for a configuration where the tested cell z 0 contains 10 targets and the number of secondary data is limited but still larger than the evaluated clutter rank (2R < K = 100 < N c /2). We observe that, in this context, the detector ΛSFPE allows for target detection with a relatively small false alarm rate. The detector Λ mmsd-B-G represents a trade off between ΛSCM (i.e. β = 0) and Λprior (β → ∞). As observed, ΛSCM does not allows for target detection since the target responses seems also canceled. This is probably due to the spreading of targets in the secondary data and the poor robustness of the SCM. The fully deterministic detector Λprior does not cancel the ground clutter as we can only observe false alarms on the ridge and no distinguishable targets. However, the detector Λ mmsd-B-G provides a trade off that allows for interference rejection and reliable target detection. This detector even outperforms ΛSFPE for appropriate values of the parameter β, which illustrates the interest of introducing some prior information in an adaptive subspace estimation process.
As prospects of this work, it will be interesting to develop adaptive methods for the selection of the parameter β = κ/α G . A second point to be investigated is the development of MMSD estimators that account for others signal models (e.g. covariance models, colored and non-Gaussian sources). The radar celerity is V = 100m/s. The inter-element spacing is d = 0, 3m and the pulse repetition frequency is fr = 1kHz. The clutter rank, computed from Brennan Rule [13], is R = 45 and the CNR is evaluated around 20dB.