Towards Robust Subspace Clustering via Joint Feature Extraction and Cauchy Loss Function

The purpose of the subspace clustering approach is to discover the similarity between samples by learning a self-representation matrix, and it has been widely employed in machine learning and pattern recognition. Most existing subspace clustering techniques discover subspace structures from raw data and simply adopt L2 loss to characterize the reconstruction error. To break through these limitations, a novel robust model named Feature extraction and Cauchy loss function-based Subspace Clustering (FCSC) is proposed. FCSC performs low dimensional and low-rank feature extraction at the same time, as well as processing large noise in the data to generate a more ideal similarity matrix. Furthermore, we provide an efficient iterative strategy to solve the resultant problem. Extensive experiments on benchmark datasets confirm its superiority in the robustness of some advanced subspace clustering algorithms.


Introduction
Subspace clustering is an effective algorithm for handling high-dimensional data, and has been widely involved in diverse fields, including gene expression analysis, face clustering, motion segmentation, and character recognition, etc. [1][2][3].Basically, subspace clustering methods are roughly divided into algebraic methods, iterative methods, spectral clustering-based methods and statistical methods [4].Among these techniques, the most efficient type of subspace clustering is the spectral clustering-based approaches [3], which uses all sample points in same subspace to linearly express each sample point.
Spectral clustering approaches generally involve two fundamental steps.Finding the similarity matrix from the data is the initial stage.The final partitioning result is generated by segmenting the obtained similarity matrix using spectral clustering [2].It is widely known that Least Squares Regression (LSR) [5], Sparse Subspace Clustering (SSC) [2] and Low Rank Representation (LRR) [3] are most representative works in spectral clustering-based techniques.SSC employs L1-norm to regularize the coefficient matrix and find the sparsest representation of every data sample.To adequately reflect the global structure, LRR utilizes the nuclear norm for discovering the lowest-rank representation of every data samples.LSR studies the grouping effect, and theoretically ensures that similar samples can be grouped into the same subspace.
However, previous works generally utilize raw data directly [6] and focus on selecting an appropriate regularization to characterize the coefficient matrix [4], but they ignore the influence of noise in the real world [7].For example, face images are frequently obscured by masks or sunglasses, as well as are affected by various lighting conditions, which makes face images have more complex noise.Therefore, how to deal with noise in the subspace clustering model is a very crucial issue and influences the ultimate clustering performance.Currently used techniques employ L1 norm or L2 norm strategies, which are limited to handling specific types of noise, such Gaussian or Laplacian noise [8], but they are unable to deal with complicated noise in the data.
To address the above-mentioned issue, we provide a robust subspace clustering method by combining feature extraction and Cauchy loss function into a unified framework.Specifically, the low rank and low dimensional structure of the data is considered to achieve the removal of noisy.We utilize feature extraction to convert the original samples into a new feature space, generating latent representations with low dimensional and low rank structures.We also make use of self-representation on the learned latent representation to construct the reconstruction error, and then apply the Cauchy loss function to characterize the reconstruction error and achieve substantial noise compression.For clarity, the main contributions are highlighted as below:  We propose a robust subspace clustering approach, named FCSC.The model considers feature extraction for subspace clustering approach, which not only extracts low-dimensional features, but also ensures that the generated features have a low-rank structure.Such operations can alleviate the impact of noise in reality. We introduce the Cauchy loss function into the subspace clustering model, which can effectively compress large noisy of the samples, and the proposed approach is more robust to various forms of noise in real-world data. An efficient iterative technique is designed to address FCSC optimization problem.Moreover, comparing with several advanced subspace clustering algorithms confirms the robustness and effectiveness of our model.

Model Establishment
Given a dataset = 1 , 2 , ⋯, q ∈ ℝ p×q with the feature dimension p and number of samples q.
The main principle of subspace clustering is to use self-representation to create a coefficient matrix for the original data [2].In addition, a deterministic regularization form is implemented for the coefficient matrix, forcing it to be a block diagonal structure.To learn the coefficient matrix ∈ ℝ q×q , the specific form is as follows Where κ > 0 and ∆() are the parameter and certain regularization, respectively.Moreover, the term dig() = 0 is habitual to avoid non-trivial solutions.However, equation ( 1) is insensitive to the large noise.Motivated by the fact that the influence function of the Cauchy loss function has an upper bound whose value approaches to zero as the error increases [6].We also simply adopt Frobenius norm to the coefficient matrix for avoiding the trivial solution and achieving the grouping effect.Therefore,we have the following form Where τ is a constant.To further consider model robustness, we use feature extraction to generate a low rank and low dimensional latent representation instead of directly using the original data as input.Then, we acquire the final objective function as Where σ > 0 is also the parameter.Moreover, ∈ ℝ p×h is learnable parameter whose purpose is to recover the authentic feature representation ∈ ℝ h×q from the original data , where h (h < p) is dimension of authentic feature representation.The term log(1 + 2 ) adopts an authentic feature representation to reflect the reconstruction error, and ∥ ∥ * s. t. = is to ensure that the learned authentic feature representation has a low dimensional and low rank structure.

Optimization Algorithm
Note that equation ( 3) is a non-convex optimization problem, and it contains three variables that need to be optimized.The alternation direction method of multipliers (ADMM) is adopted [3], which is an effective method for solving multi-variable problem.We first introduce the auxiliary variable to replace , and further have Where 1 and 2 are the Lagrangian multipliers, and γ is a penalty parameter.Accordingly, equation ( 4) can be considered to alternately optimize the following sub-problems Updating via equation ( 5): We take the derivative and set it to zero.Solution can be acquired as Updating via equation ( 6): Similar to equation (5), we obtain It is obvious that the above equation ( 10) is the Lyapunov matrix equation, and adopting Bartels-Stewart algorithm can ensure that it has a closed-form solution [7].
Updating via equation ( 7): The other variables fixed except , the sub-problem is a convex optimization problem, we have the following closed-form solution Updating via equation ( 8): By implementing the Singular Value Thresholding operator, it is easy to get the closed-form solution for variable [3].
Updating γ, 1 and 2 : They are updated by we let θ = 1.2, γ = 10 −6 and γ max = 10 6 .Based on the above update criterion, we obtain the optimal representation coefficient matrix  until convergence.Finally, we can acquire the clustering results by adopting a spectral clustering strategy [9] to segment the affinity matrix (  +  T )/2 for subspace clustering [2].

Comparison of Clustering
In this part, we compare the clustering performance of our FCSC with competitive methods on the COIL-20, Extended YaleB, USPS, and Robust-NUST datasets.The results are displayed in Table 1.
From the results, we can find that the clustering results of our proposed approach on the COIL-20 dataset, Extended-YaleB dataset, USPS dataset, and Robust-NUST dataset are the highest performance of the ACC and NMI.Continuing with observing the Robust-NUST dataset, we can see that our FCSC has significant advantages over the competing methods, thus demonstrating the feasibility of robust subspace clustering.

Sensitivity Analysis
The sensitivity analysis of the parameters κ and σ should be tuned in our proposed FCSC method.Therefore, We investigate the impact of different parameter settings to clustering performance on all benchmark datasets, and the detailed results are shown in Figure 1.Here we choose the NMI metric for evaluation, we see that the clustering performance is a little sensitive to different parameter settings.Choosing appropriate parameters to intervene in clustering performance is still very important, which is also an open problem in machine learning applications.

Figure 1.
Parameter sensitivity analysis of FCSC on benchmark datasets.

Evaluation on Noise Dataset
To evaluate the performance of our FCSC approach against noise, we randomly select 50 classes from Robust-NUST, and the percentage of replaced pixels ranges from 10% to 80%, representing different degrees of damage.The experimental results are presented in Figure 2. It is obvious that the clustering results obtained by SC-BDR slowly decrease with the degree of noise damage increases.In contrast, the clustering performance of existing methods degrades rapidly.This also illustrates our motivation, which is that the proposed FCSC model can effectively cope with the noise in the data.

Conclusion
In this article, we propose a Feature extraction and Cauchy loss function-based Subspace Clustering (FCSC).FCSC adopted the strategy of feature extraction to learn a low-rank and low-dimensional latent representation, which recovered a "clean representation" from the original data.Furthermore, we introduced the Cauchy loss function, which enables compression of large noise in the data.Finally, extensive experiments on several tough datasets confirmed the robustness and superiority of our proposed FCSC approach.

Figure 2 .
Figure 2. The clustering results of the tested methods on noise dataset.

Table 1 .
Performance conparison of all compared methods on the four benchmark datasets.