LDA Extension via Oblique Projection

The classical linear discriminant analysis (LDA) was previously modified by orthogonal projection into null space LDA (N_LDA) and direct LDA (D_LDA) for solving small sample size (SSS) problem. In this paper, the author proposes an extension of LDA by oblique projection, wherein N_LDA and D_LDA are included as special cases, to reduce discriminative information loss resulted from single N_LDA or D_LDA. The effectiveness of the proposed algorithm is tested by image forensics and face recognition.


Introduction
Linear discriminant analysis (LDA) is one of the most popular means in pattern recognition [1]. The objective of LDA is to find a projection W that maximizes the ratio of the between-class scatter matrix, S b , against the within-class scatter matrix, S w , i.e., via the following Fisher criterion: One of the major drawbacks of LDA is the so-called small sample size (SSS) problem encountered in many practical applications involving high dimensional data [1]. Many techniques have been proposed to solve this problem, and Ref. [2] may be consulted in detail. Among them, the most notable approaches are null space LDA (N_LDA) and direct LDA (D_LDA) [2,3], which are based on the following modified Fisher criterion [4]: Since N_LDA and D_LDA are based on orthogonal projection, and generally the two conditions in equation (2) cannot be simultaneously satisfied, the processes are usually fulfilled in two steps: in N_LDA the null space of S w is found first and then that of S b is discarded, while in D_LDA the reverse order is taken. In this manner, much discriminated information may be lost.
In this paper, an extension of LDA is proposed by oblique projection, and it is called OP_LDA, for reduction of possible loss of discriminative information in N_LDA or D_LDA. Experimental results on image forensics and face recognition show satisfactory performance of the proposed method.
The rest of this paper is organized as follows. An overview of N_LDA and D_LDA is briefly introduced in Section 2. Section 3 describes OP_LDA and discusses the relationship among N_LDA, D_LDA and OP_LDA. In Section 4, we evaluate the performance of OP_LDA and compare its performance with LDA, N_LDA, D_LDA, and 2-dimensional LDA (2D_LDA) [5]. Finally, conclusions are given in Section 5.

N_LDA
In N_LNA, the discriminative information can be found in the null space of Sw by two steps [2]: Step 1: Find the null space of S w , Null(S w ).
Step 2: Let S b ′=P Null (Sw) (S b ), and then , where P Null(Sw) (S b ) is a matrix obtained by orthogonal projecting the column vectors of S b onto Null(S w ), Col(S b ′) is the column space of S b ′, P Col(Sb′) is an orthogonal projection operator onto Col(S b ′).

D_LDA
In D_LNA, the discriminative information can be found in the column space of S b by two steps [3]: Step 1: Find the column space of S b .
Step 2: Let S w ′=P Col(Sb) (S w ), and then , ) where P Col(Sb) (S w ) is a matrix obtained by orthogonal projecting the column vectors of S w onto Since rank(S b ), the rank of matrix S b , is usually smaller than rank(S w ), and in this case Col(S w ′)=Col(S b ) and Null| Col(Sb) (S w ′)={0}, the solution of D_LDA is given by

OP_LDA
We present a new LDA algorithm based on oblique projection, named OP_LDA. The objective of OP_LDA is to find a projection operator W that satisfies equation (6), which also is special circumstance of equation (2).
The key idea of our algorithm is that W is an oblique projection operator, whereas the traditional methods take it as orthogonal projection. A particular solution of equation (6) is given by where P Col(Sb)|Col(Sw) is an oblique projection operator onto Col(S b ) along Col(S w ), and W 3 can be obtained from Col(S w ) and Col(S b ), Ref. [6] may be consulted in detail.

Discussion
As shown in Figure 1, Figure 2 and Figure 3, there are three means to obtain v′: a. discarding the component of Col(S w ), b. keeping that of Col(S b ), and c. discarding that of Col(S w ) and keeping that of Col(S b ) simultaneously. So it is clear that OP_LDA including N_LDA and D_LDA as special cases when Col(S w ) and Col(S b ) are orthogonal. The traditional methods emphasize S w or S b respectively, and our method emphasizes S w and S b simultaneously. For discarding Col(S w ) in N_LDA, S b is converted into S b ′ by projecting onto Null(S w ). Since there are differences between S b and S b ′, discriminated information may be lost by adopting S b ′. The similar drawback may appear in D_LDA. Therefore discriminative information loss in N_LDA or D_LDA may be reduced by adopting S w and S b directly in OP_LDA.

Analysis
The following condition needs to be satisfied in using P Col(Sb)|Col(Sw) : the direct sum of Col(S b ) and Col(S w ) must be equal to the input space [6]. Since the direct sum is the sum of two disjoint spaces, it includes two meanings: the sum of Col(S b ) and Col(S w ) must be equal to the input space, and Col(S b ) and Col(S w ) must be disjoint, i.e., Col(S w )∩Col(S b )={0}.
Since the sum of Col(S b ) and Col(S w ) is a subspace of the input space, P Col(Sb)|Col(Sw) is invalid for vectors which do not belong to the sum space. Therefore we project vectors in this case onto Col(S t ) first, where S t denotes the total scatter matrix. Since Null(S w )∩Null(S b )=Null(S t ) has been proven in Ref. [7], and S w , S b and S t are symmetric matrixes, it is clear that Col(S t ) is the sum of Col(S b ) and Col(S w ).
If Col(S w )∩Col(S b )≠{0}, P Col(Sb)|Col(Sw) does not exist. This is a more complicated problem. In this paper we consider a simple scheme as follow: where P Col(St) is a orthogonal projection operator onto Col(S t ), P Col(Sw) Null(S b ) is a space obtained by orthogonal projecting the vectors of Null(S b ) onto Col(S w ), and P Col(Sb)|PCol(Sw)Null(Sb) is an oblique projection operator onto Col(S b ) along P Col(Sw) Null(S b ). Figure 4 depicts a simple example of our method, where v is an arbitrary vector in the input space, and v′ is a new vector obtained from v by equation (8). It is clear that we want v′ to lie in Col(S b ) and discard the partial components of Col(S w ).

Experimental setup
We illustrate the efficacy of the proposed method on image forensics and face recognition. As application-based development, some parameters and schemes have been empirically determined in our implementation.
For finding the null space and the column space of an arbitrary symmetric matrix, M, we perform first the singular value decomposition of M as M=U∑V T , and then let Col(M)=span{u 1 ,…,u r } and Null(M)=span{u r+1 ,…,u n }, where U=[u 1 ,…,u r ,u r+1 ,…,u n ], u 1 ,…,u r correspond to the nonzero eigenvalues, and u r+1 ,…,u n correspond to the zero eigenvalues [8].
In order to conform whether Col(S b ) and Col(S w ) are disjoint, we consider a simple scheme via the following dimensional formula [9]: if rank To evaluate the performance of the proposed method, the training samples are randomly selected about half of the image dataset to train the classifier, and the remaining images are used in testing.

Image forensics
OP_LDA is tested by blurring detection in image forensics. The dataset and feature extraction are same as Ref. [10], wherein details can be found. The dataset consists of 183 authentic images and 183 forged images, and three types of feature vectors respectively consisted of the singular values of gray image matrix, correlation coefficients for double blurring operation, and image quality metrics are extracted and then fused for forgery detection.
In comparison, Table 1 lists the detection rates of the following four different detection schemes: feature fusion obtained respectively by LDA, N_LDA, D_LDA, and OP_LDA, respectively plus Euclidean distance (ED) classifier. In this table, "--" means that LDA is ineffective, due to the SSS problem. Since image forensics is in fact a binary classification problem, and in this case Null(S w ) is lost in D_LDA, D_LDA performs poor. While Col(S b ) is kept in N_LDA , so N_LDA obtained the same result as OP_LDA in this experiment.

Face recognition
OP_LDA is tested by face recognition using three face datasets: YALE, ORL, and PIE, details can be found in Ref. [5]. The YALE dataset contains 165 face images of 15 persons. The image size is 100×100. We subsample the images down to a size of 25×25. The ORL dataset contains 400 face images of 40 persons. The image size is 92×112. We subsample the images down to a size of 23×28. The PIE dataset is a subset of the CMU_PIE face image dataset. It contains 6615 face images of 63 persons. The image size is 486×640. We subsample the images down to a size of 98×128. Table 2 lists the recognition rates of the following five different schemes: feature extraction obtained respectively by LDA, N_LDA, D_LDA, OP_LDA, and 2D_LDA, respectively plus ED classifier. The results demonstrate that our proposed method can achieve comparable performance to 2D_LDA in [5]. Compared to the prior arts in [2,3], OP_LDA performs better than N_LDA and D_LDA in using YALE and ORL face datasets. Due to the SSS problem, LDA is ineffective in these experiments.

Experimental results
An extension of LDA by oblique projection for solving SSS problem is proposed in this paper. Instead of using orthogonal projection, oblique projection is adopted to reduce possible discriminative information loss in N_LDA or D_LDA, two popular modified versions of the traditional LDA. Experimental results on image forensics and face recognition showed satisfactory performance of the proposed method. OP_LDA may be converted into a nonlinear version by kernel method, which is one of our future tasks.