Personal credit assessment based on improved SVDD algorithm

Aiming at the credit evaluation problem of commercial banks, this paper proposes a new personal credit evaluation method based on the existing SVDD algorithm--the improved SVDD algorithm, and introduces the theory and derivation of the algorithm in detail, based on this, an example is given.and tested, the results show that the improved SVDD effect is indeed better than SVDD.


Support vector data description
Support Vector Data Description (SVDD) was proposed by Tax et al [4][5] . in 1999 on the basis of the theory of minimum bounding sphere (MEB) and support vector machine (SVM). The basic idea is: Find a super-autumn sphere that encloses all or most of the target sample in the feature space, and minimize the volume enclosed by the hypersphere so that the target sample is surrounded by the hypersphere as much as possible, rather than the target sample. As far as possible, it is not included in the hypersphere, thus achieving the division of categories.

Improved support vector data description derivation
Support vector data description is a special SVM, but it is different from SVM. It is a kind of supervised learning method. When doing credit evaluation modeling, sometimes it is not known whether the customer is a "good credit" customer or a "credit difference". The client, at this time unsupervised learning methods, then unsupervised learning methods are effective means of supervision. However, under normal circumstances, unsupervised learning accuracy will be lower.
Therefore, this paper presents an improved SVDD method based on SVDD to improve the accuracy of its learning. The objective function in the original problem of standard SVDD is: (1) Therefore, this paper presents an improved SVDD method based on SVDD to improve the accuracy of its learning.
The objective function in the original problem of standard SVDD is:  as a slack variable, a is a column vector,Indicates the center of the ball that is being sought. ( )   is a feature function that maps a sample from the input space to the feature space the real. 0 C  is a penalty parameter, The bigger the C , the greater the penalty for misclassification.By setting the parameter C ,The compromise between the radius R of the hypersphere and the number of training samples it can contain is made. When the C value is large, try to put the sample into the goal; when the C value is small, try to compress the size of the ball. It is worth noting that the constraint 0 i   is not needed in the improvement of SVDD, because it is assumed that there is 0 i   for some of the za8 target values, and * 0 i   is satisfied, satisfying the condition: Make the target value smaller, which contradicts i  as the optimal solution.
Introducing the Lagrange multiplier i  , the Lagrange function corresponding to the problem (2)(3) is: In equation (5), L is deduced relative to R, a, i  and is equal to zero.
Bringing the equations (6)-(8) into the Lagrange function (5), the dual problem of the optimization problem is obtained: (10) It can be seen from the derivation process and the dual problem that the C value does not directly affect the constraint of the dual problem as the standard SVDD, but also directly affects the quadratic term of its objective function. In addition, the improved SVDD is a form of the standard SVDD at the quadratic loss function, which is constructed by reducing the risk from the primary of the deviation to the quadratic.
According to the optimization theory, the KKT condition for the improved SVDD problem is: According to the difference of i  , these conditions can be further refined, that is, divided into the following cases: The decision function is as follows : among them, R is the distance from any support vector to the center of the sphere on the hypersphere.

When
( ) 0 f x  , the corresponding point is judged to be a normal point, otherwise it is judged to be an abnormal point.

Algorithm
From the above derivation, the improved SVDD algorithm can be derived as follows: Step 1: Enter data set , and standardize the data using the Z-scoreg method; Step 2: Select the appropriate kernel function ( , ) i j K x x (eg, Gaussian kernel), and use cross-validation to select parameters C and g; Step 3:: construct and solve the optimization problem (9), and find the optimal solution * Step 5: constructing the decision function Step6: Classify the sample and output the sample category.

Data simulation experiment
The improved SVDD algorithm is used to address the corresponding issues in personal credit assessment. We consider the customer sample point of 'good credit' as a normal point, and consider the customer sample point of 'credit difference' as an abnormal point to establish a credit evaluation of the SVDD model of unsupervised learning. In order to compare performance, the creditworthiness model of SVDD and improved SVDD was established using comparative authoritative German credit data to verify the effectiveness of the improved SVDD. The experiment is mainly carried out by programming with MATALBR2012a. The kernel function ( , ) i k x x of all models adopts Gaussian radial basis kernel function.

Determination of parameters C and g
In the test, all the parameters are 5-fold cross-validation [6] grid optimization method, the penalty parameter C and the kernel parameter g are in the search range   respectively, the initial value C=1, g=0.1, 5 -The folded cross-validation grid optimization method selects the optimal (C, g) pair value, and the optimal parameter in the German credit data [7] evaluation model is C=10.5561, g=6.6029 (as shown in Figure 1). In order to better reflect the relationship between (C, g) pairs and accuracy, Figure 2 gives its corresponding 3D view. Once the parameters are determined, the data can be substituted for credit evaluation analysis.

Test results and analysis
After the parameters are determined, the test is conducted. The test results are shown in Table 1:  Table 1 shows the test results of German credit data. The test results of the improved SVDD model are: the accuracy of the training set is 66.72% and the test set is 68.77%, which is obviously better than the SVDD test result, but lower than the SVM test result.
Experimental results show that the improved SVDD effect is indeed better than SVDD, but the improved SVDD, like SVDD, is an unsupervised learning method with lower performance than the standard SVM.

Conclusion
Based on the standard SVDD, this paper proposes a new algorithm: an improved SVDD algorithm, using credit evaluation data for data experiments, comparing with the performance of SVDD and standard SVM, and obtaining improved SVDD effect. It is indeed better than SVDD