Differential search algorithm for biojective feature selection

Feature selection (FS) is a complex optimization problem with important real-world applications. Generally, the main target of FS is to reduce the dimensionality of features and enhance the effectiveness of solving problems. Due to the population characteristics, various evolutionary algorithms (EAs) have been proposed to solve feature selection problems over the past decades. However, the majority of them only consider single-objective optimization, while many real-world problems have multiple objectives, therefore, it is need to design more suitable and effective EAs to deal with multi-objective FS. In this paper, a biobjective FS algorithm based on differential search algorithm (DSA) is designed to solve FS problems, which minimizes the number of selected features and maximizes the classification accuracy. The results of simulation experiments and statistical analysis on 15 classification datasets compared with other four state-of-the-art EAs show that the proposed DSA algorithm can not only obtain better optimization performance, but also achieve competitive convergence accuracy.


Introduction
In recent years, with the development of information technologies that include machine learning and artificial intelligence huge amounts of data have been produced and collected [1]. However, in high dimensional data, there may be redundant, irrelevant and noisy features. Such features lead to adverse effects in the learning process and classification effectiveness by overextending the search space [2]. As the number of features increases, the classification accuracy decreases sharply and the time of training increases quickly [3]. By eliminating redundant or irrelevant features can effectively solve the curse of dimensionality. A feature selection (FS) algorithm can effectively reduce the dimensionality of data, shorten the training time and enhance the classification performance [4][5]. A multi-objective FS algorithm often contains two main objectives: 1) enhancing the classification precision and 2) reducing the number of features, which are potentially conflicting with each other [6]. Over the past decades, many meta-heuristics, including the evolutionary algorithms (EAs), have been proposed to solve FS problems. Compared with traditional methods, EAs need no domain knowledge and assumption about search space. Moreover, their population-based search strategy can generate a set of nondominated solutions with the trade-off between different conflicting objectives in a single run, which is particularly suitable for multi-objective optimization problems.
In this paper, a biobjective feature selection algorithm based on differential search algorithm (DSA) is designed to solve feature selection problem, which minimizes the feature number and maximizes the classification accuracy.

The proposed DSA for FS
In this paper, a binary differential search algorithm is proposed for the biobjective feature selection task.

Differential search algorithm
Differential search algorithm (DSA) is a recently proposed derivative-free global heuristic algorithm for solving unconstrained optimization problems [7][8]. The DSA simulates the migration of a species of living beings.
In DSA, N represents the number of members in a superorganism, D denotes the dimension of the solved problem, and MaxT is the number of maximum iterations. Each artificial organisms ( , 1,2, , r is a uniformly distributed random number in   0 1 ， . After initialization, the mechanism of DSA for finding a stopover site i S in the areas among the artificial organisms can be described by the Brownian-like random walk model. A randomly selected individual from the artificial organisms is used to generate a target vector: where the

 
RandOrder  function randomly exchanges the order of the artificial organism members in the current population. The magnitude of changes in the positions of each individual is controlled by the Scale value, shown as Eq.(3).

 
A greedy selection operation is used to select the next stop point, shown as Eq.(5).
where t is the current iteration,   f  represents the objective function.

Binarization of features
To establish a binary DSA algorithm for solving feature selection problems, the individuals (solutions) are restricted to the binary  0,1 values, 1 represents a feature will be selected, while 0 denotes a feature will not be selected. Therefore, the lower boundary and upper boundary of each feature should be set to 0 and 1, respectively, i.e., 0, 1  . During population initialization, each feature is initialized by Eq.(1) and will be restricted to 0 or 1 as Eq.(6).
At the same time, the position of each stopover site ( i S ) will be updated by Eq. (7). As for those features that are out of bounds after updating, the reinitialization mechanism will be used to restricted them to 0 or 1.

Methodology and fitness function
Suppose that DataSet is a dataset containing K instances with D features, a feature selection problem can be described as follows: to select f L features ( f L D  ) from all the D features so that some objectives, such as classification accuracy or classification error, are optimized. Because the number of the selected features determines the computational cost of a classification algorithm, it is also a key objective function [9]. Therefore, this paper will consider the following two objectives: maximizing the classification accuracy ( max ACC ) and minimizing the number of the selected features ( f L ). In order to unify the two objectives into maximization and create a single objective fitness function which combines the two objectives into one, the following fitness function is modelled by Eq. (8).
In this paper, the whole dataset was randomly divided into training set and test set by 5-fold crossvalidation method. The k -nearest neighbor ( k -NN) classifier [10] is adopted, and 3 k  . In Eq. (8), ACC is the classification accuracy calculated by dividing the number of correctly classified instances by the total number of instances. max ACC represents the maximum ACC among the 5-fold crossvalidation.
denotes the weight factor for the number of selected features. In this paper, f w will be set to 0.8.

Comparison algorithms and related parameter settings
To evaluate the performance of the proposed DSA for solving FS problems, 5-fold cross-validation is employed. Based on 5-fold cross-validation, 80% of training partition is used for training, while 20% is used for the fitness function validation [11]. The classification accuracy of the k -NN classifier with 3 k  and the number of selected features are used to establish the fitness function. The performance is compared with four classical FS algorithms including particle swarm optimization (PSO) [12], cuckoo search [13], butterfly optimization algorithm (BOA) [14], pathfinder algorithm (PFA) [15]. These comparison algorithms, as detailed in Table 1. The related parameter settings for all comparison algorithms are also listed in Table 1, which are based on the suggestions in the corresponding references.
For the sake of fairness, each algorithm is independently run 30 times, the population size 30 N  , and the terminal condition of all algorithms is the maximum function evaluations ( MaxFEs ) which is set to 3000. which has been employed extensively in numerous model-and algorithm-evaluation studies. The detailed information is listed in Table 2. As Table 2 shows, the features of all the datasets ranges from 5 to 60, and the datasets contain the binary and multi-class classification datasets.

Experimental results and analysis
To evaluate the performance of all algorithms for solving FS problems, six quantitative indicators will be adopted, including the average (Mean), standard deviation (Std), worst value (Worst), rank of each algorithm on every dataset (Rank), total rank and final rank of each algorithm on all the datasets. The Mean records the average of 30 runs independently. The larger the Mean, the better the performance. Std is a measure of the dispersion of a set of data averages. The smaller the Std, the more stable the performance. Rank is sorted according to the Mean value, the best performance, i.e., the maximum Mean is recorded 1, the second performance is recorded 2, and so on. Total rank is the cumulative sum of Rank on all datasets. The smaller total rank, the better the overall performance in all datasets. Table 3 shows the comparison results of all algorithms on the selected datasets. According to the principle of no free lunch (NFL), there is no a algorithm can obtain the best performance on all datasets.
As Table 3 shows, the best Mean and Rank are shown in bold. Among the 15 datasets, the proposed DSA ranks first on 9 datasets, including D1, D2, D5, D8 -D12 and D15. DSA ranks second on 3 datasets, i.e., D3, D4 and D7. The rest datasets, DSA ranks third. DSA has a total rank 24 and final rank first on all the 15 datasets. On dataset D1, all algorithms have the same performance. As for final rank, BOA ranks second, PSO ranks third, PFA ranks fourth, and CS ranks fifth. In summary, DSA has competitive performance on quantitative analysis among 15 datasets.

Convergence analysis
In order to intuitively compare the convergence performance of all algorithms, convergence curves of all algorithms on different datasets are shown in Figure1. It can be seen from Figure1, DSA obtains better performance on both convergence accuracy and rate under different types of datasets.

Statistical analysis
Statistical analysis of the experimental results of the proposed algorithm is an important and meaningful work [16]. For pairwise comparison of the problem-solving success of evolutionary algorithms, a IOP Publishing doi:10.1088/1742-6596/2031/1/012064 5 problem-based or multi-problem-based statistical comparison method can be used [17]. In this paper, the Wilcoxon Signed-Rank Test (WSRT) is used to perform a pairwise statistical test on the experimental results from Table 3, with the significance level 0.05

 
. WSRT is a non-parametric test  Table 4. '+' denotes that DSA outperforms the compared algorithm,'=' represents that there is no significant statistical difference between two algorithms, and '-' indicates that DSA shows poorer performance in comparison. The last row shows the total count in the +/=/-format. As Table 4 shows, the performance of DSA is significantly better than the other compared algorithms, especially compared with CS and PFA.

Conclusion
A feature selection (FS) algorithm can effectively reduce the dimensionality of data, shorten the training time and enhance the classification performance, so that the research on feature selection algorithms is a hot topic in the field of machine learning. In this paper, a binary DSA is proposed to solving biobjective feature selection problems. Firstly, the initial population is randomly generated in binary search space. Then, the binary mechanism is also used for position update and out-of-bounds processing. Meanwhile, maximizing classification accuracy and minimizing the number of selected features are combined a unified fitness function. The proposed DSA is compared with PSO, CS, BOA and PFA using 15 UCI datasets. The experimental and statistical results show the proposed DSA outperforms other algorithms and has better convergence ability. As a future work, the proposed binary mechanism and fitness function will be employed to other swarm based evolutionary algorithms. We will also evaluate the performance of the proposed DSA on complex high dimensional datasets and real-world optimization problems.