The Optimization of Multi-classifier Ensemble Method Based on Dynamic Weighted Voting

Generally, on the same data set, different deep learning classification models will achieve different performances. The existing weighted voting method can combine the results of models, which can improve the performance of classification. On the other side, its classification accuracy is affected by the accuracy of all models. In this paper, we proposed a dynamic weighted voting method. Our method dynamically selects models on different data sets, and integrates them according to their weights, thereby improving the classification accuracy. We evaluated the methods on three data sets of CIFAR10, CIFAR100 and Existing, which increased the accuracy about 0.65%, 0.91%, and 0.78% respectively compared with the existing weighted voting method.


Introduction
Generally, different deep learning classification models have strong or weak ability to obtain the features on the same data set. At the same time, the prediction results of these classifiers are complementary in some samples. By integrating multiple classifiers, the overall classification performance can be improved. The existing weighted voting method is based on the output information of each classifier, which combines the results of models based on the weight of these classifiers. On the other side, some classifiers with poor classification performance may reduce the accuracy. To solve this problem, this paper proposed a new ensemble method based on the existing weighted voting ensemble method. This method is mainly improved the performance by the following two aspects: we firstly expand the consideration set of models, and then we dynamically select the proper number of models based on the validation set.
The structure of this paper is organized as the following: the first section introduces the background of this paper. The second section introduces related works. The third one introduces our method. The fourth section describes and analyses the experimental results. Finally, the fifth section summarizes this paper and introduces the future work.

Related Works
In recent years, the research of computer becomes hot with the utilization of deep learning [1][2][3][4]. Image classification has also attracted a large number of researchers, which is widely used in various fields, such as medical image segmentation [5], face recognition [6] and remote sensing image classification [7]. Lecun et al proposed the lenet model [8] and achieved good results in handwritten numeral recognition in minist data set. This model takes the picture as the input, extracts the feature 2 information in the image through the stacked convolution layer and pooling layer as the feature extractor, and finally classifies and outputs the image through the full connection layer and activation function. Although CNN (convolutional neural network) [9] models continuously improves the classification accuracy, the ability of a single model is still limited. The accuracy depends on many factors like the size of figures or the gradient vanishing problem [10]. Ensemble methods [11] try combine multiple models with certain strategies to improve the overall classification performance.
Voting methods [12][13] are to combine classifiers by using the idea of ensemble learning. Different from the simple voting method, the weighted voting method [14] gives different weights to different classifiers, and finally obtains the results. Weighted voting method is widely used in various aspects. For example, Mu X et al. [15] propose the application of weighted voting methods to face recognition and speech recognition, and Sammar Moustafa et al. [16] use weighted voting to predict software bug. The weight allocation method has a certain impact on the overall performance. How to accurately allocate large weight to the base classifier with large contribution is also a key problem. Alican Dogan [17] proposed a new weight allocation method from the perspective of weight allocation. Different from the existing literature, this method only provides the reward mechanism. In the validation set, the weight of the base classifier with correct instance classification is increased by the percentage of other base classifiers with incorrect instance classification. However, the disadvantage of this method is that it combines all the base classifiers. Although it makes use of the complementary information between the base classifiers to a certain extent, integrating all the base classifiers is greatly affected by the accuracy of each classifier. Even if the weight of the base classifier with poor performance is small, it still has a certain impact on the overall classification performance. Therefore, how to select the most complementary base classifiers among the existing base classifiers for integration is important. Erdal Tasci [18] et al. Proposed to generate a series of variant models based on different image enhancement and pre-processing. Among all of the models, this method selects some of the best models for weighted voting integration. This method has certain advantages over the latest results in the detection of tuberculosis.
In this paper, we try to dynamically select the trained models. Compared with the existing method that only applies the voting method to the results of models, we try to combine the outputs of these models that are the probability of labels to achieve higher accuracy. Furthermore, the number of models for the combination is also dynamically selected based on the validation set, which is to reduce the effect of the low accurate models.

Materials and Method
The framework of our method is shown in figure 1. Let the sample set s be    At the final step, the label with the highest confidence is taken as the final decision of this method. In addition, we also consider selecting a certain number of best classifiers for integration. The number of optimal base classifiers n is obtained by the validation set.

Data Processing
In order to prove the effectiveness of the proposed method, three public data sets cifar10, cifar100 and Existing are selected and compared with the existing weighted voting method. The three data sets are described in table 1. The size of the training set is shown in table 1. The cifar10 dataset has 60000 instances in total and 10 classes, with 6000 instances in each class. 5000 instances are randomly divided from each class, and a total of 50000 instances are used as the training set to train the base classifier. The remaining 10000 instances are used as validation sets and test sets. cifar100 is similar to cifar10, except that cifar100 has 100 categories and 600 instances of each category. Each class is divided into 500 instances as the training set and the rest as the validation set and test set. The Existing dataset has 100 classes, 600 instances in each class, 480 instances in each class as the training set, and the rest as the validation set and test set. The images are normalized to facilitate better input into the network for training. In the training process, the data enhancement method is adopted for the pictures, and the data set is further expanded by randomly rotating the pictures horizontally to prevent over fitting and enhance the robustness of the network.

Experimental Process
In this paper, six convolutional neural networks, vovnet [19], vgg [20], res2net [21], repvgg [22], resnet [23] and densenet [24], are used as the base classifiers. Firstly, the picture is input as the size of 256 * 256. Models extract the features of the picture through a series of convolution layers and pooling layers. Finally, the picture is classified through the full connection layer, and the probability vector is output, that is, the probability of the picture corresponding to each label. We also use the pre training model based on Imagenet training. The advantage of this is to use the pretraining model trained on a large data set to load some general features, and train on this basis, which can greatly save the training cost. The mini batch gradient descent method is adopted, that is, the training samples are divided into multiple batches of the same size for training, which not only updates the weight, but also improves the convergence speed. At the same time, in order to better process the data, we deal with the probability distribution in the form of matrix.
The experimental process is shown in figure 2. By running the trained base classifier on the validation set, the weight of the base classifier, the number n of the classifiers and the ordered set C of the classifier can be obtained. After these parameters are determined, the base classifier is run on the test set to output the probability distribution of each classifier to the test samples. We divide the validation set and the test set according to different proportions, which further shows that when the validation set is large enough, the performance of our method on the validation set and the test set can be closer.  Table 2 shows the experimental results on the cifar10 dataset. We select the average accuracy of the classifier (that is, the accuracy of each base classifier is added and then averaged) and the existing weighted classification accuracy as a comparison. When dividing the validation set and the test set in the ratio of 1:9, we select different numbers of the best base classifiers to integrate on the validation set, and determine that the number n of the best base classifiers is 3. Therefore, selecting the first three base classifiers to integrate on the test set, we can see that the accuracy of this method is higher than the average accuracy Ave_ ACC is improved by 12.63%, which is 0.65% higher than the existing weighting accuracy. By dividing the validation set and the test set in different proportions, it can be observed from table 2 that the number n of the best base classifiers selected on the validation set is relatively stable, and the number n of the best base classifiers selected on the validation set and the number n determined on the test set.  Table 3 shows the experimental results on the cifar100 dataset. The accuracy of this method is better than the average accuracy Ave_ ACC is improved by 17.2%, which is 0.91% higher than the existing weighting accuracy. When dividing the test set and the validation set in the ratio of 1:9, the selection number n of the best base classifier is determined as 5 on the validation set and 6 on the test set. Our analysis is affected by the scale of the validation set. With the expansion of the scale of the validation set, that is, through the adjustment of the division proportion, the n determined on the validation set will tend to be consistent with the n determined on the test set.  Table 4 shows the experimental results on the exising dataset. The accuracy of this method is better than the average accuracy Ave_ ACC is improved by 22.13%, which is 0.78% higher than the existing weighting accuracy.

Conclusion and Future Work
In this paper, we proposed a dynamic voting method on the output of models to achieve higher performance. Compared with the existing methods, our one can dynamically select the number of models, which can better fit the dataset. Furthermore, our weight system is based on the probabilities of labels on a sample, which can achieve higher performance than just utilizing the results of models on a sample. In the future work, we will try to apply our method with the combination of other information. For example, the distribution of labels in an environment [25] can be used to further improve the accuracy of the classification.