Air traffic complexity evaluation with novel complexity features and mRMR-XGBoost

Air traffic complexity evaluation is a critical problem in air traffic system operation, especially for air traffic safety and air traffic controller deployment. Many researches focus on using mathematical modeling or machine learning methods to evaluate air traffic complexity. However, there are still challenges in accurate evaluation, which is affected by lack of effective complexity related features and complicated feature-complexity relationship. Based on the existing complexity features, this paper proposes additional domain features, time-dependent features and data distribution features to construct a novel air traffic complexity feature set. A mRMR method for feature selection is then applied to filter out redundant features. Finally, the filtered feature set and corresponding air traffic complexity level are input into XGBoost model for learning the relationship, so as to achieve high-performance evaluation of air traffic complexity in the face of new air traffic data. The experimental results show that the proposed features are beneficial to the evaluation of air traffic complexity, and the XGBoost model with mRMR method can effectively select the important features and mine the relationships within air traffic complexity data, resulting in an improvement in overall evaluation performance by at least 5% while using less than half of the original number of features.


Introduction
Nowadays, the air transport industry is developing rapidly with growing movement demand of goods and people, which results in the surging air traffic volume. For the safe and efficient management of the huge air traffic, the airspace is divided into smaller sectors to be managed separately, and each sector has a group of air traffic controllers (ATCos) responsible for directing the operation of aircrafts. However, the ATCos resource is limited and almost unable to meet the current increasingly busy air traffic. In order to keep the workload of ATCos within a safe range, we need to monitor the workload in real time and allocate ATCos resources over different sectors reasonably. To implement these, it is critical to accurately evaluate air traffic complexity.
The research on air traffic complexity evaluation is mainly divided into two major directions: mathematical modelling and machine learning. The former one characterizes air traffic complexity by constructing a mathematical model or the most relevant indicators [1], such as conflict probability, conflict resolution difficulty, Lyapunov Exponent and so on. The second approach, from the perspective of machine learning, regards air traffic complexity evaluation as a machine learning problem [2][3][4]. By considering different types of features that influence air traffic complexity as much ISTTCA 2020 IOP Conf. Series: Earth and Environmental Science 638 (2021) 012036 IOP Publishing doi:10.1088/1755-1315/638/1/012036 2 as possible and using machine learning model to learn the relationship between numerous features and traffic complexity, evaluation of air traffic complexity can be realized.
There are a large number of researches using various machine learning models to study air traffic complexity evaluation, but such researches mainly focus on the improvement of the model algorithm, and does not pay too much attention to complexity related features. On the one hand, the reason is that there are many factors influencing complexity characteristics, which are difficult to summarize and sort out; on the other hand, it is also because air traffic complexity is an abstract concept, and it is difficult to construct new effective features to describe air traffic complexity.
It is well known that the ceiling of machine learning model performance depends on the quantity and quality of input features. In order to further improve the performance of air traffic complexity evaluation, this paper summarizes and organizes the existing commonly used complexity features, while supplementing additional domain related complexity features. At the same time, it also proposes time-dependent features and data distribution features with the aim to comprehensively describe the information of air traffic complexity. To reduce the impact of redundant features, we introduce the mRMR method to screen the important features, and then uses an ensemble learning model (XGBoost) to mine the information between the screened features and the complexity labels to build a highperformance air traffic complexity evaluation model.

Air traffic complexity features analysis and novel features proposal
Air traffic complexity is abstract, which is difficult to be described and quantified in detail. In order to evaluate air traffic complexity, we need to describe and express it through relevant influencing factors. In this section, we will analyses the existing common complexity features, and then supplement other domain-related features. Meanwhile, time-dependent features and feature data distribution features will be proposed to provide a complete description of air traffic complexity.

Prevailing air traffic complexity features
Many scholars have explored and studied various features that can influence the level of air traffic complexity. At present, the commonly used air traffic complexity feature set is proposed by Gianazza and Gutted [5], which includes aircraft number information in different flight states or at different time periods (N, N 2 , N ds , N cl , F 5 , F 15 , F 30 , F 60 , Dens), relevant parameters of aircraft movement rate ( , , ), convergence and dispersion situation of aircraft groups( , , , , , , ), difficulty degree of flight conflict resolution and conflict probability( ), etc. The above feature set consists of 28 features in total and has been consistently found to be important in reflecting air traffic complexity [2][3][4]. For a more thorough review of the above features, the cited source literatures can be referred.

Additional domain knowledge features
Apart from the complexity related factors considered in Section 2.1, there are other factors that can be taken into account in air traffic complexity evaluation. Since the change of aircraft motion state may cause the change of the entire air traffic situation, it is necessary to calculate the relevant parameters of the aircraft motion state change. Therefore, this paper adds the number of level flight aircraft, the number of aircraft whose heading/speed/altitude is in a state of change. In addition, the altitude, speed and corresponding control strategy of aircraft will be different with the different types of aircraft. In order to reflect the influence of the aircraft types, we use aircraft type mixing ratio as a complexity related feature. At the same time, the radiotelephony communication of the controller can reflect the workload of air traffic controllers, which is closely related to air traffic complexity, so the communication time and communication frequency characteristics are designed. In addition, we also add two features that could reflect vertically approaching trends according to the reference 6 [6].

Time-dependent features
Air traffic complexity is an embodiment of flight situation, which changes gradually and slowly in the huge airspace sector, so it has the characteristics of continuity, and air traffic complexity in adjacent periods has a certain correlation. Specifically, the complexity of the previous period may affect the complexity of the next period, and the two will have similar characteristics. Therefore, time-dependent features can be designed to capture the correlation of time series. In this paper, two kind of timerelated features are proposed, namely time information features (each complexity sample corresponds to the day of the week, the hour of the day, the minute of the hour) and the flight status in adjacent time periods (the number of aircraft in the heading/altitude/speed change state within 3 minutes before and after a certain sample, and the total time and frequency of radiotelephony communication within 3 minutes before and after).

Data distribution features
Although domain knowledge features and time-dependent features can provide actual business information, the amount of information may still have the potential to increase. In many practical machine learning tasks, exploratory analysis of data will be carried out based on the characteristics of actual data, which provides a basis for feature engineering. One of them is to extract new features according to the regular characteristics of data distribution. These features may have no practical meaning, but because of their certain data distribution rules and characteristics, they may play a certain role in improving the learning and training of the model. Therefore, this paper puts forward a benchmark distance feature based on the distribution of actual air traffic complexity feature data. Specifically, the first step is to draw a data distribution map between two features, and then distinguish different types of air traffic complexity samples with different colors, so as to observe the relationship between the correlation distribution of different features and the complexity category.
The following two figures enumerate the feature data distribution relationship. It can be found that there is an obvious location difference between samples with high complexity and samples with lower complexity, which can be distinguished from spatial distribution. Therefore, this paper sets the midpoint of the most complex sample group as a benchmark, and the distance from the corresponding features of other samples to the benchmark is defined as the data distribution features. For our actual data, a total of 10 data distribution with these obvious characteristics are screened, and corresponding 10 data distribution features are constructed.

XGBoost classification model with mRMR method
A comprehensive feature set plays a great role in the task of machine learning, so we constructed a meticulous feature set from different perspectives in the previous section to describe air traffic complexity. But this does not mean that the more features the better. Redundant and useless features often occur and mislead the learning of the model, thereby affecting the predictive performance. Therefore, on the one hand, we need to carefully review these features before input to the machine learning model to ensure the effectiveness; on the other hand, we need to use a robust and powerful model to mine the relationship within data to ensure excellence learning performance. So, in this section, we will introduce the mRMR feature selection method and XGBoost classification model used in this paper.

mRMR feature selection method
In view of information theory, the purpose of feature selection is to find a feature set S with m features , which jointly have the largest dependency on the target class c. This scheme, called Max-Dependency, can be represented as . But this criterion is hard to implement and time-consuming, an alternative is to select features based on maximal relevance criterion, which is called Max-Relevance. Max-Relevance is to search feature satisfying , which approximates Max-Dependency with the mean value of all mutual information values between every feature and class c. According to Max-Relevance, it is likely that features selected could have rich redundancy. When two features highly depend on each other, the respective class-discriminative power would not change much if one of them were removed. Therefore, Min-Redundancy criterion can be added to select mutually exclusive features, which is .
The criterion combining the above two constraints is called "minimal-redundancy-maximalrelevance (mRMR)" [7]. Combined with the above two criterions, the mRMR criterion can be expressed as . In practice, incremental search methods can be used to find the near-optimal features set. Suppose we already have , the feature set with m-1 features. The task is to select the m-th feature from the set . The respective incremental algorithm optimizes the following condition: (1)

XGBoost classification model
XGBoost is short for "extreme gradient boosting", which is designed to be a scalable machine learning system for tree boosting [8]. The parallel tree boosting and regularization strategy enable it to run in a much faster way and achieve state-of-the-art results in many machine learning problems. As an ensemble method, the basic idea of XGBoost is to combine several weak models into a strong one, which can be presented as: (2) where is a weak model and is the number of weak models? As a tree boost, the core of XGBoost is the Newton boosting, which searches the optimal parameters by driving the objective function as formula (8)  where is the loss function and is the regularized term? They measure the performance and control the complexity of the model.
The ensemble model works better in an addictive manner. is added to improve the model and the new objective function is formed as: (5) where is the prediction of the -th sample and is the weaker model at the -th iteration. Several improvements are used in XGBoost to promote the classification performance, such as overfitting prevention, computation enhancement and so on, so we adopt it as our air traffic complexity learning model to mine information.

Dataset preparation and evaluation metrics
In this section, to verify the effectiveness of our proposed air traffic complexity features and XGBoost classification model with mRMR method, several experiments are executed on the real air traffic operation data of a Guangzhou en-route airspace in China during December 1 st to December 15 th , 2019. After data pre-processing, checking and screening, we collected a total of 3605 valid samples, each of which corresponds to a one-minute air traffic scenario of the en-route airspace, consist of various complexity factors (50 features) as described in Section 2 and a corresponding complexity level (5 levels) obtained from air traffic management experts.
To evaluate the performance of different experiment, overall accuracy (Acc), mean absolute error (MAE) and F1-score are selected as the evaluation metrics. Their definition are as follows:

Verifying the effectiveness of novel air traffic complexity features and XGBoost
As mentioned in Section 2, in addition to the prevailing complexity features (PF) used in existing research, this paper proposes different types of features, namely: additional domain features (ADF), time-dependent features (TDF), and data distribution features (DDF). In this part of the experiment, we carried out an ablation study, combining different types of features to study the effectiveness of our proposed features. According to the existing 4 types of features, we constructed a total of 8 complexity feature sets (FS_1: PF; FS_2: PF + ADF; FS_3: PF + TDF; FS_4: PF + DDF; FS_5: PF + ADF + TDF; FS_6: PF + ADF + DDF; FS_7: PF + TDF + DDF; FS_8: PF + ADF + TDF + DDF), in which FS is the abbreviation of feature set, and the number represents the serial number .
We divided the complete data set into training set and test set in a 4:1 ratio. In order to ensure the balance between samples of different categories, a stratified sampling method is adopted. The experimental results for evaluating the test set are shown in Table 1. Among each column in the table, the top two best results are highlighted in boldface, and the column of sum statistics the times that the corresponding features set obtains the top two best results, which reflects the merits and demerits of the feature set. Among each row, the best result is indicated with an underline. We have following observations from the results above:  From the perspective of the feature sets, that is, by observing the performance of different feature sets on the classifiers, we can find that some feature sets have poor performance in evaluating air traffic complexity. Specifically, FS_\2\4, compared with other feature sets, does not get the top two results on any of the classifiers. However, some feature sets have better evaluation results, such as FS_3\5\7. According to the statistical data of the sum column, it can be seen that the value of FS_3 is the largest, which means that the top two results have been obtained by it on more than 6 classifiers. FS_5\7 follows, and achieves the top two results on 3-6 classifiers. Considering that FS_1 is the basic feature set, the poor performance of FS_4 reflects that the addition of data distribution features may not significantly improve the performance of the basic feature set; while FS_3\5\7\8 were found to contain time-dependent features, indicating that time-dependent information plays a vital role in evaluating air traffic complexity.  From the perspective of the classifiers, that is, to observe the performance of different classifiers on the feature sets. It is found that XGBoost has achieved excellent results. No matter which feature set, it has achieved the best evaluation results compared with other classifiers. The optimal evaluation performance (Acc:79.94%, MAE:0.2064) is achieved under FS_8, which shows that XGBoost own strong feature adaptability and the learning ability. KNN, RF, SVM, LP, GBDT came to the second to the evaluation performance, while LR and DT were the worst.  It is worth noting that FS_8 acquires the optimal results of all experiments on XGBoost but does not show excellent performance on other classifiers, in which FS_8 only achieves top two results on one classifier. This shows some features may be useless for some classifiers, which means that more features are not necessarily better. Key and important features have a great impact on the performance of the model. Redundant and inefficient features cannot or even are not conducive to the improvement of the evaluation performance, so it is necessary to carry out appropriate feature selection.

Performance study on XGBoost model with mRMR method
In order to improve the performance of the model, it is necessary to provide an effective feature set for the model. At the same time, feature selection can also greatly reduce the feature dimension, speed up model training and calculation, and can also provide assistance for the analysis of important features. The mRMR method used in this paper is a feature selection method based on information theory, and shows excellent results on a large number of machine learning tasks.
To compare the effects of different feature selection methods, this paper selects another four feature selection methods (ReliefF, Backward, Gini index and CMIM) and random feature selection strategies, using XGBoost as the basic classifier to study their performance in evaluating air traffic complexity. The experimental results are shown in the Table 2, showing the optimal accuracy and corresponding number of features under different feature selection methods. In addition, we set a suboptimal threshold for different metrics to further observe the changes in the number of features. From the results in the table, it can be easily found that mRMR method has achieved the best results under all performance metrics, and is ahead of other feature selection methods, and the corresponding number of features is only 21. This means that only using less than half of the original number of features, we can achieve a better performance than the full feature set. For other feature selection methods, the use of feature selection methods does have some effects. While reducing the number of features used, they can still maintain the similar evaluation performance as the original full feature set. This shows that feature selection for air traffic complexity features is effective and has a prospect in practical engineering applications.
In order to further understand the intermediate process of the feature selection method, we provide the performance curves of different feature selection methods when the number of features changes. The left picture shows the change trend of Acc, the right picture shows the change trend of MAE, and the horizontal axis of the figure represents the number of features. As can be seen from Figure 2, in the overall trend, the evaluation performance of the model generally increases with the increase of the number of features, because adding more features is more likely to provide more information to the model. The mRMR method reaches the optimum when the number of selected features is 21, and then begins to decline slowly, indicating that the first 21 features provide extremely useful information, and the subsequent features are not enough to provide more information to help improve the performance of the learning model.
The Gini method is the fastest in terms of performance improvement, achieving accuracy of more than 75% in less than 10 features. The CMIM method is superior to the random method in the first 35- Although the features selected in the later period continue to be beneficial to the model, they are inferior to the features selected by random method. The Backward method and ReliefF method are just the opposite. Before the number of feature selections is 17 and 27, the selected feature set is not as good as the random feature selection method, but it is reversed and maintained at a high level in the later stage.

Conclusion
In this paper, we propose domain knowledge features, time-dependent features and data distribution features that related with air traffic complexity. On the basis of the prevailing complexity features, the above new proposed features are synthesized to form a more complete air traffic complexity feature set used to characterize air traffic complexity. In order to ensure the effectiveness of the feature set, the mRMR method is introduced to perform feature selection to select the most useful features for machine learning model. Finally, we use XGBoost model to train and learn the data set after feature screening, and obtain the air traffic complexity evaluation model. The experimental results illustrate the effectiveness of our proposed complexity features and outstanding performance of XGBoost model with mRMR method.
In academic research, the proposed complexity features and feature selection methods in this paper can support scholars to analyses air traffic complexity at a deeper level, and help to excavate the internal mechanism of the formation of air traffic complexity. At the same time, in engineering application, the lower dimension complexity feature screened out by feature selection methods is helpful to be put into use and analysis of actual business, and can also reduce the calculation complexity to a certain extent.