NOTAM Text Analysis and Classification Based on Attention Mechanism

As one of the important carriers, Notice to Air Men (NOTAM) provides key aeronautical security information during flight operation. However, a large number of NOTAM texts lack systematic utilization and organization. To improve the application of NOTAM texts, the Natural language processing can be used to classify and analyze NOTAM texts, and their effective processing attaches significance to monitor the air security. In this paper, a corpus set of NOTAM by web crawler technology has been built to generate raw data. Based on the aeronautical intelligence knowledge system, the corpus set is systematically pre-processed and analyzed, while the key information in NOTAM and corpus features are also extracted. Considering the high-dimensional and sparse of corpus set, the traditional bi-directional Recurrent Neural Network (Bi-RNN) architecture is improved by adding an attention layer which not only allows the Bi-RNN to capture hidden dependency to improve sparse features, but also allows the attention mechanism to weigh the key information base on their perceived importance to optimize text representation. Experiments show that that the Attention-based Bi-RNN network outperforms the other four models, achieving a precision of 94.17% and an F1-Score of 93.66%. The model rapidly increases the utilization of text features meanwhile reduces the loss, which demonstrates it is a useful tool for the task of NOTAM texts classification and air security surveillance.


Introduction
As an important part of integrated aeronautical information series, NOTAM is a regulation or temporal change that pilots and relevant aviation personnel must understand and master the information, which is related to telecommunication regarding the establishment, prerequisites or trade in any aeronautical facility, service, system or hazard, the timed know-how of which is integral to personnel and structures involved with flight operations. Each NOTAM textual content contains, in addition to different information, a description of the records written in natural language, and every NOTAM textual content is assigned codes from predefined taxonomies. Complexity arises, on the one hand, from the want to categorize the textual content on the different hand, from the want to analyze and apprehend the reviews from a world factor of view. The universal purpose is to strengthen the equipment to assist categorize and analyze the data.
In recent years, with the rapid development of natural language technology, these techniques including text classification, text analysis and text mining have been widely and deeply applied in various domains. In general domain, Kim [1] pick up the essential data in sentences and extracted the nearby relevance of textual content by way of a couple of convolution kernels of exclusive sizes in  (CNN). Zhang [2] additionally attempted the comparable CNN in textual content classification assignment and introduced character level to enhance the performance. But CNN is with extraordinarily bad overall performance in disposing longer sequence. Hao [3] and Cho[4] individually adopted Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) model to allow classification task. Besides, Yang [5] paid extra attention to archives classification via proposing a hierarchical attention network.
But the methods summarized above cannot entirely migrate into a specific domain, in agricultural domain, SVM (Support Vector Machine) model used to be carried out on agriculture dataset to consider classification project by means of Shi [6]. In scientific text, there are current mount of complicated features, a sensible technique was once taken with the aid of Hughes [7] to mechanically classify with a sentence degree in CNN. Sulea [8] amassed felony instances from the French Supreme Court and utilized he desktop getting to know strategies to predict the effects which made by means of judges. Rosa [9] targeted on navy chat or posts, which comprise gadgets of pastime and evaluated numerous present day textual content categorization and function determination methodologies on chat posts. Yang [10] proposed a novel methodology for producing conceivable counterfactual explanations, at the same time as exploring the regularization advantages of adversarial coaching on language fashions in the area of finance. The researches above contain in medical, agriculture, military, regulation and economic domain, and in aeronautical talent facts that we lookup on is not often investigated. As for aviation incident reports, Tanguy [11] designed a number of NLP (Natural Language Processing) techniques to mine and control the information. Xu [12] described the textual content evaluation framework and used aviation datasets to put into effect NLP tasks. Alkhamisi [13] tried to discover a clever security supervision machine to administrate and predict the chance in aviation working through the ensemble of computer mastering algorithms. Rose [14] introduced a methodology for the evaluation of aviation protection narratives based totally on text-based bills of inflight occasions and specific metadata parameters which accompany them.
The researches summarized above are characterized abundant corpus and great portability in general domain text analysis and classification task, However, owing to the situation that aeronautical information with spare corpus and high noise, the current text analysis and classification methods are not applicable to be directly used. To some extent, this restricts the intelligent analysis and development of aeronautical information intelligence.
In order to deal with the problems given above, we will build a NOTAM corpus set initially, then apply it into a NLP task. In the section one, through the web crawler technology, amount of NOTAM texts is fetched from the website. Based on the NOTAM rules, the key information in NOTAM is extracted to make up the corpus set. In the section two, we apply the corpus to achieve a text classification task, which is helpful to aviation bodies to address innumerable NOTAM texts. To improve the performance of text classification, we adjust and optimize some key parameters in the model.

Construction of corpus set
Aeronautical corpus performs an quintessential position in natural language processing task, alas there is little corpus in aviation enterprise to supply the researchers to mine the information further. National General Aviation Flight Information Database presently homes over 700,000 h of per-second time collection flight statistics recorder readings generated via over 400,000 flights from eight fleet of plane and over one hundred ninety taking part personal individuals.
[15] A public database of cosmic radiation measurements at aviation blanketed greater than 4500 flights with greater than a hundred thirty zero measurements.
[16] Flight extend prediction machine constructed a dataset for the proposed scheme, computerized structured surveillance-broadcast (ADS-B) messages are received, preprocessed, and built-in with different records such as climate condition, flight schedule, and airport information.
[17] Thus, we will build a corpus set about NOTAM to solve the problems.

Corpus collection
NOTAM covers abundant contents and change swiftly so they are transmitted by the aeronautical fixed telecommunication network (AFTN). In order to satisfy transmitting, NOTAM must be issued following a specific format and using the specific characters. Because of this, we collect raw NOTAM data from Federal Aviation Administration through web crawler technology. The raw data is cleaned and filtered by regular expressions to remove useless information and extract 98088 raw NOTAM texts. Text E) CHONGZHOU VOR/DME 'CZH' 114.5MHZ/CH92X U/S DUE TO FLTCK. As is shown in Table 1, a complete NOTAM text is including ID number, qualifiers, time and location information, and text. Especially, each piece of content express information referring to aeronautical intelligence. As is shown in figure1, the basic analysis of a NOTAM text is clearly implied.

Text analysis
2.2.1. Fundamental information Fundamental information is referring to the ID, qualifiers, and time and location. This information must adhere to a fixed format, which means it is easy to extract. Meanwhile, they extremely rely on the aeronautical intelligence rules. We will use regular expression to summarize them into a corpus set. And a part of hidden features are mining, including the length, the duration time, and importance. The manual work also is taken to check whether the information is accurate. An example of fundamental information has been processing is shown in Table 2. Also scope information statistical figure is shown in Figure 2.

Main body text.
The main body text cannot be directly input into any tasks due to the features of disorder, the texts should be initially disposed by the means that NLP usually adopts in general domains. The punctuations and the stops words are dealt by loading vocabulary, cleaning and filtering the special characters that influenced features extracting, and segment words by the word segmentation toolkit. As for the distinctiveness of NOATAM corpus, our work is involved as follows: 1) Abbreviation analysis: totally 996 pairs of abbreviation are replaced according to the list of abbreviation provided by International Civil Aviation Organization (ICAO). 2) Digital reservation: although data information often is ignored in general domain, data information is expressed radio frequency, runway number, distance, azimuth, flight level and other information. As an essential feature, this information can help to improve the accuracy of text classification task. The comparison of text before and after analysis is presented in soil collapse with <distance> depth <distance> threshold <number> <distance> north centerline advise tower instructions According to the weights of TF-IDF, the extracted concepts are sorted, and the 100 words thermodynamic diagram with the highest weight of main body text is shown in Figure 3.

Application of corpus set
As for NOTAM managing procedures, it is difficult for aeronautical intelligence personnel to dispose massive NOTAM texts with high frequency and accuracy. But each NOTAM text is not equally important, how to recognize the more important NOTAM text is affecting to avoid the accidents and impoverish the security of aviation operating. With the information extracting, a text classification task based on deep learning network is implemented to distinguish which NOTAM should be disposed with a high-priority rating. In this task, it will be divided into three sections: data calling, model building and experiment. Thus, we call the theme and main body information in corpus set to accomplish the task. Observing the information, those includes 43133 texts and the distribution of them is shown in

Model building
Attention-based RNN model not only allows the Bi-RNN network to capture hidden dependency to improve sparse features, but also allows the attention mechanism to weigh the key information base on their perceived importance to optimize text representation. The architecture of attention-based Bi-RNN is shown in Figure 4.

Fig. 4 Architecture of Att-BiRNN Model
Input layer contains n words expressed as a sequence of W1 ， W2……Wn. Word embedding layer transmits pre-training generates word vectors into a fixed-dimension word embedding matrix, also input them into Bi-RNN model. RNN neural network mines word context semantic features information and capture some word-to-word dependencies, but a part of hidden dependencies has lost. Then introducing Bi-RNN, with the propagation of forward and backward RNNs alternately, context information and capturing hidden context dependencies are tightly linked to optimize the sparse semantic features of short text. Then, a representation will generate and combine with the RNN. Finally, the classifier, softmax layer, predicted the results by the representation feeding from attention layer.  (1)(2), then updates inner state t c by equation (3), last updates hidden state t h by equations (4)(5). The function for gates and cell states are computed as: The development of LSTM network has led to many network variants, such as GRU network. GRU and LSTM have similar performance. Both of these models can effectively handle gradient Traditional RNN given above capacity that they solely study the hidden facts from preceding time step. Bi-RNN layer consists of two sub-layers: one shifting ahead in time steps and one transferring To compute the hidden state, we perform an element-wise sum of the hidden states computed from both its sublayers by equations (6)

Design and performance of experiment
The experimental data is extracted from corpus set totally 43133 pieces，and then divided them into train set, validation set and test set according to the ratio of 8:1:1. The specific experimental parameters are as follows: word vector dimension embedding_ Size = 300, the size of neurons in the whole connective layer is hidden_ Dim = 128, learning rate= 0.001, batch size = 128, random inactivation of 50% of the nerve unit drop_ Out = 0.5, epochs = 10.
In the hypothesis of assuming experiment, we take cross-entropy loss, precision, accuracy, recall, and F1-Score as indicators of comprehensive evaluation. Table5. lists the index values of attentionbased Bi-LSTM and Bi-GRU for NOTAM classification, including accuracy, recall and F1score. According to the three assessment values, each index performed well. Overall, Bi-LSTM model outperform than Bi-GRU model, F1 score individually has reached 93.66% and 93.04%.

Tab. 5 Multi-indicator Effects of NOTAM Classification Tasks
Category . that compared with other texts, the K-class about the check list text has significant numerical features and fewer data samples, therefore the classification results all reaches 100%. It is noteworthy that the main reason for the effectiveness of class G GNSS texts in Bi-GRU model is that the number of texts is small, and the sensitivity of Bi-GRU model to sample size is different from that of the Bi LSTM model; The precision of L-class about lighting facilities text and Sclass about meteorological information, texts are lower, and they have many similar features with Mclass operation landing area text and P-class air traffic rule text. Furthermore, these four types of text are short in length, but there are large amounts of data in M-class and P-class text. This suggests that the amount of data speculated affects the effect of model adjustment to some extent. Therefore, according to the unbalanced characteristics of data volume, based on the above model, this paper further explores the impact of balanced and unbalanced data volume on the classification of the model. At the same time, the maximum text length of the model is set to 32, 64 and 128 characters for comparison. The evaluation index mainly adopts F1 score, and the results are displayed on Table 6.  Table 6. shows that F1 score of most categories is better than that of unbalanced data set in case of balanced data set. In the case of balanced case, F1 score of L-class and S-class text is higher than that of unbalanced case, while F1 score of M-class and P-class text is mostly lower than balanced situation in unbalanced case. Considering the difference between the number of L-class, S-class, M-class and Pclass text. It is suggested that the data quantity has a certain influence on the classification effect of the model, which is consistent with the above Table 5. conclusion. In addition, it can be seen from Table  6. that with the increase of the maximum text length, the F1 score of model classification increases.

Fig. 5 Classification F1-Score of NOTAM Text in Different Max_Len
With different text length, the classification accuracy of the model on balanced and unbalanced data sets is shown in Figure 5. It can be seen that with the increase of the maximum text length, the accuracy of model classification increases in both balanced and unbalanced data sets. On the whole, the prediction result of the model in the balanced situation is better than that in the unbalanced situation.

Comparison of different model.
In order to verify the effectiveness of the model, four groups of controlled experiments are designed, which are conducted on fasttext, RNN, CNN and Bi-RNN models respectively. During the experiment, the same basic super parameters are set for each model: random inactivation rate is 0.5, epoch is 10, and the maximum text length is 128. Table. 6 shows the comparison of different deep learning models in accuracy, recall and F1 score. By comparing each model, the F1 score of the recurrent neural network model combined with attention mechanism is significantly higher than other models, which shows that the model can enhance the feature extraction and transmission, reduce the loss of features, and improve the classification effect.  Figure 6. and Figure 7. show the accuracy and loss of different models on the validation data set. The attention-based Bi-LSTM and attention-based Bi-GRU models will not have a big shock in the training process, and will converge quickly and keep the highest in the subsequent training process, which further reflects the advantages of the model.

Conclusion
In this paper, we aim at the corpus building and its application of NOTAM texts. By integrating and analysis the corpus according to the characteristics of the domain, and we develop the deep learning method to build a Bi-RNN model combined with attention mechanism to complete the text classification task. This model can capture the dependency information hidden in context, and pay more attention to the key information of text, so as to improve the accuracy of text classification. The main conclusions are as follows: (1) A NOTAM corpus set is established by web crawler technology and text analysis theory. Qualitatively, this corpus is entirely following the basic rules of aeronautical intelligence and can be reused or shared with other aviation NLP task. Quantitatively, it covers 98088 NOTAM texts with finely processing and the data features in it can be expressed visually.
(2) The attention-based Bi-RNN neural network is used to build the automatic text classification model of NOTAM, which can realize the automatic text classification in 14 categories quickly and accurately, and avoid the problems of time-consuming and poor classification effect of the shallow learning model. The accuracy and F1 score of the model reach 94.17% and 93.66%, which is similar to the other four text classification models (fasttext model, CNN model RNN model and Bi-RNN model; (3) By comparing the text classification results of different categories, we can see that the categories with large amount of data are more sensitive to the attention Bi-RNN model. By adjusting the maximum length and the balance of text, the problem of insufficient data can be alleviated to a certain extent, On the whole, with the increase of the maximum length, F1 score increases correspondingly, and the classification results of balanced data sets are better than those of unbalanced data sets.
There is a large amount of information in NOTAM, and the analysis and classification methods vary with the content and users. This paper explores only one way to process texts based on deep learning. In the next step, we need to consider other NLP systems of NOTAM, and start from the perspective of reporting the status of transactions and emergencies, to realize the multi angle and all- 11 round processing of notations; In addition, the accuracy of the model can also be improved by adjusting the parameters of feature fusion, so as to achieve the method of accurate detection of aviation safety.