Open Information Extraction for Waste Incineration NIMBY based on Bert Network in China

Unstructured data makes it difficult to build the emergency knowledge base. Information extraction can help people get structured information faster and more accurately. In this paper, a method of information extraction based on Bert neural network is proposed, which can extract information from the Chinese open text of Waste Incineration NIMBY. It provides a scientific scheme for the construction of a structured knowledge base of Waste Incineration NIMBY, so as to provide auxiliary decision-making for solving the conflict of Waste Incineration NIMBY. Experiments show that the precision rate, recall rate and F-value of the information extraction can reach 91% by using the Bert network, which basically meets the requirements of building a structured knowledge base of Waste Incineration NIMBY.


Introduction
China Statistical Yearbook 2019 shows that China's domestic waste removal and transportation volume reached 200 million tons, of which about 45% were treated by waste incineration [1]. Although waste incineration has the characteristics of harmlessness, reduction and resource utilization, it also often causes the problem of conflict due to the location of the waste incineration site [2]. It is of great significance to deal with the text about the problem of NIMBY from government websites and scientific research literature, and to build a knowledge base of Waste Incineration NIMBY for providing scientific and effective decision-making solutions to the problem of Waste Incineration NIMBY. In the process of building the knowledge base, there are many problems, such as incomplete information, large amount of data, inaccurate text analysis and so on. In recent years, the development of artificial intelligence and natural language processing technology provides a feasible solution to the above problems. The purpose of this paper is to use the method of open information extraction to extract entity information from the text collected by the web crawler, so as to provide a scientific and reasonable knowledge base of Waste Incineration NIMBY and a scientific decision-making scheme to solve the problem of Waste Incineration NIMBY.

Information extraction
Information extraction refers to organizing the unorganized and unstructured information into organized and structured data for storage and utilization. [3]. The main task of information extraction is to extract information points from various documents and integrate them in a unified form. The method of information extraction can be roughly divided into two kinds: rule-based method and machine learning  [4] et al proposed the method of using the distance and location constraints between entities to obtain candidate triples, and using the method of sentence rule filtering to extract the open entity relationship. Qiu Qizhi [5] et al proposed the method of information extraction based on style and vocabulary, with an average accuracy of more than 80%. Machine learning based methods are mainly used to mark and identify designated entities on the basis of statistics. Some of these mathematical and statistical models are applied to information extraction, such as HMM (Hidden Markov Model) [6] and CRF (Conditional Random Field). In addition, some neural network models are also used to extract entity information from text, such as LSTM (Long and Short Term Memory model) [7], Bi-LSTM (Bidirectional Long and Short Term Memory model ) [8] and so on.
In 2018, Google's Jacob et al released a large-scale pre training language model based on bidirectional encoder representation from transformers [9,10]. The experimental results show that the bidirectional training model has a deeper understanding of context than the unidirectional model. This model shows amazing results in machine reading comprehension top level test SQUAD 1.1 by virtue of the fine turning model based on pre training: it comprehensively surpasses human in all two measures, and also refreshed the records in 11 different NLP tests, including pushing the GLUE benchmark to 80.4% (absolute improvement rate of 7.6%), achieving the accuracy of MultiNLI 86.7% (absolute improvement rate of 5.6%) and so on. Google also released Bert's multilingual model and Chinese model, which makes it possible for Bert to extract Chinese information. In this paper, Bert is used to extract the information of Chinese Waste Incineration NIMBY text.

Bert framework
In this paper, Bert neural network is used to train and learn the data set of Waste Incineration NIMBY, so that it can mark and extract the information of specified entity and entity relationship in the given web text. This is the basic work of building knowledge base of Waste Incineration NIMBY which can provide assistant decision for solving the conflict problem of Waste Incineration NIMBY. The Bert network consists of three layers: input layer, transformer processing layer and output layer. The overall structure of the model is shown in Figure 1.  the word vector representation of the sentence, the second layer is the word encoding of the transformer encoder, and the third layer is the corresponding word encoding in the second layer. Then give each of these three layers a weight, multiply the coding results by their respective weights, and then they are accumulated to get the output.

Information extraction framework
In this paper, the process of entity information extraction for Chinese text of Waste Incineration NIMBY is as follows. Firstly, obtain the text of Waste Incineration NIMBY from relevant websites, government documents, and scientific literature. And then the acquired texts are processed to remove the symbols and marks that cannot be identified by information extraction program, and a data set that can be processed by the Bert model is formed. After that, the data set is input to be processed by BERT model which is trained by a large number of Chinese data of Waste Incineration NIMBY, and the entity information and entity relationship that need to be extracted are obtained. Figure 2 is the information extraction architecture of the Waste Incineration NIMBY construction studied in this paper. Information extraction architecture mainly includes three modules: (1) Information acquisition module: Web crawlers written in Python are used to crawl the text and document information of Waste Incineration NIMBY in official, news, literature and other websites, then save them to form the initial network information text.
(2) Text processing module: The initial network information text is processed to remove the structural marks and symbols that can not be recognized and processed by the Bert model. And reorganize the text according to the sentence to form the corpus data set.
(3) Information extraction module: This module is the core module. It inputs the corpus data set to the Bert neural network model program to process and extract the target entity and related information. The Bert model has been trained by a lot of data of Waste Incineration NIMBY. In this paper, we have obtained papers about the Waste Incineration NIMBY from CNKI, such as "the cause of NIMBY movement and its governance paradox: An Empirical Analysis Based on NIMBY movements in Chongqing". Then the structural marks in the text are removed to form a corpus data set that can be used for Bert network training. In the experiment, the corpus data set is manually marked. The target entity and related information of information extraction in the experiment are marked to form train data set, development data set and test data set for the experiment.

Information entity mark
In the experiment, entity relationship triples (entity1, relation words, entity2) and words related to time and location are marked in the train data set, and then entity relationship triples in the test data set are extracted through Bert model. Among them, entity1 and entity2 are entity pairs with relationships, and relation words are words or word sequences that describe semantic relationships between entities. For example, in the text "Driven by the right discourse, the NIMBY movement has become a research hotspot in urban planning, environmental management, social governance and other fields", we can extract relational triples (NIMBY movement, become，research hotspot). The tags in the information extraction experiment are shown in Table 1.

Evaluation index of experiment
In this paper, the precision rate, recall rate and F-value are set as the evaluation indexes of this information extraction experiment. Precision rate represents the proportion of correct words extracted by Bert model in all extracted words. The recall rate represents the proportion of correct words in all (1) In the above formula (1) -(3), P is the precision rate, R is the recall rate, F is the F-value, c1 is the correct number of extracted information, c2 is the number of extracted information, and c3 is the number of test data set to be extracted.

Experimental process
In this experiment, computing is carried out in the computing enhanced cloud server. The CPU of the experimental environment is Intel cascade Lake 3.0GHz 24 core, and the RAM is 48G. In the experiment, firstly, the constructed train data set, development data set and test data set are used as input. Then the Bert model written in Python language is used to train the train data set and development data set and extract features. After that the Chinese text in the test data set is marked according to the training results. Finally, the precision rate, recall rate and F-value of this experiment are calculated by comparing the Bert model prediction mark and artificial mark of entity relation triplet in the test data set.

Experimental result
The information evaluation index of the experiment is shown in Table 2. From the analysis of the above experimental results, it can be found that the precision rate, recall rate and F-value of the information extraction entities are all above 91% when using Bert to extract the information of the Chinese text about Waste Incineration NIMBY. According to the data shown in the experimental results, the information extraction method adopted in this paper can achieve the information extraction of the text about Waste Incineration NIMBY with high efficiency and efficiency, basically meeting the experimental expectations.

Conclusion
This paper puts forward a Chinese entity relation information extraction method for the text of Waste Incineration NIMBY, which can effectively extract entity relation triples from Chinese text, with an average accuracy of 91%. From the aspect of Waste Incineration NIMBY, the information extraction method proposed in this paper realizes the transformation of unstructured Waste Incineration NIMBY information into organized and structured information, which is the basic work of building a structured knowledge base of Waste Incineration NIMBY, and also the basic work of supporting decision-making for the solution of Waste Incineration NIMBY problem. From the aspect of natural language processing, the method proposed in this paper can show that the application of Bert model in the field of Chinese information extraction also has a good effect. Compared with the traditional rule-based extraction method, the accuracy is improved. It has a certain practical value for the application of Chinese information extraction in other knowledge fields. In the future, we will carry out further research on the project and build knowledge base of Waste Incineration NIMBY. At the same time, we will further study the application of the Bert model in the field of Chinese information extraction, fine tune and improve the Bert model, further improve its precision and recall rate of Chinese information extraction, and also try to apply this method to other research fields.