ERNIE-based Named Entity Recognition Method for Traffic Accident Cases

Traffic accident case named entity recognition, which helps mine key information in traffic accident texts, plays a vital role in downstream tasks such as the construction of knowledge graphs in road traffic and intelligent policing. In this paper, we construct a named entity recognition model based on the EDE (Entity Data Enhancement)-ERNIE-Bidirectional Gated Recurrent Unit Network (BiGRU)-Conditional Random Field (CRF) to address the current situation of low traffic accident case data and poor recognition of long-text entities. First, the amount of accident case data is enhanced using the entity random substitution method. Next, the text data of traffic accident cases are characterized as a dynamic word vector using the ERNIE pretraining model. Then, the BiGRU network learns the long-distance dependency relationship in the text to enhance the effect of the model on long-text entity recognition. Finally, the result sequence is constrained by the CRF layer to realize the named entity recognition model. The experimental part uses data related to real traffic accident cases in a domestic area. The data enhancement method increases the data volume three times compared to the original data volume. Experimental results show that the EDE-ERNIE-BiGRU-CRF model achieves better F1 values, recall and precision achieved better performance than the entity recognition methods of BERT-BiGRU-CRF, ERNIE-BiGRU-CRF, ERNIE-BiLSTM-CRF, ERNIE-CRF, ERNIE, BiGRU-CRF, ROBERTA-wwm-ext-BiGRU-CRF and verify its effectiveness for entity recognition in traffic accident cases.


Introduction
The transportation and automobile industries of China are undergoing rapid development, with the number of domestic motor vehicles increasing from 240 million in 2014 to 402 million in 2022, and motor vehicles have become an indispensable component of human life.While motor vehicles have brought convenience to human life, they have also caused a series of traffic accidents.A large quantity of unstructured or semi-structured text data are generated in traffic management for all types of traffic accidents.The textual data of traffic accident cases can support the work of auxiliary adjudication and traffic safety prevention.However, there is still room for exploitation because it is challenging to effectively and automatically extract due to the mix of a large quantity of heterogeneous data.Named entity recognition is one of the main tasks in natural language processing, which aims to determine textual entity boundaries and classify entities into predefined categories [1].Named entity recognition enables the extraction of key information from traffic accident case data and is expected to play an important role in road traffic accident analysis.
There are currently three main approaches to named entity recognition: rule-based approaches, statistical model-based approaches, and deep learning-based approaches.An example of rule-based approaches is the cTAKES project [2], which focuses on extracting information from electronic clinical cases using grammar and manually set language rules.This method is simple to construct but must rely on experts to constantly supplement the rules.A statistical model-based approach was shown by Bikel et al. [3], who used Hidden Markov Models for entity recognition of English texts.However, this method relies on data sets and fails to achieve good results in entity recognition of specific domains.Deep learning-based approaches are mainstream and more effective research methods.For example, Huang et al. [4] proposed using a bidirectional LSTM (Bidirectional Long Short-Term Memory, BiLSTM) model, combined it with the conditional random field model for entity extraction, and achieved an F1 value of 88.83% in the CoNLL-2003 dataset.With the BERT pretrained language model proposed by Devlin et al. [5], the ability to characterize syntactic and semantic information was further improved.
With pre-trained models widely used in natural language processing, named entity recognition in general-purpose domains has achieved good performance.Traffic accident case entity recognition is a specialized domain entity recognition, and many challenges remain.Kumar et al. [6] built convolutional neural network models to achieve address recognition for unexpected events such as traffic accidents.Prasad et al. [7] used BERT models to achieve named entity recognition for traffic accident locations, which replaced the traditional GPS (Global Positioning System) positioning.There are few studies on accident case entity recognition for road traffic in China.Cheng et al. [8] used the BERT model as a named entity approach to constructing the ARTCDP (Automated Road Traffic Crash Data Platform) traffic accident data processing platform and produced the structured accident data.Fan et al. [9] used the BERT-BiLSTM-CRF model to identify the entities of accident information in the safety management website.An analysis of these studies shows that these methods have problems, e.g., pre-trained language models are not relevant to downstream tasks and have poor recognition of accident address-like entities.To address these problems, this paper proposes an EDE-ERNIE-BiGRU-CRF based entity recognition model for traffic accident case naming.The ERNIE model in this method takes an entity-level MASK approach in its training phase, which is beneficial to the downstream named entity recognition task.Afterward, the dynamic word vectors generated by the ERNIE model are passed through the BiGRU network, which can improve the recognition effect for accident address entities.
The main contributions of this paper are as follows:  Considering the scarcity of traffic accident case data, a traffic accident case entity database is constructed based on the existing training data, and the number of data is enhanced by the entity random replacement method. Since language models such as BERT use a word-based masking strategy in the pre-training phase, which destroys the holistic nature of entities, the ERNIE model in this paper has an entitylevel masking strategy for the named entity recognition task. Considering the existence of long-text entities in traffic accident case data, the BiGRU network is used to help the model obtain more semantic information of the context and improve the recognition effect of long-text entities.

Model construction
The model has four main parts: the data enhancement layer, ERNIE word embedding layer, BiGRU network layer and CRF layer.Figure 1 shows the structure.First, in the data enhancement layer, the data are enhanced using the entity random replacement method and subsequently converted into a vector representation according to the dictionary provided by the ERNIE model, which is E1, E2...En in the figure.Next, the vectors are passed into the ERNIE word embedding layer to convert the traffic accident case data to be processed into dynamic word vectors T1, T2...Tn.Then, they are further processed by the BiGRU layer to learn the bidirectional semantics of the text, improve the model's understanding of a long text, and output the prediction scores for each label.Finally, the entity recognition results are obtained through the decoding and constraint of the CRF layer.

Data enhancement layer
Borrowing the idea of the simple data enhancement method (EDA) [10] and combining the characteristics of the named entity recognition task, the operation of random replacement of only entities enhances the data.The entity random replacement method has two main components.First, the entities in each data item in the training set and validation set are extracted according to the labels and separately stored according to the entity categories to complete the entity database construction.Simultaneously, the entities in each data are replaced by the "*entity category*."Next, the entities in the entity database are randomly filled with the same type of position in the data to form new data and added to the original data set.Each position goes through the entity filling process twice.After the data enhancement, the data volume can reach three times of the original data volume.The schematic diagram of the method is shown in Figure 2.

ERNIE word embedding layer
Word embeddings achieved through pre-trained models such as ERNIE are contextually relevant, and ERNIE models characterize the data as dynamic word vectors that fuse the contexts.The model in this paper uses ERNIE3.0 [11] and achieves a score of 94.6 on the named entity recognition task in the SuperGLUE list, which is the highest score achieved on this task.
The backbone network of the ERNIE3.0model consists of multiple layers of Transformer-XL [12], which incorporates a fragment-level recursive mechanism compared to Transformer to help the model learn longer textual contents.
The BERT model randomly masks some of the words in the data during the training phase and asks the model to predict the masked-out words based on the remaining words.Unlike the BERT model, the ERNIE model incorporates an entity-level masking strategy.Figure 3 shows the schematic diagram.

BiGRU network layer
Recurrent neural networks are often applied to tasks such as sequence labeling and classification in the field of natural language processing.The Gated Recurrent Unit (GRU) is one of the commonly used recurrent neural networks, which can learn and represent long-range dependencies well.The Gated Recurrent Unit network is simplified from the Long Short-Term Memory network and has comparable effect but higher computational efficiency than the Long Short Term Memory network.The structure of the gated recurrent unit network is shown in Figure 4, and each GRU unit contains update gates and reset gates.  The GRU network can well represent long-distance dependencies.In this paper, we extend the GRU network to two directions using the BiGRU network, form a recurrent neural network in two directions (forward and backward), and stitch the vectors obtained at each position to form a new vector representation.The new vectors can better learn the long-distance dependencies in the text, which makes the model have stronger representation capability.

CRF layer
CRF is a discriminative model commonly used in sequence labeling tasks such as named entity recognition, which mainly predicts the output sequence from the input sequence.In the sequence labeling task, a linear chain conditional random field is usually used, whose input and output are linear sequences.
Among them: score(x,y) is the sequence evaluation score; A is the transfer score matrix of the label; P is the output score matrix of the label obtained from the BiGRU network layer; Yx is the set of possible markers; y * is the maximum probability result obtained by Viterbi's algorithm.The algorithm formulas are represented by ( 5)-( 7):

Experimental data preprocessing
The original data come from the text data of a real traffic accident case in a domestic place, and the text content contains information such as the time of the accident, name of the person involved, plate number, type of number plate and address where the accident occurred.The data example (desensitized data) is shown in Figure 5. First, the original data were cleaned and de-duplicated to obtain 1420 data items.Then, the text data of traffic accidents were annotated using the "BIO" [13] entity data annotation method.

Experimental data enhancement
First, all entities in the training and test sets are extracted, and the original label positions are replaced by "*entity category*."Then, the entities in the traffic accident case data are stored in a local file to form a traffic accident case entity database.Finally, the entities are randomly selected from the traffic accident case entity library and filled in the corresponding positions of the data, each position is filled twice, and the training set is expanded to three times of the original data.Figure 6 shows the number of various types of entities within the traffic accident case entity database.

Evaluation Metrics
The named entity recognition task usually has many "O" labels, which results in an imbalance of data category labels.Thus, the recall (R), precision (P) and F1 (F1) values are usually used as evaluation metrics for this task.The calculation formula is represented by ( 8)-( 10): (10) Among them: TTP is the number of correct entities identified by the model; FFP is the number of incorrect entities identified by the model; FFN is the number of entities that are correct but not recognized by the model.

Experimental environment and parameter settings
The experimental environment for the model in this paper was PaddlePaddle 2.3.2 and Python 3.7.4.The selected pre-training model was the ERNIE-3.0-basemodel with a maximum text input length of 2048, and the AdamW optimizer was used.Table 1 shows the parameters for the specific model training.

Analysis of experimental results
Following the above experimental parameters, the accident case data were trained in the EDE-ERNIE-BiGRU-CRF model.Seven models (ERNIE-BiGRU-CRF, ERNIE-BiLSTM-CRF, ERNIE-CRF, ERNIE, BERT-BiGRU-CRF, ROBERTA-wwm-ext-BiGRU-CRF, and BiGRU-CRF) were trained under identical experimental conditions for comparison to verify the model of this paper.Table 2 and Figure 7 show the entity recognition results of the eight models for traffic accident case data.7 show, the word embedding using ERNIE, BERT, and ROBERTA-wwmext pre-trained language models can effectively improve the model performance, and the F1 values are generally approximately 14.5% higher than those of traditional deep learning models such as BiGRU-CRF, which indicates that the pre-trained language models can better characterize the semantic information of the context.Combining all metrics, the EDE-ERNIE-BiGRU-CRF model has good entity recognition.To further explore the model performance on different entities, the seqeval tool was introduced to improve the evaluation function of the PaddlePaddle framework.Table 3 shows the results of the F1 values of the eight models on the dataset for various types of entity recognition.
As Table 3 shows, the EDE-ERNIE-BiGRU-CRF model better recognizes all types of entities than other models.Compared with the ERNIE-BiGRU-CRF model, our model improved the F1 values of NUM and PER entity recognition by 4.91% and 5.26%, respectively, due to the entity data enhancement.The ERNIE-BiGRU-CRF model was compared with the ERNIE-CRF model because both LOC and NUM entities are usually long-text entities, and the BiGRU network can learn and well capture longer sequential information [14], which improves the overall entity recognition of the model.

Application Analysis
This paper designed and implemented an applet-based application for "quick entry of traffic accident information" based on the identification of traffic accident case entities.According to the requirements of the "Accident Express" module in application "12123," the accident information and information of the parties involved and their vehicles are filled in by entering the accident case description.This feature enables traffic police to focus on recording accident information on the spot, reduces the number of repeated entries of the same information and consequently improves the efficiency of traffic police enforcement.The implementation method first exports the static diagram model through the PaddlePaddle framework, subsequently uses the Paddle Inference tool to transform it into an API, and finally calls the API through an applet to implement it.Figure 8 demonstrates the effect.

Conclusions
This paper proposed a traffic accident case entity recognition method based on the EDE-ERNIE-BiGRU-CRF model.This method uses the entity random substitution method to enhance the data, transforms the traffic accident case data into a dynamic vector representation by the ERNIE model while using BiGRU network to capture the contextual information of the text, and obtains the final results using the CRF network constraints.The experimental results show that the EDE-ERNIE-BiGRU-CRF model has higher accuracy, recall and F1 value than the commonly used entity recognition models.However, due to the limited number of entity categories in traffic accident case data, the helpfulness for traffic accident analysis remains limited at present.Thus, more traffic

Figure 2 .
Figure 2. Schematic diagram of the data enhancement process.

Figure 3 .
Figure 3. Schematic diagram of the data enhancement process.

Figure 4 .
Figure 4. Unit structure of GRU.Among them, W and U are the weight parameter matrices; σ is the sigmoid activation function; xt is the input of the unit at the current position; ht-1 is the state at the previous position; ht is the state at the current position; rt is the operation of the reset gate; zt is the operation of the update gate.The specific unit calculation can be expressed by formulas (1)-(4).

Figure 5 .
Figure 5. Example of traffic accident case data.

Figure 6 .
Figure 6.Statistical chart of the number of entities.

Table 1 .
Model training parameters

Table 2 .
Entity identification results of the model

Table 3 .
F1 scores identified by the models for each type of entity