Event Detection with Word-Word Relation Classification

Event detection (ED) is a critical task in information extraction, aiming to identify triggers and types. Current research has increasingly focused on fine-grained types, where a single sentence may contain multiple trigger words and event types. Previous models only considered sentence-level features, neglecting word-word features and word positional information in the text. We propose a novel labeling scheme that treats event detection as a word-word relation recognition task. In this approach, we first identify relationships between word pairs and then utilize these relationships to perform trigger and event-type detection. By adopting this method, we can efficiently and concurrently identify triggers and types in a sentence. Leveraging word pair relationships effectively addresses scenarios in which multiple trigger words appear in a single sentence. The results demonstrate that our approach outperforms several baseline models.


Introduction
Event detection (ED) aims to identify event triggers and classify them into predefined event types.Figure 1 illustrates two event types where the trigger words "demonstrate" and "war" activate the events "demonstrate" and "attack", respectively.Figure 1.The example demonstrates event detection, where the trigger words "demonstrate" and "war" activate the events "demonstrate" and "attack", respectively.
In practical applications, events are often annotated with numerous fine-grained types, leading to scenarios where a single sentence contains multiple trigger words and event types.Existing models usually pick predefined trigger words from a sentence and use a single threshold to decide whether they are trigger words.This method can lead to missing some triggers when multiple triggers are present in a sentence.In contrast, this paper establishes a relationship for each word pair and then determines the presence of a relationship, representing an event type, based on a predefined relationship score.The advantage of this approach lies in its ability to independently determine the relationship for each word pair using a threshold without being influenced by other word pairs.As a result, it enables effective fine-grained event detection.
Moreover, in event detection, the prevailing approach in most existing models involves a two-step process.First, they identify event trigger words, and then they proceed to classify these identified triggers.However, this method predominantly relies on the trigger words for event type selection, overlooking the interconnectedness between individual words in the sentence.Consequently, this oversight may have an adverse impact on the accuracy of event-type classification.
To address the mentioned issues, we present a one-step word-word event detection model.In this research, we employ grid construction to comprehensively learn the connections between tokens while simultaneously identifying event triggers and types.The process begins with encoding the sentence using BERT, and then the encoded hidden states are fed into a convolutional layer comprising grid representation and multi-granular dilated convolutions.Consequently, tokens are transformed into grid representations, capturing the relationships between words.Next, we proceed with word type classification.Initially, event types are predicted using a dual affine classifier, and the relationships between word pairs are also considered, derived from the grid representation.Finally, the relationships are classified based on scores obtained from both classifiers.
Our model utilizes grid representation to learn the relationships between words, which facilitates the classification of word pair relationships, leading to event type detection.Moreover, during the classification of word pair relationships, we also identify the trigger words.This one-step process for event detection reduces error propagation and contributes to improving the accuracy of event detection results.
Our contribution is as follows:  By treating event detection as a word-word relation recognition task, we effectively utilize lexical-level features and word positional information in the text. We introduce a word pair labeling strategy to tackle the challenges of fine-grained event detection and address cases where a single sentence contains multiple trigger words. Experimental results demonstrate its superiority over several baseline models in event detection.

Related Work
Event detection (ED) aims to identify triggers and classify types.Figure 1 illustrates two event types where the trigger words "demonstrate" and "war" activate the events "demonstrate" and "attack", respectively.
During the initial stages, feature-based approaches relied on linguistic features and manually designed task-specific features [1] , which frequently resulted in cascading errors.Hong et al. [2] developed a cross-entity event detection model that utilized dependency relationships between various entities and trigger words.Li et al. [3] employed global features to capture the dependency relationships between diverse event trigger words and arguments.
Nguyen et al. [4] pioneered the use of convolutional neural networks for event detection tasks, mitigating the problem of cascading errors caused by feature extraction in traditional methods.Currently, one approach involves adopting a machine reading comprehension paradigm, where event types are treated as questions, and abundant type information is utilized for prediction.However, these methods rely on external information for event detection and overlook the interrelationships between event types.
Most existing models in event detection have limitations due to their exclusive focus on sentencelevel features, disregarding lexical-level features and word positional information in the text.
We investigate a novel approach to model word pair relationships, addressing the event detection problem.This method efficiently captures the relationships between boundary words and internal words of entities, leading to a more comprehensive understanding of the events.

Our approach
The framework's architecture is depicted in Figure 2, primarily comprising three components.First, we utilize the BERT encoder to generate contextualized word representations, extracting information from the input sentences.To achieve more precise word pair relationship classification, we divide the process into two modules to obtain word pair relationship scores.One module employs convolutional layers to construct and refine grid representations of word pairs, enabling subsequent word-word relationship classification.The other module directly utilizes a multilayer perceptron (MLP) to acquire word representations and derive word relationship scores.By combining these two relationship scores, we can effectively identify trigger words and event types.

Encoder Layer
BERT is the encoder for our model.BERT is capable of generating context-based text representations conditioned on tokens while preserving rich textual information.We let a sentence with N tokens be denoted as x x , x , x , … ., x .These tokens are then input into the BERT model, resulting in hidden states H h , h , … ., h ∈ R where these hidden states serve as the token representations for downstream subtasks.Here, d denotes the dimension of word representations.

Convolutional layer
To establish and enhance the grid representation between words, convolutional layers are introduced.These convolutional layers consist of two modules: the grid representation module and the multigranular dilated convolution module.

Grid Representation
To build a word-interaction network with high-quality representations, we incorporate conditional layer normalization (CLN) to enhance the learning of information between distinct words.The conditional layer normalization (CLN) approach is built upon the framework of layer normalization, yet it permits the adaptive generation of scaling factor γ and bias β according to conditional information.Given token representations hi and h j , the formal expression of the CLN can be defined as follows: In this context, μ and σ denote the mean and standard deviation, respectively, of the elements in h j .
We employ an MLP to link word information C with the word positional information Ed, thus establishing the initial framework of the grid representation.

Multigranular Dilated Convolution
To capture the interrelationships between words at different distances, we utilized several twodimensional dilated convolutions, each with varying dilation rates.Each dilated convolution module is designed to extract information from specific distances.The computation of a dilated convolution can be formulated as follows: In this context, D signifies the output of the two-dimensional dilated convolution with a dilation rate of l, where l∈ [1,2,3].The GELU activation function is denoted by σ.

Baselines
 TriggerQA [5] adopts a question-answering formulation to accomplish the task.
 TANL [6] treats multitasking as translating between natural language and designed annotations for prediction structures. Text2Event [7] employs a Seq2Seq model to generate manually designed structures for each event. PoKE [8] proposes multiple joint prompting methods, which model interactions between different trigger words or arguments to elicit more complementary knowledge. GDAP [9] facilitates the automatic utilization of label semantics on prompt templates.

Main Results
We conduct a comparative analysis with five baseline models.The experimental results are shown in Table 1.
We achieved the highest F1.This superiority can be attributed to our model's comprehensive consideration of lexical-level features and lexical positional information.Compared to the GDAP model, our model demonstrates significant improvements, with a 5.47% increase in precision, a 0.94% increase in recall, and a 3.38% increase in F1 score.
The GDAP model adopts a template-based approach, utilizing separate templates for different event types, and template quality will affect the event extraction results.Our word-word-based model employs a unified approach to predict all event types, leading to overall improvements in event detection.
Moreover, when compared to TriggerQA, our model shows a 0.88% improvement in F1.Our model incorporates word pair relationships for trigger word and event type prediction, while TriggerQA relies solely on sentence-level features.As a result, by considering more interactions between words, our model achieves a substantial increase in the F1 score, leading to superior performance.

Conclusion
This paper introduces treating event extraction as a word-word relation recognition task, which allows for efficient and parallel extraction of word-word relations for event trigger words.Our model incorporates a unique word pair labeling strategy and a score-based classifier.By utilizing a grid representation, the model learns relationships between words and facilitates the classification of word pair relations, enabling event type detection.Through this word pair relation classification process, trigger word labeling is accomplished simultaneously, achieving event detection in a single step and reducing propagation errors.This leads to improved event detection accuracy.We evaluate our proposed method on the ACE 2005 dataset and demonstrate its superiority over several baseline models through experimental results.However, our model has yet to address the issue of resolving trigger word overlaps.To address this challenge, we plan to make improvements in subsequent research.