A Summary of Research on Intelligent Dialogue Systems

Dialogue systems have always been an important research direction of natural language processing. In recent years, the potential and commercial value of dialogue systems have attracted people’s attention, and technologies such as intelligent customer service and chatbots have gradually become popular. First, it outlines the concept and classification of the dialog system; then introduces the constituent modules of the dialog system, and mainly studies the key technologies and mainstream ideas; finally, summarizes the problems encountered at this stage and looks forward to the future research direction.


Introduction
Science fiction movies often have virtual assistants with high intelligence, they can freely talk to the host, and provide help to the host, with a full sense of technology. With the advent of deep learning and other technologies, creating a robot that can chat with us is no longer a fantasy. First of all, in terms of data sources, there are a lot of dialogue data on the Internet, including some film and television works, articles, and novels. Secondly, deep learning technology has been verified in many natural language processing tasks, and it has performed very well in many complex tasks. A large number of documents using deep learning technology to construct dialogue system have emerged.

Classification of dialogue systems
The dialogue system can be understood as the use of machine learning, deep learning and other advanced technologies, so that the machine understands human language, correctly recognizes the intention of human language, and can communicate with humans or respond freely [1]. The application value of this technology is to replace manual work in highly repetitive operations, and to perform recovery work with high efficiency and high standards.
Dialogue systems can be divided into two types according to the number of dialogue rounds [1]: single-on dialogue systems and multi-round dialogue systems. According to the purpose of the dialogue, it can be divided into two types: task-oriented dialogue system, non-task-oriented dialogue system, and it can also be generalized as a task-oriented and small-talk dialogue system. This article mainly studies the non-task-oriented dialogue system. The non-task-oriented dialogue system can be divided into two types according to the mode: retrieval type and generative type [2].

Retrieval dialogue model
The retrieval model needs to construct a response database, and retrieve the most suitable response in the database as output based on heuristic inference rules, input, and context characteristics. The advantages of this method are obvious. Because the response database is built by hand, there is no grammatical error in the sentence that is answered while ensuring the quality of the database [2]. But ICAITA 2020 Journal of Physics: Conference Series 1651 (2020) 012020 IOP Publishing doi:10.1088/1742-6596/1651/1/012020 2 there are also certain flaws. For knowledge not covered in the database, the retrieval model cannot respond, nor can it generate any new text, and cannot be applied to scenarios where there is no predefined response.

Generative dialogue model
Compared with the retrieval model, the generative model does not depend on the constructed response data, but the model learns from the database to generate a new response. The advantage of this method is that it answers smartly and is closer to human conversation [3]. However, the model training process is more difficult. First, the model training requires high hardware resources. Second, the model learning features require a large amount of data. Finally, there may be grammatical errors when the output is long.

Retrieval dialogue model
Retrieval dialogue refers to selecting an appropriate answer from the response database as an output by searching and matching. The general construction method is: first construct a response database that contains a large amount of data, then match the input data through information retrieval and other methods [4], and finally use the matching candidate answer scores as output. To solve the problem of low utilization rate of dialogue interactive information in the retrieval dialogue model, Tao [4] proposed a deep retrieval model based on retrieval in 2019. An interaction-overinteraction network is defined in this article. The network is composed of interaction blocks. Each block contains a self-attention module, an interaction module, and a compression module. The self-attention module is used to extract the dependency between the question or reply; the interaction module models the dependency between the question and reply [4]; the compression module combines the results of the first two modules. Let U k-1 and R k-1 be the Kth input, U 0 =E u , R 0 =E r , and the self-attention module is defined as: The interactive module first let U k-1 and R k-1 pass: After that, U k-1 and R k-1 interact with U k and R k respectively: Finally, the U k-1 and R k-1 are updated to U k and R k through the compression module as the output of the module. After passing the ioi network, in the Kth interaction module, three similarity matrices are constructed: Then connect these three matrices into a three-dimensional vector, and use the convolutional neural network to extract features from T k i and match. The final feature output is mapped to the d-dimensional vector Vki through linear transformation, and the timing relationship is captured using GRU, and output to the sigmoid function through the hidden layer to obtain the final matching score. Finally, the model is validated on three data sets. The ioi model performs better when interacting with medium depth. Compared with the existing model, the accuracy rate is improved by 2%.

Generative dialogue model
Generative dialogue can be regarded as a seq2seq problem. However, the main reason why the seq2seq paradigm is successful in solving problems in the field of machine translation is that the output interval corresponding to the input problem is limited [5]. If the same technology is used to build the dialogue model, the first problem is: unable to pay attention When it comes to context, the model cannot pay attention to the contextual connection. For example, the above input "I am sick today", and then one of the inputs "what to do next", the reply "go to play basketball together", this reply is for the current The input is correct, but the above is not taken into account at all. In fact, the reply of playing basketball should not appear in the output space; the second question is: the gender, age, work, status, Income and so on have a certain impact on the output space, these conditions have a certain limit on the recovery space. In Figure 4, the upper part is the encoder frame and the lower part is the decoder frame. The overall framework consists of three parts: self-attention encoder, incremental encoder, and collaborative decoder. The self-attention encoder independently encodes document knowledge and current dialogue; the incremental encoder uses incremental encoding to encode multiple rounds of dialogue, and uses the attention mechanism to establish context-sensitive document knowledge representation [7]; the collaborative decoder is a two-way conversion For the decoder, the first pass of decoding relies on the dialogue context to generate a response, and the second pass of decoding uses document knowledge for further correction. Encoder construction steps: First, the multi-head attention method is used to merge and encode the document knowledge and the contextual dialogue: Decoder construction steps are as follows: The network structure of the first decoding and the second decoding is the same, which are composed of four sublayers [8]: a.multi-head self-attention: b.multi-head context attention: c.multi-head utterance attention: d.position-wise fully connected feed-forward network: Then use the softmax function to obtain the probability of decoding the word for the first time. In the second decoding process, you only need to replace the historical dialogue information of the second sublayer with the current document representation, and the current round of sentence information of the third sublayer with the output of the first decoding [9], and finally output after correcting with the knowledge of the document word.
The change in the decoder part of the model reduces the perplexity of the results and improves the performance of the overall model [10].

Conclusion and Future Work
Judging from the current technical level, the realization of the retrieval dialogue system is better, and the overall dialogue system is less effective due to the contextual problems and the like. As far as the application in the commercial field is concerned, the prospect of the generative dialogue system is very broad, and it is also the current development trend of the dialogue system. The follow-up plan is based on the above models, research and integration of ELMo, BERT and other models to achieve a highperformance dialogue system.