A New Training Idea of Machine Learning in NLP

Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers in natural language. Generally speaking, NLP can be divided into the following areas, including text retrieval, sentiment analysis, semantic role labeling, machine translation and dialogue systems. Nowadays, although artificial intelligence has achieved a certain degree of achievements in the dialogue system, there is still room for improvement. This article proposes a way to deal with multiple interactive dialogues between man and machine, so that machine dialogue can take more about the content of the conversation before and after into account and the performance of machine is closer to the real human dialogue.


Introduction
Natural language processing, as an important branch in the field of computer science and artificial intelligence, studies theories and methods that can realize information interaction between humans and computers through natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. It is not a general study of natural language, but the development of a computer system that can effectively realize natural language communication, especially the software system, so it is a part of computer science.
Using natural language to communicate with computers is what people have long pursued. It not only has obvious practical significance, but also important theoretical significance. Among the various branches of natural language processing, it can be said that dialogue systems and question answering systems are the areas closest to the ultimate goal of NLP. At present, there has been considerable progress in the research and application of the conversation system that used for small talk. For example, Microsoft Xiaobing, Apple's Siri, and Xiaomi's Classmate AI are all currently used in practice. The study of human-computer dialogue will not only help human understand their own language further, but also enable people to make computers move further away from true "intelligence" in this process, and achieve more complex functions.

Basic Technology-Neural Networks
The basic neural network structure is from the input layer to the hidden layer: H= σ (X*W+B) (where H, W, X, b are all matrices, and σ is the activation function). The process from the hidden layer to the next hidden layer is basically the same as the input layer to the hidden layer. Hidden layer to output layer: Y= Cross Entrophy (Softmax (H'*W'+b')) [2,3].

CNN, RNN
On the basis of the basic neural network, people have continuously improved the algorithm and training methods therefore many types of neural networks have appeared. In the direction of image recognition, Convolutional Neural Networks (CNN) is more commonly used. Unlike basic neural networks, it squashes the entire image data. Convolutional neural network does not ignore the position and structure information of the image, and can make better use of the structure information of the image to achieve a better analysis of the image. In the field of Natural Language Processing (NLP), the more primitive and basic models are Convolutional Neural Networks and Recurrent Neural Networks (RNN). Among them, the characteristic of RNN is that it pays more attention to the correlation between the data in the sequence than the basic neural network, instead of analyzing individual data separately, in the process of sequence data analysis. This point is determined by the structure of RNN. For the input sequence [X 1 , X 2 ,...X i ,...X n ], its structure is H i =f (X i , H i-1 ) , from which it can be easily found that RNN analyzes the correlation between the sequence through H, which is also the reason why RNN is more common in NLP [4].

LSTM, Bidirectional LSTM, GRU
RNN has performed better than basic neural networks in natural language processing, but it still has some problems in actual training. When dealing with longer sequences, RNN will cause gradient disappearance or explosion during training due to its structure. To solve this problem, long short term memory (LSTM) was proposed. LSTM adds a new memory unit c, input gate i, forget gate f, and output gate o, so that longer sequence data can be better processed during training. The author explains the structure of LSTM in detail here (Picture from Fei Jiang PaddlePaddle platform) [1]. ) ( ) ( In this process, the change of C i is often slow. In the calculation process of C i , the step(f i *C i-1 ) is to perform partial forgetting of the previous memory unit, and the next step is to realize the update of the memory unit under the current data. , But its value will be relatively stable. However, there are often big changes in different data of H i (different moments t i ).
LSTM pays more attention to the association between the current time data and the previous data, but does not take the subsequent data into consideration. However, in actual situations, when people analyze the specific meaning of a word in a sentence, people will not only contact the word before the word, but also the word after it. Based on this, with the stacked bidirectional LSTM, the general approach can be well understood. LSTM starts from the first word in the sequence and continuously analyzes the next word and correlates with the previous word. The reverse LSTM starts from the last word in the sequence, Go forward one by one, and associates with the later part of a sequence that has already been analyzed. Bidirectional LSTM is to use LSTM and reverses LSTM alternately to analyze and process the sequence in the multi-layer training of the data.
GRU was proposed in 2014. It can be said to be a variant or improvement of LSTM. Unlike LSTM, which needs to obtain two parameters (H i-1 and C i-1 ) from the previous moment in the calculation, GRU only needs to obtain one parameter (H i-1 ) from the previous moment and use the update parameters z i and (1-z i ) in the calculation. Gating is similar to the function of input gate i and forget gate f in LSTM, and the test effect after training is basically similar to that of LSTM, but GRU is easier to calculate. The structure of GRU is not elaborated here.

Problems to be Solved
These algorithms are all aiming at a single sentence, that is to say, the current context of the dialogue is less effective. The main factor that affects the machine dialogue is the content of the previous sentence, while the previous dialogue content and the environment of the dialogue need to pay less attention to. The author takes Microsoft Xiaobing as an example for testing.
Microsoft Xiaobing is an intelligent chat robot released by the Microsoft team. Since its release, the Xiaobing Framework System has led the technological innovation of artificial intelligence and has made many achievements in many related fields.
Here is the WeChat official account hosted by Microsoft Xiaobing as the experimental subject to check the dialogue effect. The same words are repeated all the time. It can be seen that Microsoft Xiaobing's final answer after repeating it four times is memorable. (1) Repeat two identical sentences alternately. After repeating it many times, we can find that when Microsoft Xiaobing deals with more complicated repetitive information, it will not "find something wrong" as in the case of 1.

A Solution to the Problem
In the human-machine dialogue training, the initial emotion of the machine is added and the emotional change after this answer is added in the output result. Although this method can not perfectly remember and take into account the content of the previous dialogue, it can be closer to the dialogue between humans and the machine, because the answering attitude of the machine will change with the content of the previous dialogue.
The specific training idea is to add the initial state such as the emotional state at this time when training, and add state fluctuations to the output. However, in the process of real interaction between people, human dialogue is affected by many factors. For example, a person's personality may maintain a certain state for a long time, while emotional mood may fluctuate greatly in the short term. Therefore, perhaps the real training is that the initial state of emotions added is not just one value, but multiple numbers. These still need more testing and research. But just like RNN, LSTM can process sequence information better than basic neural networks. This method of adding mood fluctuations can be linked to the previous and subsequent conversations, so as to achieve better processing of multiple conversations.
Absolutely, there are also many problems with this idea. Firstly, with this training method, the ability to memorize the context of dialogue still cannot be truly achieved. Secondly, this training method is more difficult to collect training data, because there is no more accurate method to judge the emotional state and mood fluctuations of people in the conversation, and the collected data may not be good enough to be used for training properly. Lastly, the training cost will be more expensive than simply training a single sentence.
For the second problem, since artificial intelligence has already achieved certain achievements in sentiment analysis, which can be used to analyze the emotions and psychological conditions of both sides of the dialogue to a certain extent. In this way, ordinary dialogue data can be processed into data with emotional state annotations, and then used as data for analysis.
In this method, the author intervened in the prediction results based on a set of variables and made certain changes to the set of variables after each prediction and applied the variables to the next prediction. The method can not only be used in the field of NLP, it can also be used in similar fields. For example, in terms of recommending the nutrition that the users should take according to the users' physical condition, the type and amount of nutrition that the user should take are not only related to his or her current physical condition. It may also be necessary to consider his or her previous physical condition and recommendations. In this field, introduction of additional parameters to help better