Online Ride-hailing Demand Prediction Model Based on GRU & LSTM

While online car-hailing provides fast and convenient services, it has the same problems in the process of ‘seeking customers’ as traditional taxis. With the research goal of the demand for online ride-hailing, an online ride-hailing demand forecast model based on G & LSTM is proposed. Firstly, grey relational degree analysis and other methods are used to analyze which and how the relevant factors affect the demand for online car-hailing. Secondly, the demand prediction model of online car-hailing based on GRU&LSTM is trained an adjusted. Through the adjustment and comparison experiment, the values of each parameter in the model are properly adjusted. Finally, based on the running data of Chengdu, the model proposed is verified and evaluated. The experimental results show that the GRU&LSTM prediction model has good prediction effect. By comparing the prediction effect of weighting data and unweighting data, it can be seen that weighted data has better prediction effect.


Introduction
In recent years, online car hailing has flourished [1] [2] and become an important mode of transportation for residents [3] .Although online ride-hailing can meet residents' delicate travel needs, it has the same problem of blindness in the process of 'seeking passengers' as traditional taxis, and is easily affected by external factors [4] , resulting an imbalance between supply and demand.In order to achieve the timespace balance between the supply and demand of online car-hailing, it is important to accurately forecast the demand of online car-hailing.
Ma and Wang [5] etc. first preprocessed and analyzed the track data of online car-hailing cars in the South Second Ring Road of Xi 'an from the open data of Gaia, and then constructed the CNNLSTMARIMA prediction model integrating time and space.Li, Wen [6] et al. first selected 2. Related technology LSTM (Long Short-Term Memory), is a special kind of recursive neural network jointly proposed by Hochreiter and Schmidhuber [10] et al.Since its design, LSTM has been used to solve the common problem of long-term dependence in recursive neural networks.Using LSTM, information in a long time series can be effectively transmitted and expressed without causing the useful information in a long time to be forgotten.LSTM can also solve gradient disappearance or explosion problems in RNN.The internal structure of the LSTM hidden layer unit is shown in the figure 1.
LSTM's structure However, due to its complex internal basic structure, high computational complexity and large number of parameters, its training efficiency is far less than that of traditional RNN with the same amount of computation, resulting in a long training time.The use of GRU can improve this problem [11] .
Gated Recurrent Unit (GRU) is invented by Cho, Bengio, Bahdanau, van Merrienboer [12] etc. GRU greatly simplifies the architecture of traditional LSTM.To capture the dependence relationship between sequence data with large intervals, the concept of gated recursive unit is proposed.The internal structure of the GRU is shown in the figure 2.   The concepts of GRU and LSTM are both used to deal with the reverse transfer of information and the gradient of long-term memory.So, GRU is a very effective variant of LSTM.From the perspective of algorithm, GRU model is more concise in construction, faster in training, fewer parameters, and good in training effect.

Data
The dataset of this research is comprised of the online car-hailing data of Chengdu and the historical weather data of Chengdu.The online car-hailing data is obtained from the Data Open Plan Platform.The weather data is obtained through web crawler.
Chengdu online car-hailing data includes two parts: track data and order data.Track data consists of track points collected at 2-4s intervals, and road binding processing is used to ensure that the data is consistent with the actual road information.Order data records the order data of Chengdu online car booking.
Weather data mainly records weather conditions, precipitation, snowfall, temperature, humidity, wind speed, sunshine, ultraviolet radiation, runoff and other related information.
The collected data requires a preliminary cleaning to remove duplicate data and erroneous data.What's more, in order to improve the model training speed and accuracy, attribute values of the data should be filtered.By integrating gray level analysis and correlation analysis, the characteristic attributes finally determined are the demand, time, relative humidity, temperature, evaporation capacity, north wind speed, east wind speed, total cloud cover and runoff.

Model
As LSTM has the advantage in preserving a long dependency relationship, combining the characteristics of temporal data, and considering the influence brought by the changes in the demand before and after.It can better fit the tidal model of the demand for online car-hailing [13] .Therefore, in this paper, LSTM is chosen as the theme model, combining GRU model to reduce the iterative training time.The model has six Normalization layers, including one GRU layer, one Batch Normalization layer, one LSTM layer, one activation function layer and one full connection layer.The LSTM layer consists of two LSTM layers, two dropout layers, and one fully connected layer.Therefore, the overall structure is GRU→BN→LSTM→dropout→LSTM→dropout→Dense→Activation→Dense, as shown in the figure 3. The GRU layer contains two gated structures, the reset gate and the update gate [14] .Reset gate is used to limit the extent to which the status data information of the previous moment is discarded, and update gate is used to control the amount of the status information of the previous moment transmitted to the current moment.It is mainly used to control whether the data quantity of the early memory information can be retained to the current moment.Combining the information passed down by the forgetting gate and the information input by the current node, a new memory cell can be formed.
BN layer regularizes and normalizes the data in batches.This not only makes the selection range of the L2 regularization constraint parameters smaller, but also greatly reduces the dependence on Dropout and L2 regularization problems on the normalization problem.BN layer is added to accelerate the training speed and the convergence process.
LSTM layer data input first goes through the forget gate [15] , selectively forgetting worthless information from the cell state.Then, the tanh layer is used to determine the candidate values that will be updated, and the update gate is used to determine the information that will be updated.Then, the two parts are combined to achieve the update of the current cell state [16] .Finally, the output gate determines the cell state of the output the part of the output.
The Dropout layer averages a number of different neural networks.As some 'reverse' model's overfitting will cancel each other out, the overall overfitting can be reduced.The presence of the Dropout layer effectively reduces model overfitting problems, thereby enhancing the network's generalization ability and reducing the number of intermediate features, and moreover, reducing redundancy and increasing orthogonality between individual features at each level.
The activation layer and the full connection layer integrate the model feature results to form the output results.
In this paper, mean absolute error (MAE) and root mean square error (RMSE) are selected as model evaluation indexes based on actual data.
Mean Absolute Error (MAE): Root Mean Square Error (RMSE):  It can be seen that no matter the learning rate is 0.2, 0.05 or 0.001, the prediction error decreases with the increase of the number of model training, which means, the more the model is trained, the higher the prediction level it will have.However, the error drops fastest when the learning rate is 0.05.Moreover, when the training times are the same, the error of RMSE and MAE with learning rate of 0.05 is the smallest.To sum up, assigning the learning rate to 0.05 can achieve the best training effect.
Then the batch size used in the model batch iteration was set.According to the empirical values, batch size=40, batch size=72, batch size=96 and batch size=112 were selected for parameter adjustment.The result is shown in figure 5. training is 4000, the lowest error point can be obtained.Although the number of training is the least and the convergence rate is the highest, the accuracy oscillation is the largest.Therefore, this paper will use the batch size=96 with the smallest error point, higher convergence rate and relatively small accuracy oscillation range.
Finally, the model parameters are set as table 1.

4.2Prediction results
Randomly choose one day to show the result of model of different particle sizes.Result is shown in figure 6(a-e).

Analysis
The analysis result of different resolution p using RMSE and MAE is shown in figure 7. The analysis result of different particle size is shown in figure 8.
Figure 8. model's analysis result of different particle sizes In the case of different particle size prediction, the maximum error was obtained at 60min, and the minimum error was obtained with the size of 10min and 15min.According to RMSE, the prediction error of Chengdu with the size of 15min was reduced by 13.253% compared with that of 10min.

Conclusion
In conclusion, the GRU&LSTM based online car-hailing demand prediction model is effective, and data weighting is helpful to improve the prediction effect.This model can be used to help the driver find passengers, assist operators in management and assist the government in traffic management.
However, there are not many data dimensions considered in this paper, so data other than weather can be introduced in the future. tanh

Figure 2 .
Figure 2. GRU's structureThe concepts of GRU and LSTM are both used to deal with the reverse transfer of information and the gradient of long-term memory.So, GRU is a very effective variant of LSTM.From the perspective of algorithm, GRU model is more concise in construction, faster in training, fewer parameters, and good in training effect.

Figure 3 .
Figure 3.the layers structure of this modelThe GRU layer contains two gated structures, the reset gate and the update gate[14] .Reset gate is used to limit the extent to which the status data information of the previous moment is discarded, and update gate is used to control the amount of the status information of the previous moment transmitted

4 . Experiment 4 . 1
Parameter settingsFirstly, the model's learning rate should be set.The usual learning rate setting range is [0.001,0.2].The introduction of Decay can ensure rapid convergence at the initial stage of model training and prevent shock after reaching the optimal convergence.To exponentially decline the learning rate with the increase of the number of training rounds, the Decay value was set to 0.0001 according to experience.Since the decrease of Loss in the first training was quite large, it was difficult to see the change trend of loss in the later period, so only epochs loss between[1,300] was shown.During the learning rate adjusting, 5% training set of experimental data was selected as the verification set to test the experimental training effect.The training results are shown in the figure 4.

Figure 4 .
Figure 4. the model's train result of different learning rateIt can be seen that no matter the learning rate is 0.2, 0.05 or 0.001, the prediction error decreases with the increase of the number of model training, which means, the more the model is trained, the higher the prediction level it will have.However, the error drops fastest when the learning rate is 0.05.Moreover, when the training times are the same, the error of RMSE and MAE with learning rate of 0.05 is the smallest.To sum up, assigning the learning rate to 0.05 can achieve the best training effect.Then the batch size used in the model batch iteration was set.According to the empirical values, batch size=40, batch size=72, batch size=96 and batch size=112 were selected for parameter adjustment.The result is shown in figure5.

Figure 5 .
Figure 5. model's train result of different batch size When the training times is 5200 and Batch size = 96, the minimum error point is reached, and the error is less than the minimum error of Batch size= 40, batch size=72 and Batch size= 112.The number of training is relatively less, and the convergence speed is also relatively fast.When the number of

Figure 6 .
Figure 6.model train result of different resolutions compared with real dataFrom the degree of curve fitting, it can be concluded that the prediction effect of Chengdu is well.

Figure 7 .
Figure 7. model's analysis result of different resolutions When weighted data was used to predict the experimental data, RMSE and MAE decreased by 40.820% and 41.866% on average.When the resolution p is set at 0.2, RMSE is reduced by 45.929% and MAE is reduced by 50.515%, which has the best effect.The analysis result of different particle size is shown in figure8.