Research on Load Forecasting of Charging Station Based on XGBoost and LSTM Model

At this stage, due to the increasing use of electric vehicles, the position of electric vehicle load scheduling in grid power scheduling is becoming more and more important. Effective electric vehicle power dispatching can balance the peak-valley difference of power dispatching, increase the power supply utilization rate of power grid dispatching, and reduce the power supply pressure of line transformer. The load forecast can describe the user’s electricity consumption habits in the next period of time, and can provide important data basis for power dispatching. This paper summarizes the research status of electric vehicle charging load, analyzes traditional charging load research methods, propose a charging load forecasting method combining XGBoost(Extreme Gradient Boosting) and LSTM (Long Short Term Memory Network), And use the data of a charging station in Jiangsu to verify the calculation example. The proposed method is based on the prediction results of the XGBoost model for feature engineering, extracting data features using phase space reconstruction techniques and statistical methods. In addition, training the LSTM model for load prediction. Based on the charging record data of domestic charging stations, this paper applies the artificial intelligence method to the charging load forecast of domestic charging stations for the first time. The charging station load forecasting method studied in this paper can support the regional load forecasting research of electric vehicles with high permeability, and further optimize power dispatching.


Introduction
At this stage, urban residents are increasingly concerned about the impact of the environment on their lives, In order to protect the healthy living environment, the government has begun to promote lowcarbon and low-pollution travel methods, and electric vehicles have begun to enter thousands of households. Correspondingly, in order to facilitate the living environment of urban residents and better reduce the inconvenience of electric vehicle travel, charging stations established gradually became popular, and charging at charging piles has become one of the main charging methods for residents to travel by car [1].
Experiments show that under the natural charging state, the daily charging load curve of electric vehicles is consistent with the daily load curve of the power grid [2].
As the electric load, the charging load of the charging pile can balance the peak-valley difference through effective electric power dispatching, and increase the power supply utilization rate dispatched by the power grid. In the absence of power dispatching, the peaks of the daily curve of the grid load may be superimposed on each other, resulting in excessive pressure on the line transformer power supply load. Low power consumption peaks are superimposed on each other, resulting in low power efficiency [3]. On the contrary, accurate charging load prediction is beneficial to strengthen the power dispatching of charging stations by the power grid and improve the efficiency of power usage.
Nowadays, with the continuous popularization of electric vehicles, the charging load of charging stations becomes more and more important in power dispatching of power grids.
Due to the uncertainties of the user's charging habits, the uncertainty of the initial charging capacity, the different charging characteristics of the battery, etc., uncertain charging needs have arisen [4]. Traditional charging load research methods are mainly based on probabilistic models [5]. Typical probability models are: probability average model, Monte Carlo sampling probability model [6], probability model based on travel statistics, etc.
The probabilistic average model uses the idea of probability average to calculate the charging load of discrete points, and considers the probability distribution of charging start time, battery initial load state, etc. in the model. The Monte Carlo sampling probability model is a more commonly used method in the study of charging load. The Monte Carlo simulation method is used to simulate the charging start time, initial charging state, and charging time of the electric vehicle to obtain the charging load of the electric vehicle. Distribution [7]. The probability model based on travel statistics is based on the investigation of the travel of household vehicles, adding the probability distribution of the initial load state of the battery and the daily mileage [8].
The traditional charging load depends on the probability model, and the accuracy of charging load prediction depends on the accuracy of probability statistics. The charging rules are different in different regions. Furthermore, the traditional model depends on user habits, battery charging characteristics and other probabilistic models, and has a large randomness. Therefore, the traditional probability model is not suitable for charging loads in different regions. Due to the lack of charging load data at this stage, and the difference in the collection of charging load data in different regions, this paper proposes a method based on the data law to predict the charging load for a single regional charging stationand in the method, by proving the time dependence of the charging station load, the charging load forecast is focused on the load history data to construct a time series data set.Data-based rules can avoid the influence of uncertain data such as: user charging rules battery charging characteristics, improve the accuracy and applicability of charging load prediction.
At present, the domestic charging station load forecasting method is still in the theoretical stage. The existing charging load forecasting methods for charging vehicles mainly use the data provided by foreign countries with guaranteed quality and good granularity.This high-quality data provides the specific charging load of the electric vehicle at each time.
For domestic charging load data, they usually record the charging time and amount of charging power of a charged car in the form of a running account. The data has the characteristics of low granularity and high processing difficulty. This article is the first time that XGboost and LSTM are combined to process load data as a time series in the case of data defects, and the charging station load is predicted by the method of predicting time series, and a relatively high prediction accuracy rate is obtained.
Based on the charging record data of domestic charging stations, this paper applies the artificial intelligence method to the charging load forecast of domestic charging stations for the first time.Using the charging data of a certain charging station in the region as the data source for algorithm verification, the proposed method is also applicable to the prediction of the charging load of other charging stations.

Different types of charging cars
Different types of charging cars have different charging rules, so different types of cars need to be considered as a whole [9]. For example: Based on the data of charging load, the data is classified by clustering method. Calculate the different types of data in a period of time, use it as a representation of different types of electric vehicles, and calculate the ratio to obtain the proportion of different types of electric vehicles in the charging load of the charging station. This paper carries out load research and prediction based on data from charging stations in a certain area of Jiangsu. In the load forecast of the charging station, the charging object of the charging station is bus. In this paper, the bus is considered to have the same charging rule, so the influence of different types of vehicles on the charging load is ignored. Therefore, the factors that need to be considered to affect the charging load of electric vehicles are mainly: time-of-use electricity price, charging duration, and the number of charging cars in the same time period.

Different prices during different charging hours
In order to alleviate the pressure of the power supply load during the peak power supply period and improve the utilization rate of the power supply during the low peak period, the grid power supply usually chooses the time-sharing electricity price method [10], dividing the time of the day into three parts. Three periods of time use different electricity prices to affect the user's charging strategy. According to the agreement of Su Jia Gong [2018] No. 89, the daily peak hours are 08:00-12:00 and 17:00-21:00, the normal hours are 12:00-17:00 and 21:00-24:00, Valley time is 00:00-8:00. The pricing strategy for the peak and valley period is: peak period: 1.0697 yuan/kwh, normal period: 0.6418 yuan/kwh, valley period: 0.3119 yuan/kwh.

Electric vehicle charging time
In the charging process of electric vehicles, the charging capacity of electric vehicles is related to the charging time and charging load. The charging load is related to the initial charge state of the charge and the battery charging regulation [11]. The data actually obtained in this article is the electric charge of an electric vehicle at a charging station in a certain area over a period of time. In order to obtain the correlation between the charging time of the electric vehicle and the charging load, and to calculate the charging load in each time period, the charging power needs to be divided into three parts:  In order to obtain the average charging load in the current time period (the time period in this article is 1 hour), use equation (1) to calculate.

The number of cars charged in the same time period
During the charging process of an electric vehicle, the connection of a new electric vehicle to the charging station will affect the charging efficiency of other charging piles, thereby changing the load status of the charging station [12], the state of charging load will affect the charging time and charging capacity of electric vehicles [13].
When a large number of electric vehicles are connected to charge, it will affect the static voltage stability margin of the power system [14], Then it will affect the charging of electric vehicles.
This article will count the number of charging cars connected to the charging station in each time period, and using the autocorrelation coefficient method of phase space reconstruction theory [15], obtain the autocorrelation of the charging load of the charging station in each time period. Using highly relevant charging load as model-features, and then, enter the features into the LSTM model to predict the amount of electric vehicle charging in the next time period.

XGBoostpredicts the charging load in the time period
The data W t in the experiment is the charging amount of the electric car in time t, in order to obtain the load P t in each time period of the model, it is necessary to obtain the relationship between charging time and charging capacity.

XGBoost (Extreme Gradient Boosting) is an integrated learning algorithm based on Boosting ideas [16], XGBoost is a regression forest based on linear classifiers or classification regression trees [17](This article uses the classification regression tree)
Since the corresponding values of charging time and charging power in the experimental data are discrete distribution, moreover, the XGBoost integrated learning model has L2 regularization introduced into the loss function to prevent overfitting, which effectively reduces the complexity of the model. The XGBoost loss function uses Taylor second-order expansion calculation, which simplifies the calculation and improves the model convergence speed.
XGBoost introduces a new sampling method, which can effectively control the sample imbalance problem, prevent overfitting, and improve the accuracy of the model. Therefore, XGBoost is used to fit the relationship between charging time and charging capacity.
The input data in training is n T ,which means the charging time of sample n. To increase the nonlinear fitting degree of the model, 2 n T and 3 n T are added as features. The calculation process of XGBoost is as following equation (12): Where i represents the number sample, N represents the N-th round of the model training, and  ( ) Perform Taylor's second-order expansion to get the equation (14):  as the set of sample points on leaf node j . Equation (14) replaces j I to get equation (15): Derive the extreme value of w to get equation (16): The iteration about the tree model is transformed into the iteration about the leaf nodes of the tree, and the optimal leaf node score is obtained. The optimal value of the leaf node is brought into the objective function, and the final objective function can be used to measure the quality score of the structure q of the tree, as in equation (17): The smaller the score, the better the tree structure. In order to find the optimal tree structure, a greedy algorithm is used to iteratively add branches from a leaf node to the tree structure. Minimize the objective function to get the XGBoost model. Finally, the XGBoost model is used to predict the charging capacity.   LSTM (Long Short Term Memory Network) is a special recurrent neural network proposed to solve the long-term dependence problem of RNN (Recurrent Neural Network) [18]. The basic idea of LSTM is to add data information of a long time ago to the input of the recurrent neural network, and to solve the effect of the early data in RNN on the current data disappear(Vanishing Gradient) [19]. The network structure of LSTM is shown in the Fig 1:   Fig. 2 Long Short Term Memory Network model structure In order to obtain the relationship between charging time and charging load in this experiment, XGBoost predicts the charging capacity of each time period and uses formula 1 to obtain the charging load and obtain the time series load data for a period of time. This article considers that there is a time correlation between the number of electric vehicle charging (Fig 3), using LSTM to predict the number of charging vehicles in the next period of time.

Phase space reconstruction
For the electric vehicle charging time series data used in the experiments in this paper, there are characteristics such as user charging habits, battery state of charge, charging time and other uncertain factors. It can be regarded as a non-linear dynamic system with irregular motion seemingly random, uncertain behavior, non-repeatable and unpredictable, also known as the chaotic phenomenon of nonlinear dynamic system.
The general time series is mainly to study the model in the time domain. For chaotic time series, the calculation of chaotic invariants, the establishment and prediction of chaotic models are all carried out in phase space, so phase space reconstruction is a very important step in the processing of chaotic time series.
The phase space reconstruction technique has two key parameters: the embedded dimension d and the delay time τ. This paper uses the autocorrelation function to find the delay time τ. Its main idea is to extract the linear correlation between sequences.
For the chaotic sequence x(1), x(2), ..., x(n), the autocorrelation function can be written as equation (19): Where μ is the mean value of the data, Find the mean value, and then bring in the data x(1), x(2),..., x(n) to make an image of the autocorrelation function R(τ) with time τ (Fig 3), and get the correlation of the number of charged cars in the next period. The correlation analysis in Fig 3 shows that the correlation of the data reaches the maximum value at the interval of 24 hours. Therefore, the number of charged cars with larger correlations is selected as the feature input(This article takes the number of recharged cars at the same time point for seven consecutive days).

The number of cars to be charged at the charging station in the next period
Through autocorrelation coefficient method, the historical data with high load correlation is used as a feature in the LSTM time series model for prediction. Use MSE asevaluation function.   Calculate the statistical characteristics of the data: mean, median, summation, variance, and standard deviation, add the influencing factors of time-of-use electricity price and the number of charged cars, and add the points with higher correlation to the characteristics of the model to obtain the characteristics table 1.

Influence factors Features input Features description
The load of the first n days in time period t Load(d-n,t)_mean The average load of the first n days in time period t Load(d-n,t)_middle Median load of the previous n days in time period t historical data.
Load(d-n,t)_sum Median load of the previous n days in time period t Load(d-n,t)_variance Load variance of the first n days in time period t Load(dn,t)_standard_deviation Standard deviation of load in the first n days during time t

Time-of-useelectricity price
L_price(t) Electricity price during time t

Number of charging cars
L_charge_num(t) Number of charging cars during time t

Result analysis
This paper proposes a load forecasting method that combines LSTM and XGBOOST for domestic charging station data. In using XGBoost to calculate the charging capacity in the next period, 2018-9-1 to 2018-12-18 are used as training data, and 2018-12-19 to 2018-12-31 are test data.
Because the granularity of the load data of the charging station is not fine enough at this stage, and the time data of the actual load is lacking, the load in this paper is obtained from (charging time/charging power), which is the average load over a period of time. Different charging loads during the charging stage will also change. Under such data conditions, the prediction results obtained in this paper are not much different from the actual values.
With the comprehensiveness of charging load data, the prediction method provided in this paper can further expand the characteristics of the data set and charging rules to obtain more accurate prediction results and provide a reliable basis for grid power dispatching.