Statistical approach and multiplicative models for electric vehicles charging behaviour patterns

Since the charging processes of electric vehicles are stochastic and time-dependent, the paper views an approach based on a statistical analysis of real data on electricity consumption at charging station connection points. Other types of data (geographical, public sites, distance between individual charging stations, etc.) are also taken into account when making the analysis. Multiplicative models are the most suitable for studying and forecasting time series with pronounced cyclicity and seasonality. Their application allows us to consider the correlation of the load in the consuming nodes with regional features, climatic factors and seasonality. The method and approach discussed in this paper make possible the processing of a large amount of data and the detection of load cyclicity in the load schedule of electricity facilities. The results of the model will identify the requested charging power in a developing charging infrastructure.


Introduction
The ever-increasing number of electric vehicles and the consequent major changes in transport sector that we are witnessing today pose enormous challenges to the electricity distribution network and the charging infrastructure connected to it. A careful stakeholder analysis is needed for the proper expansion of the charging infrastructure: from the point of view of electricity suppliers (appropriate place to provide the necessary charging power without peak overloads of electricity consumption at the point of connection of the charging station); from the point of view of owners of electric vehicles (utilization of the structure for maximum accessibility and services); from the point of view of government and local government structures. Nowadays the remarkable evolution in the field of mobility is related to digitalization, autonomous driving, shared mobility and dynamically evolving electric mobility.
The reduction of CO2 emissions, atmospheric pollution and noise and the controlled dependence on oil are at the focus of this sustainable development. But how do we evaluate the participation of electric vehicles (EVs) in the implementation of each of these important goals? It seems that they are quite ambitious and the current 5.3 million (according to the International Energy Agency) electric vehicles will not cover them. Therefore, there is an urgent need to improve charging technology and build relevant charging station infrastructure. The charging process of electric vehicles is stochastic and timedependent and the most common practice is to connect charging stations to an existing electricity supply network. The energy system operators are responsible for managing energy flows and constantly monitoring the balancing energy at the connection points of large consumers with peak loads, such as charging stations (CS). There are two main reasons why they do this: the load in the local distribution networks and the required transmission network are strongly increasing with all the negative IOP Publishing doi:10.1088/1757-899X/1216/1/012008 2 consequences of this. The energy market regulation is a constant corrective. In Bulgaria, the development of the charging station infrastructure (CSI) turns out to be a simple function of the number of registered electric vehicles, without any research or analysis done in advance. There is no profile that prevails at the moment -business customers, as well as private and municipal installations are approximately equal. It is important to expand the network, first at public places and then at company parking lots. Business desire depends on the number of installed CSs and they are the most difficult thing in the field of electric mobility since more preparation is required. The secondary market for electric vehicles is developing slowly, mostly because of the undeveloped and incomplete CSI There are several web sites providing information about charging stations in Bulgaria, as well as worldwide. However, there is not a real platform for charging infrastructure management with various options for planning, booking, payment and management, as well as for attracting other participants in this process.
In conclusion, the main steps towards low-carbon economy and more sustainable transport with the increasing use of EVs are careful forecast analysis of appropriate multiplicative models for electricity consumption and subsequent integration of renewable energy sources in the charging infrastructure.

Background
The structure of the energy distribution network to which charging stations are connected is interdependent and therefore a careful preventive analysis is recommended in order to avoid peak loads. Some factors that determine the behavior during charging are time-dependent, others are geographical and local, but overall the charging process is defined as stochastic. The results of an optimized multiplicative model for electricity consumption at the connection point of the charging station will be an indication of the requested power and location of the expanding charging infrastructure. This is particularly true for countries like Bulgaria, where the rate of use of EVs is described as low, "exploratory", and charging infrastructure as slowly developing and chaotically positioned. The uncontrolled charging combined with the strongly increasing trend of EVs cause major problems in the distribution networks at some places. The solution is "smart charging management algorithm strategy" based on multiplicative forecasting models. In [1, 2] it is shown how the power transformers in PS overload when CS are connected. In [1, 2, 3], stochastic models based on the behavior of EVs' owners during charging are offered. Selected models are used to evaluate the impact of CS on the distribution network, as well as a tool to compare different types of EVs and the consumption of residential sites and other objects nearby. Another stochastic model is used in [4,5] and it is based on the consumption of households supplied by the urban distribution network. Since 2016 the analyses of potential risks of charging according to geographical region, travel time between work and home, average mileage, charging time, etc. have started. This is how the so-called "charging profile of a typical EV battery" is built. Nowadays, according to [5], it is evident that studying travel patterns and charging behavior is beneficial.
The first approach involves the collection and processing of data for the purpose of forecast analysis of the required charging power.
The second approach is based on building a model of the entire EVs lot and using the data as an indicator of the required power provided by the charging infrastructure at certain points and times, [6]. Obviously, the forecast analysis of the energy consumed by the charging stations is extremely important when it comes to evaluating the load on the distribution networks and determining the exact location and power when the CSI is expanding.

General statement and approach
The most common models for identifying discrete objects and time-dependent processes are:

Autoregressive model -AR (Auto Regressive)
It is considered as the simplest and described as follows: (2)

ARX model -(Auto Regressive with eXternal input)
In generalized and expanded form is given with the expressions: . . (3) .
Optimization with ARX models is most effective when applying a polynomial approximation. The forecasts made by ARX models have a high degree of accuracy, especially if the approximate polynomials are of a higher order. Distortions (disturbances) as part of the dynamic system are a disadvantage. These models are widely used because of their main advantage: it is not necessary to know the physical characteristics of the system, which greatly reduces engineering work.

ARMAX model -(Auto Regressive -Moving Average with eXternal input)
In this case, the object of study (electricity consumption) is time dependent. The models used for forecasting are non-deterministic as it is assumed that they allow us to determine the probability of future values in a certain range. Such models are called stochastic and they play an important role. They are divided into stationary and non-stationary. Stationary models are based on the assumption that the process remains in equilibrium around a relatively constant average level. But often in industry, trade, economy and so on, there are time series that are best described as non-stationary, which means that they do not have a natural average level. In this case an exponentially weighted moving average value is used. The stochastic model, where forecasting by exponentially weighted moving average is optimal, belongs to the class of non-stationary processes, also known as autoregressive processes -with integrated moving average or ARIMA (Auto Regressive Integrated Moving Average) models. .
where n k is a value that characterizes the delay.
To achieve greater flexibility in the construction and operation of the model in terms of the observed time series, sometimes it is more appropriate for the models to be combined in one autoregression and moving average model. The combined model is described by the expression: where p+q+2 are unknown parameters: μ, , , … , р , , , … , , , which are determined by the observations.
It is important to know that the combined models have the following distinguishing features: stationarity, reversibility and reciprocity. Mixed autoregression -the moving average model is always preferred because of its economy. In general, the process is expressed by the equation: . .
where: и -are polynomials of В of degree р and q. Many time series in industry, trade, economy are non-stationary and do not fluctuate around a relatively fixed average value. Along with that, the process fluctuates around a different average level and, taking into account the differences in levels, a similar behavior of the order is observed. Such an order can be represented by a generalized autoregression operator φ (B) where one or more zeros of the polynomial φ (B) are equal to 1. When similar features of the time series are repeated after the s interval (they are called reference intervals), it is said that the time series has periodicity s. The most common mistake in the analysis of time series is mixing the procedures for building the forecasting model. Therefore, a common approach to make the analysis is to divide the order into three components: trend, seasonal component, and random component. The multiplicative models are suitable for researching and forecasting time series with pronounced seasonal periodicity. If the time series is , 1, 2, 3, … , , a SARIMA model (Seasonal ARIMA model) is generated for it from row (p, d, q) x (P, D, Q) with average value μ, as follows [6,7,8,9]: where: is a polynomial of order p, p is the order of non-seasonal autoregression; polynomial of order q, q -the order of the non-seasonal component of the moving average; Φ polynomial of order P, P -order of seasonal autoregression; Θ -polynomial of order Q, Q -the order of the seasonal component of the moving average; d -the number of differences; D -the number of seasonal differences; S -the length of the season.
The development and analysis of the model is performed in the following sequence: Descriptive Methods -the data are visualized and the frequency of seasonal occurrence is established, if it is not precisely determined in advance; Seasonal Decomposition -the data are divided into four components: trend (T t ), cyclical component (C t ), seasonal component (S t ), and random component (R t ). The data are presented as a product of the individual components . . . ; User-Specified Model -the most appropriate models are selected; Automatic Forecasting -automated model selection is performed with the best approximation to the source data.
Estimated Autocorrelations, Partial Autocorrelations and Tests for Randomness of residuals -the autocorrelation function, the partial autocorrelation function and a randomness test of the difference between data and forecast are calculated, i.e. the residual error. The type and values of the functions are used to diagnose the model.

Results and discussion
The object of monitoring and analysis is the consumption of three publicly available charging stations (CS 1, CS 2 and CS 3), connected to the city distribution network. Considering the problems described in sections 1 and 2, we use the data collected for the energy profiles of the three power substations (PS 1, PS 2, PS 3) supplying these terminals, Figure 1. The data include active and reactive electricity measured in 15 minutes for three typical months: January, May and July, (2976 values for each month), 2020. The energy profile of the terminals to which the CSs are connected is monitored at the same time, Figure 2, Figure 3, Figure 4. The energy profiles represent active and reactive energy for one typical working day of each month, a total of 100 measured values for 24 hours (x-axis). The analysis of the application results and the optimized multiplicative model will be an indication of the correlation of the load with climatic factors, seasonality, electricity tariff zones and EVs charging behavior.    In a three-coordinate system, the data from January (x-axis), May (y-axis) and July (z-axis) are compared, the so-called data distribution density, Figure 5 a). Each monthly energy profile is checked for random data (Normal Probability plot with 95% limits). Box and Whisker plot helps to find datacenter, median and interquartile range, as it is done for May, figure 6. Because of the processing of 288 data on the consumption of CS for each month considered, non-repeating, minimum and maximum values for each of them are established. Table 1. The density of distribution is presented on figure 5 b). A completely adequate description of the seasonal time series using a multiplicative model is impossible to obtain. The adequacy of the model gives an idea about how accurate (as close as possible to the specific data) and reliable it is (in which cases it gives good results).   The adequacy analysis is performed after calculating the residual error once the model has been started. An autocorrelation checks of the residual errors using the method of least squares is made. In addition, a check of a periodogram that shows the probabilistic distribution of the residual error is made. However, both checks are considered relatively rough and do not provide an accurate picture of the adequacy of the model. Selected from a total of seven models, the ARIMA (1,0,1)x(0,1,1) 7 model for forecasting the daily profile of electricity consumption gives very good results, Table 2. According to the analysis of the coefficients of the autocorrelation and the private autocorrelation functions of the residue, only between two private autocorrelation functions out of 24 been calculated a correlation is found. The forecast can be more precise if the relationship between the residual of the data after the forecast analysis and the predicted values is used, Figure 7: , .

Conclusion
The disadvantage of the model is the difficulty in predicting values with a low frequency of occurrence, but in all cases, the quality of the data used for the model is of great importance. It is necessary to have a long period of observations and thus to form a database of values of the observed parameters and their change. The electricity consumed in January differs from the trend and cyclicity of those in May and July. The reason is compliance with all restrictions and lockdown due to COVID-19. These measures affect households and industry, and therefore the whole economy. Therefore, another statistical approach is required to study their impact.