Research on load curve fitting method based on K-means algorithm

Some areas of China are affected by the fluctuation of new energy output and load, and the power supply as a whole presents an unbalanced phenomenon of abundant electricity supply but shortage of power supply. To understand the characteristics of the load electricity consumption curve more accurately, this paper proposes a method of fitting the load electricity consumption curve based on K-mea ns algorithm. First, analyze the characteristics of the load electricity consumption based on the industry subdivision, and build the load electricity consumption characteristic index system. Then, the k-mea ns cluster analysis is carried out based on the characteristics of the load power consumption curve of the subdivided industries to determine the number of clusters, and the curve fitting and deviation test are carried out for each cluster group. This paper carries out practical analysis and theoretical research based on the data of typical industry load power consumption curve in Ningxia.


Introduction
With the continuous improvement and development of China's electric power market mechanism, the construction of the electric power market system for the electric power technology level of the requirements are constantly improving, the load curve characteristics analysis technology is the basic part of the electric power technology field, for the development of the relevant field research has an important role [1][2][3] .In the process of power market transactions, power market operators through the analysis of load equipment power consumption characteristics, a better understanding of the power system power load curve fluctuation characteristics, so as to carry out targeted power market transaction organization, and strongly maintain the security and stability of the power system [4][5][6] .
China's power market reform progress and depth is constantly strengthened, the flexibility of the power market transactions continue to expand, the power market, the competitive pressure between the various trading bodies is also enhanced [7][8] .In order to better enhance the satisfaction of power users, increase the number of power users, increase the economic benefits of power enterprises, power enterprises in the field of operation and planning need to accurately grasp the characteristics of power users' electricity behavior and the industry's production of conventional electricity behaviour characteristics, and as a basis for the construction of the power user image, optimize the scheduling of the power generation resources of power enterprises, to meet the power market users' electricity needs, so as to obtain a better corporate economic and social benefits.Better enterprise economic and social benefits [9][10][11] .
Scholars usually divide the load type into four types, namely, stable continuous load, random fluctuation load, highly flexible load and aggregated adjustable load, and the electricity consumption curves of different types of loads have obvious operating characteristics, respectively, as shown in Figure 1.In order to better understand the operation law of conformity, fully and efficiently use the various types of flexible load resources, so it is necessary to study the characteristics of different types of load power consumption curve, aggregated load resources to be used.
There are abundant theoretical research and practical experience on power load curve forecasting at home and abroad, including typical daily load curve forecasting, load curve forecasting based on different conditions and load curve characteristic analysis using different methods, etc [12][13] .However, there is no research on typical curve of power consumption in major industries in Ningxia, as shown in Figure 1 below.This paper carried out practical analysis and theoretical research based on the data of typical industry load curve in Ningxia, and proposed a load curve fitting method based on K-means algorithm.First, based on the industry subdivision, the industry load characteristics were analyzed, and the load consumption characteristic index system was built.Then, based on the characteristics of the power consumption curve of the subdivided industries, k-mea ns cluster analysis was carried out to determine the number of clusters, and curve fitting and deviation analysis were carried out for each cluster group.

Characteristic index system of load power consumption curve based on industry segmentation
The load consumption curve reflects the daily consumption behavior characteristics of power users, and the concise and effective selection of the characteristics of the consumption curve can effectively reduce the dimensions of load curve fitting and reduce the complexity of cluster analysis.Therefore, this paper selects the characteristic indicators of conventional load, as shown in Table 1.

k-means clustering algorithm
The K-means algorithm is one of the conventional clustering algorithms, which is characterised by the need to first determine the value of the number of cluster centroids, i.e. the K value.The usual solution steps of this clustering algorithm are shown below: (1) Determine the number of categories into which the clustering group is divided, K, i.e., the number of cluster centers (4) Based on the re-divided cluster group to solve the new cluster centre of the category (5) Repeat the above procedure until the termination condition is met.When the value of SSE no longer changes significantly, the clustering centre (cluster heart) is basically stable and no longer iterated, at which time the grouping of members within the clustering area has been determined.
SSE denotes the sum of the squared errors of the distances of all clustered objects from the corresponding cluster centers (cluster centers).

Determination of cluster center
K-means clustering should first determine the number of cluster centers.Because of the large error in determining the number of cluster centers by visual observation of the changing trend of daily load curve, the number of cluster centers is determined by calcula ting the aggregation coefficient.
As can be seen from Figure 2, when the number of cluster centers is 3, the broken line has an obvious inflection point, and then gradually decreases.Therefore, the number of cluster centers is set to 3. The category of large power users can be divided into 3 categories.

Analysis of clustering results
Set the number of clustering centers to 3, and carry out K-means clustering.Because there is no  According to the cluster analysis results, different daily load curves are classified, and the trend of three types of load curves is shown in Figure 3 below.Among them, the first type of load curve changes stably, and the load value is high; The second type of load curve changes is chaotic, and the general load value is low; The third type of load curve shows a certain regularity, that is, it drops sharply between 18 and 20 points, and shows a steady fluctuation trend between 20 and 0 points the next day.

Classified load fitting results 1) Category I classification load fitting
The original load data of large users is classified to obtain the first type of data set, BP neural network is used to make a prediction, and the data except the load data of the last day is selected for training to obtain the following training results.As can be seen from Figure 4, the error of the first type of classification load fitting has strong autocorrelation, and the error is concentrated near 0, which indicates that the prediction error is small and the prediction effect is good.Figure 5 shows the output results of the first type of similar load prediction.Its error range fluctuates around 0, and some very few points show jumping state, indicating that the prediction effect is good.2) The second type of classified load fitting The original load data of large users was classified to obtain the second type of data set.BP neural network was used to advance prediction, and the following training results were obtained by selecting data other than the load data of the last day for training.As can be seen from Figure 6, the error autocorrelation of th e second type of classification load fitting is obviously not as strong as that of the first type of prediction.Although the error is concentrated near 0, the fluctuation is also large, indicating that the prediction effect is not as good as that of the first type.Figure 7 shows the output results of the second type of similar load prediction.The error range of most of the prediction results fluctuates around 0, and some very few points show a jumping state.3)The third type of classification load fitting This report classifies the original load data of large users to obtain the third type of data set.BP neural network is used for prediction, and the following training results are obtained by selecting data other than the load data of the last day for training.As can be seen from Figure 8, the error autocorrelation of the third type of classification load fitting is poor.Although most of the error is concentrated near 0, the fluctuation is also large, indicating that the prediction effect is not as good as that of the first type.Figure 9 shows the output results of the third type of similar load prediction, and the error range of the prediction results has a large overall runout.

Error analysis
Error analysis refers to the error in the completion of the function of the system, the causes and consequences of the deviation of the required goal and which stage of the system occurs to analyze, reduce the error to a minimum.After the classification is predicted by the BP network, three types of forecast daily load curves are obtained respectively, as shown in Figure 10, among which are the actual value and predicted value of the predicted daily load curves.As can be seen from the figure, the results of the first type of pre-measured daily load curve are good and the degree of fitting is high, while the results of the second type of daily load curve and the third type of daily load curve are poorly fitted.Therefore, the mean absolute percentage error MAPE and decision coefficient R 2are used in this report for error analysis.The mean absolute percentage error MAPE is calculated by the following formula: Where, At is the actual value and Ft is the predicted value.n is the number of samples.When the value of MAPE is smaller, the error is smaller, the gap between the actual value and the predicted value is smaller, and the prediction effect is better.As can be seen from Table 3 of the error evaluation results, the error of category 1 is relatively small, which is 1.66%, within the reasonable error range of 2%, but the error of category 2 and category 3 is relatively large, which is 7.84% and 5.32% respectively.

Conclusions
Based on the industry segmentation in Ningxia region, this paper analyzes the characteristics of load power consumption, constructs the characteristic index system of load power consumption, and conducts kmea ns cluster analysis based on the characteristics of the curve of loadpower consumption in subdivided industries, determines the number of clusters, carries out curve fitting and deviation test for each cluster group, and draws the following conclusions: (1) The first type is the load change of ordinary day throughout the year, which itself has most of the data throughout the year, the amount of data is large enough, and the change of ordinary daily load throughout the year is not obvious, so it is used to pr edict the daily load curve with high accuracy.The second type and the third type occupy fewer days in a year, and the amount of data used for training is also small.Data problems also cause the prediction accuracy is not as high as the first type of prediction accuracy.
(2) The second category is set for the date that is greatly affected by bad weather, and the fluctuation of its load characteristics is mainly due to the influence of weather, and the weather factors themselves are relative to the load, and do not have regular changes.The data collected by the research is only the fluctuation of the load throughout the year, and the change law of all factors is included in the fluctuation of historical data by default.Therefore, due to the uncertainty of weather factors, the accuracy of the prediction method of historical data extrapolation is poor.
(3) The third category is set as the Spring Festival holiday, but since the Spring Festival holiday itself is only two months, there will be uncertainties in the daily load curve of the last day of the IOP Publishing doi:10.1088/1742-6596/2703/1/0120459 holiday when the power load changes of the whole holiday are used to predict.The variation rules of power load characteristics on different dates are also different.Therefore, the fitting degree of the third type of load prediction is not high.

Figure 1 .
Figure 1.Electricity consumption curves for four types of typical loads.

( 2 )
Randomly select K individuals from the clustering group as the initial cluster centres (3) Calculate the distance of each object in the clustering group from the initial clustering centre, and divide it into the category of the closest initial clustering centre (,   ) = √∑ (  −   ) 2  =1 (1) (,   ) denotes the Euclidean distance of individual x from the clustering centre.x ij represents the j indicator for individual i. c i represents the cluster centre value for the i indicator.m represents the number of indicators.

Figure 4 .
Figure 4.Error analysis diagram of the first kind of BP neural network.

Figure 5 .
Figure 5.Output result of BP neural network.

Figure 6 .
Figure 6.Error analysis diagram of the second type of BP neural Network.

Figure 7 .
Figure 7.Output result diagram of the second type BP neural network.

Figure 8 .
Figure 8.Error analysis diagram of the third type BP neural Network.

Figure 9 .
Figure 9.Output result of the third type BP neural network.

Figure 10 .
Figure 10.Comparative analysis of predicted value and actual value.

Table 1 .
Load power consumption characteristic index system.