Research on short-term load forecasting based on clustering and deep learning

In power systems, power load forecasting is essential to ensure the reliability and efficiency of power supply. Since power load is affected by many factors, including weather, seasonality, and social activities, its patterns and changes are complex and diverse, and traditional forecasting methods may make it difficult to meet demand. In this background, this study combines the K-means clustering algorithm and deep learning model. First, through K-means clustering, we grouped historical load data, dates, and temperatures to identify different load patterns. This grouping helps adapt to different load changes, thereby improving the adaptability of the forecast. Then, a deep learning model is applied, combining a convolutional neural network (CNN) and a two-layer bidirectionally gated recurrent unit (BIGRU). These two models are used to deal with spatiotemporal characteristics and sequence dependence in load data respectively. CNN is used to capture the spatial features in the load data, while BIGRU processes the time series information in the data to effectively capture the complex dynamic nature of the load. K-means clustering information is entered as additional data into the CNN and BIGRU models. The experimental results of the study show that by fully considering different load patterns and data characteristics. Reducing load demand and providing more reliable short-term load forecasts improve the efficiency of the power supply, reduce energy waste, and reduce the burden on the power system.


Introduction
Load forecasting plays an important role in power system management.From the perspective of forecasting time span, it can be divided into long-term, medium-term and short-term, of which short-term load forecasting mainly focuses on the power load curve in the next few hours or a day, providing the necessary basis for power departments at all levels to formulate daily dispatch plans.[6] Scientific and accurate load forecasting plays an important role in the decision-making and implementation process.
For many years, scholars at home and abroad have been keen to deeply study various forecasting methods in order to constantly improve the short-term load forecasting technology.[10] Numerous methods and techniques have emerged in this field, including multiple subject areas such as statistics, machine learning, artificial intelligence, and deep learning.The effectiveness of short-term load forecasting mainly depends on its forecasting accuracy, so researchers focus on improving the accuracy of forecasting.A new short-term power load forecasting model combining CNN and GRU is proposed to deal with the temporal and nonlinear characteristics of load data [1].However, the model IOP Publishing doi:10.1088/1742-6596/2771/1/012033 2 may face the problems of easily falling into local optimal solutions and difficulty in hyperparameter selection.Shen et al. [2] proposed a short-term load forecasting method based on wavelet analysis and cluster analysis, but since only one station area was tested, it lacks representativeness.Meanwhile, Chen et al. [3] explored a deep learning-based LSTM method, but the prediction accuracy was low when the data volume was huge.On the other hand, Zhang et al. [4] responded to the challenge that the electricity load of agricultural greenhouses was affected by multiple factors, including power supply capacity and meteorological factors, and a new short-term load forecasting model was proposed.[5] In order to solve the problem of load fluctuation and nonlinearity, the model integrated variational mode decomposition, convolutional neural network, and long and short-time memory network.
With the rapid development of distributed energy and renewable energy, the power system has become more complex and changeable.[7] Therefore, accurate short-term load forecasting has become particularly critical in supporting stable power system operation, optimising energy allocation, and improving energy efficiency.The purpose of this study is to integrate the usability of the K-means clustering algorithm and CNN-BIGRU model for short-term load forecasting in the station area.[8] Firstly, the K-means clustering algorithm was used to classify the load data based on historical load data, date, and temperature.The corresponding short-term load forecasting model of this paper was constructed based on CNN-BIGRU.[9] The clustering analysis was taken as an additional input and the model prediction accuracy was obtained through model training.

K-means algorithm
The K-means algorithm is a division-based clustering algorithm, which is widely used in many big data processing domains because of its fast computing speed and the simplicity of its execution.The algorithm evaluates the similarity degree of each data in the data set by similarity measurement method and divides the data with high similarity into the same set.The clustering process of the K-means algorithm consists of the following key steps: As shown in Figure 1.
(2) Distributing data to the nearest center: For each data object i x , its distance from the K initialised clustering centres is calculated and the data object is assigned to the dataset belonging to the cluster centre with the nearest distance.
(3) Updating cluster center: The mean of the data objects in each cluster is calculated and the mean is updated as a new cluster center.
(4) Reallocation of data: The distance of each data object to the new K clustering centres is calculated and the data is reallocated to the cluster belonging to the closest clustering centre.Subsequently, the updating of the clustering centres and the reassignment of the data objects are continued until no data object can switch to a different cluster or reach a predetermined number of iterations.This iterative process can be summarized by the following formula:

Convolutional neural network model (CNN).
CNN models use the mechanism of local connectivity and shared weights to capture local features of power load data.Due to the compact structure and independent expression capability of CNNs, CNN models are generally composed of an input layer, a convolutional layer, a pooling layer, and a fully connected layer, and these hierarchies work together to build a complete CNN model.

Input layer
Convolution layer Pooled horizon Fully connected layer

Convolution operation (Convolution).
As shown in Figure 2. Convolution is the core operation of CNN and is designed to extract key features from input data.Convolution operations involve the multiplication and accumulation of the convolution kernel (or filter) with the element-by-element of the input data.This convolution kernel is a small matrix that can be slid or convolved onto the input data and local features are extracted through dot product operations.Through this key step, the CNN is able to efficiently capture local patterns of the input data, such as edges, textures, etc., thus enabling targeted feature extraction from the data.

Pooling operation (Pooling).
CNNs share parameters over the entire input data through the mechanism of weight sharing, i.e., the same convolutional kernel.The clever design reduces the number of parameters in the model and improves scalability.Weight sharing enables CNNs to learn generic features rather than learning features independently for each position, and pooling operations are used to reduce the size of the feature map, thereby reducing the computational burden and increasing the model's invariance to translation.Maximum pooling ( 6) is the most common pooling method, which selects the maximum value in a specified area as the output result.By these mechanisms, CNNS can maintain sensitivity to local changes and have certain robustness.
where ) , ( j i P is the output after maximum pooling, I is the input data, and k is the size of the pooling window.

Fully connected layers.
CNN generally contains multiple convolutional layers, with different levels of convolutional layers capturing features of different levels, and each layer focuses on extracting features of different levels.Each layer focuses on extracting features at a different level.This hierarchical structure forms a feature pyramid that helps the model understand the input data more fully.The convolution layer is usually followed by the fully connected layer, whose role is to translate the feature mapping into the final output.b XW Z (7) where X is the input matrix with each row representing the feature vector of a sample, W is the weight matrix, b is the bias vector, and Z is the output matrix of the fully connected layer with each row corresponding to the output of a sample.

Gated recurrent unit (GRU)
The Gated Recurrent Unit (GRU) is an evolution of Recurrent Neural Networks (RNN).GRU is designed to overcome the problem of gradient vanishing that standard RNNs often face when dealing with long sequential data and exhibit difficulties in processing sequential data.
The origin of GRU can be traced back to Long Short-Term Memory (LSTM), which is another RNN variant to cope with long sequence data.LSTM introduces a gating mechanism to finely control the flow of information through forgetting gates, input gates, and output gates, which effectively mitigates the gradient vanishing problem and enables it to more efficiently capture long-term dependencies.GRU is simplified and improved on this basis to achieve a more lightweight network structure while maintaining the ability to effectively model long sequential data.

Update gate.
The update gate determines how important the input information of the current time step is for updating the internal state.It is calculated by the following formula:

> @)
, ( where t z is the output of the update gate, G is the Sigmoid activation function, z W is the associated weight matrix, 1 t h is the hidden state of the previous time step, and t x is the input of the current time step.

Reset gate.
The reset gate is used to regulate the effect of the hidden state of the previous time step on the current time step.It is calculated using the following formula:

> @)
, ( where t r is the output of the reset gate, G is the Sigmoid activation function, r W is the associated weight matrix, 1 t h is the hidden state of the previous time step, and t x is the input of the current time step.As shown in Figure 3.

Short-term load forecasting algorithm based on clustering and deep learning
The dataset for this experiment includes power load data, date data, and temperature data.Firstly, data cleaning is required to deal with any possible missing values or outliers, extract the temporal characteristics of date information, and integrate temperature and historical power load data.Then, the K-means clustering is applied to group different load patterns into different clusters, and in this paper, the same date is used as the basis for clustering, which helps to identify data points with similar load characteristics and provides a strong subsequent modelling basis for the subsequent modelling.Meanwhile, the cluster information from K-means is finally used as an additional input in combination with CNN and BiGRU models.This model will take into account dates, temperatures, and historical electric load data to better capture the complexity of spatio-temporal relationships and load forecasting.As shown in Figure 4.If there are missing values, the filled missing value processing strategy can be considered.Relevant features are extracted from the date, such as whether it is a working day or not and whether it is a holiday or not; statistical features are extracted from temperature data, such as mean, maximum, and minimum values.The power load data set is divided according to date, temperature, and historical load data in order to associate each sample with its corresponding time information and other feature attributes to capture the seasonality and periodicity of time.For historical load data, the sliding window technique is used to extract sequence features and convert them into a series of continuous time steps.In order to ensure that each feature has the same weight when calculating the distance, it prevents the feature with a larger scale from having too much influence on the clustering result.To solve this problem, this paper uses a normalization process that limits the input data to the range (0, 1).Specifically, the formula for the normalisation process is as follows: min max min x x x x X (10) where X is the normalised input, x is the actual input, max x is the maximum value of the actual input value, and min x is the minimum value of the actual input value.

The K-means algorithm is used to perform clustering operations on the data set and the features
of each cluster are analyzed, including power load, temperature, date characteristics, etc.The performance of each cluster under different conditions should be understood.The distribution of different clusters on date characteristics is observed and analyzed for the presence of seasonal and cyclical patterns.

The processed dataset is organized into a format acceptable to the model, ensuring that each time step has a corresponding feature and target (load value).
It is used as an input layer to the CNN model.Time series and spatial features are extracted using convolutional operations and feature mapping is performed using activation functions to capture local patterns in the data.In this paper, two convolution layers are used, each layer has 16 convolution cores, and the maximum value in each small region is selected through the maximum pooling algorithm.CNN layer can extract important feature information from input data through convolution operation, activation function, pooling, and other steps.

2.4.4
In GRU, the transfer of information is unidirectional, which only captures the relevant information in front of the current load data.However, in the task of load data processing, it is urgent to establish a correlation between the output of the current moment and the state of the preceding and following moments in order to learn the characteristics of the load data in more detail.Such global associations help the model to more accurately understand the dynamic changes and trends of the load data, and improve the model's ability to capture complex load features, thus enhancing the ability to predict load behavior.Therefore, in order to meet this demand, this paper chooses to introduce a bidirectional loop control unit (BIGRU) to achieve bidirectional state association.After being processed by the CNN layer, the feature data are fed into the BiGRU layer for modelling the sequence data.The BiGRU layer controls the flow of information through the gating mechanism, learns IOP Publishing doi:10.1088/1742-6596/2771/1/0120337 long-term dependencies in the sequence data, and further extracts the semantic information of the time series.
BiGRU includes GRU units in two directions, forward (forward) and reverse (backward).The forward GRU unit transfers information from the first time step to the last time step of the sequence, while the reverse GRU unit transfers information from the last time step to the first time step of the sequence.Bidirectivity allows the model to take into account contextual information that has occurred and that has not yet occurred, and to better learn sequence features.In this paper, the BIGRU model is stacked with three layers to form enhanced features and obtain more feature information.)] ( ), ( [ The final BiGRU output t W is a splice of forward and reverse hidden states.

2.4.5
Using the K-means cluster information as an additional input to CNN-BIGRU, the fully connected layer is responsible for integrating the features extracted by the previous model and mapping them to the final output, the predicted value of the load.

Experimental environment
The operating system used for the experiment is Centos, the processor is AMD 3600 3.60 GHz, the memory size is 16 G, the hard disk size is 512 G, and the graphics card is RTX2080Ti.TensorFlow is used as the underlying framework, with a version number of 2.5.0;Python is used as the code language, with a version number of 3.7.0.

Experimental dataset
Power load data (power load data is collected once every 15 minutes, and a total of 96 data points are collected in a day), as well as date and temperature data (daily average temperature, daily minimum temperature, and daily maximum temperature), are known for a region from 1st January 2010 to 1st January 2014.The training set, validation set, and test set were set up using a differentiated ratio of 8:1:1.

Experimental parameters
The main parameters of the improved model can be adjusted by changing the values of the variable parameters to obtain the optimal configuration of the fusion model, provided that the other parameters are fixed.As shown in Table 1.

Experimental evaluation criteria
According to the national grid load forecasting and evaluation standards, there is a certain degree of uncertainty and contingencies in power load.Moreover, there are often some differences between the forecast results of short-term load forecasting and the actual finger, and certain evaluation criteria need to be adopted to measure the merits and demerits of the model.Commonly used evaluation criteria include: where i x is the predicted value of the load at the ith point, i y is the actual value of the load at the ith point, and m represents the number of samples.

Analysis of experimental results
Experiments with the baseline model on the constructed data set reflect the advantages of the improved clustering algorithm with multi-layer information and the improved CNN-BIGRU model proposed in this paper.The results show that the model established in this paper has higher and better accuracy.The comparison model adopted in this paper is as follows.
(2) CNN-BIGRU algorithm: The CNN-BIGRU improved algorithm is a separate model rather than a superimposed clustering algorithm for training.
(3) K-means and CNN-BIGRU algorithm: the algorithm model proposed in this paper.The summary derived from the experimental results in Figure 5 and Table 2 is as follows: (1) The effectiveness of the three-layer stacked BIGRU model proposed in this paper is proved experimentally by the K-C-B model compared with the K-means and CNN-BIGRU algorithms.The MAPE value of K-C-B is 1.64% higher than that of the K-means and CNN-BIGRU algorithms, which proves that global correlation helps the model to more accurately understand the load data's dynamic changes and trends, and improves the ability of the model to capture complex load characteristics, thus enhancing the ability of accurate prediction and analysis of load behavior.
(2) Compared with K-means and CNN-BIGRU algorithm, the CNN-BIGRU model proves the validity of the clustering algorithm proposed in this paper as additional input through experiments, and the MAPE value of CNN-BIGRU is 0.59% higher than the MAPE value of K-means and CNN-BIGRU algorithms, which proves that taking the same date as the basis for clustering.This helps in identifying data points with similar load characteristics.

Conclusion
The experimental results show that the clustering effect of the K-means model on power load, date, and temperature data is obvious, and the variation trend of them in different characteristics is revealed.In contrast, the CNN-BIGRU model performs well in short-term power load prediction, but during training, due to the complexity of the model, there are signs of overfitting, which leads to discrepancies with the true values.Therefore, it is necessary to finely adjust the model structure and introduce appropriate regularisation means.Overall, the results of this experiment provide useful insights for short-term load forecasting in power systems, but further optimisation is still an important direction for future research.

c
denotes the kth cluster centre after the tth round of iteration, and k S denotes the kth cluster.
h denotes the size of the convolution kernel, x denotes the size of the convolution kernel, and c is the offset.Through the operation of the convolutional layers and the maximum pooling algorithm

Table 2 .
Comparison of model results.
Figure 5. Experimental results of each model.