Short-term load prediction with LSTM and FCNN models Based on attention mechanisms

To address the significant load prediction discrepancies in microgrid economic dispatch, which result from substantial load fluctuations, a method for near-term load forecasting leveraging the attention mechanism is proposed, utilizing both LSTM and FCNN models. In this method, the attention mechanism is introduced to enhance the LSTM model’s ability to recognize and utilize key information, and the LSTM is used to process historical load data with time continuity selected by a sliding window, whereas the FCNN is used to process daily static features, such as the day’s maximum temperature, minimum temperature, and weather conditions. The outputs of the two networks are subsequently fused into a fully connected layer to generate the final load forecast. Training leverages six months of actual power load values and regional meteorological factor data. The forecast outcomes are then evaluated against both traditional and machine learning algorithms. The experimental findings highlight the superior prediction accuracy of the proposed model.


Introduction
The stable operation of the power system and the optimization of economic benefits depend crucially on the accurate prediction of power loads.Especially in the environment of microgrids, accurate shortterm load forecasting can effectively realize the economic dispatch of power grids, thus improving the efficiency of energy use, reducing operating costs, and at the same time improving the reliability and stability of power grids [1].However, power loads are affected by a variety of factors, such as climate change, daily activity patterns, and industrial production activities, and these complex influences make power loads have strong nonlinear and time-series characteristics.
Current methodologies for load forecasting can be classified under two major categories: statistical approaches [2][3][4][5][6][7][8] and methods rooted in machine learning.In [2], multiple linear regression is used to establish a linear regression formula for load forecasting on a small number of large data variables.However, the correlation between electrical load and its influencing factors frequently exhibits nonlinearity, and its forecasting accuracy needs to be improved.In [3], a method for spatial load forecasting is proposed in data scarcity, generating an adversarial network that requires a large amount of computational resources.The training process is very complex, and if it is not trained correctly, the quality of the generated Class II metric historical load data may be low, which will affect the final prediction results.In [4], Dai and Zhao use the ARIMA model and gray model concerning the benefits and drawbacks associated with near-term load prediction and accuracy comparison.The gray model is not as accurate as the ARIMA model, but the ARIMA model requires smooth data such as mean and variance that are constant in time.In [5], a model for near-term power load prediction is presented, utilizing feature selection and least squares support vector machine for optimal input features.However this model involves a variety of optimization and feature selection methods, and the complexity of the model is high.In [6], the forecasting method is designed as a seasonal approach to propose a CNN-LSTM model based on the attention mechanism for short-term electricity load forecasting.In [7], the feature vectors are first extracted by using CNN and then used as input data for the LSTM network in a time series manner, and eventually, the LSTM network is utilized to carry out near-term load forecasting.In [8], the random forest algorithm is introduced, which does not require a lot of human setting and adjusting hyper-parameters, but its adaptability and flexibility in facing different problems may be poor.In [9], the load characteristic data is split into elements of high and low frequency by wavelet transform, and a hybrid neural network model is built for high-frequency and low-frequency respectively for prediction, but the model construction is complicated and requires larger computational resources.
To enhance the accuracy of load prediction, a method for short-term forecasting grounded in the attention mechanism and employing both LSTM and fully connected networks denoted as Attention-LSTM-FCNN, is proposed.This method fully utilizes the advantages of LSTM and fully connected networks.First, historical load data with time continuity are processed by using LSTM to fully capture the temporal characteristics of the data.Then, daily static characteristic data, such as the peak and trough temperatures, and weather conditions of a day, are processed by using fully connected networks to extract the effects of these factors on the load.After that, the outputs of these two networks are fused into a dense network layer to generate the final load forecast.To substantiate the precision of the suggested model, its predictions are juxtaposed with those from other models: linear regression, support vector machine, random forest, LSTM, and CNN-LSTM.

LSTM model
LSTM stands as a distinct variant of the recurrent neural network (RNN), specially engineered to circumvent the long-term dependency issue.Introduced by Hochreiter and Schmidhuber in 1997, LSTM has seen numerous enhancements and modifications from various researchers, including the addition of a 'forget gate' [10].The configuration of LSTM is depicted in Figure 1, and specific operations of the LSTM are shown in Equations ( 1)-( 6) at time step t.

FCNN model
In a fully connected network, connections exist between all neurons in neighboring layers.Each neuron receives a set of inputs.These inputs are multiplied with the corresponding weights, and then the results are summed and a nonlinear transformation is performed through the activation function to obtain the output of that neuron, whose mathematical relationship is shown in Equation (7).
() y w xb ι < * (7) In the above formula, ι is the activation function, w is the weight matrix, x is neuron input, b is the offset, and y is the output.

Attention Mechanism
After the LSTM layer, the attention mechanism is integrated by adding a SeqSelfAttention layer.This layer acts as an information filter, directing the model to focus its attention on the part of the input sequence that is most relevant to the load prediction task.In this layer, we use 'sigmoid' as the attention activation function, which scales the attention score range between 0 and 1.The attention score represents the weight or importance given to each input in each input sequence.The use of the sigmoid function ensures that the attention scores are normalized, making the model more stable and improving its predictive performance.For example, recent historical data may be more important than earlier historical data when forecasting electricity load [11].

Attention-LSTM-FCNN Network Hybrid Modeling
In this paper, an iterative prediction model is used to train a hybrid prediction model capable of predicting loads at 96 moments a day.The load data for 96 moments per day for 20 days prior to the day being forecasted is selected as a time series feature map using a sliding window with a step size of 1 and a size of 1920X1.This data is fed into an LSTM network for feature extraction.Static features such as maximum temperature, minimum temperature, weather conditions, daytime wind direction, nighttime wind direction, whether it is a weekday or not, and whether it is a holiday or not are selected for the forecasted day with a size of 1X64 and input into a fully connected network.Finally, the two branches are merged and the 96-point load values are output for the predicted day via one fully connected layer.
The architecture of Attention-LSTM-FCNN is depicted in Figure 2.

Experimental evaluation indicators
Two metrics are employed to evaluate the discrepancy between predicted and actual outcomes in the prediction model: Mean Absolute Percentage Error (MAPE), outlined in Equation ( 8), and Maximum Absolute Percentage Error (MaxAPE), detailed in Equation ( 9).
In the aforementioned formula, 'n' stands for the number of time intervals within the forecast day, 'actual' represents the real load value at a specific time 't', and 'prediction' signifies the forecasted load value at the same moment 't'.

3.Calculus analysis
The load dataset from the 9th Mathematical Modeling Competition for Electricians 2016, Question A [12] is used, and the load data from January 8, 2018, to July 10, 2018, is selected as the historical load data (sampled every 15 minutes every day).A sliding window is used to select the load data of the previous 20 days as well as the relevant influencing factors of the day to be predicted, the maximum temperature, the minimum temperature, the weather condition, the wind during the daytime wind direction, night wind direction, and some additional influencing factors are added, such as whether it is a weekday or not, whether it is a holiday or not, to predict the load value for 96 moments on the day after.To evaluate the effectiveness of Attention-LSTM-FCNN on short-term load forecasting, linear regression model, support vector machine model, random forest model, LSTM, and CNN-LSTM models are selected for comparative analysis of load forecasting results.

Handling of abnormal data
A sliding window is devised to augment the accuracy of the model's predictions.The mean and standard deviation of the sliding window are calculated, and the knowledge of statistics is used to calculate the Z-score of each point as expressed in Equation ( 10) [13].
In the above formula, X is the arbitrary load value of the window, rolling_mean is the window mean, and rolling_std is the window standard deviation.

Normalization of data
The load data is normalized in accordance with Equation (11) with an aim to enhance the speed of model training.
In the above formula, Y signifies the normalized data, min Y represents the smallest value from the historical load dataset, and max Y stands for the greatest value within the same dataset.

Results and Analysis
The control variable method is used to optimize the model, varying the number of network layers, the number of neurons, and the activation function of LSTM as well as FCNN.The training set loss (loss) is taken to be 30% and the validation set loss (val_loss) is taken to be 70% as a criterion for the selection of the network structure Y, which is shown in Equation (12).The value of the loss for each network structure is shown in  1 demonstrates the adaptations implemented within the network architecture, aimed to boost the model's predictive accuracy.These involve manipulating the number of LSTM layers, adjusting neuron quantities, and experimenting with various activation functions.A key enhancement is the integration of an attention mechanism into the LSTM layer.This adjustment leads to a noticeable improvement in the model's performance, as shown by the reduced loss values for both the training and validation sets.This indicates a heightened precision in the model's predictions.
Table 2 presents the fundamental conditions of the forecasted day.The attention mechanism improves the accuracy of load forecasting as shown in Figure 3 and Figure 4. Actual load values at 96 instances on the forecast day and corresponding predictions by each model are depicted in Figure 5. It's evident that the proposed Attention-LSTM-FCNN model aligns more closely with the real load curves and offers superior accuracy in predictions compared to the rest of the models.In order to see the prediction differences between the prediction models more graphically, the MAPE is solved every 4 points, and a total of 24 points per day is shown in Figure 6.The one-day MAPE as well as the maximum MAPE for each model are shown in Table 3.

Conclusions
The Attention-LSTM-FCNN hybrid short-term load forecasting modeling method is proposed by processing historical load data and load forecasting influencing factors separately, using Attention-LSTM to extract the effective feature vectors of the historical load data.At the same time, FCNN is used to extract the information useful for load forecasting from the input static features, and through multilayer neuron computation and nonlinear transformation of the activation function, these features are mapped into a new space, which can better represent the relationship between these features and power load.In the comparison of prediction results from the linear regression model, support vector model, random forest model, LSTM model, and CNN-LSTM model, it is clear that our proposed hybrid model exhibits greater prediction accuracy.Such accurate load prediction is advantageous for the economical and precise planning of microgrids.

Figure 3 .
Figure 3.Effect of Attention Mechanisms onLoad Forecasting in Microgrids.

Figure 4 .
Figure 4. MAPE of Load Forecasting with and without Attention Mechanisms.

Figure 5 .
Figure 5.Comparison of predicted and actual curves of different models.

Figure 6 .
Figure 6.24-point MAPE for each prediction model.

Table 1 ,
and after comparison, it is determined that the LSTM network and the FCNN network are both two-layered.

Table 1 .
Loss Values for Each Network Structure.

Table 3 .
Comparison of Model Predictions for Stochastic Prediction Days.