Energy consumption forecasting with deep learning

This research endeavors to create an advanced machine learning model designed for the prediction of household electricity consumption. It leverages a multidimensional time-series dataset encompassing energy consumption profiles, customer characteristics, and meteorological information. A comprehensive exploration of diverse deep learning architectures is conducted, encompassing variations of recurrent neural networks (RNNs), temporal convolutional networks (TCNs), and traditional autoregressive moving average models (ARIMA) for reference purposes. The empirical findings underscore the substantial enhancement in forecasting accuracy attributed to the inclusion of meteorological data, with the most favorable outcomes being attained through the application of time-series convolutional networks. Additionally, an in-depth investigation is conducted into the impact of input duration and prediction steps on model performance, emphasizing the pivotal role of selecting an optimal duration and number of steps to augment predictive precision. In summation, this investigation underscores the latent potential of deep learning in the domain of electricity consumption forecasting, presenting pragmatic methodologies and recommendations for household electricity consumption prediction.


Introduction
With the growing concern regarding the close relationship between energy consumption and environmental sustainability, forecasting electricity usage is gaining importance as a key aspect of energy management and planning.Accurately predicting household electricity consumption is not only crucial for energy companies in terms of resource allocation and grid scheduling but also directly impacts the energy cost management for household users.Nevertheless, household electricity consumption forecasting encounters several challenges, including the complexity of multidimensional time series, the influence of seasonal and weather changes, and the modeling of long-term dependencies.
In this context, deep learning technology has emerged as a forefront research area in electricity consumption forecasting, owing to its exceptional feature extraction and modeling capabilities.In recent times, the field of time series forecasting has witnessed notable advancements driven by deep learning models.These models include recurrent neural networks (RNNs) and their various iterations, as well as the more recent introduction of time series convolutional networks (TCNs).These models not only excel at capturing intricate patterns within time series data but also adeptly handle seasonal variations and nonlinear relationships.Additionally, the introduction of external factors, such as weather data, has further enhanced the predictive performance of these models.
practice of data curation prior to constructing predictive models yields notably superior results compared to the direct application of machine learning model building.
A comprehensive model that was intended for short-term wind power forecasting was presented by Wang and Wu (Wang Y and Wu Y) [20].This model was anchored on the T-LSTNet_markov architecture.Their exhaustive methodology is comprised of a number of essential stages, the first of which is the preprocessing and enhancement of the initial dataset.After that, they use the T-LSTNet model to make predictions based on the raw data on the power generated by the wind.An error correction step is added, which makes use of the k-means approach in conjunction with the weighted Markov process in order to further improve the predicted accuracy of the model.The empirical findings, which were produced from a case study that was based on a wind farm that was located in the Inner Mongolia Autonomous Region of China, clearly demonstrate a notable improvement in forecast precision after the error correction process was carried out.

Data description
This study investigates the concept of predicting the energy consumption of households as well as the development of predictive algorithms.This inquiry makes use of a dataset that is available to the general public and was initially detailed in a research article with the title "Evaluating Short-Term Forecasting of Multiple Time Series in IoT Environments."The following web address (URL) provides access to this dataset: https://fordatis.fraunhofer.de/handle/fordatis/215.It is comprised of three separate files that are saved in the xlsx file format.These files are titled as follows: 20201015_consumption.xlsx (which has a file size of 21.06 MB), 20201015_profiles.xlsx(which has a file size of 19.21 kB), and 20201015_weather.xlsx(which has a file size of 18 MB).These files include a wide variety of data, such as energy consumption profiles for 499 customers, which are characterized by time series data with hourly resolution, as well as customer profiles and weather data.Additionally, these files contain information regarding the weather.The latter explicitly refers to the outside temperature time series data, which also has an hourly resolution and corresponds to the appropriate regions of each of the customer's locations.
The data for this study cover twelve consecutive months and are divided according to the classic 75%:25% split ratio, segregating the data into training and testing sets, as depicted in Figure 1.Under this division, approximately nine months of data are allocated for model training, with the remaining three months reserved for model testing.The temporal span of the data extends from January 1st, 2019, at 00:00 hours to December 31st, 2019, at 23:00 hours, with hourly time intervals.Utilizing this data, we generated the following line chart, where the x-axis denotes time, and the y-axis denotes energy consumption.In the chart, the blue region represents the training set data, while the red region denotes the testing set data.This partitioning approach ensures a comprehensive evaluation of model performance across different time periods.In the foundational methodologies, we commence with an elementary strategy of employing the most recent known value as the predictive estimate for the current time point.We illustrate the application of this technique by employing an individual user's energy consumption dataset, which encompasses anomalous observations.The utilization of the most recent known value as the predictive estimate for the current time point serves as the establishment of a fundamental predictive model.During the training phase, we juxtapose actual values against their corresponding previous values and calculate a series of error metrics.These metrics encompass bias, root mean squared error (RMSE), mean squared error (MSE), mean absolute percentage error (MAPE), and mean absolute error (MAE).
Subsequently, we explore the simple moving average method with a sliding window of size five hours.This method calculates the moving average value within a specified period (e.g., five hours) and updates it as time progresses.We apply this method to the training set data, calculate predicted values using the moving average, and compute corresponding error metrics for both the training and testing sets.
Furthermore, we introduce the autoregressive integrated moving average (ARIMA) model, which leverages past time step values and relevant lag terms to predict future values.We train an ARIMA model using a specific user's data as an example and utilize the trained model to predict the testing set data.We evaluate the model's performance using ARIMA's predicted results and the previously mentioned error metrics.
We have decided to use the Vector Autoregression (VAR) model to anticipate future energy consumption since it takes into account a number of climatic variables and adheres to stringent processes for both the processing of data and the selection of models.The major purpose of this research is to analyze the complex relationship that exists between consumption patterns and the climatic circumstances that are experienced.The first thing that needs to be done is to combine the data on consumption as well as the data on the weather that belong to the particular user who is being considered.This will result in one uniform dataset.Within the context of this consolidated dataset, 'consumption' and 'weather' play the parts of input features, with 'consumption' functioning as the variable that will be analyzed.Following that, we split the dataset into two separate subsets: a training set, which contains data spanning the period from January 1, 2019, to September 30, 2019, and a testing set, which covers the period from October 1, 2019, to December 31, 2019.
To determine the appropriate lag order for the VAR model, we employ information criteria, including AIC, BIC, FPE, and HQIC.Based on the results, we select a lag order of 24 as the optimal model parameter.We apply this parameter to the VAR model and fit it.The following are summary statistics of the model, along with regression coefficients and statistical significance related to the relationship between consumption and weather.
The model results reveal a close correlation between consumption and the current hour, as well as consumption over the previous several hours within a lag period of 24 hours, and weather factors.However, it should be noted that some lag period regression coefficients do not exhibit significance.From the model fit, we gain preliminary insights into the extent of consumption's relationship with weather.
Next, we predict electricity consumption in the testing set using training set data.Leveraging the VAR model, we predict hourly consumption based on historical information from the training set data.We then compute error metrics, including bias (BIAS), root mean squared error (RMSE), mean squared error (MSE), mean absolute percentage error (MAPE), and mean absolute error (MAE), for the predicted values against the actual values.
The final results demonstrate that the VAR model exhibits certain predictive capabilities for electricity consumption.However, compared to the previously used ARIMA model, there is an increase in prediction errors.Given the complexity of prediction and model performance, we will further explore deep learning methods to potentially enhance prediction.

Gated recurrent unit (GRU)
The Gated Recurrent Unit (GRU) [21] stands as a variant of recurrent neural networks (RNNs) tailored for processing sequential data.It effectively addresses the vanishing gradient problem and exhibits superior performance in capturing long-range dependencies compared to conventional RNNs.GRUs have demonstrated success across various tasks, including sequence modeling and natural language processing.
The GRU unit comprises two pivotal components: an update gate and a reset gate, which wield control over the flow and retention of information.These gates endow GRUs with the capability to manage long-range dependencies adeptly.The computational process of the GRU can be represented using the following equations: 1. Reset Gate: Here are the key components: •   is the input at the current time step.
• ℎ −1 signifies the hidden state from the preceding time step.
•   signifies the reset gate, which regulates the influence of the previous hidden state on the candidate state.
•   stands as the update gate, governing the equilibrium between retaining the previous hidden state and employing the candidate state.
• ℎ  ̃ denotes the candidate hidden state, its computation dependent on the present input and the reset gate.
•  denotes the sigmoid function.

Long Short-Term memory (LSTM)
Within the field of recurrent neural networks (RNNs), the Long Short-Term Memory (LSTM) model is an important step forward in technological development.It has been painstakingly designed in order to address the problem of the disappearing gradient while also being capable of capturing complicated dependencies within sequential data [22].LSTM presents a memory cell structure that is more complex than the traditional RNN design, which is how it differentiates itself from that architecture.Because of this breakthrough, its application has been ubiquitous across a variety of fields, including sequence modeling, language translation, and speech recognition.
Central to the LSTM architecture is the inclusion of a cell state, which functions as a conveyor for information transfer across various time steps.Additionally, it features three crucial gates: the output gate the forget gate, and the input gate.These gates meticulously manage the flow of information into and out of the cell state.
The computations within LSTM can be elucidated using the following equations: Here are the key components: •   is the input at the current time step.
• ℎ −1 signifies the hidden state from the preceding time step.
•   signifies the forget gate, determining which information is to be discarded from the cell state.
•   is the input gate, governing the introduction of new information into the cell state.
•   ̃ is the candidate cell state, potentially incorporated into the cell state.•   stands for the updated cell state, a blend of the prior cell state and the candidate state.
•   represents the output gate, controlling the extent to which the cell state contributes to the hidden state.
• Weight matrices         are used in the calculations.
•  plays a fundamental role within the gating mechanisms.
•tanh is utilized in specific computations.

Temporal convolutional network (TCN)
The Temporal Convolutional Network (TCN) [23] is a type of deep learning model that has been carefully crafted for the purpose of the processing of sequential input.TCN makes use of dilated convolutions, as opposed to the typical recurrent models, in order to properly capture long-range dependencies within the data.TCN has attracted a lot of interest due to the effectiveness with which it handles sequences of varied lengths and the extremely parallelizable architecture that it possesses.The input sequence is processed by TCN using a series of stacked convolutional layers that are dilated.Because of these dilated convolutions, the model is now able to investigate a more extensive context window without requiring a substantial increase in the amount of parameters.As a consequence of this, the model is able to effectively represent both long-term and short-term dependencies that exist inside the sequence.
The pivotal characteristic of TCN lies in its utilization of dilated convolutions.The computational process can be depicted through the following equation: Here's the breakdown: •   represents the output at time step t.
•   stands for the convolutional filter weights positioned at index i.
•k denotes the kernel size, effectively determining the size of the receptive field.
•  serves as the dilation factor, governing the spacing between filter elements.GRU and LSTM have the advantages of capturing the long-term dependency.However, they are troubled by the gradient problem from the theoretical aspect, making them less desirable for the energy consumption forecasting problem considered in this study.On the other hand, TCN has the advantage of efficient training and strong learning capacity.

Experiment and discussion
As part of the scope of this inquiry, we evaluated the overall performance of a number of different models that anticipate power use.In order to carry out this examination, we made use of three distinct metrics: the root mean squared error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE).The various models that were investigated include the Autoregressive Integrated Moving Average (ARIMA), the Simple Moving Average (SMA), the Vector Autoregressive (VAR), the Long Short-Term Memory (LSTM), the Gated Recurrent Unit (GRU), and the Temporal Convolutional Network (TCN).The TCN model was shown to have consistently produced the most positive findings across all three measures, which is indicative of its superior fitting ability when compared to its contemporaries.This was discovered through the use of stringent experimental analysis.We support the incorporation of weather-related information in multi-time series prediction initiatives wherever it is practicable.In pragmatic applications, this is especially important when taking into account the influence that meteorological variables have on the accuracy of predictions.
RMSE, which stands for Root Mean Square Error, serves as a metric for quantifying the disparity between observed values and ground truth values.The formula for RMSE is defined as follows: Conversely, MAE, denoting Mean Absolute Error, represents the mean of absolute error values.Its calculation is as follows: The MAPE (Mean Absolute Percentage Error) is a metric sensitive to relative errors, remaining unchanged due to the global scaling of the target variable.It is suitable for problems with relatively large differences in target variable magnitude.The expression for MAPE is: The evaluation results are summarized in Table 1.For RMSE, MAE, and MAPE, smaller values indicate better model performance.The performance differences between different models can be seen from Table 1.Deep learning models have a better performance than traditional models in general.And TCN has the smallest error with the weather information.Observing the experimental results, we can draw the following conclusions: For all models that do not include weather data, RMSE, MAE, and MAPE show a decreasing trend, indicating a gradual enhancement in predictive capability when weather data are excluded.Among these, the TCN model achieved the lowest RMSE, MAPE, and MAE.This suggests that TCN and GRU exhibit favorable electricity consumption prediction performance in the absence of weather data.
Upon introducing weather data, the predictive capabilities of the models generally improve, particularly the TCN model, which saw a significant improvement in MAPE from 134.407 to 89.445, signifying a substantial enhancement in prediction accuracy.This strongly underscores the importance of incorporating weather data for boosting predictive performance.
When comparing models with and without weather data, it becomes evident that models incorporating weather data typically offer more accurate predictions.Among these, the TCN model benefits the most.Nonetheless, it is essential to highlight that the LSTM and GRU models experience relatively smaller increases in RMSE and MAE upon the inclusion of weather data.

Conclusion
Based on the experimental outcomes in this study, it is evident that incorporating weather data proves to be an effective approach for enhancing the accuracy of user electricity consumption predictions.Furthermore, the TCN model emerges as the optimal choice for predictive performance when integrated with weather data.To enhance the precision of predicting user electricity consumption, we advocate for the integration of the TCN model in tandem with weather data.This synergy presents promising avenues for future research exploration: The first research direction involves exploring the application of graphbased deep learning models [24] for energy consumption forecasting, as these models have demonstrated effectiveness in addressing similar problems.The second research direction entails the implementation of distributed learning techniques [25], which would be well-suited for real-world systems, enabling scalable and efficient predictions.The third research direction revolves around the joint forecasting of weather and energy consumption [26], a potentially more effective approach that considers the interplay between these two critical factors.This holistic approach could lead to enhanced predictive capabilities and improved energy management strategies.

Figure 1 .
Figure 1.Training and test split.
Some examples of the forecasting results of TCN with and without weather are shown in Figure2and Figure3.

Figure 2 .
Figure 2. TCN model performance without weather information.

Figure 3 .
Figure 3. TCN model performance with weather information.Subsequently, we further evaluated the TCN model in greater depth by analyzing the effect of the input time length as well as the output prediction step.Figures 4 and 5 exhibit the findings in their respective formats.

Figure 4 .
Figure 4. Influence of the input time length on the TCN model.

Figure 5 .
Figure 5. Influence of the output prediction step on the TCN model.

Table 1 .
Evaluation results of different models.