Informer Model based Wind Power Forecast with Tropical Storms Present

When severe tropical storms pass through, regional wind speeds fluctuate greatly, and the volatility of wind farm output also increases significantly. At the same time, the duration of tropical storms is long, and it is difficult for short-term time series data prediction models to be effective, in which case the unstable output of wind turbines will have a greater impact on power system dispatching. This paper first examines the characteristics of tropical storm movement, namely the change in wind speed, and then uses the Informer long time series data prediction model to predict the total change in wind turbine output in the next 10 days after the storm has passed. The actual case proves that the Informer model is ideal in predicting the long time series of wind power output during a tropical storm.


Introduction
In order to solve the problem of resource shortage and environmental pollution in the world, and to meet the growing demands of economic development, the scale of the current power system is expanding day by day, and its dispatch control and overall structural components are becoming more and more complex.At present, in the forward-looking new power system oriented to new energy, wind power has been vigorously developed by countries around the world due to its advantages of abundant resources, pollution-free and large-scale application potential.On the other hand, due to its own intermittency and volatility, wind power will cause great scheduling problems in the power system.In addition, because the output of the blade is greatly affected by climatic conditions, especially in extreme weather conditions, the blade may frequently cut off, and the disturbance capacity of external conditions on the power system increases, which greatly reduces its stability.Wind power can be divided into onshore wind power and offshore wind power in the application scenarios, "sea wind" compared with "onshore wind" application difficulty, the overall scale is much smaller than the land wind farm, but with the breakthrough of core technology, than land wind power has better application prospects.Most onshore wind farms are built at high altitude, and most of the current research hotspots in this area focus on blade icing detection and anti-icing and de-icing technology.Offshore wind farms do not have the conditions for blade icing, but a number of extreme weather events such as tropical storms, accompanied by severe wind speed changes, can cause large fluctuations in wind turbine output; with the increasing proportion of wind turbine capacity, the power balance of the electricity system is facing major challenges [1][2][3].
Wind power prediction under severe tropical storms is a typical time series prediction problem, in order to capture the time series characteristics, RNN models are widely used, but RNN has gradient vanishing and gradient explosion problems when the network deepens.Literature [4] The adaptive weight learning module selects the best weights for the output of CNN-LSTM module and GRU module, and constructs a short-term prediction model of CNN-LSTM & GRU combination, but it still has poor performance in capturing long-term dependencies.The literature [5] uses recurrent neural networks (RNNs) combined with autoregression (AR) to predict short-term wind speed probabilities,

Characteristics of the wind power generation with tropical storms present
A severe tropical storm is a type of tropical cyclone, which is a disturbance of a tropical depression that develops gradually and can have a maximum wind speed of 17.2 -24.4 m/s near the centre of the cyclone.The high winds, heavy rainfall and large waves generated by tropical storms have a major impact on coastal urban infrastructure, shipping and agricultural production.Tropical storms form under conditions of high sea level, suitable flow fields and geostrophic deflection forces.

Figure 1
Wind speed and wind power profile a wind farm When the wind speed exceeds the rated wind speed, to maintain the stability of power generation, the variable speed gear of the blade will control the speed to make the output vary slightly around the rated power, when the wind speed falls below the rated wind speed, the limit power tracking method will be adopted to ensure the power generation efficiency of the unit [7].It can be seen that there is a positive correlation between blade power and ambient wind speed.To further explore the factors that affect the output of the blade, the statistics of the blade and power data of Hong Kong, China is drawn into Figure 1, it can be seen that the output of the blade is greatly affected by the wind speed, showing a strong coupling relationship, although the blade has a control system such as a gearbox, the large variation of wind speed is the main reason why the blade output is difficult to control [8].A large number of studies have shown that when tropical storms are moving, only in a certain wind farm in the centre of the storm will the wind speed be too high, resulting in frequent shutdowns of the wind farm.The peripheral wind farm will even maintain a full state with sufficient wind resources, the wind speed in the centre circle area can exceed the threshold, and the wind farm in the edge area can capture the most suitable wind energy.The applicability is not strong.However, due to the high speed of the storm, up to 30 km/h, as shown in Figure 2, the storm "cuckoo" always maintains a high moving speed while moving, and over time, the wind farm located at the edge of the storm may appear in the belly area of the storm within a few hours, causing a dramatic change in the real-time wind speed in the area.If the wind turbine is in unusual operation due to wind speed and other reasons, it will not only make the blade output deviated from the predetermined value, but also cause the internal structure of the blade crash, for safety reasons, the coastal counties of China in the strong tropical storm will choose to stop all the wind turbines along the line, significantly reduce the overall power system backup capacity, increase the power system operating costs [9].

Severe tropical storm wind power prediction
Wind power, due to its intermittent and volatile nature, will have a serious impact on grid stability, and accurate wind power forecasts can provide guidance for power system operation dispatch control strategies.Wind power forecasting can be divided into short-term (hours), long-term (days) and ultralong-term (days to months) forecasting; short-term forecasting can solve the problem of real-time power imbalance in the power system and optimise system scheduling.Ultra-long-term forecasting is mainly used in planning problems such as wind farm siting [10][11][12]; Long-term forecasting can help power plants to plan power generation, determine the right maintenance time and ensure that the wind turbine generates maximum economic value within the effective wind speed.
China's offshore wind power is mostly concentrated in the southeastern coastal region, the frequency of strong tropical storms in this region is high, the output of wind farms along the way is highly uncontrollable, access to the regional local power grid can alleviate the problem of a large number of wind power in the centre of the storm to some extent [12], but with the gradual increase in the proportion of wind power in China, it can seriously affect the power quality of the power grid, and even cause regional large-scale power outages.For example, on 28 September 2016, a high wind event occurred in southern Australia, which led to a large-scale de-grid of new energy sources in the power system, which accounted for more than 30% of this wind power generation, and eventually developed into a 50-hour large-scale power outage.
Figure 3 is the real-time data of wind speed when the tropical storm passes through a certain area of the southern coast, and the difference value of each moment is further calculated as a graph, as shown in Figure 4, the wind speed in this area in two days is the lowest 0m/s, the highest can reach nearly 18m/s, and the extreme value difference reaches 18m/s.The variance calculation result of this wind speed time series data is 19.425, and the wind speed difference amplitude is large, the wind speed fluctuation in this area is large, and the relative position of the wind farm and the storm centre changes at different times under strong tropical winds, which further increases the instability of power generation in power plants.For the wind power forecasting under these weather conditions, China's climate characteristics determine that offshore wind power is only suitable for construction in the southeast coastal area, basically can exclude the research on wind farm site selection, in order to ensure the accuracy of forecasting and scheduling accuracy, the unit power forecasting interval should include the whole process of storm operation, the corresponding task is long-term forecasting related content [13].Under the action of tropical cyclones, wind speed at different locations in the wind farm area also changes with the cyclone movement, showing the characteristics of time-series fluctuations.By extracting the characteristics of regional historical time-series wind speed data under storm cyclones, regional wind speed under the action of new storm cyclones in the future can be predicted, thus providing guidance for wind turbine control strategies.However, the movement of storm cyclones in the wind farm area usually takes 3-10 days, and the prediction ability on long time scales is particularly important, and the difficulty of prediction increases significantly with the length of the prediction series [14][15][16].

Figure 4 Informer model structure diagram
The Informer model is a long sequence time series prediction model based on the attention mechanism developed by Beihang Airlines.The encoder-decoder mechanism of Transformer is used, and on this basis, the time complexity of the ProbSparse self-attention optimisation model is proposed, the distillation mechanism of self-attention is proposed to optimise the computational amount of the model, and the generative decoder mechanism also significantly reduces the prediction time [17].
The overall network structure of the Informer model is shown in the diagram: The model is mainly composed of two parts, encoder and decoder, both parts can accept the input of time series data, but the difference is that the input data of the encoder is all historical long time series data, while the input data of the decoder is a combination of a historical time series data and a length of prediction length, the starting value of the null value data, null value data is the prediction value of the dummy data.


Encoder The encoder is stacked by the multi-head probability sparse self-attention mechanism and the distillation mechanism to obtain the long-term dependent features of the input historical time series data.The multi-head attention mechanism will pay more attention to the data with more obvious fluctuation characteristics in the time series data, map the time series dependencies to the intermediate features containing the time series data information, and further use the attention distillation mechanism to give more weight to the intermediate features with dominant features and compress the feature dimensions.Since the output of each position in the sequence already contains the information features of other elements in the sequence, the coder of each layer reduces the length of the input sequence by half through convolution and pooling, thereby significantly reducing the computational and memory requirements of the coder.The encoding layer and the distillation layer together form the stack structure, the specific structure of which is shown in the figure: (1) The distillation layer contains a one-dimensional convolutional layer and a maximum pooling layer, and the distillation operation from layer i to layer i + 1 is: These include the key operations in the attention block and the sparse self-attention of the bull problem.Conv1d represents a one-dimensional convolution operation on a time series using ELU as the activation function.Its formula is as follows: 1, 0 ( ) , 0 After multiple computations of the long probability sparse self-attention mechanism and the distillation mechanism, the encoder can output an intermediate result, and the multi-mechanism computation also increases the robustness of the model.

 Decoder
The decoder further realises time series prediction based on the input time series data and the output of the encoder.It consists of two multi-headed attention layers, in which the probabilistic sparse selfattention mechanism adopts a masking mechanism in the computation, measures the weighted attention composition of the time series data, and outputs the vector of the placeholder to be predicted in a rapidly generated manner: After the generative decoder processing, each position to be predicted has a vector.If the vector of the sequence to be predicted is entered into the fully connected layer, the data result of the predicted sequence cannot be obtained.

Informer Probabilistic Sparse Attention Mechanism
Generally speaking, the input form of self-attention mechanism is (query, key), and then scaled point product, and its calculation formula consists of query vector, key vector, value vector, where d is the dimension of the input data, and the calculation formula is: The probability of sparse attention for the ith query has the form： where qi, ki, vi denote the ith rows of the vectors Q, K, V, respectively, ( , ) select the Asymmetric index exp( ) The probability distribution of self-attention itself has sparse characteristics, and its structural distribution is also a long-tail distribution, only a small part of the dot product contributes more to the main attention, and the contribution of other parts of the dot product can be ignored.Therefore, to measure the sparsity of the query, the KL divergence is used to evaluate the sparsity of the i-th query, and the formula is: Based on the above evaluation method, the formula for the probability sparseness of the attention mechanism is: 5 Simulation experiment and analysis of prediction results

Evaluation indicators
Common metrics used to evaluate the performance of model algorithms are MAE, MSE and RMSE.MAE can better reflect the actual situation of prediction value error; MSE is the square of the difference between the true value and the predicted value, and then summed and averaged, which is convenient for derivation in the form of square, so it is often used as a loss function for linear regression.The RMSE is used to measure the deviation between the observed value and the true value; in addition, in order to better evaluate the accuracy of the time series, another evaluation index is introduced, the MAPE, and the specific formula is as follows: where is the predicted value of the model and is the actual value of the power.

Time series feature extraction
Wind power prediction under severe tropical storms is a time series forecasting problem, and text data is the same structured data, the basic principle is to establish the input and output function fitting relationship by mining historical data, the main difference is that the time series may have strong coupling relationship and long-range correlation and other characteristics will affect the forecasting performance of the model to a certain extent, reasonable processing and extraction of specific information in time is the key problem of prediction [18][19][20].The time series can theoretically be composed of trend terms, seasonal terms, cycle terms and noise, and the possible nonlinear coupling relationship of various factors greatly increases the difficulty of model prediction, and there is also a certain correlation between time and time, that is, the previous period will affect the output of the later period [21].Transformer's encoding and decoding mechanism has great application prospects in time series forecasting problems, but as the output sequence grows, its MSA index will show an exponential decrease.In order to extract the time information in the series and meet the accuracy requirements of long-term forecasting, the improved Informer algorithm is applied on this basis.The research theory of the combination relationship of each feature term in the time series can be divided into addition principle, multiplication principle and nonlinear principle, etc., this paper adopts the additive model, it is believed that the simple sum of seasonal term and trend term can fit the time feature term, and the seasonal term is extracted by the sliding average method, and the average value of each window in the original data is first calculated, the trend term of each window is obtained, and then the trend term of the whole series is obtained.At the same time, according to the additive model, the seasonal term can be obtained by subtracting the trend term from the original input series.The encoder section inputs the historical time series, and the decoder section contains two parts: trend items and seasonal items.The trend item consists of two parts, one is the second half of the trend item decomposed by the series decomposition block of the historical series, which is equivalent to using the recent trend item of the historical series as the initialisation of the decoder; the other part of the trend item is filled with 0, i.e.The season item is similar to the trend item and is also composed of two parts, the first part being the recent season item decomposed by the encoder and used for initialisation; the second part is filled with the mean of the encoder sequence.The main purpose of the encoder part is to model complex seasonal items.Through several layers of Series Decomposition Blocks, seasonal items are continuously extracted from the original sequence.This seasonal term serves as a guide for the decoder in predicting future tense information for the seasonal term [22][23][24].

Simulation results
Considering the drastic changes in wind speed of wind farms under severe tropical storms, this paper takes a severe tropical storm with obvious characteristics and long duration in the southern coastal area as the experimental object, extracts a wind farm data on the storm track, the corresponding training data is the time set of the first two years of the period, the sampling frequency is once every 10 minutes, the input characteristics are real-time wind speed and power values and their corresponding time labels, the original data input step is set to 96, and the decoder input step is set to 48.To avoid overfitting the data, the cross-validation method is used and the training set is relatively large.The default number of training iterations in the model is set to 6 and the number of experiments is set to 2 to obtain better training results.To improve the accuracy of data prediction, the sliding output step after the fully connected layer is set to 24 to better correlate the time characteristics.
In order to simulate the whole process of wind turbine output under strong wind action, and to give the grid dispatch a certain margin of reaction time based on meteorological knowledge by changing the proportion size of the prediction set, the output prediction interval is set to 3 days and 10 days, in addition, to explore the prediction performance of the model on a long-term series, to check the problem that the evaluation index decreases too fast with the growth of the prediction interval, a shortterm prediction experiment with a prediction length of one day is used as a control.Running the programme produces the specific forecast results shown in Figure 6, Figure 7 and Figure 8.The 10-day wind turbine forecast basically covers the influence process of wind turbine output at different times under tropical storms, as can be seen from the figure, the wind power output under tropical storms is extremely volatile, and the unit can be approached from half power state to full state within hours, The specific reason can be attributed to the strong tropical storm moving too fast, resulting in a large difference in wind speed in different regions, and the rate of change of wind speed in a single region is too fast with time.
In addition, to assess the prediction accuracy of the model on the time series, the MAE, MSE, RMSE and MAPE values of the three experiments were calculated and plotted in Table 1.It can be seen that among the predictors of a single day, not only the values of MAE, MSE and RMSE are the smallest, but also the value of MAPE is the smallest among the three experiments, showing the lowest error rate, the highest stability and the processing performance of the time series.Combined with the graph, it is found that the predicted image in a single day not only has a stable frequency prediction, but also has a high degree of fitting for the amplitude, and with the increase of the length of the prediction time, the frequency prediction of the image is better, while the difference in amplitude is large, and the MAE, MSE, RMSE and MAPE indicators reflected in Table 1 increase with the length.However, this does not mean that the model's processing ability for long-term series prediction is too poor, with the increase of prediction samples, the difference between the four indicators of 1-day prediction and 3-day prediction and the difference between 3-day prediction and 10-day prediction is not obvious, and the 7-day time span is more than the 3-day time span, and the reduction is proportionally smaller, and there is no problem that the prediction interval increases and causes the model performance index to decrease.It can be assumed that the model performs well on long time series forecasting problems.

.Conclusion
In order to solve the impact of unstable wind power output on power system dispatch under specific weather conditions, based on the long-term series prediction problem, of which long-term power prediction is a part, the Informer model is applied to capture the characteristic information in time to effectively predict the power curve of wind turbines during the duration of tropical storms.The work of the article is summarised as follows: 1) The relationship between wind turbine power and wind speed is summarised, and the wind speed power changes sharply and for a long time under severe tropical storms, and a long-term prediction method for wind turbine power is proposed to provide a basis for grid power generation planning.
2) The improved self-attention mechanism is applied to ensure that each module in the encoder and decoder can extract the trend term and seasonal term components in the data to the greatest extent, and summarise the overall change process of wind turbine output under the action of severe tropical storm with a forecast length of 10 days, and the results show great long-term series prediction potential.This experiment does not consider the influence of wind direction on the output of the wind turbine, strong tropical storm will make the wind direction change speed of the wind farm too fast, and the addition of this feature quantity can occupy part of the weight in the model prediction, which plays an auxiliary role in improving the prediction accuracy.

Figure 2
Figure 2 Wind speed and moving speed of Tropical Storm 'Dujuan' in 2021

Figure 3
Figure 3 Wind speed and differential curve under storm 4 Time series forecasting model

Figure 5
Figure 5 Encoder internal Stack structure diagram Coding layers include long probabilistic sparse self-attention layers, forward neural networks, residual connections, and layer regularisation operations: ( ( ))   o LayerNorm x Sublayer x(1) The distillation layer contains a one-dimensional convolutional layer and a maximum pooling layer, and the distillation operation from layer i to layer i + 1 is: