Research on Attack Identification Method and Device Method Based on Random Forest Algorithm

Because the energy storage system has the function of stabilizing the voltage and frequency of the grid, when the energy storage system is connected to the grid, it is necessary to obtain its operating status information in a timely and accurate manner. The operating status data of the energy storage system is uploaded to the higher-level dispatch center via the energy storage coordination control device. However, with the intelligence and informationization of the power grid, various cyber attack methods have emerged endlessly. There are various security risks in the way of information interaction based on the network. Attackers can analyze the communication protocols within them, or implement Eavesdropping attacks, Dos attacks, and tampering with sensitive data, etc., causing the energy storage coordination control device to obtain the wrong data sent by the dispatch center. The wrong data causes the energy storage system to output the wrong power, which may have the opposite effect when adjusting the voltage and frequency of the power grid, causing an imbalance in the voltage and frequency of the power grid, resulting in great economic losses, and even causing casualties due to equipment failure. Attackers can also gradually invade the upper-level dispatch center by using the energy storage coordination control device as a springboard. In view of the above problems, this paper uses a random forest-based network attack detection module is used to make real-time judgments on all data flows of the energy storage coordination control device. That is, when the energy storage coordination control device is subjected to a network attack, all data flows of the energy storage coordination control device pass a trained random forest attack detection model, thereby implementing attack detection on real-time data flows.


Introduction
Facing the energy crisis, financial crisis, and climate crisis, people recognize the importance of new energy development. Countries' investment in new energy has increased significantly, and the capacity of new energy has also expanded dramatically. Renewable energy power generation is the core of new energy development. However, due to the impact of seasonal, meteorological, and regional conditions, wind, solar, and marine energy generation has obvious discontinuities and instabilities. The generated power fluctuates greatly, the adjustability is poor, and the generation and consumption are sometimes delayed. When there is too much new energy generation capacity connected to the grid, the stability of ICNISC 2020 Journal of Physics: Conference Series 1646 (2020) 012012 IOP Publishing doi: 10.1088/1742-6596/1646/1/012012 2 the grid will also be affected. By supporting a large-scale energy storage system, the time difference between generation and consumption can be resolved, and intermittent renewable energy generation can be directly connected to the grid to impact the power grid to adjust the power quality [1][2][3].
Because the energy storage system also has the function of stabilizing the voltage and frequency of the grid, when the energy storage system is connected to the grid, it must obtain its operating status information in a timely and accurate manner. The operating status data of the energy storage system is uploaded to the higher-level dispatch center via the energy storage coordination control device. However, with the intelligence and informatization of the power grid, various cyber attack methods have emerged endlessly. There are various security risks in the way of information interaction on the network. Attackers can analyze the communication protocols within them, or implement Eavesdropping attacks, Dos attacks, and tampering with sensitive data, etc. Causing the energy storage coordination control device to obtain the wrong dispatch center to issue data. The wrong data leads to the wrong output power of the energy storage system, which may have the opposite effect when adjusting the voltage and frequency of the power grid, causing an imbalance in the voltage and frequency of the power grid, resulting in great economic losses, and even caused by the abnormal operation of the equipment. attackers can also gradually invade the upper-level dispatch center by using the energy storage coordination control device as a springboard [4][5][6][7][8][9].
Potential safety hazards of existing energy storage coordination control devices. Under normal circumstances, one or more security vulnerabilities will give the attacker a chance. The attacker can use the security breach as a breakthrough point, by creating a large amount of useless data or sending repeated requests to occupy the network resources of the attacked person or interfering with the normal communication of the attacked. In the energy storage coordination control device, hackers can easily launch various attacks on them based on known security holes. For example, an attacker can create a large amount of useless data, causing network congestion of the energy storage coordination control device, making it unable to communicate with the superiors and subordinates normally; the attacker can use the energy storage coordination control device transmission protocol to handle the defects of repeated connections, repeatedly The high-frequency sending of repeated connection and repeated requests makes the energy storage coordination control device unable to process other normal requests in a timely manner; an attacker can inject a Trojan into the energy storage coordination control device and use it as a springboard to gradually invade the upper-level dispatch center; the attacker can also root the energy storage coordination control device transmission protocol defects and repeatedly send malformed attack data, such as tampering with the output power of the energy storage system, causing grid voltage oscillations or the dispatch center to incorrectly allocate a large amount of system resources, which directly affects the safe and stable operation of the grid.
A new network attack detection module is added to the energy storage coordination controller. Its principle is to detect the data flow in the energy storage coordination controller based on a random forest algorithm to determine whether it has been attacked and what kind of network attack. The network attack detection model is first trained using data after various network attacks, so that the model can meet the required performance requirements. Figure 1 shows a conventional energy storage coordination control device and system structure diagram. It can be seen from the figure that the energy storage coordination control device is a communication bridge that connects the upper-level dispatch center and the lower-level energy storage power station monitoring system and other intelligent devices. It collects the PCS working status, PCS charge and discharge power, SOC value, etc., and receives it. Relevant remote signaling data from the lower-level controller, as well as forward-relevant remote adjustment, start-stop commands to the lower-level controller; upper-level real-time transmission of the input and output of the energy storage system, active and reactive power, power consumption, and voltage, current, power and other data of the grid connection point.    And reactive power, battery SOC, maximum chargeable power, maximum dischargeable power, rated power, working status;

Grid-Side Energy Storage System Model
The grid data also includes PT secondary rating, CT secondary rating, current and voltage change dead zone, frequency change dead zone, time synchronization method, device address, A \ B network IP address and subnet mask, and remote power Power, remotely set reactive power, system reactance value, grid-connected voltage value, reactive power adjustment compensation, input of hard pressure plate and opening and closing state of the signal; The CPU module is used to receive the data stream sent by the communication unit, the measurement power source and the filtering unit and send it to the network attack detection module for real-time detection and classification, and determine whether there is an attack behavior in the data stream based on the detection and classification results; When the data has an attack behavior, an alarm is displayed on the HMI LCD screen to display the type of attack behavior and generate a log record; when the grid data in the detection result does not exist, the normal data flow is sent to the main center of the dispatch center through the communication module System; At the same time, the CPU module also sends log records to the log module for storage or sends attack alarms and log records to the dispatching center master station system through the communication module.
The measurement module is used to obtain the data stream in the energy storage power station system and upload it to the CPU module; the wave recorder module is used to send the data stream of the fault record and waveform to the CPU module; the print module is used for printing; the log module is used for the log Records, data streams and alarms are stored; the timing module is used for device timing; HMI LCD screen is used for display; The network attack detection module is used to detect the data stream sent by the CPU module in real time through a random forest model, and output the detection classification result to the CPU module.

Network Attack Detection Model
This model is a network attack detection model based on the energy storage coordination controller. The network attack detection model based on random forest is obtained by the following methods: (1) Using the data stream as training samples, establish N training sample sets and M features, where N is the number of training sample sets and M is the number of features, and construct at least one tree to train the training sample set; from The training sample set is randomly sampled from the training sample set. The 2N / 3 training samples in the randomly selected training sample set are used as the training set (each sample will be put back into a sample). The remaining training samples in the set are taken as out-of-bag samples (Out of bag test set), and then m features are randomly selected from the training set as the basis for the branching of this tree, where m <M. The way of selecting features builds a tree, as shown in Figure 4  The importance of A 1 in the t-th tree The importance of A 1 in the forest Figure 5. Schematic diagram of the importance of each eigenvalue in the forest (3)Update and iterate the random forest model, and find the optimal random forest model in the random forest model as the final random forest model [15][16][17][18].
Among them, the determined characteristic values include: 1. Determine whether a feature plays a role in this tree or whether the feature is invalid in this tree by randomly changing the attribute value (attribute value of the feature) of a feature; 2. Compare the error rate of the test set before and after the change, and take the difference of the error rate of the test set as the importance of the feature in the tree. Calculate once for m features in a tree to obtain m The importance of the feature in the tree; The error rate is obtained using formula (1): 3. Calculate the importance of each feature in the forest, take the mean value of the importance of this feature in multiple trees as the importance of the feature in the forest, and use the formula (2) to obtain the feature value: (2) Among them, MDA represents Mean decrease accuracy; A in Ai represents a feature, and i represents the number of the feature: nsum represents the number of times feature Ai appears in the forest, and OOBerrta represents the value of the Ai attribute in the t-th tree Out-of-bag error rate after change, OOBerrtb represents the out-of-bag error rate of normal Ai value in the t-th tree.
The error rate outside the bag is obtained by the formula (3) 4. After obtaining the importance of all the features in the forest, sort all the features according to the importance, remove some of the features with low importance in the forest, and get a new feature set to complete an iteration.
Determining feature values also includes: repeating 1-4, gradually removing relatively poor features, each time a new random forest model is generated, until the number of remaining features is m, and then the best among these random forest models is found The random forest model is used as the final random forest model for real-time detection of data streams. Figure 6 is a specific flowchart for determining the characteristic value.

Conclusions
Compared with the prior art, this paper uses all data streams in the energy storage coordination control device as training samples and uses a random forest model as a detection engine to detect and classify the input data streams. When the energy storage coordination control device is under attack , Issue an alarm, isolate suspicious data while generating log records, thereby improving the information security of the energy storage coordination control device.

Acknowledgments
This work was by Supported by the National Key R&D Program of China (2018YFB0904900, 2018YFB0904903).