Identification of Disturbances in Power System and DDoS Attacks using Machine Learning

Power system disturbances are the most common problem that often occurs due to human-made or natural events. The most challenging issue is finding the main cause of power disturbance and making appropriate decisions in response to it. The accuracy of the human judgement is not good in predicting the power disturbance; it is not easy for operators to monitor the system’s situation. To make human findings easier, we incorporate machine learning techniques to differentiate between the categories of power disruptions and focus more on DDoS attacks, which are its main cause. We evaluate various machine learning techniques to discriminate between power disturbances and to discuss the advantage of incorporating machine learning to enhance the productivity of the power system design.


Introduction
The main target of the power system is to provide continuous services delivery to the users. These set-ups have been developed with fault tolerance and redundant in order to achieve this goal. They were design at the time where computer security is not considered as a main issue. Later, all the power system was connected with internet to give for control and management. It was generated in order to contain more security features, more secure against unauthorised access and vulnerabilities in the computer networks.
Control systems in industries are same as used in electric grid, are very complicated and tortuous in their design and architecture. The Supervisory Control and Data Acquisition (SCADA) are connected to network having multiple connection rules and physical association. The data are collected from remote locations and software like SCADA are connected to various interfaces generated for isolation services, leads to more possible defects arise in hardware and software, provide more platform for the security attackers. Every component of the smart grid, from meters to home, from control room to substations, is a very easy target to the DDoS attackers [1].
Power systems based on present technology are connected to networks. The computer securities are becoming new menaces for the strength of the systems [2]. Power companies are now incorporated some more security to the power system and rely mostly on network security to prevent from unauthorised access. The new challenges are created for the computer operators to monitor evaluate, monitor and react to disturbances that are taken place in the power system and their tasks are more complicated as they assume that there is more possibility of DDoS attacks. This situation is more complicated and challenging for human to reply because Forest, Adaboost (iv) The potential of our model has checked in three instances of datasets such as multiclass, three class and two class.
The remaining sections of the paper are organised as: Related works are in section 2, proposed model is in section 3, experiments and results are described in section 4, finally wind up in section 5.

Related Works
Machine learning is very popular technique used for the identification of unusual and malicious events in intrusion recognition in cyber security [3]. These are techniques used to monitor transactions between computers and train to get the behaviour and characterize the pattern in the traffic. This paper extends the application of our technique to power system also, where network system is the backbone of the communication and operations among the components of the power system. This application continuously monitors the different variables related to devices which are used for the purpose of communication in the power system.

Cyber Security in Smart grid
Smart grid contains two layered structure: physical and cyber systems. The combination of these two creates the cyber-physical environment. The Phasor Measurement Unit (PMU) works in the cyber part and gives the present data to control the physical layer using energy management system. The sequence of instructions as a process presented to the cyber-physical environment. The phasor data contains information about voltage and phasors measurement including position of devices such as switches, breakers, relays and transformers. The information in the large volume of phasor data provides different algorithms for power system to enhance the performance and reliability of smart grid [4]. The addition of synchophasor equipment increases the working of communication network in the services or in the neighbouring services. The incorporation of synchophasor devices increases the DDoS attacks. There are lots of devices having no security against the attacks. The modern attacks in the power system can be processed by personal computers with the help of network. Salmon et al. proposed a system called Aurora to monitor the capability of an attacker to open and close the breaker with the help of remote connection to create the in electric generator [5]. Susceptibilities can use by malicious settings against Intelligent Electronic Devices (IED). Falliere et al. introduced a system known as Stuxnet worm used to change the settings of control device to increase the working of physical system [6]. The most protocols used in power system having no security at all such as protocols IEEE C37.118, used for streaming of synchrophasor data, MODBUS and DNP3 both are used to remotely. control and monitor IED. Masera et al. conducted a deep penetration test which shows that they targeted the substation computers and devices [7]. Attackers were also able to reset and crash the computer, which leads to real-time tracking and managing the power systems.

Intrusion Detection System (IDS)
In recent time, the exposure of smart grid has influenced the scientists to generate new techniques regarding intrusion detection. The scientists from different fields develop different IDS techniques for the security of smart grid. Ten et al. proposed an anomaly-based security system which is used to protect the IED from the attack [8]. Their system is host-based and identifies attacks only in single IED at a time using event log generated data.
Chen et al. introduced a protection method for smart appliances used in home [9]. Their security method was based on the homogenous rules created by combination of three concepts: usability, security and electricity price. The most advanced version of this type of IDS will consider the behaviour of different devices to produce system level intrusion detection.
Mitchell et al. proposed an efficient IDS system based on specification of smart grid [10]. Their system is based on the behaviours of smart grid devices: access points distribution, highends, and subscriber meters. Readings are obtained from 22 sensors attached with the three types of devices and considered as state components. They summarize all the data from components usually in the prescribed ranges and later, they create 3 state machines with normal form of 3456, 1728 and 3456 respectively from the three devices. Their model was very expensive due to high memory space. They detect only small number of attacks due to low sensors used in the IDS. Their method was not scalable therefore unable to detects new applications and attacks.
Yang et al. introduced a framework for synchrophasor system to detect DDoS attacks [11]. Their framework is created using protocol-based, access control and rules of network. Each layer consists of security rules in the synchrophasor systems. Their system detects only Denial of Services (DoS) and Man-in-the-Middle (MITM) for the IEEE C37.118 and synchrophasor systems.
Zhang et al. proposed IDS capable of analysing communication traffic in various layers of system [12]. They also able to analyse the wide area network, neighbourhood area network and home area network. The expert and intelligent system is applied to each level to protect from the cyber-attacks with the help of data mining techniques. These different subsystems are communicated with the single unified system to monitor the status of the network communication to enhance the accuracy.
Hadeli et al. introduced anomaly-based detection method to control business systems by analysing behaviour pattern of devices used in the system [13]. They analyse IEEE 61850, GOOSE messages, Modbus/TCP, Manufacturing Message Specification, and network routing protocols. They used the file for analysing which contains pattern of all communication networks in the industrial systems. Hadelis method along with the methods of Zhang and Yang, are very well organized to detect abnormal activities which create problems in network traffic. They are unable to detect malicious activities which have no change in traffic of the network physical system. Hadelis research work cannot identify malicious attacks from the IP address to trip the relay, break the flow in transmission line and make a complete blackout.
Berthier et al. proposed IDS which is based on specification used to identify sequential event in the advanced metering infrastructure [14]. They create state machine from two AMI protocols by extracting specifications from device status. They also created a model to verify the correctness of specifications by checking method. Their IDS is not suitable for the transmission system due to high complex applications and disturbances. Also, building state machine was comparatively very expensive. Valenzuela et al. introduced a system used to detect cyberattacks using optimal power flow instructions [15]. They also supported the plan that bad data create the problem of power flow in the system. Talebi et al. proposed a system using estimation of weighted state to identify the bad data attacks in power system [16]. Zonouz et al. introduced an IDS to check the measurement of state data using power theory and state estimation as well as results from network for probability calculation that the data is good or not [17]. All above works are very efficient to identify false data but works only for one type of attacks but not used proposed an intrusion detection tools on gas pipeline system using multiple learning technique [18]. They used Modbus RTU dataset for their simulation. The state-of-the-art methods were used to distinguish between the commands and identify attacks on data for SCADA system. They first applied machine learning to detect the disturbances in power system and DDoS attacks.

Proposed Model
The machine learning techniques are used to differentiate between disturbances arises in power system. The assessment of model used along with the description of the different man-made and natural structures in this section. Also, various machine learning and classification methods are described.

Representation of Power System
The power system framework is shown in fig. 1.The control system is interacting with various electronic smart devices. The Syslog and SNORT are the devices used to monitor the network. There are four breakers managed by intelligent relays. The IEDs sends all information with the help of router and substation switch to the data capturing systems and supervisory control. The situation of attacks has been made as it is assumed that attackers have approach to network substations and create a threat by sending commands from switch placed in the substation.
The Figure 1 has so many components as described below G1, G2 → Power Generators R1, R2, R3, R4 → IEDs as ON and OFF switch in breakers. BR1, BR2, BR3, BR4 → Breakers Bus B1 to B2 → Line 1 Bus B2 to B3 → Line 2 Each IED are used to control single Breakers. For example, R1 manages BR1, R2 manages BR2 and so on. When breakers identify the problem, it trips according to distance protection technique. The IEDs in the system has no internal intelligence to differentiate between valid and invalid commands. The operators send commands manually to the R1 to R2 which manually breaks the trip from BR1 to BR4. In our experiments, the scenarios with multiple operations in order to check whether the differentiation is valid or invalid to identify attacks in normal operations. in order to form total blackout.

Proposed Approach
To assess the performance of our model to identify intrusion in power system using Python. The simulation in python is used to test different machine learning methods using already created  Figure 1. Power System Framework power system created by Missipissi State University [19]. The identification of the intrusion is performed using different datasets [20,21,22,23].
• Multiclass: The scenarios of 37 events having classes name such as normal operations, natural events and attack events predicted by the learners. • Three class: The 37 events are classified into 3 classes: Natural events (8), attack event (28), no event (1). • Two class: The 37events are classified into 2 classes: Normal operations (9) and attack (28).
The database was created from the measurements of thousands of events takes place in power system, divided into 15 datasets. The sampled data has been reduced to 1% and evaluate the advantage of small sample sizes. The sampling divides the data into natural events, attack events, no event having 1221, 3711 and 294 instances respectively.
The analysis of machine learning methods on power system for every datasets, i.e., multiclass, 3 class and 2 class. The performance of six machine learning methods has been tested in 15 datasets. The simulation follows n cross validation rule. In this rule, all datasets have been divided into n parts. The n-1 is used to train and remaining is used to test the classifier. The final accuracy is the mean of all the accuracies of n parts for each datasets.

Basics of methods used in classification
The basic introduction of methods used in classification for different datasets are described below: Random Forest: Each tree represents the most popular instance of the class. The decision trees are obtained from randomly chosen data samples [24].
OneR: This is very simple method used to choose optimal feature rule from the group of feature rules [25]. Naïve: This method is based on the Bayes theorem and used to consider the conditional probability of random variables and was popularly applied in machine learning [25,26].
NNge: This method is similar to the K-nearest neighbour method and used to classify the data points which are nearest to the key data point [28].
SVM: This technique is based on the concept of minimal optimization. This method is used to model the data points in vector space divided by hyperplanes having maximum distance between two classes [29,30]. New data points are predicted in the space from its position in the vector space.
Adaboost: This method is used to enhance the performance of the other machine learning methods [31]. The main objective of this technique is used to focus on the misclassified data which are generated in other techniques. This used these data to train the new classifier in order to increase the accuracy.

Experiments and Results
The analysis of results obtained from machine learning methods applied to differentiate between different intrusion detection in power system disturbances. The evaluation of six machine learning techniques in all fifteen datasets for each multiclass, three class and two class instances [20,21,22,23]. The accuracy of all data has been calculated in percentage by total number of correct data classification among the total number of classification. The accuracy is the measure of performance of classifier. The high value of accuracy indicates the good performance of classifier. We follow n fold cross validation to obtained the results.
In Figure 2,3,4, the accuracy of classification for fifteen datasets in multiclass, three class and two class using 6 different machine learning techniques have been shown. The performance of all machine learning techniques such as OneR, Naïve Bayes, K-Nearest Neigbhor, Support Vector Machine (SVM), Random Forest and Random Forest+ Adaboost are depicted in the given figures on by one in all datasets.
In Figure 2, the accuracy of classification for each machine learning technique using 15 datasets with the help of n-cross validation rule for each dataset. Each line of the graph  Figure 3. Accuracy of 15 datasets in Threeclass represents each machine learning techniques for 15 datasets. In terms of accuracy, we can say that KNN, Random Forest and Random Forest + Adaboost are best between all classification methods. The performance of Random Forest + Adaboost is maximum in multiclass datasets. In Figure 3, 4, the performance of detecting intrusion of smart grid power system by Random Forest + Adaboost is maximum between all learning methods.

Conclusion and Future Works
The main objective of our research work is to provide benchmark for the beginners by using different machine learning methods for the detection of intrusion in smart grid power systems. The combination of random forest and adaboost create relatively good results with low false positive rate and with high accuracy rate. On the basis of our results, we can say that machine learning is very efficient and reliable technique to detect attacks and able to discriminate between attacks. In future, these learning techniques can be applied to big power system data and learning models to improve the accuracy. This research work is also used as a benchmark for the reliable use of machine learning techniques in this broad domain and field.