Voltage sag severity analysis based on improved FP-Growth algorithm and AHP algorithm

In order to accurately evaluate the impact of voltage sag on both utility and customer, a data mining analysis method based on improved FP-growth algorithm and AHP algorithm is proposed. The proposed method selects and classifies voltage sag characteristics and combines voltage sag severity level to construct a data mining analysis framework. The improved FP-growth algorithm is used for data mining on voltage sag events to improve the efficiency of mining. The association rule matching model is constructed by AHP algorithm for evaluating the voltage sag severity level, which improves the accuracy of the evaluation results. Finally, a practical example verifies the practicability of the proposed method.


Voltage sag characteristic attribute
The power quality monitoring system collects a lot of data reflecting voltage sag events, and eventually forms a giant database of various information over time. In this database, certain principles must be followed to screen out factors that correspond to the severity of voltage sag, so as to improve mining efficiency and avoid mining "meaningless" association rules. In this paper, six-dimension factors are selected to participate in association rule mining and node voltage sag severity as shown in Table 1. Of course, users can also select other voltage sag characteristic attributes according to their own requirements.  Table 1, "Observation position" refers to the geographical location of the substation where the concerned bus is located; "Voltage grade" means rated voltage; "Weather" refers to the weather condition at the corresponding substation; "Cause of fault" refers to the cause of voltage sags; "Season" means the season when voltage sags occur; "Time" is the time segment of one day when the voltage sag occurs.
For mining association rules, it is necessary to convert voltage sag characteristic attribute into qualitative quantity. In Table 1, "voltage grade", "weather", "cause of fault" are qualitative descriptive data. "Season" and "time" can be qualitatively divided according to social habits.
"Observation location" adopts k-means clustering method to discrete six qualitative quantities, namely northeast, southeast, central, northwest, west and southwest.

Voltage sag characteristic attribute
To measure the impact on sensitive consumers caused by voltage sag, it is indispensable to consider the severity of voltage sag and characteristics of sensitive consumers. This kind of severity should consider the voltage tolerance capability on sensitive equipment. A specific definition of voltage sag Where, U is the voltage amplitude per-unit value; t is the duration of voltage sag; curve ( )

Overview of Association rules
Where, • represents the number of sets, and Eq. (2) represents the proportion of transaction t that contains both X and Y in database D ; Eq. (3) represents the ratio of transactions containing X and Y to transactions containing X .

Improved FP-Growth algorithm
FP-Growth algorithm is an association rule based on data mining algorithm proposed by Jiawei Han in 2000 [17]. This algorithm transforms the itemset of a given database into a Frequent Pattern Tree (FP-Tree), and keeps the correlation information among the itemset in the FP-tree. Because of the tree-like structure of FP-tree, branches with the same prefix can be reused, which can realize data compression and improve data mining efficiency. Compared with the traditional association rule mining Apriori algorithm, this algorithm has the following characteristics: No candidate set is generated in the mining process and only needs to traverse the database twice [18]. The traditional FP-Growth algorithm via mining the association rules that between any attributes and has no restrictions on the conditions and results of the association rules. However, when association rules are used to assessing voltage sag severity, the result of association rules must be the severity index about voltage sag, and the condition is the characteristic attribute of voltage sag. Therefore, a large number of invalid rules will be generated when traditional FP-Growth algorithm is used to mine characteristic attributes and severity indexes. Therefore, this paper proposes an improved FP-Growth algorithm to avoid this situation. The four steps of this algorithm are as follows: Step A: First, scan the entire database to get and count the frequency for each characteristic attribute item. Then deleting the items that do not meet the minimum support degree, and finally sort the obtained feature attribute items in descending order according to the occurrence frequency to obtain the frequent item sets table. Take the database in Table 2 as an example, where the items a, b, c, d, e and f are characteristic attribute items, and the items g and h are severity index items, all characteristic attribute items were scanned and the occurrence frequency was calculated, among which item a is appeared 6 times, items c and b are appeared 5 times, item e is appeared 4 times, item d is appeared 3 times, and item f is appeared 1 time. Set the minimum support as 0.25, which means that the item appears at least twice. c, e, b, g Step B: Put the obtained characteristic attribute items that meet the minimum support into the frequent itemsets table as shown in Table 3; Reorder the database in Table 2 from large to small according to the times of each characteristic attribute items appears, and the influence index is ranked to the end. The rearranged database is shown in Table 4. Step C: Scan the database again and create the root node of the FP-Tree with "Null". Every transaction obtained by scanning again is inserted into the FP-Tree according to table 3 to create a path. If the same item appears in the process of building FP-Tree, add one to the node number of the related item. According to the table 4, add 8 item sets to the FP-tree to get the tree diagram as shown in Figure  1. Step D: Mining FP-tree. For each branch shown in Figure 1, count the number of tail node severity index. If the number of index items meets the minimum support, the path ending with this the severity index item node are output as the candidate ruleset. This rule is supported by the ratio of the number of severity index item to the total number of events, and the confidence of the rule is the ratio of the number of severity indicators to the number of characteristic attributes of the previous node. For example, the number of the severity index item g of the branch 3 is 2, which meets the minimum support degree, and the confidence degree is 1. Then a, e, d, g is output to the candidate ruleset. The candidate ruleset obtained according to Figure 1 is shown in Table 5.

Association Rule Matching Model
If the mined association rules are to be used to guide the actual production, they also need to be matched with the actual fault scenario. By establishing an appropriate matching model, similar results can be output if they are not completely the same as the actual scenes. In this paper, AHP algorithm is used to construct the matching model of association rules. AHP algorithm [19] is a multi-objective decision making method that combines qualitative and quantitative algorithm. The steps of building an association rule matching model using AHP algorithm are as follows: Step1: According to the actual scenario and association rule matching model, the maximum matching degree is determined as the target layer, the membership degree of each dimension of voltage sag characteristic attribute is determined as the criterion layer, and the mined association rule library is the scheme layer.
Step2: Determine the judgment matrix according to the relationship between the indicators of the indicator layer. The judgment matrix A is defined by reference [19]. The definition of the elements in the judgment matrix A is as follows: Where, n is the number of indicators.
Step3: After the maximum eigenvalue max λ of the judgment matrix and its corresponding eigenvector ξ are calculated, normalizing the eigenvector to obtain the weight matrix W.
[ ] Where, i w is the weight of the i-th index. Step4: Calculate the consistency index and consistency ratio of the judgment matrix, and verify whether it meets the requirements [18]. Otherwise the judgment matrix should be appropriately modified.
Step5: Calculate the membership degree of each dimension of the voltage sag characteristic attribute of the association rule, and calculate the matching degree between the association rule and the actual scenario through the weight matrix W. The association rules are sorted according to the matching degree.
Where ki µ is the membership degree of the i-th characteristic attribute of the k-th association rule.

Validation of Association Rules
Take the power quality monitoring records of a power company as the original data, and extract the characteristic attributes of voltage sags as the historical database for analyzing.
The K-MEANS clustering method is adopted to cluster the latitude and longitude values of the observed position of voltage sag., and six qualitative parts are obtained. The weather information of the voltage sag will be obtained from the weather historical information of the Meteorological Bureau at the time. The fault cause and time information of each sag event are obtained from the monitoring data and classified. Combined with fault event report, monitoring data and power grid topology, the voltage amplitudes of all bus nodes of all events are calculated, duration and sensitive equipment tolerance of known historical sag events, the node voltage sag severity is calculated.
In order to verify the accuracy of association rules, the data from January 2016 to December 2018 are used as training samples to establish a voltage sag severity database; Then the data from January 2019 to December 2019 are used as the test set to match and verify the association rules. When the association rules mined according to the database from 2016 to 2018 still have similar confidence in the sag events in 2019, it shows that the mined association rules can guide users to take corresponding measures to reduce losses.
First of all, data mining is carried out for the training sample set, then selecting 20 association rules from the mined association rules randomly. Referring to Eq. (3), define the confidence of association rules in the test set ( ) S C i as shown in Eq. (7), and the define accuracy η as shown in Eq.(8):

Association rule matching model
The VSP of actual scenario is {southeast, 110.0, winter, morning, overcast, single-phase grounding} according to Table I, and match the fault scenario with the mined association rule. When the attribute is the same, the degree of membership Take 1 and take 0 in all other cases. Use the AHP to obtain the weight of the membership degree of each attribute as shown in Table 6. The matching degree of the association rule is calculated according to the weight of the membership degree of each attribute in Table 6, and the association rule is output according to the size of the matching degree. The matching results are shown in Table 7. According to Table 7, the severity of voltage sag in this fault scenario can be obtained. The association rule of serial number 1 has a high degree of confidence and matching. The voltage sag severity level in the association rule shown in serial number 1 could be used to measure the voltage sag severity about this fault scenario. In other word, the voltage sag severity of this fault scenario is "A", indicating that the voltage sag in this fault scenario has little impact on the consumers connected to this node.

Conclusion
In order to evaluate the voltage sag severity, this paper proposes a method via data mining analysis that based on FP-growth algorithm and analytic hierarchy process. The proposed method can mine the association rules between the voltage sag characteristic attributes and the voltage sag severity, which can obtain the voltage sag severity in the fault scenario by matching the actual fault scenario with the mined association rules. Moreover, according to the mined association rules, it can also guide sensitive users to choose the right entry point. In the end, the practicality of the proposed method is verified by examples analysis.