Voltage Sag Source Location based on the Random Forest

As one of the most prominent power quality problems in the power system, voltage transients threaten the use of many electrical and electronic devices. The rapid and accurate location of voltage drop sources can better define the responsibility between the power supply and the customer. It can also provide a reliable basis for future grid development planning. In this paper, a random forest model is set up to extract data from several electrical characteristics of voltage transients according to the classical voltage transient location method, which is used as training and test samples to input into the random forest model to achieve the upstream and downstream location of voltage transient sources. The random forest model is more accurate than the single decision tree model and the traditional upstream and downstream voltage drop location method. It is possible to determine which of the traditional judgments is more important under this model.


Introduction
The development of smart power grids has accelerated, resulting in many distributed power sources being connected to the power grid.The power quality issues caused by the grid connection of distributed power sources have also become the research focus.According to the statistics of relevant literature, the temporary voltage drops problem accounts for 70%~90% of all power quality problems, and the user complaints caused by it account for 80% of all the complaints about power quality problems.In comparison, the complaints caused by harmonic and switch operation overvoltage are less than 20% [1].
For the definition of temporary voltage drop, different professional organizations set different standards: The Institute of Electrical and Electronic Engineers (IEEE) defined that the effective value of power frequency voltage at a point in the system suddenly drops to 90%~10% of the rated value, lasts within 10 ms-1 min, will return to normal after this short duration period, and the definition of the International Electrotechnical Commission (IEC) is different from the IEEE standard, IEC for the standard of temporary voltage drop is the voltage amplitude drops to 90%~1% of the rated value.
Random Forest is an integrated learning model that employs decision trees as its primary classifier.One of the decision trees is a widely used tree classification algorithm in which nodes inside the decision tree select the optimal splitting attributes for classification.The random forest comprises multiple decision trees trained using the Bagging integrated learning technique.The final output is determined by voting on the output of individual decision trees.Random forests are more tolerant to problems such as overfitting a single decision tree and perform well for high-dimensional data classification processing [2].
In this paper, through the establishment of a decision tree, the random forest model, according to the classic voltage localization method to extract multiple voltage electrical characteristic data, we constitute the training samples and test samples, input to the random forest model, and realize the voltage source upstream and downstream positioning through the test samples for the upstream and downstream positioning method.

Positioning method based on disturbance power and disturbance energy
In the temporary drop process, the disturbance power and energy determine which side of the voltage drop source comes from the monitoring equipment [3].Change of instantaneous power caused by disturbance, i.e., disturbance power is defined as: DP P P .
(1) The disturbance energy flowing through the measuring device is defined as: DE t DP u du.
(2) Particularly, P f is one of them and represents the active power during the disturbance.The system's steady-state active power before the disturbance is represented by P ss .Assuming DE is positive, it shows that the voltage hang source might come from downstream of the observing point.The sag source originates upstream of the monitoring point if DE is negative.And this method has high requirements for matching disturbance energy and power.If the data matching is high, it will lead to correct conclusions.

Based on the system trajectory slope method
The system trajectory's slope was employed.By evaluating the slope of the straight line, one can identify the strategy for locating the source of the voltage temporary drop.It is based on the base frequency current amplitude ratio corresponding to various fault places to the base frequency voltage amplitude multiplied by the power factor [4].Because it simply requires knowing the slope of the line and does not require calculating other parameters or setting limits, this method is straightforward to apply.Also, this approach extensively uses data and will raise its trustworthiness.

Based on the current real-part polarity method
The positive and negative values of the real current are determined by examining the direction of the active current at the monitoring point during the temporary drop phase.The temporary drop source is situated upstream of the monitoring device if the current part is negative, and the temporary drop starts and is at downstream if the current part is positive [5,6].

Establish a decision tree model
When generating decision trees, it mainly adopts a program recursive method, starting from the root node, and dividing into two subtrees.It produces root nodes and left and right subtrees starting from the subtree.Each subtree generates new subtrees until it recursively reaches the leaf node.When producing the left and right subtrees from the root node, it is necessary to compare the advantages and disadvantages of the results after splitting different attributes and choose the optimal attribute split to produce the left and right subtrees.This process of splitting after comparison is called the node split [7].Different comparison rules correspond to different decision tree generation algorithms.There are many algorithms for decision tree generation, including node-splitting algorithms like CLS, ID3, C4.5, and CART.
This paper takes the C4.5 node splitting algorithm as an example to establish a decision tree [8].In view of the information gain index is easy to produce multi-value bias problem, the C4.5 algorithm innovatively introduces the split information ratio index, and the calculation formula of this index is as follows: where v is the number of values of the attribute A.   : By comparing the representative of the uniformity of training set S when split according to attribute A with the information gain index, the selected attribute can be more uniform without the bias problem.In this way, the information gain rate is calculated as follows: GainRatio A .
Gain (A) denotes the information gain of A.
The flowchart of the decision tree C4.5 is shown in Figure 1.The C4.5 algorithm overcomes many disadvantages of the ID3 algorithm, makes the generation process of the decision tree more reasonable, and improves the classification accuracy of the algorithm.Since the decision tree adopts a single classifier decision pattern, it still has the following disadvantages: 1) Complex classification rules.2) Convergence to the non-global local optimal solutions.3) Overfitting [9].Therefore, establishing a random forest model based on the decision tree can effectively avoid the above shortcomings.

Establish a random forest model
The Bootstrap Aggregating (Bagging) integration method integrates multiple decision tree models into an integrated model [10], using multiple decision tree models to make common judgments.The Random Forest algorithm is obtained by adding the idea of randomisation to several parts of the Bagging method.Its randomness is reflected in two levels: The random selection of data and the selection of features.The schematic diagram of the random forest algorithm is shown in Figure 2.

Simulation verification
We build the dual power supply network as shown in Figure 3.In this paper, the comprehensive criterion for the location of voltage sag sources is designed using the initial peak value of disturbance power (DP first ), the final value of disturbance energy (DE end ), the slope of the system trajectory (k), and the real part of the real current (I cosθ).

Conclusion and Outlook
Based on a comparative analysis of various classical voltage drop positioning methods, this paper proposes a voltage drop localization method based on random forest.The random forest model accuracy is higher than a single decision tree model and traditional voltage drop upstream and downstream positioning method.It can determine which traditional judgment on the importance of the model is higher.
However, the model used in this paper is relatively simple, and the upstream and downstream judgment of complex networks still needs further research and model optimization.

Figure 1 .
Figure 1.Decision tree model flow chart tree when the nodes are split.

Figure 3 .
Figure 3. Dual power supply network The fault points are set separately in the dual power network, and the fault data of voltage sag are obtained at different monitoring points.Multiple sets of fault data are put into the decision tree model for training and learning.The comparison of the prediction results of the test set is shown in Figure 4.The correct rate of the prediction sample is 92.857%.

Figure 4 .
Figure 4. Comparison of prediction results of the decision tree modelThe decision tree model using a single classifier decision mode will inevitably lead to insufficient accuracy due to its own limitations.Therefore, multiple decision tree models are integrated, and random ideas are added to obtain a random forest algorithm.The simulation data is put into the random forest model for training and learning, and the comparison figure of the test set prediction results is shown in Figure5, and the accuracy of the predicted sample is 100%.

Figure 5 .
Figure 5.Comparison of prediction results of random forest model