Research and Application of Power Grid Data Anomaly Tracing Method Based on Time Series Correlation

Power grid data is a “barometer” that reflects the operational trend of the power grid system, so conducting a traceability study on abnormal data generated by the power grid system is crucial to maintaining the stable operation of the power grid system and preventing further malfunctions. This article proposes a maximum time series correlation ring tracing algorithm based on time series correlation: firstly, for a large amount of time series data collected from measurement points in the power grid system, a correlation coefficient matrix is calculated, and then a time series correlation graph is constructed using graph theory knowledge. The Kruskal algorithm is used to search for the longest spanning tree in the time series correlation graph. Finally, based on the spanning tree, the breadth-first search (BFS) algorithm is further used to obtain the maximum time series correlation ring[1]. By using the time series correlation of each node within the ring and the physical topological relationship between nodes, the abnormal data can be traced back. Distinguish whether the generation of abnormal data is due to a single point failure or system failure. Most of the existing fault tracing techniques are based on machine learning algorithms, which are difficult to widely apply in high-dimensional time series data due to their high complexity.Through experiments on real power grid data, the results verify the effectiveness of this method in anomaly tracing of high-dimensional time series data. Through comparative experiments, this method is superior to machine learning model algorithms in terms of efficiency and stability. At the same time, this method does not require additional calculations of the statistical distribution of the samples to be tested, thus greatly saving computational costs.


Introduction
Abnormal data usually refer to data that does not meet rules, threshold ranges, constraints, patterns, or unusual data in a given model in data collection.Anomaly data traceability means analyzing the reasons for generating such data.Due to the complex topology and variety of power grids, it is difficult for monitoring personnel to trace the cause of faults based on abnormal data.
At present, the research on anomaly data traceability focuses on the machine learning level.Literature [2] proposes a component fault tracing method for power dispatching control system based on information difference graph model, which can be traced under the condition of unknown topological relationship of system components.In this study, the feature interval of system components is extracted, the clustering mean is discretized, the clustering center is obtained, and the fault tracing is carried out according to the clustering results.Literature [3] collects the relevant information of the server through the interface provided by the operating system to obtain the server status, locates the log content based on a specific point in time, and thus carries out anomaly tracing.Literature [4] provides a power system anomaly data tracing method based on binary tree of data source, and detects data anomalies through data source-based anomaly tracing method.Literature [5] proposes the discretization method of feature interval clustering mean.The k-means algorithm is used to obtain the clustering center of each data feature time series collected, which is used as the endpoint of dividing discrete intervals, and the interval mean is calculated as the discretization result.Secondly, the information difference matrix is constructed according to the change rate of the information measurement before and after the alarm.Finally, the information difference graph model is established according to the features with high change of the alarm information and the interaction information among the features.The interaction relationship between the components in the graph model is used to trace the fault.Literature [6] proposes a time-series alarm correlation mining algorithm based on network topology.The algorithm can effectively reduce the candidate set, and realize efficient compression and fast traceability of massive network management data.By establishing offline account sample database and real-time power consumption curve fault feature sample database, literature [7] realized the construction of metering device fault sample database, and traced the source based on actual electricity meter testing and data anomaly recognition.Based on the characteristic analysis of the output waveform of wind turbines during high-voltage off-grid, literature [8] constructs the characteristic indicator system of high-voltage off-grid faults of wind turbines, and adopts the Gini index-maximum correlation minimum redundancy method to screen the original indicator system.Finally, genetic algorithm-ant colony optimization algorithm-particle swarm optimization algorithm is used to optimize the initial value of weight and deviation of BP neural network, so as to ensure the accuracy of traceability.
Time series data can feedback the operating status of the system within a certain period of time.When abnormal data is detected [9], it is difficult to trace the source due to the complex and diverse relationships between power grid components.In this paper, By calculating the correlation of time series, graph theory and correlation algorithm are used to search the maximum sequence of time series correlation rings, and then the correlation of nodes in the ring is used to tracing the abnormal data and judge whether the data is caused by system failure or single point failure.The traceability process is described in detail below.

Overall Overview
On the components of the power grid topology, the measurement points on them constitute the correlation set.The high oil level of the transformer oil pillow will increase the pressure inside the transformer, causing the transformer pressure release valve to act and oil injection; If the transformer oil level is too low, it will reduce the oil level of the transformer high voltage casing, which is easy to lead to internal discharge of the casing and accidents.Such problems belong to system-level faults.Real-time monitoring of the transformer oil pillow oil level by the utilization measuring point is an important monitoring means.Generatrix power imbalance rate refers to the percentage of the incoming generatrix power and the difference between the sum of the outgoing generatrix power and the incoming generatrix power in the generatrix of each voltage level of the substation.When the generatrix power imbalance occurs, abnormal metering events will occur, resulting in the need for a large amount of power recovery.Such problems are caused by a single point of failure.Real-time monitoring of the power imbalance rate of the generatrix can quickly detect abnormal metering events, and it is also of great significance to check the correctness of the wiring of the metering device and the index of automatic meter reading rate of the grid level platform.
Based on the abnormal data, the source is identified as a system failure or a single point of failure.
The overall process is divided into two stages: training and detection.As shown in figure 1:

Figure 1. Anomaly tracing flowchart
Phase of training: Firstly, the time series correlation of normal time series data is calculated (most of the time series data generated during the actual system operation is normal, so obtaining this data is not difficult); secondly, the graph theory algorithm is used to search all the time series correlation rings in depth, and finally output the corresponding time series correlation ring set.
Phase of detection: The anomaly detection algorithm [9] is used to identify the abnormal data.The data with the anomaly label is divided into each sequence correlation ring set according to the physical location of the measurement point, and the cause of the abnormal data is analyzed in each correlation ring set with strong correlation (correlation greater than the threshold value).Finally, according to the traceability results, timely notify the operation and maintenance personnel to deal with, to ensure the stable operation of the power grid system and avoid the further spread of faults.
The following is a detailed description of the training and detection phase.Before introducing the training and detection phase, we will first introduce the acquisition of multidimensional time series data.

Acquisition of Multidimensional Time Series Data
The process of acquiring multidimensional time series data from measurement points is shown in figure 2.
In figure 2 2, So it can be considered that S1, S2, and S3 form a three-dimensional time series group.
Where ) , ( n t i x represents the sampling value of the measurement point i p at time n t . T x i represents the sampling mean of measurement point i p over a time period of } ,..., , { T 2 1 t t t T  .thr V is defined as the correlation threshold between time series(the size of this correlation threshold may be different under different network topologies).When

S and j
S are considered to have correlation, And the greater ij R , the greater correlation between the time series.
Thus obtain the following set of correlation coefficients (for the convenience of representation, the set of correlation coefficients is represented in matrix form): (2)

Construct Time Series Correlation Diagram.
The correlation threshold between time series should be established according to the characteristic indexes of power system.For example, generatrix characteristics, transformer characteristics, and line characteristics, combined with historical sampling data, actual operation and maintenance experience.In this paper, it is assumed that the correlation threshold between time series has been worked out according to the topology connection of the power grid, which is represented as

Compute the Set of Time Series Correlation Rings.
There are many connected graphs in time series correlation graph, and these connected graphs are the connected components of time series correlation graph.
The concepts of connected graphs and connected components are as follows: In an undirected graph G, if there is a path from vertex A to vertex B, the two vertices are said to be connected.If any two vertices in the graph are connected, the graph is said to be connected graph.Otherwise it will be considered as unconnected graph.In the connected sub-graph of undirected graph G, the connected sub-graph with the largest number of nodes is called the connected component of undirected graph G. Any connected graph has only one connected component, which is itself.An unconnected graph has multiple connected components.See figure 4 and figure 5: It can be seen from figure 4 that an unconnected graph has two connected components.As it can be seen from figure 5, the connected component of a connected graph is itself.
To calculate the set of time series correlation rings is to find the largest time series correlation ring in each connected component of the time series correlation graph.
The maximum time series correlation ring is defined as follows: In the time series correlation graph, when the number of nodes in the ring is the largest, the resulting ring becomes the largest time series correlation ring, and the undirected graph G obtains the largest time series correlation ring, as shown in figure 6 below: The next step is to find the largest time series correlation ring.In other words, in each connected component of the time series correlation, Find the longest path that can form a loop [10].For convenience, we use Cc to represent the connected component.
According to graph theory, finding the largest time series correlation ring in the connected component can be equivalent to finding the longest spanning trees in the connected components [11].Therefore, the edge set of connected components can be divided into the spanning tree edge set and the non-spanning tree edge set (the non-supporting edge set is called the ring edge set), and the edge set relation is: Edgecc=TreeEdgecc + CycleEdgecc.Let's search for the longest spanning trees according to figure 3. Assign the edge above the connected component according to the actual calculated correlation values, as shown in figure 7.
Algorithm 1: Longest spanning trees search algorithm Use Kruskal algorithm to generate spanning trees.All the edges in the connected component are sorted in ascending order according to the weight, and the edge with the smallest weight is selected.As long as this edge does not form a loop with the selected edge, it can be selected to form a spanning trees.For the connected components of K vertices, Select K edges that meet the conditions, and these edges form the spanning trees.
Input: a connected component of K vertices.Output:k-1 qualified edges.
The following describes how to search for the longest spanning trees.
1. Sort all the connected edges in figure 7 in ascending order according to their values, as shown in table 1 2. Start from edge {S2,S4}, since no edge has been selected to form the spanning trees this time, and edge {S2,S4} will not form a loop on its own, edge {S2,S4} can form the spanning trees, as shown in figure 8 below:   6. Edge {S1,S2},edge {S2,S3} and edge {S2,S6} can not form a spanning trees due to they will form a loop with the selected edge {S1,S3},edge {S2,S4}, edge {S3,S4} and edge {S4,S6 }.Any edge of {S1,S5},{S5,S3} will not form a loop within the selected edges except both are added.So we choose the edge with the lowest value and add it to the selected edge based on edge {S1,S5} and edge {S5,S3}.Here, we select edge {S1,S5} to add to the selected edges to form the spanning trees.As shown in figure 12, for a connected graph with 6 vertices, we have selected 5 edges, which form the longest spanning trees.The spanning trees shown in figure 12 May be only one solution of the connected component.That is, there may be multiple spanning trees searched for the same connected component.However, the maximum time series correlation ring obtained from different spanning tree generated by the same connected component is unique.Here is how to find the largest time series correlation ring from the longest spanning tree.
We have previously divided the edge sets of connected components into spanning tree edge sets and non-spanning tree edge sets.That is, Edgecc=TreeEdgecc + CycleEdgecc.Theorem 2 is introduced: Any edge in the set of none-spanning edges must form a ring with the spanning trees corresponding to the connected component of the undirected graph.For proof, see section 7.2.Therefore, we need to find the longest path from the spanning tree that can form a loop.
Algorithm 2: Maximum time series correlation ring search algorithm.Input: spanning trees consisting of K vertices, root node r, leaf node set Leafs, non-supported tree edge set CycleEdges.
Output: CycleNode, the set of vertices that supports the longest loopable path in the tree.1.Starting from node r, search for r's Neighbors and add them to the set of neighbors; 2.For each neighbor node in Neighbors, the BFS algorithm is used to find the path point set corresponding to the node: Algorithm 3: BFS algorithm (Figure 12 shows the BFS algorithm.Assume that the root node r is S5 , and the set of Neighbors of S5 is only S1) 1.Take node S1, which has one child S3, S3 has one child S4, and S4 has two children {S2, S6}: 1.1.S2 has no child node and returns to the upper level of S6 directly to continue the search; 1.2.S6 has no child node.The search is complete.Therefore, {S1, S3, S4, S2, S6} becomes the deep search path-point set for neighbor node S1.
3. For the deep search path-point set of each neighbor node, sort each path-point from long to short (the number of path-point sets) in the deep search path-point set through the neighbor node to the leaf node (according to the current rules, {S1, S3, S4, S2}, {S1, S3, S4, S6} in figure 12); 4. Every time take the longest two paths in order, and judge whether the respective leaf nodes form an edge among the non-supported tree edge set CycleEdges.If they do not form an edge, remove the current path and continue to traverse the next path.Otherwise, all nodes on the path and the root node r are added to CycleNode (According to the current rules, in figure 12, the two longest paths are S1->S3->S4->S2, S1->S3->S4->S6, S2 and S6 as leaf nodes constitute edges in the non-spanning tree edge set.So all points on the path need to be added to the CycleNode, which is {S1, S3, S4, S2, S6}); 5.Return the CycleNode (in figure 12, the set are {S1, S3, S4, S2, S6}).After finding the longest path that can form a ring, the vertices that are not belong to the ring need to be iterated to determine whether their addition to the CycleNode can make the ring larger (ring length increases).Theorem 3 is introduced: If a node belonging to the same spanning trees but not in CycleNode, if there are at least two nodes in CycleNode forming the non-spanning trees edge, then adding this node to CycleNode can make the ring increase.See section 7.3 for the proof.The loop enlargement algorithm logic is described in detail below.
Algorithm  12, the longest loop path {S5, S1, S3, S2, S4, S6} can be obtained through the loop enlargement algorithm, that is the maximum time series correlation loop sequence set.
For other connected components of the same time series correlation graph, the above algorithm can be used successively to obtain multiple maximal time series correlation ring sequences, expressed as: CNmax = {CNmax1, CNmax2, CNmax3,... }.

Abnormal Data Identification.
How to detect abnormal data can be referred to reference [9], and the abnormal detection process will not be repeated in this paper.
This paper assumes that all data involved in the detection has been abnormal detection, that is, each data is labeled "abnormal" or "normal".

Data Anomaly
Tracing.Now we begin to trace the abnormal data obtained in section 4.1.
Algorithm 5: Algorithm of data anomaly tracing Input: multiple time series correlation sets CNmax, multidimensional (assumed dimension here is K,K>2) time series data DTBD (data-to-be-detected) to be detected.
Output: K dimensional time series array DWDT(date-with-detection-tag) with the detected tag of the abnormal source.
1.The sequences in DTBD are divided into m sub-sequences which named SS according to the time series correlation sets contained in CNmax; contained in SS will be identified as "fault"; 4. if there is an abnormal sequence , but other relevant sequences are normal, Then all the data corresponding to the i S will be identified as "error; 5. Finally, Obtain the K dimensional time series array DWDT(date-with-detected tag) with the exception source label.
The data in the array DWDT is analyzed as follows: When the data is marked as abnormal, and other related data are also abnormal, this situation is caused by system failure; When data is identified as an error and all other relevant data is normal, this is not a true grid failure, but a single point of failure, such as a measurement point which doesn't work properly.

Performance Analysis
These are the key steps which include Constructing a time series correlation graph and finding the maximum time series correlation ring for each connected component.The performance in time and space is analyzed below.

The Complexity of Constructing Time Series Correlation Graph
Assuming the length of the time series is L, the sequence dimension is K. Calculate the correlation between each two columns in the K-dimensional sequence first.During the calculation process, each point in the sequence will be traversed once, with a total time complexity of

The Complexity of Spanning Trees Search Algorithms
Assuming that it can be divided into M connected components named Cc in the time series correlation graph which composed of K-dimensional time series.
The way to calculate the algorithm complexity of search the spanning trees:Assuming that the number of vertices of each connected component is close to K/M , the algorithm complexity of search the spanning trees is about . When there is only one connected component, the algorithm complexity of search the spanning tree is

The Algorithm Complexity of Search the Maximum Time Series Correlation Ring
The complexity of search the maximum time series correlation ring in each spanning tree: under the limit condition, the K vertices of the time series correlation graph form a connected component, and the time complexity is The number of measurement points is not very large in the actual system, that is, the dimension of K is in an acceptable range.In the system with large K, it can be divided into multiple subsystems with strong internal correlation, that is, it can reduce the complexity through internal dimension reduction.

Metrics
It is to distinguish whether the abnormal data is due to the system failure or a single point of failure for the anomaly tracing in multi-dimensional time series data.
The error data generated by a single point of failure is classified as a negative example, and the abnormal data generated by a system failure is regarded as a positive example.Each time series group is considered to be an instance unit, There are four types of experimental results to be obtained as follows: . Accuracy formula: In fact, it will occur that the algorithm judgment is correct among the time series data to be detected, thus accuracy and recall rates are not included in the calculation.

Source of Experimental Data
The data used in the algorithm is the real sample data that collected from the measuring points in power grid system.The data in phase of training is the historical sampling data for three consecutive months.Finally, there are about 500,000 time points data in 60 measuring points are selected to be training.The data in phase of detection is collected through the real time power grid system.
To add, the multi-dimensional time series data collected at the measurement points is also the main data source of the real power grid monitoring system, with practical physical significance.The results of experiments conducted on this data are convincing and have practical application significance.

Two Contrast Algorithms
Two comparison algorithms are set as references in this experiment: 1. Anomaly detection algorithm based on constraints: anomalies are detected and classified by detecting fluctuations in outliers; 2. Cluster analysis algorithm(k-means) [12][13]: finding the cluster center and search radius by extracting features of time series, then mining the potential cluster relations, thus confirm the classification of each sequence.

Comparative Analysis of Experimental Results
For the time series group to be detected, the performance of the algorithm is compared from three aspects: the total number of dimensions, the size of test set and the size of training set.

Effect of the Total Number of Time Series Dimensions.
In the process of increasing the total number of time series dimensions K from 10 columns to 60 columns, the accuracy and recall rates of the maximum time series correlation ring traceability algorithm and two comparison algorithms proposed in this paper based on time series correlation are as follows: It can be seen from figure 13, compared with the k-means algorithm, the accuracy rate of the proposed algorithm remains at a relatively stable level with the increase of sequence dimension, in which the average accuracy rate remains at about 83%, about 0.08 higher than that of the k-means algorithm.
It can be seen from figure 14, compared with the k-means algorithm, the recall rate of the proposed algorithm remains at about 81%, which is about 0.06 higher than that of the k-means algorithm on average.It can also be seen from the figure 14 that the recall rate of k-means algorithm increases with the increase of dimension due to the increase of sequence dimension increases the amount of data, and reduces the possibility of k-means algorithm falling into the local optimal solution.

Figure 13. Accuracy of different dimensions
Figure 14.Recall rate for different dimensions By using the feature of time series correlation, the proposed algorithm is slightly better than k-means algorithm in accuracy and recall rate.It should be noted here that the stability and recall rate of the anomaly detection algorithm based on constraints are both lower than 0.5, and the source of the anomaly cannot be determined.Therefore, the efficiency of the algorithm will not be introduced separately in the subsequent experimental comparison.As shown in figure 15, with the increase of the scale of the test set, both the proposed algorithm and the k-means algorithm have improved in terms of accuracy.Under certain test set conditions, the accuracy of the k-means algorithm increases faster than that of the proposed algorithm, but there are shortcomings in stability.In comparison, the proposed algorithm is superior to the k-means algorithm in terms of stability.

Effect of
Figure 16 compares the change trend of the recall rate of the two algorithms under different test set sizes.It can be seen that when the size of the test set increases, the recall rate decreases significantly because the sample size increases and the correlation calculation becomes more accurate and closer to the real situation.However, with k-means, when the size of the test set increases, the recall rate increases greatly.

Summarize
In this paper, the maximum time series correlation loop tracing algorithm based on time series correlation is studied for multi-dimensional time series data in power grid system.Based on the multi-dimensional time series correlation calculation algorithm, constructing the time series correlation graph and search for the maximum time series correlation ring set, then the obtained maximum time series correlation ring set is used to classify the data to be detected into the corresponding correlation set.Combined with the actual application scenario, the correlation threshold is set, and the abnormal data is traced back in the correlation set based on the threshold.Through experiments on real power grid data, the algorithm proposed in this paper is superior to machine learning model algorithms in terms of efficiency, running speed, stability.so this research has strong adaptability and wider applicability in the field of anomaly tracing.

Theorem 1
There is only one maximum time series correlation ring in the connected component of each time series correlation graph.Proof: Suppose there are two largest time series correlation rings named ring 1 and ring 2 in the connected component: (1) If two rings have repeated paths, then ring 1 and ring 2 can be combined into a larger ring 3 according to the connected graph principle; (2) If ring 1 and ring 2 don't have intersect paths, then they belong to different connected components and don't conform to the premise in the same connected component.
Therefore, the theorem is proved.

Theorem 2
Any edge in the edge set of an none-spanning tree must form a ring with the spanning tree corresponding to the connected component of the undirected graph.
Proof: The two vertices on the ring are in the connected component according to the definition of the connected graph spanning tree, they are also in the vertex set of the spanning tree.Since there must be a path between any two points in the spanning tree, there must be a ring.Starting from a vertex of the ring edge, follow a path in the spanning tree to another vertex of the ring edge, and then return to the starting point along the ring edge to form a ring.

Theorem 3
For a node that belongs to the same spanning tree but is not in the CycleNode, there is at least one

Figure 2 .
Figure 2. Acquisition of multidimensional time series data 4. Research on Traceability of Maximum Time Series Correlation Loop 4.1.Phase of Training 4.1.1.Calculate the Correlation Coefficient Matrix.Based on the normal time series data collected from K measurement points,Calculate the correclation by using the following formula:

.Figure 3 .
Figure 3.Time series correlation graph According to figure 3, only time series with strong correlation have edges to connect.When the correlation is weak, the nodes connect fewer edges or even appear isolated.

Figure 6 .
Figure 6.The largest time series correlation ring Figure 7.Some connected component of a time series correlation graph Theorem 1 is introduced here: there is only one maximum time series correlation ring in the connected components of each time series correlation graph.See section 7.1 for proof.The relationship of the number of nodes among time series correlation graph, connected components, and maximum time series correlation ring is as follows:time series correlation graph >= connected components >= maximum time series correlation ring.The next step is to find the largest time series correlation ring.In other words, in each connected component of the time series correlation, Find the longest path that can form a loop[10].For convenience, we use Cc to represent the connected component.According to graph theory, finding the largest time series correlation ring in the connected component can be equivalent to finding the longest spanning trees in the connected components[11].Therefore, the edge set of connected components can be divided into the spanning tree edge set and the non-spanning tree edge set (the non-supporting edge set is called the ring edge set), and the edge set relation is: Edgecc=TreeEdgecc + CycleEdgecc.Let's search for the longest spanning trees according to figure3.Assign the edge above the connected component according to the actual calculated correlation values, as shown in figure7.Algorithm 1: Longest spanning trees search algorithm Use Kruskal algorithm to generate spanning trees.All the edges in the connected component are sorted in ascending order according to the weight, and the edge with the smallest weight is selected.As long as this edge does not form a loop with the selected edge, it can be selected to form a spanning trees.For the connected components of K vertices, Select K edges that meet the conditions, and these edges form the spanning trees.Input: a connected component of K vertices.Output:k-1 qualified edges.The following describes how to search for the longest spanning trees.1.Sort all the connected edges in figure7in ascending order according to their values, as shown in table1:

Figure 12 .
Figure 12.Spanning trees composed of edge {S1,S5} data corresponding to the time series } Test Set Size.In this section, experiments are conducted on the effects of different test set sizes on the performance of the above algorithms.

Figure 15 .
Figure 15.Accuracy of different test set sizes Figure 16.Recall rate of different test set sizes

5. 4 . 3 .
Effect of Training Set Size.This section investigates the effects of different training set sizes on the above algorithms.

Figure 17 .
Figure 17.Accuracy of different training set sizes Figure 18.Recall rate of different training set sizes Figure 17 compares the variation trend of algorithm accuracy under different training set sizes.It can be seen that with the increase of training set size, the accuracy of both the proposed algorithm and k-means algorithm increases.The accuracy of the proposed algorithm increases with the increase of the training set, and higher stability.Figure 18 compares the change trend of algorithm recall rate under different training set sizes.It can be seen that with the increase of training set size, the recall rate of both the proposed algorithm and k-means algorithm decreases.It can still be seen that the stability of the proposed algorithm is better than k-means algorithm when the training set size increases.
Figure 17 compares the variation trend of algorithm accuracy under different training set sizes.It can be seen that with the increase of training set size, the accuracy of both the proposed algorithm and k-means algorithm increases.The accuracy of the proposed algorithm increases with the increase of the training set, and higher stability.Figure 18 compares the change trend of algorithm recall rate under different training set sizes.It can be seen that with the increase of training set size, the recall rate of both the proposed algorithm and k-means algorithm decreases.It can still be seen that the stability of the proposed algorithm is better than k-means algorithm when the training set size increases.5.4.4.The comparison of Algorithm Rate. Figure 19, figure 20, and figure 21 are the comparison of running time under different time series dimensions, different test set sizes, and different training set sizes.It is obvious that the running time of the proposed algorithm is much less than that of the k-means algorithm.The algorithm in this paper needs some time in the training stage, But by using the maximum sequence correlation ring vertex set obtained in the training stage, the anomaly source detection algorithm can output the result within 0.1s.The algorithm does not need to perform additional statistical distribution calculations on the raw data at each detection, so it has a shorter running time than the k-means algorithm.

Figure 19 .
Figure 19.Running time with different dimensions

Figure 20 .Figure 21 .
Figure 20.Running time for different test set sizes

Table 1 .
: Edge weights of connected graphs in ascending order

4 :
Ring enlargement algorithm Input: Cyclic vertex set CycleNode, non-cyclic vertex set NoneCycleNode, non-tree-edge set CycleEdges.Output: The expanded point set CycleNodeExt.1.Iterate through a Node in the NoneCycleNode and repeat the following steps: 2.If Node1, Node2, and Node exist in the CycleNode set, forming the two edges on the CycleEdges node-node1, node-node2, Then the node is added to CycleNodeExt; 3.Return: The expanded point set CycleNodeExt.For the CycleNode set {S1, S3, S2, S4, S6} in figure