Convergence criterion of power flow calculation based on graph neural network

In order to solve the problem of current data-driven power flow calculation methods rarely consider the divergence of power flow, which always maps a false system power flow when a divergence power flow case was given, a data-driven power flow convergence method based on DGAT-GPPool graph neural network classifier is proposed. Firstly, to solve the problem that the classical graph convolution method does not consider the edge attribute, a double-view graph attention convolution layer is constructed based on line admittance. Secondly, to solve the existing pooling method also does not consider the edge attribute and the loss of physical meaning of the coarse graph obtained from pooling, a grid partition pooling layer is constructed based on the electrical distance between nodes. Finally, 10000 system samples containing different network topologies are generated based on the IEEE 14-node system and its extended system, the accuracy reaches 99.3% in the testing set after training, and the effectiveness of the improvements in graph convolution and graph pooling is verified by comparative experiments.


Introduction
With the development of power system, power grids in different regions are interconnected more closely, and power grid equipment load becomes heavier, traditional power flow calculation methods such as Newton-Raphson (N-R) method are prone to ill-conditioned power flow [1] , resulting in slow solution speed and non-convergence of power flow calculation [2] .
As a data-driven method, the power flow analysis method based on deep learning can directly fit the mapping relationship between the initial value of power flow and the power flow distribution of the real power system without iterative calculation of the Jacobian matrix, so the time complexity is only linear with the growth of the node scale, and the ill-conditioned power flow problem caused by the large condition number of the Jacobian matrix will not occur [3] .
However, the existing power flow calculation methods based on deep learning only use regress model to fit the relationship between the initial value information of the power flow and the power flow distribution, without considering the non-convergence of the power flow.From the perspective of neural network modelling, the calculation of power flow distribution is a regression problem with the distribution of power flow as output, while the judgment of power flow convergence is a binary 2 classification problem which output whether the power flow is convergent or not.There are essential differences between them in model structure [4] .Therefore, the power flow distribution model based on regression model does not have the ability to distinguish whether the power flow converges, so it will still give false system power flow distribution for the input case of non-convergent power flow.This defect greatly limits the practical application of deep learning power flow calculation methods, which needs to pre-judge the input power flow cases to make up for it.So, it is necessary to study the power flow calculation method based on data-driven method.
To solve this problem, this paper proposes a new GNN graph classifier suitable for power flow convergence problem, which can be used as a pre-step of the power flow distribution calculation model based on deep learning.While maintaining its advantage of calculation speed, it solves the problem that it does not consider the power flow convergence.

Input feature
Proposed model required the node information contains the node type T ,voltage amplitude V and angle  , active load L P and reactive load L Q , active power G P and reactive power G Q of the generator.What's more, in view of the voltage of the generator out of limit will cause the node type transformation, we also add the bound of generator voltage reactive power max Q and min Q to the node feature, so that the proposed model could consider of the effect of node type transformation implicitly.So, the node feature of node i v is all node features can be represented by node feature matrix 9 1

, , [ , ] T NN h h h
, where N is the number of node in graph.
Different from general graph data, topological connection and edge information can be represented by node admittance matrix NN  Y in power flow analysis.It can directly reflect the topological connection and line admittance of grid, so the we uses node admittance matrix as edge feature.

Graph classifier based on DGAT-GPPool
The GNN graph classification method mainly uses graph convolution to extract node features, summarizes the graph-level features through graph pooling, and finally inputs the graph-level features into the classifier to get the result [4] .The overall structure of the proposed graph classifier is shown in Figure 1.

Heterogeneous feature extraction layer
In power flow analysis problem, nodes are nodes are divided into PQ , PV and V type, each type of node have different power flow initial value.Therefore, in the view of power flow calculation, the power grid is naturally a heterogeneous graph.Nodes in heterogeneous graphs usually have different features, it is not convincing to directly apply the graph neural network based on the isomorphic graph to heterogeneous graphs [5] , so some additional work is needed to eliminate the heterogeneity of graphs.Therefore, we use different Fully Connected Layer, which is essentially a learnable linear transformations to extract features of different types of nodes, after that, the graph can be regarded as the isomorphic graph.

Double-view graph attention network layer
In order to solve the problem of classical graph convolution methods doesn't consider the edge feature, a double-view graph attention mechanism is proposed.In view of the physical property that the line admittance is proportional to the tightness of node connection, double-view graph attention mechanism calculates the node attention coefficient through the classical graph attention mechanism [6] , and calculates the edge attention coefficient according to the line admittance.Finally, double-view graph attention is obtained by weighted average of the attention coefficients under the two views through the learnable weight parameter.The calculation formula is shown in equation ( 1): is the attention mechanism; ‖ operator stands for vector concatenation, LeakReLU( )  is the activate function for nonlinear transformation; || ij Y is the mode of the element in the row i and column j of the node admittance matrix; w is a learnable weight parameter.After calculating the comprehensive attention, DGAT update of node features through the message passing aggregation mechanism as shown in Equation (4).
Where () i N is the neighbourhood of the node i ; i h  is the feature vector of the updated node i ; W is a learnable parameter matrix shared with the attention mechanism; K is the number of heads of attention used; ()   is a nonlinear activation function.To sum up, the proposed graph convolution method DGAT can better describe the closeness of connections between nodes in the grid with the help of the good property of line admittance, which improves the calculation accuracy of the model and the interpretability of the model.

Grid partition base pooling layer
In the problem of power flow convergence, the graph pooling method extracts graph-level features from the input graph.However, most of the graph pooling methods also do not consider the influence of edge attributes on graph-level features.In order to solve this problem, we propose a proposes a graph pooling method based on power grid partition, which adds the edge features in the graph into the graph pooling calculation process, and retains the physical meaning of the edge feature attributes after pooling, so that it could be used with the DGAT method together.Compared with other graph pooling methods that divide subgraphs by the pure graph theory, the proposed method that divides subgraphs from the power grid partition method is more suitable for power flow convergence problem.
The subgraph partition method of GPPool is to first calculate the electrical distance between any nodes, then calculate the score of all nodes and select the node with the highest score as the centre node, and finally divide the remaining nodes into the subgraph belonging to the centre node with the nearest electrical distance.The graph coarsening process in GPPool is shown in Figure 2. The basis of grid partition is electrical distance [7] .Since the power in the grid is distributed according to impedance when the power supply and network topology are constant, we used the mode of the twoport network input impedance to measure the electrical distance between nodes.The calculation formula is shown in equation ( 3 Where ij D is the electrical distance between node i v and j v ; ij Z is the element in the row i and column j of the node impedance matrix. After calculating the electrical distance, GPPool quantitatively describes the strength of the electrical coupling between nodes by the reciprocal of the sum of the electrical distances between nodes, so that it could identifies whether a node is in a key position of grid in the view of the network topology and line impedance.Since nodes with generators have a greater impact on the calculation results than other nodes in power flow convergence, additional scores are assigned to all nodes with generators, so that it can be retained as much as possible in the pooling process.The calculation formula is shown in equation ( 9): After that, we could extract the node feature matrix k H and node admittance matrix k Y of subgraph k through () While calculate all subgraph's node feature matrix and node admittance matrix, all node features of the subgraph k are aggregated to the subgraph feature, which be seem as the supernode k 's node feature.The subgraph is fed into the DGAT rater with learnable parameter matrix After Graph coarsening, GPPool needs to compute the admittance matrix of the coarsened graph nodes that conforms to the actual equivalent simplification law.The matrix ma Y is defined as the admittance matrix containing only mutual admittance of the coarsen graph, which is calculated as shown in Equation ( 5).
Where m is the number of cluster centres set by the hyperparameter, that is the number of supernode in the output graph.If as shown in Figure 2  Y is the element in the row i and column j of the subgraph k 's node impedance matrix.As known that the node admittance matrix satisfies the sum of row and column are both 0, the node admittance of output coarsen graph is the sum of ma Y and sa Y , that is Therefore, it can be considered that the hierarchical graph pooling based on GPPool conforms to the equivalent simplification law of the actual power system when coarsening the graph, and the edge features in the coarsened graph retain their physical meaning as line admittances after pooling.

Dataset generation
The training set and test set were generated using the IEEE 14-bus system as the base scenario.Four categories of extended scenarios were considered: node failures, line maintenance, generator maintenance, and equipment commissioning.Figure 3 displays the electrical wiring diagrams for both the base scenario and the extended scenarios.To create the samples, modifications were made to the node voltage, phase angle, active power, reactive power, active load, reactive load, line resistance and reactance values in both the base and extended scenarios.Each parameter was randomly varied within a uniform distribution ranging from 80% to 120% of their initial reference values.After that, based on the calculation results, the samples were annotated.In the end, 9000 power flow samples were selected as the training set, and 1000 samples were used as the testing set, both of them were designed to ensure that 50% of the total samples were power flow converged samples, meeting the requirement of a uniform data distribution.

Model structure and hyperparameter setting
In terms of model structure, after heterogeneous feature extraction and node score calculation, the graph will be input into 3 times of DGAT-GPPool structure, each DGAT-GPPool structure contain two DGAT layer for node feature extract, a global GPPool for summarize the graph feature and a hierarchical GPPool for graph coarsening.Finally, the graph-level features summarized from the different levels of graph will be spliced together and input into the MLP binary classifier to get the classify result.
In terms of hyperparameter Settings, the number of features of all hidden layer in the graph classifier is set to 5, DGAT layer adopts three-head attention mechanism, and each hierarchical GPPool reduce the number of nodes in the original graph by about one-third.What's more, all activate function in the model is LeakyReLU which negative slope is 0.3.

Model performance test
In this paper, we used True Positive Rate (TPR), True Negative Rate (TNR), overall Accuracy (Acc) and F1 Score (F1) to assess the performance of the model.The proposed graph classifier based on DGAT-GPPool was employed for 100 rounds of training and testing on the dataset described in section 3.1.The training process is as shown in Figure 3.

Comparative experiments
To assess the effectiveness of the proposed DGAT-GPPool graph classifier, a comparative experiment was conducted, involving classical graph convolution and graph pooling algorithms.In this experiment, only the corresponding components of the proposed graph classifier were replaced with the respective components of the comparison models, while keeping the overall structure of the models unchanged.The comparison experiment was divided into three groups.Group 1 is to analyze the influence of heterogeneous feature extraction layer on the model performance, the proposed models will be compared with the GCN [8] -SAGPool [9] -ISO model without a heterogeneous feature extraction layer and the base-line GCN-SAGPool model.The performance of each model on the test set is shown in Figure 4 and Table 2.  Group 2 is to analyse the influence of graph convolution layer on the model performance, the proposed DGAT layer will be compared with GCN and GAT layer.All models except the baseline model GCN-SAGPool used GPPool as the graph pooling layer.The performance of each model on the test set is shown in Figure 5 and Table 3.  Group 3 is to analyse the influence of graph pooling layer on the model performance, the proposed GPPool layer will be compared with MaxPool, MeanPool, TopkPool [10] and SAGPool.All models except the baseline model GCN-SAGPool used DGAT as the graph convolution layer.The performance of each model on the test set is shown in Figure 6 and Table 4.

Figure 2 .
Figure 2. Example of GPPool layer coarse the 5-node graph into a 2-node graph.

S
is the electrical coupling strength of node i ; is the number of nodes in the figure;i G is the additional score of node i , it equal to 1 when node i is PV or V node, otherwise it equal to 0. when node i G is PV or V node it equal to 1, otherwise it equal to 0. When zooning subgraph, we could denote the node set which divide into the subgraph k as k  , and let k i v  denote that the node i v is assigned to the subgraph k .Then calculate the node assignment matrix S by assignment result, that is when k v  the element 1 ik s = in S .The nodes in the subgraph can be sorted in any way, set ()

1
there are multiple nodes are connected by r edges 12 ,, , r ee e  between subgraph A and B, the mutual admittance ma AB Y between supernode A and B in ma Y y is the admittance of i e .Since the subgraph contains both nodes and edges, the self-admittance of nodes in the subgraph and the mutual admittance between nodes should be considered when calculating the self-admittance of the supernode.Define the diagonal matrix sa Y

Figure 3
Figure 3 illustrates that the computational loss of the proposed DGAT-GPPool graph classifier reaches a plateau after approximately 55 training epochs, indicating that further training does not significantly improve the model's performance.The final performance of the model on both the training set and testing set is summarized in Table1.

Figure 4 .
Figure 4. Performance of each model in comparison group 1 on the test set.

Figure 5 .
Figure 5. Performance of each model in comparison group 2 on the test set.

Figure 6 .
Figure 6.Performance of each model in comparison group 3 on the test set.

Table 2 .
Best Performance of each model in comparison group 1 on the test set.

Table 3 .
Best Performance of each model in comparison group 2 on the test set.