Higher-order random network models

Most existing random network models that describe complex systems in nature and society are developed through connections that indicate a binary relationship between two nodes. However, real-world networks are so complicated that we can only identify many critical hidden structural properties through higher-order structures such as network motifs. Here we propose a framework in which we define higher-order stubs, higher-order degrees, and generating functions for developing higher-order complex network models. Then we develop higher-order random networks with arbitrary higher-order degree distributions. The developed higher-order random networks share critical structural properties with real-world networks, but traditional connection-based random networks fail to exhibit these structural properties. For example, as opposed to connection-based random network models, the proposed higher-order random network models can generate networks with power-law higher-order degree distributions, right-skewed degree distributions, and high average clustering coefficients simultaneously. These properties are also observed on the Internet, the Amazon product co-purchasing network, and collaboration networks. Thus, the proposed higher-order random networks are necessary supplements to traditional connection-based random networks.


Introduction
Many real-world complex systems can be modeled as complex networks.In the study of complex networks, most structural properties of networks are derived from connections or edges [1][2][3].For example, the degree of a node or vertex is the number of edges incident to the node.Previous research [4] indicates that degree correlations exist in real-world networks.Community structure [5][6][7], which means that nodes in the same community are densely connected, but nodes between different communities are sparsely connected, is also a typical edge-based feature.
To model real-world complex systems, edge-based random network models have been extensively studied.Erdős and Rényi [8] proposed a simple random network model called the ER (Erdős-Rényi) model, which is a widely used model of real-world networks.However, researchers found that real-world networks have critical properties that do not exist in ER random networks.As a result, various complex network models other than the ER model were proposed.Specifically, Watts and Strogatz [9] introduced the WS (Watts-Strogatz) model, which can generate networks with small characteristic path lengths and large clustering coefficients.Later, Barabási and Albert [10] proposed the BA (Barabási-Albert) model, which generates scale-free networks.Then Newman et al [11] developed random graph models with arbitrary degree distributions.Besides, Papadopoulos et al [12] improved preferential attachment models such that both popularity and similarity are under consideration.
An edge represents a relationship between two nodes.In contrast, higher-order structures can characterize complex relationships between n (n > 2) nodes.Because real-world networks are complicated, it is necessary to explore network properties based on higher-order structures.Network motifs [13], statistically significant connectivity patterns, illustrate the importance of higher-order structures.For example, motif-based communities reveal critical hidden structural properties of complex networks that cannot be described by traditional edge-based community structure.To effectively identify motif-based communities, motif-based community detection was widely investigated [14][15][16].The higher-order clustering coefficient [17], which generalizes the traditional edge-based clustering coefficient, is another proposed property based on higher-order structures.Graphlets, which are special subgraphs, were also extensively studied [18].
Besides properties based on higher-order structures, researchers pay attention to higher-order network or graph models [19].Hypergraphs are classic higher-order graph models, in which hyperedges indicate relationships among an arbitrary number of nodes.Recently, the generative model of clustered hypergraphs was introduced [20].In addition, the consensus dynamics over temporal hypergraphs were investigated [21].Moreover, a three-body consensus dynamical model over hypergraphs was developed [22], which features asymmetric roles of interacting agents in a triangle.Simplicial complexes on n vertices [23] can also be used to develop higher-order graph models.For example, exponential random simplicial complexes [24], which generalize exponential random graphs, were proposed.Besides, growing simplicial network models for complete graphs or cliques were introduced [25].Researchers also introduced and investigated various multilayer network models [26,27], such as multiplex networks [28], interdependent networks [29], and temporal networks [30].However, current higher-order network models cannot describe many essential features based on higher-order structures in real-world complex systems.For developing higher-order complex network models, we introduce a framework including higher-order stubs, adjacency tensors, higher-order degrees, and generating functions for higher-order degrees.Then we develop higher-order random networks with arbitrary higher-order degree distributions through the proposed framework.In the supplementary material, we also propose higher-order stochastic blockmodels and higher-order preferential attachment models through the proposed framework.

The framework for higher-order complex network models
At first, we consider the triangle.The triangle-based higher-order stub is presented in figure 1(b).In a network G with n nodes, we define the triangle-based higher-order degree of node u as the number of triangle-based higher-order stubs that u has in G. Then the triangle-based higher-order degree of u equals the number of triangle instances in which u participates.Here, a triangle (or motif) instance is an induced subgraph of G that is isomorphic to the triangle (or motif).Next, we define triangle-based adjacency tensors.We use a pure covariant tensor of mixed type (0, 3) to represent an adjacency tensor.Let e 1 , . . ., e n corresponding to n vertices 1, . . ., n be the standard basis of R n .Then adjacency tensor A Tr has the form where A Tr ijk is a component of the tensor and Einstein summation notation is used.Given the basis, the adjacency tensor is uniquely determined by its n 3 components.Three distinct nodes i, j, and k are said to satisfy condition Γ if induced subgraph H with node set {i, j, k} is isomorphic to the triangle.The components of triangle-based adjacency tensor A Tr are defined as Since there is only one type of the triangle-based higher-order stub, A Tr satisfies Further, we define the triangle-based generating function as where p (2) k is the probability that a randomly chosen node has triangle-based higher-order degree k.The properties of G where z is the average triangle-based higher-order degree.The degree distributions of many real-world networks approximately follow power laws, namely, p k ∼ k −τ .We observe that the higher-order degree distributions of real-world networks also have special features such as following power laws.Here we consider the Internet, the Amazon product co-purchasing network [31], and the Arxiv COND-MAT (Condensed Matter Physics) collaboration network [32].Figure 2 shows that the triangle-based higher-order degree distributions of all three networks approximately follow power laws.
In the following, we consider higher-order structures with directed edges.Specifically, we consider the FFL (feed-forward loop), whose higher-order stubs are shown in figure 1(d).For the FFL, higher-order stubs are useful gadgets to identify the roles of nodes.For example, In E. coli [13], if node u has an α-stub, it indicates that the transcription factor u regulates another transcription factor v (node v has a β-stub), and both u and v modulate the transcription rate of target gene w (node w has a γ-stub).The α-degree of node u is the number of α-stubs that u has.Here the number of α-stubs of u equals the number of FFL instances containing u such that u has out-degree 2 in each instance.We define the β-degree and the γ-degree of u in the same way as the α-degree.Then we define the adjacency tensor for FFL M 5 .A triplet (i, j, k) with three distinct nodes is said to satisfy condition ∆ if induced subgraph H with node set {i, j, k} is isomorphic to M 5 such that i has an α-stub, j has a β-stub and k has a γ-stub in H. Then the components of FFL-based adjacency tensor A M5 are defined as Similar to adjacency matrices, we can obtain higher-order degrees of nodes from adjacency tensors.For example, the α-degree Finally, we introduce a generating function for the FFL.The generating function for the joint probability distribution of α-degrees, β-degrees, and γ-degrees is defined by ijk x i y j z k (8) where p ijk is the probability that a randomly chosen node has α-degree i, β-degree j, and γ-degree k.Thus we have where z α , z β , and z γ are the average α-degree, the average β-degree, and the average γ-degree respectively.Equation ( 9) are equivalent to the following condition: Figure 3 presents the FFL-based higher-order degree distributions of the arXiv HEP-PH (High Energy Physics Phenomenology) citation network [33].All three FFL-based higher-order degree distributions of the arXiv HEP-PH citation network approximately follow power laws.
In comparison to hypergraphs and motif adjacency matrices, higher-order stubs and adjacency tensors provide more details about relationships based on higher-order structures.In particular, hyperedges or motif adjacency matrices cannot provide information about types of higher-order stubs.Similar to the directions of edges, the types of higher-order stubs are useful in the study of complex networks.

Higher-order random networks with undirected edges
We develop higher-order random networks with arbitrary higher-order degree distributions through the triangle.It is straightforward to develop other higher-order random network models based on the presented model.The process of generating a network through triangle-based higher-order stubs is as follows.We generate n random numbers according to the triangle-based higher-order degree distribution p (2) k .The only requirement is that the sum of the numbers should be divisible by three.If the sum is not divisible by three, we discard the n numbers and generate a new random set with n numbers until the above condition is satisfied.After the n numbers are available, we randomly choose three triangle-based higher-order stubs and join them in the network until all higher-order stubs are chosen.Algorithm 1 presents a pseudocode to generate a network G whose triangle-based higher-order degree distribution approximately follows F. In the algorithm, a random number X is sampled from distribution F.There are many methods [34] to draw a sample from the given distribution.For example, the rejection sampling algorithm can return a random number X with the given distribution.Further, 1 ⩽ ⌊X⌋ ⩽ (n − 1)(n − 2)/2 requires the triangle-based higher-order degrees to be larger than zero and not too large.We allow loops and parallel edges, and thus the triangle-based higher-order degree distribution of the generated networks may not follow p (2) k .To resolve this problem, the triangle-based higher-order degree of a node incident to loops and (or) parallel edges can be defined as follows.Without loss of generality, the node set is represented as {1, • • • , n}.If node i is incident to two parallel edges and a loop, we add two to d (2) (i); if i is incident to two parallel edges and adjacent to a different node j with a loop, we add one to d (2) (i); if i is incident to three loops, we add three to d (2) (i).If endpoints i and j (i < j) of two parallel edges have loops, we add two to d (2) (i) and add one to d (2) (j).When computing d (2) (i), we first consider loops and remove all the relevant loops, and then we consider parallel edges (along with their corresponding loops).The above method can uniquely determine the triangle-based higher-order degrees and the modified triangle-based higher-order degrees of any network drawn from the proposed higher-order random network model are identical to the predetermined n random numbers.Figure 4 presents an example.In some cases, the networks drawn from the proposed higher-order random network model have few loops and parallel edges.For example, through numerical simulation, we find that the triangle-based higher-order degree distribution of the simple graphs drawn from the model with p Networks drawn from the developed higher-order random network models have crucial structural properties presented in real-world networks.We observe that the degree distributions of the real-world networks in figure 2 are right skewed.Similarly, figure 5(a) indicates that the degree distribution of the proposed higher-order random networks with power-law triangle-based higher-order degree distributions are also right skewed.In contrast, the triangle-based higher-order degrees of random graphs with power-law degree distributions are sparsely distributed (see figure 6(b)).It is not difficult to explain this phenomenon, since random graphs with power-law degree distributions generally have a tree-like structure, and hence contain only a few triangle instances.We also observe that the real-world networks in figure 2 have large average clustering coefficients (see table 1).In addition, table 1 shows that average clustering coefficients of four real-world networks with power-law degree distributions are between 0.6 and 0.7. Figure 7 indicates that the average clustering coefficients of the networks generated by higher-order random network models with power-law triangle-based higher-order degree distributions are larger than 0.57.However, random graphs with power-law degree distributions have small average clustering coefficient values.In particular, figure 7 shows that the average clustering coefficients of random graphs with power-law degree distributions are close to zero when the exponent is greater than 2.4.
At the end of this subsection, we introduce another higher-order random network model with arbitrary higher-order degree distributions of theoretical interest.Given a network, some edges may not participate in any triangle instance.Therefore, we define 'residual network' H of G as the network obtained by removing all triangle instances from G.During the deletion of a triangle instance, only the edges of the triangle instance are deleted.Then the node sets of H and G are the same.We define the residual degree of a node in G as the degree of the node in H. Next, we define generating function G (2) 2 (x, y) for the joint probability distribution of higher-order degrees and residual degrees as Figure 4. Example of triangle-based higher-order degrees.The triangle-based higher-order degrees are d (2) (u) = 2, d (2) (v) = 1, and d (2) (w) = 3.
Figure 5. Distributions of higher-order random networks with power-law triangle-based higher-order degree distributions.Networks with 10 5 nodes are generated from the higher-order random network model whose triangle-based higher-order degree distribution p (2)   where q ij is the probability that a randomly chosen node from G has triangle-based higher-order degree i and residual degree j.
The procedure to generate a network from the triangle-based higher-order random network model is as follows.Initially, there are n isolated nodes.Then we generate a sequence of n pairs (d (2) (u), r(u)) Figure 7. Average clustering coefficient as a function of the exponent τ for the power-law distribution k −τ .Each average clustering coefficient is derived from the following approximation algorithm that runs 10 3 times on the network with 10 5 nodes generated from a random network model.The algorithm randomly chooses a node u, randomly chooses two neighbors of u, and finally checks whether the chosen neighbors are connected.Then the approximate average clustering coefficient is the fraction of triangles found over the 10 3 runs.The circles correspond to the networks generated by the higher-order random network model with power-law triangle-based higher-order degree distributions.The diamonds correspond to the networks generated by the random network models with power-law degree distributions.representing triangle-based higher-order degrees and residual degrees of nodes according to the distribution q ij .We require that the sum ∑ u d (2) (u) is divisible by three and the sum ∑ u r(u) is even.If these conditions are not satisfied, we repeat the following processes until the conditions are satisfied: randomly choose a node u; delete the pair (d (2) (u), r(u)); generate a new pair from the distribution q ij .In the following, we randomly choose three triangle-based higher-order stubs and place a triangle instance on the network by joining them until all triangle-based higher-order stubs are chosen.Next, we randomly choose two stubs and then place an edge on the network by joining the stubs until all stubs are chosen.Notice that the sum of degrees is ∑ u (2d (2) (u) + r(u)), which is automatically even.

Higher-order random networks with directed edges
In this subsection, we develop higher-order random network models through the FFL.Specifically, we develop higher-order random network models with arbitrary FFL-based higher-order degree distributions.Initially, we randomly generate a set of n triplets (i u , j u , k u ) according to joint probability distribution p (2) ijk .Here triplet (i u , j u , k u ) indicates that the numbers of α-stubs, β-stubs and γ-stubs of node u are i u , j u and k u respectively.Then we compute the sums ∑ u i u , ∑ u j u and ∑ u k u .These three sums should be equal.Otherwise, we repeat the following processes until the three sums are found to be equal: randomly choose a node u and discard triplet (i u , j u , k u ) for u; then generate a new triplet according to distribution p (2) ijk .After n triplets are available, we randomly choose an α-stub, a β-stub, and a γ-stub, and then place an FFL instance by connecting the three higher-order stubs until all higher-order stubs are chosen.Similar to the proposed model for undirected networks, the FFL-based higher-order degree distribution of the directed networks drawn from the model may not follow p (2) ijk .To resolve the problem, we can revise the definition of the FFL-based higher-order degree.Figure 8 presents an example.In some cases, the networks drawn from the proposed higher-order random network model have few loops.Then the FFL-based higher-order degree distributions of the simple graph corresponding to the original sampled network approximately follow p (2) ijk .Algorithm 2 presents a pseudocode to generate a directed network whose FFL-based higher-order degree distribution approximately follows F. Algorithm 2. Sampling a higher-order random network with FFL-based higher-order degrees.Randomly choose an α-stub s1, a β-stub s2, and a γ-stub s3 from S4, S5, and S6 respectively, then remove them from S4, S5, and S6 Join the α-stub s1, the β-stub s2 and the γ-stub s3 ▷ edges of G are created Similar to the arXiv HEP-PH citation network, the in-degree and out-degree distributions of the networks generated by the proposed higher-order random network model with power-law FFL-based higher-order degree distributions also approximately follow power laws (see figures 9(d) and (e)).In contrast, for random networks with power-law in-degree and out-degree distributions, the FFL-based higher-order degrees are sparsely distributed (see figures 10(a)-(c)).

Input
In real-world complex networks, there are edges that are not contained in any FFL instance.For example, many regulatory interactions in E.coli are not contained in any FFL instance.Then we define the 'residual network' H of a directed network D as the network obtained by removing all FFL instances from D. During the deletion of an FFL instance, only edges are deleted.Thus the node sets of H and D are the same.We define the residual in-degree or out-degree of a node in D as the in-degree or out-degree of the node in H. Then we define generating function G (2) 3 (v, w, x, y, z) for the joint probability distribution of FFL-based higher-order degrees and residual degrees as where q ijklm is the probability that a randomly chosen node from D has α-degree i, β-degree j, γ-degree k, residual in-degree l and residual out-degree m.In fact, the residual networks of a directed network can be considered as the resulting network of the target attack.In the target attack, only edges of the FFL instances are deleted.For instance, if the regulatory interactions of all the FFL instances are malfunctioning in E.coli, then the resulting network is the residual network of E.coli.Now we propose another FFL-based higher-order random network model of theoretical interest.Initially, there are n isolated nodes.Next, we generate n tuples (d α (u), d β (u), d γ (u), r in (u), r out (u)), one for each node u, according to joint distribution q ijklm .Then we compute the sums , we repeat the following processes until the conditions are satisfied: randomly choose a node u; discard the tuple (d α (u), d β (u), d γ (u), r in (u), r out (u)); generate a new tuple for u from the joint distribution.We place an FFL instance by randomly choosing an α-stub, a β-stub, and a γ-stub until all higher-order stubs are chosen.Next, we place a directed edge by randomly choosing an in stub and an out stub until all stubs are chosen.

Conclusions
We have introduced the framework for developing higher-order complex network models that exhibit crucial structural properties of real-world complex systems.Then we have proposed higher-order random network models with arbitrary higher-order degree distributions.In the supplementary material, we have introduced additional higher-order network models.In conclusion, we believe that other edge-based notions can also be generalized to the corresponding notions based on higher-order structures by our framework.For example, k-cores [35] can be generalized to higher-order k-cores using the proposed higher-order stubs.It is also imperative to explore dynamical processes by considering higher-order interactions.A notable study in this area was conducted by Shang [36], who studied consensus formation over directed hypergraphs.Specifically, Shang innovatively introduced a consensus formation framework using Petri nets.Inspired by the work of Shang, we will investigate social dynamics over the proposed higher-order complex network models in the future.

Figure 2 .
Figure 2. Triangle-based higher-order degree distributions of real-world networks.(a) Amazon product co-purchasing network.(b) ArXiv COND-MAT collaboration network.(c) Internet.

Figure 3 .
Figure 3. FFL-based higher-order degree distributions of the arXiv HEP-PH citation network.

( 2 )Algorithm 1 .
k ∼ k −3 (we remove loops and keep only one copy of parallel edges of the original networks) approximately follow the same distribution.Sampling a higher-order random network with triangle-based higher-order degrees.Input: Number of nodes n, distribution F Output: Network G V ← {1, . . .,n} total Deg ← 0 Let S1 be an empty sequence do { Set S1 to be an empty sequence total Deg ← 0 while |S1| < n Sample a random number X with distribution function F if 1 ⩽ ⌊X⌋ ⩽ (n−1)(n−2) 2 Add the integer ⌊X⌋ into S1 total Deg ← total Deg + ⌊X⌋ } while(total Deg%3 ̸ = 0) Let S2 be an empty sequence for i in V Add S1[i] triangle-based higher-order stubs of node i into S2 while(|S2| > 0) Randomly choose three elements h1, h2, h3 from S2 and remove them from S2 Join one stub in h1 to one stub in h2, one stub in h1 to one stub in h3, and one stub in h2 to one stub in h3 ▷ edges of G are created where k represents a triangle-based higher-order degree.The solid line is the function f(x) = x −3 .(a) Degree distribution of the network.(b) Triangle-based higher-order degree distribution of the network.

Figure 6 .
Figure 6.Distributions of random networks with power-law degree distributions.Network with 10 5 nodes are generated from the random network model whose degree distribution p(k) satisfies p(k) ∼ k −3 where k represents a degree.The solid line is the function f(x) = x −3 .(a) Degree distribution.(b) Triangle-based higher-order degree distribution.

Figure 9 .
Figure 9. Distributions of higher-order random networks with power-law FFL-based higher-order degree distributions.Directed networks with 10 5 nodes are generated from the proposed higher-order random network model whose joint distribution p (2) ijk is a product of three independent power-law distributions such that each distribution is k −3 where k represents an α-degree, a β-degree or a γ-degree.The solid lines correspond to function f(x) = x −3 .(a) α-degree distribution.(b) β-degree distribution.(c) γ-degree distribution.(d) In-degree distribution.(e) Out-degree distribution.

Figure 10 .
Figure 10.Distributions of directed random networks with power-law degree distributions.Directed networks with 10 5 nodes are generated from the random network model whose joint distribution of in-degrees and out-degrees is a product of two independent power-law distributions such that each distribution is k −3 where k represents either an in-degree or an out-degree.(a) α-degree distribution.(b) β-degree distribution.(c) γ-degree distribution.(d) In-degree distribution.(e) Out-degree distribution.

Table 1 .
Average clustering coefficients of real-world networks.

:
Number of nodes n, distribution F