Message-Passing on Hypergraphs: Detectability, Phase Transitions and Higher-Order Information

Hypergraphs are widely adopted tools to examine systems with higher-order interactions. Despite recent advancements in methods for community detection in these systems, we still lack a theoretical analysis of their detectability limits. Here, we derive closed-form bounds for community detection in hypergraphs. Using a Message-Passing formulation, we demonstrate that detectability depends on hypergraphs' structural properties, such as the distribution of hyperedge sizes or their assortativity. Our formulation enables a characterization of the entropy of a hypergraph in relation to that of its clique expansion, showing that community detection is enhanced when hyperedges highly overlap on pairs of nodes. We develop an efficient Message-Passing algorithm to learn communities and model parameters on large systems. Additionally, we devise an exact sampling routine to generate synthetic data from our probabilistic model. With these methods, we numerically investigate the boundaries of community detection in synthetic datasets, and extract communities from real systems. Our results extend the understanding of the limits of community detection in hypergraphs and introduce flexible mathematical tools to study systems with higher-order interactions.


Introduction
Modeling complex systems as graphs has broadened our understanding of the macroscopic features that emerge from the interaction of individual units.Among the various aspects of this problem, community detection stands out as a fundamental task, as it provides a coarse-grained description of a network's structural organization.Notably, community structure is observed across different systems, such as food webs [1], spatial migration and gene flow of animal species [2], as well as in social networks [3], power grids [4], and others [5].
In the case of networks with only pairwise interactions, there are solid theoretical results on detectability limits, describing whether the task of community detection can or cannot succeed [6][7][8][9][10][11].However, many complex systems with interactions that extend beyond pairs are better modeled by hypergraphs [12], which generalize the simpler case of dyadic graphs.Phenomena that have been investigated on graphs are now readily explored on hypergraphs, with examples including diffusion processes, synchronization, phase transitions [13] and, more recently, community structure [14][15][16][17][18].
Extending the rigorous results of detectability transitions for networks to higherorder interactions is a relevant open question.
One of the main obstacles in modeling hypergraphs is their intrinsic complexity, which poses both theoretical and computational challenges and restricts the range of results available in the literature.The difficulty of defining communities in hypergraphs and of deriving theoretical thresholds for their recovery has limited investigations to the study of d-uniform hypergraphs, i.e., hypergraphs that only contain interactions among exactly d nodes [19][20][21][22][23][24][25][26][27].
A related line of literature focuses on the detection of planted sub-hypergraphs [28,29] and testing for the presence of community structure in hypergraphs [30,31].Generally, extracting recovery results on non-uniform hypergraphs proved to be demanding, with scarce literature on the subject.
Recently, Chodrow et al. [32] conjectured a recoverability threshold for their spectral clustering algorithm on non-uniform hypergraphs.Closer to the scope of our work, Dumitriu and Wang [18] provide a probabilistic model and bounds for the theoretical recovery of communities under the same model.However, such detectability bounds are based on algorithms which are not feasible in practice, and no empirical demonstration of the predicted recovery is provided.Furthermore, all these methods lack a variety of desirable probabilistic features, such as the estimation of marginal probabilities of a node to belong to a community, a principled procedure to sample synthetic hypergraphs with prescribed community structure, and the possibility to investigate the energy landscape of a problem via free energy estimations.
In this work, we address these issues by deriving a precise detectability threshold for hypergraphs that depends on the node degree distribution, the assortativity of the hyperedges, and crucially, on higher-order properties such as the distribution of hyperedge sizes.Additionally, we show how these properties can be formally described via notions of entropy and information, leading to a clear interpretation of the role of higher-order interaction in detectability.
Our approach is based on a probabilistic generative model and a related Bayesian inference procedure, which we utilize to study the limits of the community detection problem using a Message-Passing (MP) formulation [33][34][35], originating from the cavity method in statistical physics [36,37].We focus on an extension to hypergraphs of the stochastic block model (SBM) [38,39], a generative model for networks with community structure.Several variants of the SBM [15], and of its mixed-membership version [16,17], have been extended to hypergraphs.The model we utilize is an extension of the dyadic SBM to hypergraphs and allows generalizing the seminal detectability results of Decelle et al. [6,7] to higher-order interactions.
In addition to our theoretical contributions, we derive an algorithmic implementation for inferring both communities and parameters of the models from the data.Our implementation scales well to both large hypergraphs and large hyperedges, owing to a dynamic-program formulation.
Finally, we show how, with additional combinatorial arguments, one can efficiently sample hypergraphs with arbitrary communities from our probabilistic model.This problem, often studied in conjunction with inference, deserves its own attention when dealing with hypergraphs, as recently discussed in related work [40,41].
Through numerical experiments, we confirm our theoretical calculations by showing that our algorithm accurately recovers the true community structure in synthetic hypergraphs all the way down to the predicted detectability threshold.We also illustrate that our approach gives insights into the community organization of real hypegraphs by analyzing a dataset of group interactions between students in a school.To facilitate reproducibility, we release open source the code that implements our inference and sampling procedures [42].

The hypergraph stochastic block model
Consider a hypergraph H = (V, E) where V = {1, ..., N } is the set of nodes and E the set of hyperedges.A hyperedge e is a set of two or more nodes.We define Ω = {e : 2 ≤ |e|≤ D}, the set of all possible hyperedges up to some maximum dimension D ≤ N , with |e| being the size of a hyperedge, i.e., the number of nodes it contains.Notice that E ⊆ Ω.We denote with A e = 1 all e ∈ E and with A e = 0 hyperedges e ∈ Ω \ E.
Our Hypergraph Stochastic Block Model (HySBM) is an extension of the classical SBM for graphs [38,39].It partitions nodes into K communities by assigning a hard membership t i ∈ [K] ≡ {1, . . ., K} to each node i ∈ V , with t = {t i } i∈V being the membership vector.It does so probabilistically, assuming that the likelihood to observe a hyperedge A e is a Bernoulli distribution with a parameter that depends on the memberships {t i } i∈e of its nodes.Formally, the probabilistic model is summarized as where n = (n 1 , . . ., n K ) is a vector of prior categorical probabilities for the hard assignments t i .The Bernoulli probabilities are given by with 0 ≤ p ab ≤ 1 being elements of a symmetric probability matrix (also referred to as affinity matrix) and κ |e| a normalizing constant that only depends on the hyperedge size |e|.This can take on any values, provided that it yields sparse hypergraphs where π e /κ |e| = O(1/N ) and valid probabilities π e /κ |e| .We develop our theory for a general form of κ |e| and elaborate more on its choice in Appendix A. In our experiments we utilize the value 2 [17,41].Our specific formulation of the likelihood is only one among many alternatives to model communities in hypergraphs.The likelihood we propose has three main properties.First, HySBM reduces to the standard SBM when only pairs are present (as κ 2 = 1).Since we aim to develop a model that generalizes the SBM to hypergraphs, this is an important condition to satisfy.Second, it enables to develop the MP equations presented in the following section, which in turn lead to a theoretical characterization of the detectability limits and a computationally efficient algorithmic implementation.Third, the likelihoods based on expressions similar to Equation (3) have been shown to well describe higher-order interactions that possibly contain nodes from different communities [41].
For convenience, we work with a rescaled affinity matrix c = N p, which is of order where const.denotes quantities that do not depend on the parameters of the model.

Induced factor graph representation
The probabilistic model in Equations ( 1)-( 2) has a negative log-likelihood that can be interpreted as the Hamiltonian of a Gibbs-Boltzmann distribution on the community assignments t: where Z is the partition function of the system, that corresponds to the marginal likelihood of the data.The quantity F = − log Z is also called the free energy.The equivalence in Equation ( 5) allows interpreting the probabilistic model in terms of factor graphs [34].Here, the function nodes are hyperedges f ∈ Ω, and variable nodes are elements of V .The interactions between function and variable nodes can be read directly from the log-likelihood in Equation (4).In other words, the probabilistic model induces a factor graph F = (V, F, E) with variable nodes V = V , function nodes F = Ω and edges E = {(i, e) ∈ V × F : i ∈ e}.In Figure 1 we show a graphical representation of the equivalence between hypergraphs and factor graphs.For any variable node i and function node f of the factor graph we define the neighbors, or boundaries, as ∂i = {f ∈ F : (i, e) ∈ E}, being all function nodes adjacent to i, and ∂f = {i ∈ V : (i, e) ∈ E} being all variable nodes adjacent to f .

Message-Passing (MP)
Given the factor graph representation of HySBM, we can perform Bayesian inference of the community assignments via message-passing.Originally obtained from the (a) (b) Representing hypergraphs as factor graphs.(a) We depict a hypergraph and its factor graph equivalent.In the factor graph F, function nodes represent hyperedges.Notice that, while the node sets are the same in both representations, due to the presence of all possible hyperedges in the log-likelihood in Equation ( 4), the factor graph does not only contain the observed interactions E (black), but also the unobserved ones Ω \ E (gray).(b) In factor graphs, there are two types of messages: variable-to-function node q (red), and function-to-variable node q (blue).cavity method on spin glasses [36,37], MP allows estimating marginal distributions on the variable nodes of a graphical model by iteratively updating messages, auxiliary variables that operate on the edges of the factor graph.The efficiency of MP comes from the fact that the structure of the factor graph favors locally distributed updates.Although exact theoretical results are only proven on trees, MP has been shown to obtain strong performance also on locally tree-like graphs [34] and it has been extended to dense graphs with short loops [43,44].
Applying MP to our model, the inference procedure yields expressions for the marginal probabilities q i (a) of a node i to be assigned to any given community a ∈ [K].Their values are obtained as solutions to closed-form fixed-point equations, which involve messages q i→e (t i ) from variable to function nodes, and q e→i (t i ), from function to variable nodes.The messages follow the sum-product updates and yield marginal distributions as Notice that, compared to those for graphs, the MP equations for hypergraphs in Equations ( 6)-( 8) present additional challenges.First, in graphs the updates simplify.One can in fact collapse the two types of messages (and equations) into a unique one, since paths (i, f, j) in the factor graph reduce to pairwise interactions (i, j) between nodes.This simplification is not possible in hypergraphs, as one function node may connect more than two variable nodes.Second, the dimensionality of the MP equations grows faster when accounting for higher-order interactions.Here, the number of function nodes is equal to ) at large D = N .In contrast, one gets O(N 2 ) pairwise messages in the updates for graphs.To produce computationally feasible MP updates one can assume sparsity, as already done in the dyadic case.We outline such updates in the following theorem.
Theorem 1. Assuming sparse hypergraphs where c = O(1), the MP updates satisfy the following fixed-point equations to leading order in N .For all hyperedges e ∈ E and nodes i ∈ e, the messages and marginals are given by: q e→i (t i ) ∝ tj :j∈∂e\i π e j∈∂e\i q j→e (t j ) (10) where The updates in Equations ( 9)-( 12) are in principle computationally feasible, as products of function nodes f ∈ E have replaced products over the entire space f ∈ Ω.In sparse graphs, that we observe in many real datasets, E is much smaller than the original Ω, thus significantly decreasing the computation cost.An intuitive justification of Theorem 1, which we formalize in its proof, is that the observed interactions f ∈ E hold most of the weight in the updates of their neighbors, while the unobserved ones f ∈ Ω \ E send approximately constant messages and thus can be absorbed in the external field h introduced in Equation ( 12).This idea is inspired by the dyadic MP equations in Decelle et al. [6].However, in contrast to MP on graphs, a vanilla implementation of the updates is still not scalable in hypergraphs, as the computational cost of Equation (10) is O(K |e|−1 ).To tackle this issue, we develop a dynamic programming approach that reduces the complexity to O(K 2 |e|).Dynamic programming is exact, as it does not rely on further approximations on the MP updates, its detailed derivations are provided in Appendix D.1.
The fixed-point equations of Theorem 1 naturally suggest an algorithmic implementation of the MP inference procedure.We present a pseudocode for it in Appendix D.2.

Expectation-Maximization to learn the model parameters
We have presented a MP routine for inferring the community assignments {t i } i∈V .Now, we derive closed-form updates for the model parameters c, n via an Expectation-Maximization (EM) routine [45].Differentiating the log-likelihood in Equation ( 4) with respect to n, and imposing the constraint Notice that this update depends on the MP results, as N a = |{i ∈ V : arg max b q i (b) = a}| is the count of nodes assigned to community a according to the inferred marginals.
To update the rescaled affinity c we adopt a variational approach, where we maximize a lower bound of the log-likelihood, or, equivalently, minimize a variational free energy.
In Appendix C, we show detailed derivations for the following fixed-point updates where # e ab = i<j∈e δ tia δ tj b is the count of dyadic interactions between two communities a, b within a hyperedge e.In practice, when inferring t, n, c one proceeds by alternating MP inference of t, as presented in Section 3.2, with the updates of c and n in Equations ( 13)-( 14) until convergence.A pseudocode for the EM procedure is presented in Appendix D.2.

Sampling from the generative model
One of the main advantages of using a probabilistic formulation is the ability to generate data with a desired community structure.Among other tasks, this can be used in particular to test detectability results like the ones we theoretically derive in the following section.However, in hypergraphs, writing a probabilistic model does not directly imply the ability to sample from it, as is typically the case for graphs [40,41].In fact, while the O(N 2 ) configuration space of graphs allows performing sampling explicitly, in the case of hypergraphs the exploding configuration space Ω makes this task prohibitive, even for hypergraphs with moderate number of nodes and hyperedge sizes.
We propose a sampling algorithm that can efficiently scale and produce hypergraphs of dimensions in the tens or hundreds of thousands of nodes.We exploit the hardmembership nature of the assignments to obtain exact sampling via combinatorial arguments, as opposed to the approximate sampling in recent work for mixedmembership models [41].The key observation to obtain an efficient algorithm is that the hyperedge probabilities do not depend on the nodes they contain, but only on their community assignments, as implied by Equation (3).
With this in mind, we define the auxiliary quantity for a hyperedge e and community a ∈ [K], which is the count of nodes in e that belong to community a. Crucially, the hyperedge probability depends only on these counts: Therefore, all hyperedges with different nodes, but same counts # e 1 , . . ., # e K , have equal probability.
Using Equation ( 16), we sample hypergraphs as in Algorithm 1 with the following steps: (i) Iterate over the combinations.
Notice that there are N # = N1 #1 • . . .• N K # K hyperedges satisfying the count #, since we can choose # a nodes from the N a nodes in each community a. (iii) Sample the number of hyperedges.
Importantly, we do not sample the individual hyperedges, but the number of observed hyperedges.Since the individual hyperedges are independent Bernoulli variables with same probability, their sum X follows a binomial distribution: with probability π # fixed, determined by #, and number of realizations N # .Sampling directly from Equation ( 17) is numerically challenging for large N # and κ d , hence we adopt a series of numerical approximation summarized in Appendix E.1.(iv) Sample the hyperedges.
Given the count X of hyperedges sampled from Equation ( 17), we can sample the hyperedges.This operation is performed by independently sampling X times # a nodes from each community a.Notice that this procedure might yield repeated hyperedges, which are not allowed.In sparse regimes, this event has low probability [46].As a sensible approximation, we delete repeated hyperedges.
Owing to this sampling procedure, our results are not limited to theoretical derivations, but can be tested numerically on synthetic data, as we show in Appendix E.2.In Appendix E.1 we give a detailed analysis of the complexity, which is asymptotically upper bounded by O(N log N ).A pseudocode for this procedure is shown in Algorithm 1 and we provide an open source implementation of the sampling procedure [42].

Detectability bounds
Beside providing a valid and efficient inference algorithm, one of the main advantages of MP is the possibility of deriving closed-form expressions for the detectability of planted communities.The transition from detectable to undetectable regimes has been first shown to exist in MP-based inference models for graphs [6], and gave rise to an extensive body of literature on theoretical detectability limits and sharp phase transitions [8,9].Here, we extend these classical arguments to hypergraphs, and find relevant differences when higher-order interactions are considered.
In line with previous work, we restrict our study to the case where groups have constant expected degrees.In fact, in settings where such an assumption does not hold, it is possible to obtain good classification by simply clustering nodes based on their degrees [6].Formally, we assume for some fixed constant c.Notice that Equation (18) does not immediately imply a constant degree for the groups, as in hypergraphs the expected degree is defined

Algorithm 1 Sampling hypergraphs
Inputs: D, maximum size of hyperedges N , number of nodes K, number of communities n, prior of the community memberships p, affinity matrix sample node memberships using Equation ( 1) 16) ▷ (ii) sample X from Equation ( 17) ▷ (iii) for a = 1, . . ., K do sample X times # a nodes ▷ (iv) end for end for delete repeated hyperedges end if end for differently than the left-hand-side of the equation above.Nevertheless, in Appendix F.1 we prove that imposing the condition in Equation ( 18) does indeed imply a constant average degree.More precisely, Proposition 1. Assuming Equation (18), the following holds: • all the groups have the same expected degree; • the fixed points for the messages read We want to study the propagation of perturbations around the fixed points of Equations ( 19)- (20).We assume that the factor graph is locally tree-like, i.e., neighborhoods of nodes are approximately trees.We provide a visualization of this in Figure 2. Classically, it has been proven that for sparse graphs almost all nodes have local tree-like structures up to distances of order O(log N ) [34].We are not aware of similar statements for hypergraphs.While our empirical results prove that these assumptions are reasonable and approximately valid, we leave the formalization of such an argument for future work.
Referring to Figure 2(b), one can see that between every leaf and the root, there is a single connecting path.Thus, perturbations on the leaves propagate through a tree to the root, and transmit via the following transition matrix The tree assumption for factor graphs.Here, a path from a leaf (light blue) to a root (orange) consists of steps alternating variable nodes and function nodes.These two representations coincide in the case of graphs.(c) The perturbations propagate up the tree via the messages.In graphs (a), they reach the root passing from nodes i r+1 to i r (green).In hypergraph-induced factor graphs, perturbations spread from a node i r+1 , at depth r + 1, to its neighboring function nodes f r+1 (red), and up to node i r at depth r (blue) in an alternating fashion.
where i r , f r are respectively the r-th variable node and function node in the path.In words, this is the dependency of a message on the message one level below in the path.In Appendix F.2 we show that, to leading terms in N , the transition matrix evaluates to A related expression was previously obtained for the transition matrix on graphs is . Hence, we can compactly write T ab .This connection highlights an important difference between the two cases: hyperedges induce a higher-order prefactor with a "dispersion" effect.The larger the hyperedge, the lower is the magnitude of this transition.Instead, if the hyperedge is a pair, this prefactor reduces to one, and we recover the result on graphs.A perturbation ϵ k d t d of a leaf node k d influences the perturbation ϵ k0 t0 on the root t 0 by We can also express this connection in matrix form as where T is the matrix with entries T ab (in Equation ( 24) raised to the power of d), and ϵ k d the array of ϵ k d t d values.Now, similarly to Decelle et al. [6], we consider paths of length d → +∞.In such a case, the r-dependent prefactor in Equation ( 24) converges almost surely to where the expectation is taken with respect to randomly drawn hyperedges f ∈ E. If λ is the leading eigenvector of T , then Aggregating over the leaves, and since the perturbations have an expected value of zero, we obtain variance: where d 0 is the average node degree and F the average hyperedge size.The expression in Equation ( 28) yields the following stability criterion, the key result of our derivations: This generalizes the seminal result cλ 2 < 1 of Decelle et al. [6] to hypergraphs.When Equation ( 29) holds, the influence of the leaves to the root decays when propagating up the tree in Figure 2(b).Conversely, if Equation ( 29) is not satisfied, it grows exponentially.
To obtain more interpretable bounds, we focus on a benchmark scenario where the affinity matrix contains all equal on-and off-diagonal elements, i.e., c aa = c in for all a ∈ [K] and c ab = c out for all a ̸ = b.In this case, condition Equation ( 18) becomes c in + (K − 1)c out = Kc, the leading eigenvalue of T is λ = (c in − c out )/Kc, and the stability condition in Equation ( 29) reads When hypergraphs only contain dyadic interactions, Equation (30) reduces to the bound |c in −c out |> K √ c previously derived for graphs [6], also known as Kesten-Stigum bound [47,48].

Phase transition in hypergraphs
We test the bound obtained in Equation ( 30) by running MP on synthetic hypergraphs generated via the sampling algorithm of Section 3.4.In our experiments, we fix K = 4 and sample hypergraphs with N = 10 4 nodes.We also fix c = 10 and change the ratio c out /c in .In this setup, for graphs, one expects a continuous phase transition between two regimes where the system is undetectable and detectable [6].In the former, where the inequality yielded by the Kesten-Stigum bound does not hold, and the graph does not carry sufficient information about the community assignments, community detection is impossible.In the latter, communities can be efficiently recovered by MP.In Figure 3 we plot the overlap = Our results are in agreement with the theoretical predictions: the overlap is low in the undetectable region, high in the detectable region, and we observe a continuous phase transition at the Kesten-Stigum bound for graphs, i.e., when D = 2.
We expect the presence of higher-order interactions to improve detectability, as it yields greater overlap for any c out /c in and it shifts the theoretical transition to larger values.We empirically validate this prediction by evaluating Equation (30) for hyperedges up to size D = 50 and performing MP inference in Figure 3. Diverging convergence times for larger c out /c in , i.e., when the free energy landscape gets progressively rugged, further demonstrate this behavior, as shown in Appendix F.3.

The impact of higher-order interactions on detectability
As mentioned above, the transition matrix in Equation ( 22) reduces to the classic T ab [6] when only dyadic interactions are present.In fact, the additional prefactor 2/(|f r |(|f r |−1)) is equal to one for 2-dimensional hyperedges.However, when hyperedges of higher sizes are present, this prefactor is strictly smaller than one.This dampens the perturbations ϵ k0 when they propagate up the tree in Figure 2(b).It is unclear whether this higher-order effect aids or hurts detectability, as it could prevent signal from being propagated, but also noise from accumulating at the root.
With this in mind, we investigate the impact of higher-oder interactions on detectability by disentangling the effect that K, c and, most importantly, D have on the detectability bound set by Equation (29).To this end, we rewrite Equation (30) as Here, we utilized c in /Kc = ρ in ∈ [0, 1], a degree-independent rescaling of c in , where we normalize by its maximum possible value Kc, as per Equation (18).The term Φ(K, c, D) is the value of the theoretical bound at the r.h.s. of Equation ( 30), normalized by Kc as well.This way, we get the decomposition Φ(K, c, D) = α(K)β(c)γ(D) as a product of three independent terms: where In our experiments we choose of , which conveniently returns C = 2H D−1 (see Appendix A), with H D−1 being the (D − 1)-th harmonic number.However, our theory holds true for any κ d yielding sparse hypergraphs.
The classic effect of α(K) and β(c) is summarized in Figure 4(a), where the maximum hyperedges size is fixed to D = 2, hence γ(D) = 1.Here, we observe that the undetectability gap reduces when increasing c.Graphs with higher average degrees are more detectable even when there is a larger inter-community mixing.The effect of larger K is that of skewing the detectability phase transition.This is because edges contributing to c out are spread over K − 1 communities, while those accounted for c in concentrate in a single one.Intuitively, increasing K allows to have more in-out edges, and detectability is still possible because of the dominating c in term.The limit value ρ in = 1/K constitutes the perfect mixing case c in = c out = c, where detectability is unfeasible for any K and finite degree c.One should notice that, while the bounds drawn in Figure 4 hold theoretically, for large K it may be exponentially hard to retrieve communities even in the detectable region [6,49].
The higher-order effects on detectability are shown in Figure 4(b)-(c).The presence of hyperedges with D > 2 enters in Equation (34) as the product of two separate contributions, γ(D) = γ 1 (D)γ 2 (D), where These two terms have contrasting effects that multiply to obtain the overall trend of γ(D): γ 1 (D) is monotonically increasing while γ 2 (D) is monotonically decreasing.If we were to consider only the "dispersion" contribution γ 1 , we would enlarge the detectability gap by increasing Φ.However, the γ 2 term factors in the increasing number of interactions observed with larger hyperedges.The result is the overall higher-order contribution to detectability γ(D) = γ 1 (D)γ 2 (D), where the value of γ 2 dominates over γ 1 , giving rise to the non-trivial, monotonically decreasing, profile of Figure 4(b).
The overall effect of higher-order terms is illustrated by plotting the relative difference ∆Φ(K, c, D) = (Φ(K, c, D) − Φ(K, c, 2))/Φ(K, c, 2) for a range of c and D values, with K = 4, as shown in Figure 4(c).We observe how higher-order interactions lead to better detectability for all c, especially in sparse regimes, where c is small and pairwise information is not sufficient for the recovery of the communities.

Entropy and higher-order information
Hypergraphs are often compared against their clique decomposition, i.e., the graph obtained by projecting all hyperedges onto their pairwise connections, as a baseline network structure [50][51][52].
The clique decomposition yields highly dense graphs.For this reason, most theoretical results on sparse graphs are not directly applicable, algorithmic implementations become heavier-many times unfeasible-and storage in memory is suboptimal.Previous work also showed that algorithms developed for hypergraphs tend to work better in many practical scenarios [16].Intuitively, hypergraphs "are more informative" than graphs [53], as there exists only one clique decomposition induced by a given hypergraph, but possibly more hypergraphs corresponding to a given clique decomposition.Here we give a theoretical basis to this common intuition and find that, within our framework, we can quantify the extra information carried by higher-order interactions.
For a given hypergraph H = (V, E), edge (i, j) ∈ V 2 and hyperedge e ∈ E, we define the probability distribution This distribution represents the joint probability of drawing a hyperedge uniformly at random among the possible E in the hypergraph and a dyadic interaction {i, j} out of the possible |e| 2 within the hyperedge e.From Equation (37) we can derive the following marginal distributions: for all e ∈ E and pairs of nodes i ̸ = j.The distribution p E is a uniform random draw of hyperedges.The distribution p C represents the probability of drawing a weighted interaction {i, j} in the clique decomposition of H.
With Equations ( 37)-( 39) at hand, it is possible to rewrite γ 1 (D) in Equation ( 35) as where H(• | •) is the conditional entropy.This entropy is minimized when p C ({i, j}) is very different than p H ({i, j}|f ), i.e., when conditioning a pair {i, j} to be in f brings additional information with respect to the interaction {i, j} alone.This happens when {i, j} appears in several hyperedges and it is difficult to reconstruct the hypergraph from its clique decomposition.As lower values of γ 1 imply easier recovery, Equation (40) suggests that recovery is favored in hypergraphs where hyperedges overlap substantially and that cannot be easily distinguished from their clique decomposition.
We obtain a similar result by rewriting Equation (40) as which is the ratio of two exponentiated entropies.In information theory, PP is referred to as perplexity [54], and it is an effective measure of the number of possible outcomes in a probability distribution [55].Once we fix the number of hyperedges E (and therefore PP(p E )), the number of effective outcomes is given by the number of likely drawn {i, j} pairs.This number is minimized when there is high overlap between hyperedges, thus confirming the interpretation of Equation (40).
Finally, we set a different focus by rewriting γ 1 as where KL is the Kullback-Leibler divergence and ⊗ the product probability distribution.
Here we pose the question: given a fixed clique decomposition and number of hyperedges, what is the hypergraph attaining the highest detectability?From the equation, such hypergraph is that with the highest KL (p H || p C ⊗ p E ) = I({i, j}, f ).In this case, the KL-divergence between a joint distribution and its marginals, also called mutual information I [56] of the two random variables, describes the information shared between pairwise interactions and single hyperedges.Hypergraphs with high KLdivergence, i.e, high information about a given {i, j} in a single hyperedge f , will yield better detectability.In other words, it is preferable to choose hypergraphs that, while still producing the observed clique decomposition (thus achieving low entropy H(p H )), have largely overlapping hyperedges.The results discussed in this section provides a theoretically guidance for the construction of hypergraphs that explain an observed graph made of only pairwise interactions [57], a problem relevant in datasets where higher-oder interactions are not explicitly tracked.

Experiments on real data
Our model leads to a natural algorithmic implementation to learn communities in hypergraphs.In fact, alternating MP and EM rounds, our algorithm outputs marginal probabilities q i (t i ) for a node i to belong to a community t i , as well as the community ratios n and the affinity matrix p.We illustrate an application of this procedure on a dataset of interactions between high school students (High School) [58].Here, nodes are students and hyperedges represent whether a group of students was observed in close proximity, as recorded by wearable devices.The hypergraph contains N = 327 nodes and E = 7818 hyperedges.In Figure 5(a) we show the communities inferred on the dataset where only hyperedges up to size D = 2, 3, 4 are kept.We observe a clear progression in how the nodes are gradually allocated into different groups when higher-order interactions are progressively taken into account.This suggests that interactions beyond pairs carry information that would get lost if only edges were to be observed.
To get a qualitative interpretation, we compare the communities inferred with the nine classes attended by the students, an attribute available with the dataset.We illustrate the hypergraph of student interactions, coloring each node according to its class, in Figure 5(b).Previous studies have shown that in this dataset a number of interactions happen with stronger prevalence within students of the same class [58].In Figure 5(c), we compare the communities inferred with different maximum hyperedge size D with the classes, and observe that there is a stronger alignment between them when larger hyperedges are utilized for inference.In Figure 5(d) we show, at D = 2, 3, 4, the Normalized Mutual Information (NMI) between inferred communities and class attributes,the AUC with respect to the full dataset, and the fraction ρ D of hyperedges with size equal to D. In addition, our algorithm detects connection patterns that were previously observed between the different student classes as captured by the affinity matrix p, see Appendix G.2 for details.
A feature that sets MP apart from other inference methods is the possibility to approximately compute the evidence Z = p(A | p, n) of the whole dataset, or, equivalently, the free energy F = − log Z.In Appendix G we discuss how to make the free energy computations feasible by exploiting classical cavity arguments, as well as a dynamic program similar to that employed for MP.We present the results of these estimates on the High School dataset in Figure 5(e).Here we take the values of n and p inferred by cutting the dataset at maximum hyperedge sizes D = 2, 3, 4.Then, we compute the free energy on the full dataset (D = 5) in the simplex of n, p parameters outlined by the three vertices.We notice that interactions of size D = 5 seem to be less informative and lead to suboptimal inference, see Appendix G.3.Similarly to what observed on graphs [6], the energy landscape appears rugged and complex.EM converges to solutions that are local attraction points, i.e., valleys of low-energy configurations.Moreover, the free energy of the p, n parameters inferred with only pairwise interactions (i.e., D = 2, lower-right) is higher than that inferred for D = 3 (upper-left), which is in turn higher that the one of D = 4 (bottom-left).

Conclusion
We developed a probabilistic generative model and a message-passing-based inference procedure that lead to several results advancing community detection on hypergraphs.In particular we obtained closed-form bounds for the detectability of community configurations, extending the seminal results of Decelle et al. [6] to higher-order interactions.Experimental validation of such bounds shows the emergence of a detectability phase transition when spanning from disassortative to assortative community structures.With these theoretical bounds at hand, we investigate the relationship between hypergraphs and graphs from an information-theoretical perspective.Characterizing the entropy and perplexity of pairs of nodes in hyperedges, we find that hypergraphs with many overlapping hyperedges are easier to detect.Beside these theoretical advancements, we develop two relevant algorithmic ones.First, we derive an efficient and scalable Message-Massing algorithm to learn communities and model parameters.Second, we propose an exact and efficient sampling routine that generates synthetic data with desired community structure according to our probabilistic model in order of seconds.Both of these implementations are released open source [42].
The mathematical tools we propose here to obtain our results are valid for standard hypergraphs.We can foresee that they could be generalized to dynamic hypergraphs where interactions change in time, using intuitions derived for dynamic graphs [10].Similarly, it would be interesting to see how detectability bounds change when accounting for node attributes, as results in networks have shown that adding extra information can boost community detection [59][60][61].Finally, from an empirical perspective, it would be interesting to see how our theoretical insights in terms of entropy of hypergraphs and clique expansion match measures that relate hypergraphs to simplicial complexes [62].

Appendix A. Expected degree and choice of κ d
As we commented in Section 2, the choice of the normalizing constant κ d , for d = 2, . . ., D, controls the Bernoulli probabilities for all hyperedges e ∈ Ω via Our theoretical analysis and results hold for general choices of κ d , as long as these respect the following conditions.First, for any choice of a symmetric 0 ≤ p ab ≤ 1, we need valid probabilities 0 ≤ π e /κ |e| ≤ 1.This implies that, necessarily: Second, we want the ensemble to consist of sparse hypergraphs, in expectation.A good proxy for such a requirement is the average degree, which we can compute explicitly: where We assume c ab = O(1), i.e. to be in a sparse regime.Thus, the expected degree's scale is governed by C and, in turn, by the choice of κ d , as Additionally, but not necessarily, we wish our model to extend the classical SBM, which imposes the additional condition κ 2 = 1.There exist many choices of κ d obeying the constraints just discussed.A natural one is the minimum value satisfying Equation (A.1), i.e. κ d = d(d − 1)/2.This gives , which is too high to yield sparse hypergraphs.Notice that, in practice, we rarely use D = N .However, such considerations are useful to evaluate how different κ d values reflect on the properties of the hypergraph ensembles of the model.A more interesting choice is given by This corresponds to taking the average among the d(d − 1)/2 interactions that yield π e , and N −2 d−2 is a normalization: once observed an interaction between two nodes i, j, the remaining d − 2 are chosen at random.This gives which is proportional to the (D − 1)-th harmonic number, hence growing more mildly at leading order as C = O(log D).Aside from having an interpretation in terms of null modeling, the value in Equation (A.3), which we utilize experimentally, was shown to be a sensible choice in many real-life scenarios [17,41].

Appendix B. Message-Passing derivations
MP equations have been developed in the case of general factor graphs, see for example Murphy et al. [35], Section 22.2.3.2.We consider approximate messages from hyperedges e to nodes i being q e→i (t i ), and vice versa, q i→e (t i ).The messages, for any e ∈ F, i ∈ ∂e satisfy the general updates The marginal beliefs are given by

Message updates
First, we can distinguish the values of messages for function nodes e such that A e = 0 or A e = 1, i.e. if the hyperedge e is observed or not in the data.
If A e = 1, i.e. e ∈ E, then q e→i (t i ) ∝ tj :j∈∂e\i π e κ e j∈∂e\i q j→e (t j ) ∝ tj :j∈∂e\i π e j∈∂e\i q j→e (t j ) .

(B.3)
If A e = 0, then e ∈ Ω \ E. We start by computing We indicate with Ẑe→i (t i ) the convenient non-normalized rewriting of q e→i (t i ) in Equation (B.4).Therefore, we find where from the Equation (B.5) to Equation (B.6) we used Ẑe→i (t i ) introduced in Equation (B.4).We evaluate the expression in Equation (B.7) for the limit N → +∞, which gives the node-to-hyperedge messages for e ∈ Ω \ E as i.e., the nodes approximately (to leading order in O(1/N )) share their marginal belief to hyperedges that are not observed in the data.Using Equation (B.8), we can also approximate Equation (B.4) as In the assumed sparsity regime, the term of order O(1/N ) in Equation (B.9) is close to zero.Since for x ≈ 0 the approximation 1 − x ≈ e −x is sufficiently accurate, we write We can put the hyperedge-to-node updates together using the two results in Equation (B.3) and in Equation (B.10).Specifically, we derive the following expression for the message q i→e (t i ), where e ∈ E: In Equation (B.11), we used the approximation introduced in Equation (B.10).In Equation (B.12) we passed from summing over Ω \ E to Ω.This approximation is sensible as long as the expected degree of the nodes grows at most as N , which is satisfied in the assumed sparse regime, as discussed in Appendix A. Finally, in Equation (B.13) we introduced node-dependent external field h i (t i ) whose definition naturally follows from the argument of the exponential in Equation (B.12).

Appendix B.2. External field updates
We simplify the external field to remove the node dependency of h i (a).The nodedependent external field reads The sum in parentheses in Equation (B.14) can be simplified as Plugging Equation (B.15) into Equation (B.14) we get, ignoring constants, κ d and where in Equation (B.16) we included i in the node summation.Since Equation (B.16) does not depend on i, we define the nodeindependent external field

Appendix B.3. Marginal beliefs updates
Notice, that, in passing from Equation (B.11) to Equation (B.13) and then in Equation (B.17), we have shown that We use the same argument to treat the general expression of the marginal beliefs in Equation (B.2), yielding

Appendix B.4. Summary: approximate Message-Passing updates
Putting all derivations together, the final MP equations read Node-to-observed hyperedge: External field: Notice that the MP updates cannot be naively implemented as presented.In fact, the update in Equation (B.18) for q e→i (t i ) have cost O(K |e|−1 ), which does not scale with the hyperedge size.In Appendix D we present a dynamic programming approach to perform this computation exactly with cost O(K 2 |e|), and comment on further algorithmic details to implement the MP updates in practice.

Appendix C. Expectation-Maximization inference
Updates of the community priors n.We take the derivative of the log-likelihood in Equation (4).By imposing the constraint K a=1 n a = 1, we obtain the update in Equation (13).Updates of the affinity matrix p.We show here the updates in terms of c.These easily translate to those in terms of the affinity matrix p as the expression we derive below in Equation (C.6) is invariant with respect to the substitution c = N p.Let x e = i<j∈e c titj /N κ e .Then, ignoring additive constants, the log-likelihood reads where Equation (C.1) is the linearization of log(1 − x) ≈ x around x = 0, which is valid at leading order O(1/N ).We now take a variational approach to find a lower bound L of the log-likelihood: which is valid for any distribution ρ e ij such that i<j∈e ρ e ij = 1.In Equation (C.2), we utilized Jensen's inequality.The lower bound is exact when We compute the derivative of the variational lower bound and approximate to leading terms in N : where Notice that the approximations in Equation (C.4) and Equation (C.5) hold valid only when considering c ab in the expressions, as by assumption c = O(1).Now, by setting Equation (C.5) equal to zero, and substituting ρ e ij from Equation (C.3), we obtain the update where # e ab = i<j∈e δ tia δ tj b .

Algorithm 3 Inferring model parameters (EM)
Inputs: convergence threshold ϵ em maximum iterations iter em randomly initialize c, n  9)- (12), in practice we proceed in batches.In fact, we find that applying completely parallel updates, i.e. applying Equation ( 9) for all i, e pairs, successively Equation ( 10) for all i, e pairs, and then Equation ( 11) for all nodes i ∈ V , results in fast convergence to degenerate fixed-points where all nodes are assigned to the same community.For this reason, we apply dropout.Given a fraction α ∈ (0, 1], we select a random fraction α of all possible i, e pairs, and apply the update in Equation ( 9) only for the selected pairs.We perform a new random draw, and update according to Equation (10), and similarly for Equation (11).Finally, we update the external field in Equation (12).Empirically, we find that a value of α = 0.25 works for synthetic data, where inference is simpler.Values below work as well.For real data we find that substantially lowering α yields more stable inference.On real data, where we alternate MP and EM, and learning is less stable, we utilize α = 0.01.In practice, we also set a patience parameter, and only stop MP once a given number of iterations in a row falls below the threshold ϵ mp in Algorithm 2. For real datasets, we set the patience to 50 consecutive steps, and the maximum number of iterations iter mp = 2000.
Appendix E. Sampling from the generative model count.The cost of sampling the hyperedges in step (iv) in Section 3.4 can also be precisely quantified.Every d-dimensional hyperedge is sampled with a computational cost of d since it is exactly the extraction of d nodes from V , and there are ω d of such hyperedges.Calling Ω d the space of all d-dimensional hyperedges, we find Hence, the average computational cost is given by Given the large size of Ω d , the cost in Equation (E.1) tightly concentrates around the expected value.In sparse regimes, the term K d /d! dominates as the number of hyperedges ω d is low, while the two terms both contribute to the cost when E[ω d ] grows.
Precisely, we quantify the cost in Equation (E.1) in terms of asymptotic complexity.The first summand Similar to the reasoning presented in Equation (A.3), choosing the maximum possible cost, given by D = N (which is higher than most practical use cases), the sum Finally, we remark that since sampling from Equation ( 17) is computationally costly, we approximate the binomial with a Gaussian distribution [63], or with a Poisson if N # is large and π # /κ d is small [64].We use a Ramanujan approximation for large log-factorials appearing in the calculations [65].

Appendix E.2. Experiments
We employ the sampling algorithm to generate the hypergraphs used to study the phase transition of Section 4.2.Here, we set the affinity matrix to have all equal in-degree c aa = c in and out-degree c ab = c out , so that Equation ( 18) becomes c in + (K − 1)c out = Kc for some K and c.In our experiments, we sample hypergraphs with N = 10 4 nodes by fixing c = 10 and K = 4, we span across 65 values of c out in [0, 500], and compute the corresponding c in = c in (c out ; K, c).For each experimental configuration c in , c out , we draw 5 hypergraphs from different random seeds.This gives a total of 325 hypergraphs.
We use the expected number of d-dimensional hyperedges E[ω d ] in Equation (E.1) and the average degree d 0 in Equation (A.2) to perform a sanity check between our sampling algorithm and theoretical derivations.For constant in and out-degree, these two metrics evaluate to The results in Figure E1 show excellent agreement between theory and experiments.We also highlight that the sampling method is extremely fast and has an average sampling time of t = 32.7 ± 2.7(s) on the experimental setup considered here.

Appendix F. Phase transition: complementary derivations and additional results
Appendix F.1.Proof of Proposition 1 First, we want to prove that all communities have the same expected degree.In order to do that, we start by computing the expected degree d 0i of a given node i ∈ V .Following similar derivations to those for d 0 Appendix A, we find which is independent of the specific choice of group b, from which we conclude that all the groups yield equal expected degrees.
Second, we wish to demonstrate that MP's fixed points are as in Equations ( 19)- (20).Notice that in the derivations here below, when convenient, we interchange equivalent summations over function nodes' neighbors ∂e and hyperedge e.By treating all quantities that are independent of t i in q i→e (t i ), q e→i (t i ) as a constant, we evaluate Equation (7) Since messages q e→i (t i ) are normalized to have unitary sum, Equation (F.1) implies that q e→i (t i ) = 1/K.Substituting this result into Equation ( 8), one finds also that q i (t i ) = n ti .The variable-to-function node messages are updated with Equation ( 9), which includes Equation ( 12) for the external field h(t i ).The external field evaluated at fixed points is also constant, in fact The result of Equation (F.2) implies that the messages in Equation ( 9) read which is exactly Equation (20).

Appendix F.2. Transition matrix formula
In this section, we derive the expression for the transition matrix T ab r in Equation (22).To simplify the notation, we indicate the (variable node, function node) pairs at level r as (i r , f r ) = (i, e), and similarly, at level r + 1 we use (i r+1 , f r+1 ) = (j, f ).Hence, the transition matrix becomes In order to find a closed-form expression of T ab r , we claim that the two following Lemmas hold.Lemma 1.Under the constant group degree assumption in Equation ( 18): (i) for any hyperedge e and nodes i ∈ e: (ii) for any hyperedge e and nodes i, j ∈ e: Lemma 2 (Employing Lemma 1).Under the constant group degree assumption in Equation ( 18): (i) the derivative ∂ exp(−h(a))/∂q i→e (b) is negligible to leading order in N ; (ii) the external field is constant h(t i ) = const.; (iii) call Z i→e the normalizing constant of q i→e , then The claims allow us to derive the transition matrix.Particularly, we make explicit all derivatives and variable-to-function nodes messages as in Equation (9).By also ignoring all terms relative to h(t i ) thanks to Lemma 2, we get The terms involving Z i→e are in Lemma 2 (Equation (F.5) and Equation (F.6)), while the expressions in parentheses are in Lemma 1 (Equation (F. which is exactly the expression in Equation (22).What is left to complete all derivations is to prove Lemma 1 and Lemma 2, which is done next.
3. As just proved, we can ignore the external field in the expression of Z i→e , and find (F.9) Utilizing result Equation (F.3) in Lemma 1, Equation (F.9) simplifies to which results in Equation (F.5), as desired.Similarly, to compute the derivative ∂Z i→e /∂q j→f (b) we can ignore all appearing ∂ exp(−h(a))/∂q j→f (b) and h(t i ) thanks the Lemma's first two points (just proved).Hence  Assuming that MP has converged, all messages q j→e (t j ) are available.Notice, however, that naive computations of the f i and f e addends are unfeasible, due to the exploding sums over t j : j ∈ ∂e.In the following, we show how such computations can be performed efficiently.
(i) Calculations of f i .
As one can observe from Equation (B.1) and Equation (B.2), the f i terms are the log-normalizing constants of q i , therefore they can be computed similarly.In particular, ignoring constants, by Equation (B.13), the following simplification holds: The single terms indexed by e ∈ E, i.e., the values tj :j∈∂e\i π e j∈∂e\i q j→e (t j ), are equivalent to the unnormalized messages q e→j (t j ).For this reason, they can be computed with the same dynamic program presented in Appendix D.1.
(ii) Calculations of f e .While the f i terms in Equation (G.1) are computed singularly, we take a different approach and calculate the whole sum e∈Ω (|e|−1)f e without computing the single f e , as this would be impossible due to their exploding number.First, we separate the terms over Ω in Equation (G.1) as follows.where 0 ≤ λ i ≤ 1 and i=2,3,4 λ i = 1.For any value of p simplex , n simplex , we compute the free energy on the whole High School dataset, i.e., taking all hyperedges.The free energy approximations following Equation (G.1) require the messages, marginals and external field, which can be inferred via MP and in turn depend on p simplex , n simplex .For every point in the simplex, we fix p simplex , n simplex and infer all the remaining quantities via MP, to then compute the free energy displayed in Figure 5.We expand on the community patterns detected on the High School data for D = 4, which are represented in Figure 5.The nine classes observed in the data are named after their subjects of focus, and are: MP, MP*1, MP*2 (mathematics and physics), PC, PC* (physics and chemistry), PSI* (engineering), 2BIO1, 2BIO2, 2BIO3 (biology) [58].
We compare the the edge density patterns computed on the data in Mastandrea et al. [58], and shown in Figure G1( We observe that classes that are inferred in the same community appear to also belong to classes that have a larger number of external interactions with other classes in the same inferred community.For instance, the BIO classes belong to two communities that are disjoint from all others, see Figure G1(c).Within the BIO classes, 2BIO2 and

Figure 2 :
Figure 2: Local tree assumption.(a) The classical local tree assumption for graphs.Here, it is assumed that the neighborhoods of nodes are approximately trees.(b)The tree assumption for factor graphs.Here, a path from a leaf (light blue) to a root (orange) consists of steps alternating variable nodes and function nodes.These two representations coincide in the case of graphs.(c) The perturbations propagate up the tree via the messages.In graphs (a), they reach the root passing from nodes i r+1 to i r (green).In hypergraph-induced factor graphs, perturbations spread from a node i r+1 , at depth r + 1, to its neighboring function nodes f r+1 (red), and up to node i r at depth r (blue) in an alternating fashion.

Figure 3 :
Figure 3: Phase transition.The overlap between ground truth and inferred communities varies for different c out /c in ratios.The values attained are positive on the detectable region (left of the dotted theoretical bounds) and continuously drop to zero as the phase transition boundary approaches.Values for hyperedges up to size D = 50 (orange) always yield higher overlap compared to D = 2 (light blue).Shaded areas are standard deviations over 5 random initializations of MP.

Figure 4 :
Figure 4: Theoretical phase transition.Due to the decomposition of our bound in Equations (32)-(34) it is possible to separately describe the effects of K, c and D on the predicted phase transition.(a) Detectability bounds for networks (D = 2).Increasing c yields a broader range of detectable configurations (colored areas) for ρ in .The number of communities skews detectability: while for K = 2 communities can be detected in extremely disassortative regimes (ρ in close to zero), when more communities are present, only assortative networks are detectable.(b) Effect of the maximum hyperedge size D. The term γ(D) in Equation (34) can be split into the product γ 1 (D)γ 2 (D), as defined in Equations (35)-(36).The non-trivial decrease of γ(D) results from the interplay of γ 1 (D) and γ 2 (D), having opposite monotonicity.(c) The percentage decrease ∆Φ(K, c, D) = (Φ(K, c, D) − Φ(K, c, 2))/Φ(K, c, 2) in detectability for different c, D values shows that higher-order interactions steadily improve detection, especially in sparse regimes.

Figure 5 :
Figure 5: Experiments on the High School dataset.We infer the communities via MP and EM on the High School dataset.In all cases, we run inference with K = 10 communities.(a) Inferred communities on the High School dataset, only utilizing hyperedges up to a maximum size D. Taking into account higher-order information, up to D = 4, results in more granular partitions.(b) Graphical representation of the students' partition into classes.We draw only hyperedges of size D. (c) We compare the inferred partitions with the "attended class" covariate of the nodes, i.e., the classes students participate in.We comment further on this comparison in Appendix G.2.(d) A quantitative measurement complementing that of panel (b): the Normalized Mutual Information (NMI) between inferred communities and attended classes, the AUC on the full dataset, as well as the ratio ρ D of hyperedges of size equal to D. (e) Free energy landscape.We consider the parameters (p 2 , n 2 ), (p 3 , n 3 ) and (p 4 , n 4 ) inferred from the dataset with, respectively, D = 2, 3, 4. With these, we build the simplex of convex combinations p = i∈{2,3,4} λ i p i , where i∈{2,3,4} λ i = 1 and 0 ≤ λ i ≤ 1 (similarly for n).For every point in the simplex, we compute the free energy on the full dataset, i.e., with D = 5.More details on these computations are provided in Appendix G.1.

2 ) 2 N − 2 d− 2
converges to a constant for diverging D, and contributes to the complexity only as a constant relevant in sparse regimes.Defining a d = K d /d!, we can use the ratio test to assess convergence: Substituting the value of κ d = d(d−1) that we utilize in our experiments, it is also possible to quantify the second addend

D d=2 1 d− 1
grows like O(log N ), therefore D d=2 d E[ω d ] = O(N log N ), which yields an asymptotic bound of the total sampling complexity.

Figure E1 :
Figure E1: Sampling experiments.The expected number of |f |-dimensional hyperedges returned by our experiments (blue) is in great accordance with the theoretical prediction E[ω |f | ] (black).Similarly, the experimental expected degrees distribute around the analytical d 0 .Shaded areas are standard deviations over 5 random hypergraph extractions, at each |f |.
3) and Equation (F.4)).By performing all the substitutions we get ∂q i→e (a) ∂q j→f (b

Figure F1 :
Figure F1: Elapsed time for MP.For both D = 2 and D = 5, the elapsed times plateau due to the threshold imposed on MP's maximum number of iterations.Shaded areas are standard deviations over 5 random initializations of MP.Vertical dotted lines are theoretical detectability bounds derived from Equation (29).

q
j→e (t j ) exp(−h(t i ))

2 + λ 3 p 3 + λ 4 p 4 n
This allows us to compute the last two addends separately.Focusing on the second addend, and proceeding similarly as for the external field calculations that brought to Equation (B.19), we getlog e∈Ω\E κ e k<m∈e t k tm c t k tm q k (t k )q j (t j ) = C ′′′ N k<m∈V t k tm c t k tm q k (t k )q j (t j )vertices.Particularly, we define the parametersp simplex = λ 2 p simplex = λ 2 n 2 + λ 3 n 3 + λ 4 n 4 ,

Appendix G. 2 .
Figure G1: Affinity patterns on the High School dataset.Colors of the matrices' entries correspond to their log values, properly normalized to ease the Figure's readability.(a) Edge density on the clique decomposition of the High School dataset.As in Mastandrea et al. [58], the edge density between two classes X and Y corresponds to the number of observed edges between nodes of the classes, normalized with respect to the total number of possible edges between X and Y .(b) Affinity matrix p inferred by the EM-MP scheme with D = 4.The method detects 5 classes, whose affinity values are as in the matrix's entries.Colors of classes follow the color coding of Figure 5(c).(c) Inferred communities of nodes and the partition in the classes of students.The panel is identical to Figure 5(c).
a), with the affinity matrix p inferred on the High School dataset fixing D = 4, shown in FigureG1(b).Additionally, in FigureG1(c), we plot the partition of the nodes into communities with their labeling in classes.
as q e→i (t i ) ∝