A faster horse on a safer trail: generalized inference for the efficient reconstruction of weighted networks

Due to the interconnectedness of financial entities, estimating certain key properties of a complex financial system, including the implied level of systemic risk, requires detailed information about the structure of the underlying network of dependencies. However, since data about financial linkages are typically subject to confidentiality, network reconstruction techniques become necessary to infer both the presence of connections and their intensity. Recently, several ‘horse races’ have been conducted to compare the performance of the available financial network reconstruction methods. These comparisons were based on arbitrarily chosen metrics of similarity between the real network and its reconstructed versions. Here we establish a generalized maximum-likelihood approach to rigorously define and compare weighted reconstruction methods. Our generalization uses the maximization of a certain conditional entropy to solve the problem represented by the fact that the density-dependent constraints required to reliably reconstruct the network are typically unobserved and, therefore, cannot enter directly, as sufficient statistics, in the likelihood function. The resulting approach admits as input any reconstruction method for the purely binary topology and, conditionally on the latter, exploits the available partial information to infer link weights. We find that the most reliable method is obtained by ‘dressing’ the best-performing binary method with an exponential distribution of link weights having a properly density-corrected and link-specific mean value and propose two safe (i.e. unbiased in the sense of maximum conditional entropy) variants of it. While the one named CReMA is perfectly general (as a particular case, it can place optimal weights on a network if the bare topology is known), the one named CReMB is recommended both in case of full uncertainty about the network topology and if the existence of some links is certain. In these cases, the CReMB is faster and reproduces empirical networks with highest generalized likelihood among the considered competing models.

The knowledge of the structure of a financial network gives valuable information for the estimation of systemic risk and other important properties. However, since financial data are typically subject to confidentiality, network reconstruction techniques become necessary to infer both the presence of connections (i.e. the topology) and their intensity (i.e. the link weights) from partial information. Recently, various horse races have been conducted to compare the performance of these methods. These comparisons were based on arbitrarily chosen metrics of network similarity. Here we establish a generalised likelihood approach to the rigorous definition, estimation and comparison of methods of reconstruction of weighted networks. The crucial ingredient is the possibility of separately constraining the topology and the link weights to follow certain "tight" empirical relationships that connect them to the observed marginals. We find that a statistically rigorous estimation of the probability of each link weight should incorporate the whole distribution of reconstructed network topologies, and we provide a general procedure to do this. Our results indicate that the best method is obtained by "dressing" the best-performing method available to date with a certain exponential distribution of link weights. The method is fast, unbiased and accurate and reproduces the empirical networks with highest generalised likelihood.
Network reconstruction is an active field of research within the broader field of complex networks [20]. Addressing the network reconstruction problem means facing the double challenge represented by the estimation of topology and link weights. The task at hand consists in determining both binary and weighted ensemble distributions, and to understand the interplay between them. Among the methods proposed so far, some assume that the binary and weighted constraints jointly determine the final configuration in terms of both topology and weights while others attribute weights to the binary configuration using a completely separate methodology [17,21]. Amidst the former ones, a special mention is deserved by the Enhanced Configuration Model [15] . This is defined by simultaneously constraining the degrees and the strengths of nodes which jointly affect the estimation of the two sets of quantities, the linkage probabilities and the weight estimates. Since these are jointly determined on the basis of the same information (i.e. constraints), this implies the impossibility to include purely topological additional information. Examples of algorithms belonging to the second group are those iteratively adjusting the link weights (e.g. via the RAS recipe [19]) on top of some previously-determined topological structure, in such a way to satisfy the constraints concerning strengths a posteriori. This approach has encountered critiques in [16]. It is important to notice that this kind of procedure assigns weights deterministically, and therefore the likelihood of observing any real matrix is exactly zero, assuming continuous weights. There exist also two-steps algorithms [6] that attempt to overcome the lack of binary information for the topological estimation. However, they have an elevated complexity (the second step requires the solution of the ECM) and the two step pro-cedure is only heuristically motivated. Here we develop a theoretical framework that provides an analytical, unbiased procedure to estimate the weighted structure of a network, once its topology has been determined, thus extending the Exponential Random Graph framework to deal with cases that appeared tractable only via heuristic approaches. The prior information on the topological structure (either already available or obtained using a method of choice) and a number of weighted constraints are taken as input. Subsequently, we derive the unbiased link weight probability. The latter is determined by the maximization of the key quantity of our approach, i.e. the conditional entropy of the weighted distribution given the binary one. As it will be proven, once the weights are treated as continuous random variables, their distribution, conditional on the existence of a link, is exponential. This consideration also allows to determine confidence intervals for the weights estimates. Another desirable property of the model, in its second specification, is its computational simplicity, as it does not require the numerical solution of several coupled non-linear equations. We indicate the binary adjacency matrix of the network as A, and we assume that it is a realization from a random variable A. Analogously, the weighted adjacency matrix associated with the graph, W, comes from a random variable W. The probability mass function of the event {A = A} is denoted with P (A), while Q(W|A) indicates the conditional multivariate probability density function of W belonging to a neighbourhood of W, given A = A. We denote the entries of the weighted adjacency matrix, W, as w ij and their binary counterparts as a ij = Θ(w ij ), where Θ(·) represents the Heaviside step function.
Conditional Reconstruction Method (CReM) Input The procedure takes as input P (A), the distribution over the space of the binary configurations. This can be available as prior (probabilistic) information or computed using any available method. In the appendix we discuss a few options for the derivation of P (A), but it is important to notice that the following results are valid for any choice of P (A). Moreover, the CReM requires a set of weighted constraints C(W) to be imposed when deriving the weighted distribution. The unconditional ensemble average of such quantities will correspond to the desired target value. This can be chosen to be an observed quantity or any other value of choice.
Output The goal of this second step is to derive the distribution of the weighted configurations conditional on the binary one. To achieve this, we maximize over Q(W|A) the conditional entropy [3] (1) We assume that the weights are continuous. We stress that the distribution Q(W|A) determined by the second step is relative only to the W compatible with A, that is such that Θ(W) = A. We will use the compact notation Our goal is to maximize, with respect to Q(W|A), the expression (1) under a set of constraints: α . The first equation imposes the normalization of the conditional probabilities: since also P (A) is normalized, notice that condition 1. implies W Q(W)dW = 1. In the second equation C α (W) represent the desired set of constraints taken as input. The problem Lagrangian can be written as (2) Differentiating this expression with respect to Q(W|A) and equating to zero, we obtain where H(W) = α λ α C α (W) is the Hamiltonian and the parameter µ guarantees that condition 1. is satisfied. The final form for the conditional distribution of the weighted configuration is obtained choosing which constraints we wish to impose.
Functional form of likelihood function In alignment with previous results on maximum entropy methods [1,9,14], we wish to formulate our problem in terms of maximum likelihood estimation. That is, we want to be able to re-state the problem is such a form that allows to determine the values of the Lagrange multipliers λ of 2) by solving a maximum likelihood problem. See Appendix for the standard case.
Let us consider the integral term of the conditional entropy defined in (1). Using (3) we can write (4) Let us now define λ * as the value of the parameters for which the constraints are realized, that is When we evaluate the conditional entropy for such parameter choice, since for both method specifications H is linear in w ij , we obtain where W * indicates the unconditional ensemble average of W when the desired constraints are satisfied. The last equation conveys meaningful information: we started from the expression of the conditional entropy which averages both with respect to the binary configuration and to the weighted one. Imposing certain constraints to be satisfied on average, we determine the value of the vector of parameters λ * . When (1) is evaluated in λ * , it does no longer contain the averaging with respect to weighted configurations, but instead a single term, ln Q( W * |A), then is then averaged over the space of binary configurations. Therefore, the functional form of the likelihood function that preserves the dual relation between entropy and likelihood that we have in the standard case is given by the generalized likelihood This correspondence between entropy and likelihood has another important consequence: the entropy, when evaluated in λ * , itself becomes a measure of goodness of fit of the model. Let us observe that the estimation of parameter is a more general problem, arising for instance in the case of parameters estimation for the generative model of a known system, as treated in [4].
We are now going to derive the full specification of the model with two different constraint choices: the strength sequences (CReM A ) and the conditional weights (CReM B ). CReM A For each node i the in and out degree and in and out strength can be expressed as k out Imposing the preservation of the strength sequences results in an Hamiltonian function of the form: The partition function can be expressed as Finally, using expression (3), we can write that shows that the weight distribution, conditional on the existence of a link, q ij (w|a ij = 1) follows an exponential distribution of parameter β out i + β in j . In order to determine the values of the vector of parameters β out and β in , we maximize the expression of the conditional likelihood, that, substituting the results just derived into (7), reads as follows The expression f ij = A P (A)a ij represents the value of a ij averaged on the ensemble of binary configurations. This general formulation implies no assumption on the structure of link interdependencies given by P (A). For instance, in the case of the micro-canonical ensemble, where the degree are constrained sharply and not on average on the ensemble, P (A) is constant on all graphs with the same degree sequence. When P (A) factorizes f ij corresponds to the quantity denoted with p ij in the literature [20], that is the linkage probability the is independent from link to link. Differentiating (12) with respect to β out i and β in i , ∀i, yields the system of 2N coupled equations where f ij is taken as given and therefore excluded from the estimation procedure. CReM B In Figure 1, we can see, in blue circles, the comparison between the weight estimates delivered from the CReM A model versus the real weights in blue. The red crosses represent the same comparison but with respect to the weight estimates taken from the degreecorrected Gravity Model [13] where W tot = i s out i = i s in i is the total weight of the matrix. We can see that the latter estimates shows a better agreement with the data. Therefore, we wish to find a different model specification that allows us to introduce such a functional form for the weight estimates. To do so, in the conditional entropy maximization we formally constrain the values of the weights. The Hamiltonian reads H(W) = i =j β ij w ij . The derivation is then analogous to the previous case. Given (3), the conditional weight distribution will then be As a consequence, also in this case the conditional weight distribution follows an exponential distribution of parameter β ij . The generalized likelihood function, given (7), can be expressed as Differentiating with respect to β ij leads to the equation Clearly, we cannot really observe the link weights (or there would be no need for reconstruction), but we formally imposed such constraints in order to be able to use the weight estimates from (14), which shows good agreement with the data. Such a choice uses as input only the strength sequences of the graph. As a consequence the sufficient statistics for CReM A and CReM B are the same: the strength sequences. If we substitute the values w * ij in (17) with the expression from (14), to determine the matrix of coefficients β we only have to compute Directed Enhanced Configuration Model In order to compare the results of the reconstruction models and to underline the different assumption regarding the network formation process, we will outline the characteristics of a model belonging to the class of algorithms jointly determining topological and weighted structure, i.e. the Directed Enhanced Configuration Model [2]. For consistency, we will consider the continuous version of this model, whose probability distribution reads are the Lagrange multipliers associated to the constraints and x i = e −αi . This implies where q ij (w > 0) is the probability that there is a link of any positive weight w between nodes i and j, while probability that a link of any weight exists. Observe that we retained the literature notation for the linkage probability p ij , referring to the special case in which P (A) factorizes and as a consequence the linkage probabilities are independent from each other. Finally, the likelihood function can be expressed as All the details and the derivations of the DECM model can be found in the Appendix.
Let us now come to a procedure that will enable us to compare different reconstruction models.
Likelihood-based comparison The first possible approach is to compare the models through their (generalized) likelihood G. Given any observed strength sequences (or strength and degree for the DECM) we can solve the constrained entropy maximization. In particular, we solve the likelihood equations characterizing the two models, i.e. for model A we determine the vectors β out and β in by solving the system of equations (13) and for model B, we compute the quantities (18). For both models A and B, we used as prior binary distribution the linkage probabilities from the dcGM model (See (25) in the Appendix for details), since methods using such probability form have won several independent horse races [10][11][12].
Once we have obtained those parameters, we can compute the likelihood of observing the original network by reconstructing it through the different models. In Figure  2 we can see the comparison of the values of the (generalized) likelihood relative to models CReM A , CReM B and DECM, on several snapshots of two different real world datasets, the World Trade Web and the E-mid [8]. Since G is the (weighted average of) the logarithm of a probability, takes negative values. The closer its value is to zero, the better the model explains the data. Therefore, when using the likelihood to compare two models, we shall prefer the model with the highest values of G. As we can see from the figure, the CReM B performs better in the WTW dataset, while the performances are extremely similar on the E-mid one.
Generalized AIC test A very common test for model selection is the Akaike Information Criterion (AIC) [5]. For a given set of data, for a generic model m, the standard AIC recipe can be generalized as where k is the number of estimated parameters of the model and G the value of the model generalized likelihood computed on the observed data. The number of parameters estimated by each model is k DECM = 4N , k CReM A = 2N + 1, and k CReM B = 2N + 1.
The additional parameter comes from constraining the link density, as with the dcGM recipe (see Appendix for details). For both CReM A and CReM B the sufficient statistics are the vectors of in and out strengths and the total density of links. Given this observation, we note that the AIC test will yield the same model ranking as the pure likelihood comparison, since the best performance was awarded to CReM B , followed by CReM A and finally by the DECM. Since also in terms of complexity the ranking is coincident, the AIC comparison ranks again the CReM B first, followed by CReM A and finally by the DECM. Given the good performance of model B and its low complexity, we recommend this choice and we will soon release a code to make its application fast and straightforward. Discussion The method introduced in this paper aims at filling a methodological gap. Several proposed methods for network reconstruction, combine methodologically different steps to carry out the estimation of topology and of weights, thus introducing biases in the whole procedure. A first source of bias is encountered when a probabilistic recipe for topological reconstruction is forced to produce a single outcome instead of considering the entire ensemble of admissible configurations. This choice implies a null likelihood of reproducing the actual network. A second source of bias is encountered when the weights structure is deterministically imposed via a recipe like the RAS one. Again, this recipe ensures a zero likelihood of reproducing the real underlying network: the probability of correctly "guessing" all the link weights is null. Here we reconcile the two aspects, by providing a recipe that clarifies how weights should be determined, once an algorithm for determining the topology of a given network is implemented (be it either probabilistic or deterministic). Notice that the key concept of our approach, i.e. conditional entropy, generalizes traditional approaches which, instead, aim at jointly determine a given network structure: this is immediately seen by rewriting the probability distribution coming from Shannon entropy maximization as in (3). However, although it is clear how the topology of a network should represent a constrain for the weighted configuration, it is not necessarily true that the weighted configuration has as well an impact of the topology. The CReM takes a different approach. It assumes only the first dependency, but allows the network topology to be unaware of the weighted configuration. On a more practical level, the solution of both model A and B require a lower complexity with respect to the DECM. The latter one, in fact, requires the solution of a system of 4N four-wise dependent equation, while model A only involves a system of 2N paired equations. Model B on the other hand, despite being formally similar in derivation, does not require to solve any system of equations, since the involved parameters are computed via the recipe (18). In addition to this, model B shows better or comparable performance with respect the alterna-tive methods, both in terms of likelihood and confidence interval-based comparisons.
The Conditional Reconstruction Method works for any choice of procedure to determine P (A). The overall goodness of fit will depend also on the correctness of the binary model chosen. Here we propose two possible methods, from the literature, to be selected depending on the available information. One possibility is to derive P (A) by maximizing the Shannon entropy constraining the in and out degree sequences This step corresponds to solving the DBCM model [20].
The resulting probability takes the form Assuming not to know the degree distribution and using the empirically-observed correlation between degreeinduced Lagrange multipliers and strengths [13], we can This corresponds to the use of the linkage probabilities defining the density-corrected Gravity Model (dcGM) [13]. We have just shown some possible choices for the functional form of p ij . Other possibilities are the ones discussed in [20].

Relation between Entropy and Likelihood
Let us consider the case of the maximization of the Shannon entropy under a set of constraints C, where P is the distribution we wish to determine over the space of graph G ∈ G. It can be shown that P ha s a functional form given by the Exponential Random Graph model P (G) = e −H(G) Z , where H(G) = α λ α C α , is a linear combination of the constraints and Z = G e −H(G) is the partition function. We define as λ * the value of the parameters such that the constraints are satisfied, that is such that (27) Now, we can rewrite (26) as When we evaluate this in λ ≡ λ * , we obtain Now, using (27), we have Therefore, (29) becomes S( λ * ) = − log P (G| λ * ).
Since the expression L( λ * ) = − log P (G| λ * ) is the likelihood associated with that parameter choice that assures that the chosen constraints are realized on average on the ensemble, the last equation tells us that when evaluated in λ * , the entropy itself can be used to assess the goodness of fit of a model.

The DECM model for continuous weights
The Directed Enhanced Configuration Model, with continuous weights, is obtained maximizing the Shannon entropy relative to the probability distribution over the space of the weighted graph configuration with given number of nodes. The constraints derive from keeping constant, on average, the in and out degree and strength sequences: The probability distribution, as in the general Exponential Random Graph framework [18], takes the functional form where Z = W e −H(W) is the partition function and which implies The likelihood function to maximize to estimate the model parameter reads

Confidence Intervals-based comparison
In order to introduce the second comparison method, we need to make some observations about the conditional weight distribution Q(W|A). For all the methods, this distribution factorizes in the product of the individual terms q ij (w|a ij ). For CReM A , this term reads given the compatibility requirement we have q ij (w|a ij = 0) = δ w,o . Moreover we observe that which means that the weight distribution, conditional on the existence of a link, follows an exponential distribution of parameter β out i + β in j . The same observation holds for CReM B , where the exponential parameter is now given by β ij . We observe then that both models show the same unctional form of the weights conditional distribution as the DECM (see (20)). However the numerical values of the parameters characterizing the distribution are different.
[? ] Let us denote the parameter of the exponential distribution as λ ij so to keep the reasoning general. Given the knowledge on the weight distribution we can build confidence intervals around the expected value of the weight. Figure 3 gives an illustration of the procedure. The conditional expected value of the weight estimate is indicated asw. The blue curve represents the probability density function of the exponential distribution of parameter λ ij . Our goal is to determine the extreme of the confidence interval, w − and w + , in such a way that the area under the pdf comprised between the expected weight and w − is equal to the desired confidence level q − , and analogously for the upper part of the interval. This is achieved by solving leading to and w + w= wij |aij =1 λ ij e −λij wij dw ij = q + leading to In this way, upon deciding on the desired confidence levels q − and q + , we are able to define confidence interval for the conditional expected weights. Notice that such a confidence interval is not symmetric, given the peculiar form of the underlying probability distribution (i.e. exponential). Although both the ECM and the DECM can provide an error estimation, its computation is much easier within the novel, continuous framework considered here. Finally, given the confidence intervals for each weight estimate, we can compute the proportion of real weights that fall into the confidence interval relative to their estimate. We can do this for all methods, using the same q + and q − and the different λ ij . The results are shown in Figure 4. As for the likelihood-based comparison, model B shows a comparable or better performance than method A, thus confirming our previous considerations. Notice that both the ECM [15] and the DECM provide an error estimation as well; its computation, however, is much easier within the novel, continuous framework considered here.