Detangling the multilayer structure from an aggregated network

Multiplex interactions are common and essential in real-world systems. In many cases, we can only obtain aggregated networks without detailed information regarding the type of links contained within. Such single-layer networks oversimplify the structural information and lead to misunderstandings of some properties of real systems. In this context, network splitting which aims to correctly separate an aggregated network into multilayer networks, is a meaningful problem to address. To this end, we propose a simulated-annealing-like algorithm based on the link clustering coefficient. We verify the validity of this algorithm with several synthetic networks. Inter-similarities of layers are also taken into consideration, and we can find that the proposed method is valid even if there is a certain proportion of overlapping links between layers. Finally, we apply the algorithm to real international trading networks, which results in accurate splits of different layers.


Introduction
Networks can be used to characterize the structure of many complex systems and provide powerful representations of the interactions between agents. The relations among social individuals [1], biological interactions [2], traffic routes, and many other circumstances [3,4] can be represented as networks which can reveal the properties from the structural information. Such networks consist of only a single type of entity. However, the networks that neglect the types of relations oversimplify the complexity of the systems' structural properties. In most social and natural systems, there is a set of entities among individuals who can represent the multiple types of interactions [5][6][7]. Therefore, it is important to use multilayer networks to construct such complex systems with multiple types of connections or subsystems. In social networks, the relationships between people can be composed of many interactive relationships, such as families, friends, and co-workers [1,8]. A brain network that consists of structural and functional layers is potentially more informative than either single-layer network on its own [9,10]. The results of previous studies have demonstrated the differences in robustness between networks of networks and aggregated networks under attack [11,12]. Ignoring the multiplex of networks results in information loss and leads to inaccurate interpretations.
Currently, the study of the complexity of multilayer networks is an essential research direction, and intense efforts have been made to investigate multiplex systems [13]. Multilayer networks can disclose more attributes of complex systems than single-layer networks [14,15]. Research has demonstrated the various results obtained when analyzing a system from a multilayered perspective [16]. The results showed that neglecting multiple layer can result in incorrect identification [17]. There are many research works that has focused on multilayer networks. For example, frameworks for calculating node centrality have been proposed to find the central roles when considering different types of relations in different layers [18]. Various types of dynamic processes in multilayer networks have also been discussed, and one such study provided a useful method for improving the understanding of the complexity in a system. In addition, the existence of multiplex layers leads to a drastic impact on the dynamics of all overall systems [19][20][21][22]. Some Figure 1. Illustration of network separation. Network C is composed of 10 nodes and the links among them are of two types. The problem we want to solve is to separate C into a two-layer network without knowing the information about the real relations in advance. The best result is to distinguish the interactions perfectly with one type of link in each network. of the interactions in of multilayer network representations are considered to be redundant and uninformative [23]. In this situation, some layers can be aggregated [24] while maximizing the distinguishability of the remaining layers based on quantum theory and retaining as much information as possible [17]. Another framework for reduction involves aggregates the system layers by using eigenvector-based centrality [25]. For link prediction in multilayer networks, the helpful information contained in the link from one layer to another gives us a principled way to define interdependence between layers and compress redundant information [26].
The studies referenced above assumed that the interactions in each layer were known, but we may not have complete information of the relations in many real systems. Therefore, there is still an important problem that has not been studied well. Sometimes, we can only obtain an aggregated network without knowing the types of relationships within. Such networks that neglect the existence of different interactions between individuals may lose much information. Taking the social networks as an example, the relationship between people in the social network may include friend relationship, family relationship, etc. These relationships are naturally modeled by a multi-layer network where each type of relationship consists of a network layer. Therefore, it is essential to find a method to distinguish between different types of relations. In this paper, we propose a framework for splitting an aggregated network into a multilayer network.
In this paper, we aim to address the question of how to develop a universal framework for splitting aggregated networks with no extra supporting structural information. To solve this problem, we propose a simulated-annealing-like algorithm based on the link clustering coefficient to split one aggregated network into a two-layer network with different types of relations in each layer separately. The method is validated with both artificial networks and real networks (e.g. international trading networks). The results show that our method can perform well for networks with high clustering coefficients. We also consider the overlap ratio of the two layers and demonstrate the validity of our method for networks that do not highly overlap. Experimental results with respect to two applications on real systems corroborate the validity of the proposed approach.

Method
In this paper, our aim as shown in figure 1 is to split an observed network C into two networks; A and B represent the separated layers with the same type of relationship in each network. To solve this problem, we define an objective function (see equation (1)) which is the minimum link clustering coefficient between the networks A and B. Note that the link clustering coefficient of link ij is defined as N ij /(k i * k j ) where N ij is the number of common neighbors between nodes i and j, and k i and k j are the degrees of node i and j, respectively. The basic idea is that if edges are split correctly, the link clustering coefficient in each layer must be higher than that in the case of wrong split. The link clustering coefficient can be regarded as a measure of the similarity between nodes. If the similarity between two nodes is higher, there will be a larger probability of these two nodes being connected in this layer. We adopt this definition method because it can judge whether the edge should exist in this layer by measuring the similarity of the nodes. Such an objective function can avoid the situation where one network is very dense while the other is almost empty. The goal is to maximize the objective function.
Then we use a simulated-annealing-like algorithm to optimize the objective function. Initially, C is randomly split into two networks A and B with an equal number of links. During each step, we randomly select one network (A or B) and choose one link; this link is then placed in the other network. For example,  if A is selected, we choose a link from A randomly and move it to network B. This modification is accepted under two conditions: (1) the objective function is improved; (2) a randomly generated number (between [0, 1]) is smaller than q which is a value that decreases as steps increases, where q = 1 at first and q = q * 0.999 after each step. At the beginning, q is large, so the changes in links are quite random. As the the number of iterations q decreases, the modifications depend more on condition (1) than on (2). This technique can help the optimization process jump out of local minima. In order to avoid a situation where the degree of many nodes in one layer of network is zero, we introduce a penalty term ( (1 − n 0 )/N, where n 0 is the number of nodes whose degree is zero) to punish the objective function in such situations. Eventually, we hope to obtain networks A and B that contain the same type of interaction. The proposed method splits the aggregated network based on the clustering of links and the assumption of high levels of clustering in real networks.

Simulation in synthetic networks
To begin our analysis, we consider regular networks (nearest-neighbor networks) in which each node connects to the k nodes with the closest IDs to its own. To obtain an aggregated network with two types of  relations, we generate two networks (with N = 100 and k = 4), and then combine them into one network as the observed network C. Note that, when we combine A and B to obtain the mixed network C, the ID of nodes in both A and B are randomly reordered. This process can be regarded as adding the links in network B to network A. As the ID of nodes in these two networks are randomly reordered, a short range link in B may connect two very distant nodes in A when adding links of B to A. To simplify the problem, we remove all overlapping links. The results are shown in figures 2(a) and (b). We calculate the number of error links to demonstrate the validity of this algorithm. We can see that the error decreases as the number of steps increases and finally almost reaches zero. We then consider a denser network (with N = 100 and k = 8) and a more complicated case where A and B have different densities (with k A = 4 and k B = 8). The results are still robust under such situations.
Next we try to apply this method to small-world (SW) networks with different probabilities p [27]. Network layer A is an SW network with N = 100, k = 4, and a tunable p (rewiring probability). Network layer B is also an SW network with N = 100, k = 4, and a tunable p (rewiring probability). Then we obtain the observed mixed network C = A + B. Note that, when we combine A and B to obtain C, the IDs of the nodes in both A and B are randomly reordered. The results are shown in figure 3. The error decreases as the number of steps increases when p is small. If p > 0.1 (see figure 3(c)), the accuracy starts to decrease, because the clustering property is destroyed when p is too large (see more details in section S1 (https://stacks.iop.org/NJP/23/073046/mmedia) in the supplementary materials). We also try to use the true split as the initial configuration during the first step and the error does not have fluctuate greatly with the number of steps (see figures 3(d) and (e)). This is because if the objective function is effective, the true split of the aggregated network should correspond to the optimal objective function. This also reveals that it is suitable to use the link clustering coefficient as the objective function.
To further support our findings, we apply our method to different SW networks with different values of p in figure 4. The error increases with p and p = 0.1 is a critical value. The method performs well when p < 0.1. A high clustering coefficient is the basic characteristic of SW networks, so we believe that it is an important factor that influences the validity of the proposed method. The results verify this relationship (see figures 4(c) and (d)). Three of the real cases are also consistent with the conclusion in section S2 of supplementary materials. (We disorder the nodes to obtain a new network B and consider the original network as A.) According to figure 4, the trend of the two initial configurations (random split and true split) are similar. We want to know if we can use the true split to prove the validity of the simulatedannealing-like algorithm. Then we contrast the results of the two initial configurations in figure 4 and there is no obvious difference.
We extend this method to a three-layer network, and calculate the number of wrong edges and the values of objective function. The three-layer network consists of three regular networks. The aggregated network is obtained by combining these three networks by randomly reordering the ID of the nodes in each layer. We find that the three layers can be well identified by our method. The results are shown in figure 5.

Layer inter-similarity
In reality, one node may interact with others in layers A and B simultaneously. In this section, we try to perform the simulation under different layer inter-similarities between the two layers. To solve this problem, we construct networks with different layer inter-similarities. Considering a network with N nodes, we choose a certain ratio of nodes to maintain the interactions and structure among these nodes and define the ratio Q as the index for measuring the layer inter-similarity. Then, we disorder the nodes that are not chosen, so the relations among them are quite different than those in the original network. We cannot deal with overlapped links in our proposed method, but there are more overlapped links as Q increases. Therefore, we place them into the two networks randomly. Compared with the method that removes all the overlapped links, the random method has less influence on the densities of the networks if the ratio Q is high. After the above process is completed, we can obtain two networks A and B and aggregate them into one network C. The application results of the algorithm for such networks with different values of Q are shown below. Figure 6(a) is an example of regular networks (N = 100 and k = 4) with Q = 0.1, and we can see that the error decreases to a low value as the number of steps increases. To explore this phenomenon further, we try to determine the influence of Q in figure 6(b). Larger values of Q result in higher numbers of errors. We also study the validity of the algorithm in SW networks in figures 6(c) and (d) (random split and true split as the initial configurations, respectively). The results show that our algorithm can separate the network even with a relatively high layer inter-similarity. We also consider other methods to deal the overlapping links in section S3 in the supplementary materials.

Application in real networks
To verify the algorithm further, we apply it to real-world systems in figure 7 (international trading networks) [28]. In the results of the international trading networks, we can separate almost 70% of the lines, and this is much better than the results of random split. The two layers are the export relations of different kinds of products, including 'vegetables & fruits' and 'cultural & artware products' such as books and art. We calculate the revealed comparative advantage (RCA) index values [29] of all countries and products according to the export volumes and retain the export information when the RCA is above 1. The layer of one product is established according to the remaining export relationship. We try to distinguish the relationships of different products. In figure 7(a), we show the performance of the algorithm when we split the aggregated network (international trading network) into two networks, with each color representing one type of export product. Network  Table 1. We apply the algorithm to different networks for other products such as 'fossil fuels', 'fabrics materials'. The splitting results in terms of the proportions of error lines and final objective functions are shown in the table. We also compare these with the objective function of the true split. optimum according to the results, the proportion of error lines can be reduced to nearly 30% relative to the random split. This means that we can separate almost 70% of the relations between each pair of products by the proposed algorithm. We also apply our method in a social network consisting of two layers. The layers are respectively friend relationship in Facebook and work relationship [30] obtained from questionnaires. The two layers of networks are merged into an aggregated network. One can also see that the proportion of error links can be reduced to nearly 36% (see figure S9 in SM).

Discussion
The separation of an aggregated network is a very important problem from both theoretical and practical points of view. Currently, larger and more detailed data-sets than ever before are being generated at an increasingly fast rate, and they are considered to be highly essential in many aspects of society. The increasing supply of information provides many materials for analysis, and we can therefore obtain a deep understanding of the structures of different complex systems and mechanisms. There are studies that discuss the reducibility of multilayer networks to reduce the amount of unnecessary information. The problem we are concerned with in this paper is that sometimes we can obtain only one aggregated network but not the detailed relationships among the agents contained with. For example, in some social application networks, the information we obtain is only the relationships of 'following' and 'followed', but the real interaction (friends, family or co-workers) among them are not shown. We believe that it is useful to separate networks for targeted recommendation and spreading control. Hence, the idea of providing a method to split such aggregated networks is proposed.
In this paper, we focus on such problems and propose a simulated-annealing-like algorithm. The basic idea is to maximize the link clustering coefficient of the two separated networks by changing links. We apply the proposed method for several synthetic networks, such as regular and SW networks. The results demonstrate the feasibility of the algorithm. We find that the validity is influenced by the clustering coefficient between the networks and that the algorithm can only perform well in networks with high coefficients. The layer inter-similarity Q is also considered. We can use the method to solve problems as long as Q is not too high. The results in the real-world systems also show the credibility of the proposed algorithm.
We consider multilayer networks with two layers in this paper. The proposed method cannot perform well in all networks. Networks with higher clustering coefficients lead to better performances. Moreover, there are various structural properties that affect the performance of our method. Since our method relies on only the structural information to split a network into different layers, it has several limitations, especially when facing some specific structures. Firstly, the method cannot work in a very sparse network, as such network cannot provide enough structural information (link clustering) for the method to decide which layer a link should belong to. Secondly, the method requires a relatively low overlap of the links in different layers, as we shown in figure 6. Finally, the degree heterogeneity may decrease the accuracy of the method. In this case, some links of the hubs may be placed in the wrong layers. Although our approach is not exempt from limitations, it highlights the importance of detangling the multilayer structure from the observed aggregated networks. It opens a new research direction for analyzing and understanding real networks. We hope to try to improve its applicability in the future and apply this method to increasingly complicated systems.