Entropy-based economic complexity of China trade flows

The growing interest in the study of economic complexity has been motivated by the intrinsic features of statistical physics. Entropy is one of essential concepts in statistical physics and a particularly active concept for characterizing the complexity of a system. Motived by this, we propose a new method based on a function of the entropy to quantify economic complexity of China trade flows. We focus on regional economy activities, and collect China trade flows according to the database of national export and import items. Compared with classic economic complexity measurements including diversity and ubiquity, eigenvector-based complexity index, fitness and complexity index, Hirschman-Herfindahl index, the proposed method based on entropy can generate a proper ranking list when we take USA as a benchmark. Furthermore, the comparison of the main economic complexity statics also exposes that several of which performance moderate correlations. The stability of the ranking positions from the static evolution of the economic complexity over 8 years discloses that the index of economic complexity performs extremely sensitive depending on the eigenvector of the product network space. The results reveal that we should pay attention to the relationships between economic complexity methods when analysing the structural characteristics of economic entity activities.


Introduction
Over the past decades, physicists have employed their apparatus of physics in analyzing complications and modeling the interactions of mutual actors in economics and sociology [1]. Just as links are provided to connect interacting pairs of agents, knowledge of complex networks has been implemented with great success in all sorts of complex and diverse systems [2]. By means of complex networks, it is possible to illustrate the flows of international trade in a natural way [3]-5]. The ability of capturing the sophisticated structure of interactions among numerous economic agents is a critical benefit of this approach, as this advantage can contribute to solutions for variable problems in observation, modeling, and forecasting [6]-9]. Since the structural attributes of world trade data are closely linked to economic progress of each nation, it is analyzed in recent researches from interdisciplinary physics [10], 11]. Economic growth is accompanied by the upgrading of export products. The activities of economic entities are conditioned by the network space of products founded by the national database of exports. As a result, the connection between econometrics and network

Materials and Methods
The description of the datasets and methodologies employed in this thesis as provided in this section.

Methodology
The bipartite network of Country-Product Calculating Revealed Comparative Advantage (RCA) [21] is an empirical measure to assess whether a country can be called an essential exporter/importer of a product. A country's product share in the China market is greater or less than that product's share in the overall China market, indicated by the measurement of RCA. Mathematically, , , , , where 똘ᨄ corresponds to country c's export dollars for product ᨄ. It is generally considered to take 1 as the cut-off point of RCA, that is, when 꿸‫ڣ‬ a country has revealed advantage in a product. At this point the country's product share in the China market is equal or greater than that product's share in the entire China market. Thus, a binary country-product matrix can be constructed from the RCA matrix, if we assume that accordingly, applying knowledge from network science, we can treat countries and products as two types of nodes, and then if ᨄ , there is a connection established by country and product ᨄ. In addition, there is no link between any two countries and two products.
Diversity and ubiquity A rough estimation of the various capabilities existing in a country or necessary for a product corresponds to the connotation of diversity and ubiquity, respectively. Diversity is related to the number of products a country exports with RCA. Ubiquity is related to the number of countries that export a product with RCA. Thus, we can measure diversity (KCI) and ubiquity (KPI) simply by calculating the row sum and column sum of matrix ᨄ . Formally, Eigenvector-based complexity index [8] To obtain a more precise measure of the amount of capacity present in a country, or the amount of capacity necessary for a product, information carried by diversity and ubiquity respectively needs to be corrected by employing other information [8]. For country c, this asks us to calculate the average ubiquity of the products exported by it, the average diversification of countries with an export similar to it and so forth. For product p, this asks us to calculate the average diversification of the countries exporting it and the average ubiquity of the other products exported by countries that export it. At first, the matrix is used, country and country are linked by matrix via the common exports of both. Equivalently, product ᨄ and ᨄ are linked by matrix ᨄᨄ . The ECI index is defined as, where, is the average sign and 꿸‫ڣ‬ is the standard deviation. is the eigenvector of the matrix 꿸 of the second largest eigenvalue.
Fitness and complexity index [12] This index is calculated in a non-linear iterative manner, calculating the fitness of a country 꿸‫ڣ‬ and the complexity of a product 꿸‫,ڣ‬ and resulting in fixedpoint values. The Fitness 꿸‫ڣ‬ is positively related to the sum of exports weighted by complexity, whereas the complexity of the product 꿸‫ڣ‬ negatively correlated with the number of countries exporting the product. This approach involves two steps each time to accomplish one iteration. First the intermediate variables need to be calculated, The original terms are 꿸‫ڣ‬ 꿸꿸‫ڣ‬ for any country , 꿸‫ڣ‬ 꿸꿸‫ڣ‬ for any product ᨄ. Hirschman-Herfindahl Index The Hirschman-Herfindahl Index is frequently used to measure the concentration which is defined as, , HHI ranges between ᨄ and 1, where ᨄ is the number of products. The lower the HHI, the more balanced and less concentrated the sectors are.

Entropy-based economic complexity
Economic growth is accompanied by the upgrading of export products. The activities of economic entities are conditioned by the network space of products founded by the national database of exports. Structural properties of the trade network can explain differences in economic development across countries. As we know, the complex states of systems are frequently captured by entropy in interdisciplinary physics and statistical physics. A large entropy value indicates a relatively complex system with a high number of different kinds of situations. Smaller entropy values mean that the systems are comparatively uncomplicated and there are fewer different conditions. Motivated by this, we quantify economic complexity as a function of the entropy based on the concept of entropy in statistical physics, mathematically the proposed economic complexity index can be defined as, where is that degree.

Structure of the country-product bipartite network
We firstly use the trade datasets to construct the country-product bipartite network as analysis object to analyse the attributes of country-product bipartite network. Figure 1 a/b illustrates the adjacency matrix ᨄ of export/import networks in 2015, respectively. The rows and columns of these matrices are arranged by descending order to them. The apparent triangular pattern is characteristic of the adjacency matrix visualized for the trade network. Countries that are capable of exporting more complex products also export products of lower complexity, and their exports show a higher level of diversification, while countries that are merely capable of exporting less complex products show a lower level of diversification, is what this triangular pattern informs us. A recent study in Physics Reports investigated the bipartite network between products and countries, associating them in a nested pattern [22]-24]. Bipartite networks involving two potentially interdependent groups of species have been characterized by nestedness analysis [25],26]. The network composed of generalist nodes and sets of specialist nodes is called a nested organization. The specialists interact with only a small subset of nodes, while the generalists interact with (almost) all the others in the network. Nestedness is a statistical attribute of bipartite interaction data expressed in matrix form. Nestedness can be quite sensitive to the size (amount of rows and columns) and fill (amount of non-zero entries) of the input matrix. Thus, the nested structure would be of great significance if it also existed in economics-related bipartite networks. Here, we employ a uncomplicated and widespread metric--the Nested Overlap and Decreasing Fill (NODF) [27],28]. NODF extends from 꿸 to , where 1 denotes a fully nested structure. The detailed calculation of NODF is introduced that after arranging the adjacency matrix by respectively descending the degree and ᨄ , the nestedness of a pair of nodes is defined as Here is the amount of co-connected nodes shared by node and node . Consequently, averaged over the total pairs of nestedness is the nestedness of the matrix. We calculate the nestedness of the China economic output and input matrices in figure 1c. From the year 2008 to 2015, the average nestedness of export is 0.46 by using NODF, and the average nestedness of import is 0.32. From which, the nestedness of export is larger than that of import. The nestedness of import is more stable than that of export as the time evolves.

Economic complexity analysis of China trade flows for countries
We show a comparison of the six country ranking lists obtained from the export value list, diversity and ubiquity, Eigenvector-based complexity index, fitness and complexity index, Hirschman-Herfindahl index and the proposed new method based on entropy in statistical physics in 2015. As given in figure 2, we take the USA as a plausible example in the export trading flow. Although the complicated trade relationship between the China and USA takes on an infinite variety of cooperation and competition, the USA has been one of closest trade partners to China. The USA should be in top ranking position as far as possible. The USA locates in the top5 position corresponding to the ranking lists from the export value list, Hirschman-Herfindahl index, fitness and complexity index and our proposed method, while according to the ranking lists from the diversity and ubiquity, and Eigenvector-based complexity index, the USA position is ranged against top position. Then, we analyze the USA ranking position in the import trading flow. The USA stands top position in all rankings from six methods, ranking first in Hirschman-Herfindahl index, second in the import value list, fourth in diversity and ubiquity, and the proposed method, fifth in fitness and complexity index, and thirteenth in Eigenvector-based complexity index.  We further analyse the changes of the relationship between the two national indicators over time. There are six indexes, and we can get 15 pairs. As shown in figure 4, we can see that the situations of the export and import are somewhat different, indicating that these relationships are still affected by data. In exports, the pairs KCI-ECI, FCI-ECI fluctuate over time, while others are relatively stable. In imports, they are relatively stable. Among the correlations of these indicator pairs, in the export data, the strong correlations are HHCI-DCI, EXP-DCI, EXP-HHCI, KCI-FCI, the weak relationships are EXP-FCI, FCI-HHCI, FCI-DCI, FCI-ECI, and almost irrelevant are such as EXP-KCI, ECI-DCI, EXP-ECI, ECI-HHCI, KCI-DCI, while the pairs KCI-ECI, KCI-HHCI are even weak negatively correlated. In import situation, the strong correlations are HHCI-DCI, IMP-ECI, IMP-HHCI, IMP-DCI, KCI-FCI, the weak relationships are FCI-DCI, IMP-KCI, IMP-FCI, KCI-HHCI, FCI-HHCI, ECI-HHCI. There is almost no relationship such as KCI-ECI, KCI-DCI. The weak negative correlations are ECI-DCI, FCI-ECI. Therefore, among the economic complexity measurements, the relationship between indicators is affected by data and reveals some sensitive.

Economic complexity analysis of China trade flows for products
In table 1, we list the top 10 products by the proposed new method based on entropy in statistical physics. It can be seen that the indicators are still quite different, and the rankings of the same products in the export and import data are also inconsistent. Although it is not certain which method is more accurate, KPI, FPI and EPI indicators are quite different.  1  1  92  91  13  1  1  1  19  57  58  2  2  2  58  44  59  2  2  2  96  29  81  6  3  3  18  14  15  3  3  4  30  25  11  3  4  5  14  12  17  4  4  6  53  6  3  4  5  4  14  15  28  5  5  24  1  95  1  22  6  11  16  19  78  7  6  10  4  75  5  5  7  6  66  62  40  12  7  3  62  1  2  1  8  13  31  27  53  9  8  7  46  69  90  10  9  10  24  24  37  8  9  12  3  96  96  18  10  8  1  2  60  6  10  8  79  4  55  8 From the perspective of product complexity, we also analyze correlations and distributions of six estimators of product complexity in 2015. As revealed in figure 5, in the analysis of export trade, four pairs show strong regression relationships (EXP-DPI, EXP-HHPI, HHPI-DPI, KPI-FPI). From the situation of the import trade analysis, the strong regression relationships are presented in more ranking pairs (IMP-HHPI, IMP-DPI, HHPI-DPI, FPI-KPI). Furthermore, we analyze the changes of the relationship between the two product indicators over time. There are six indexes, and we can get 15 pairs. As shown in figure 6, we can see that exports and imports are somewhat different, indicating that these relationships are still affected by data. In exports, the pairs of KPI-EPI, FPI-EPI fluctuate over time, while others are relatively stable. In import, they are relatively stable. Among the correlations of these indicator pairs, in the export data, the strong correlations are HHPI-DPI, EXP-DPI, EXP-HHPI, and almost no relationship such as KPI-HHPI, KPI-EPI, EXP-EPI, FPI-DPI, EXP-KPI, EXP-FPI, FPI-HHPIKPI-DPI. The weak negative relationships are FPI-EPI, EPI-DPI, EPI-HHPI. Strong negative relationship is KPI-FPI. In imports, the strong correlations are IMP-HHPI and IMP-DPI, and the weak relationships are IMP-EPI, EPI-HHPI, KPI-EPI, FPI-EPI, HPI-DPI. There is almost no relationship such as KPI-HHPI, KPI-DPI, EPI-DPI, IMP-KPI. The weaker negative relationships are FPI-HHPI, FPI-DPI, IMP-FPI. Strong negative relationship is KPI-FPI. Therefore, we can also see that in the economic complexity measurement, the relationship between product indicators is different from country indicators, and the comparison of the main economic complexity statics exposes that several of which performance.

Relative stability of economic complexity index in 8 years
Finally, we analyse the stability of ranking position from the six aforementioned methods evolving 8 years. We give the definition of relative stability according to calculate the average and deviation Pearson correlations between two random lists which are obtained from each method used two different years data sets. As shown in figure 7, the average Pearson correlations are very close to 1, and the deviation is around 0 in the trade value list, Hirschman-Herfindahl index and the proposed entropy method. The average Pearson correlation is around 0.8, and the deviation is also small corresponding to diversity and ubiquity, and fitness and complexity index. While, for the Eigenvectorbased complexity index from export situation, the average Pearson correlation is around 0.5, and the deviation is large. But in import trade situation, the average Pearson correlation is close to 1, and the deviation is small. So, we suggest that the Eigenvector-based complexity index is sensitive when we calculate the economic complexity.

Conclusions and discussions
Socio-economic systems can be complicated, exhibit highly nonlinear interactions and abundant emergent properties, and are extremely challenging to predict. Our understanding of the structure and dynamics of socio-economic systems has remarkably deepened due to the rapid development of interdisciplinary physics, along with the incorporation of concepts and methods deriving from statistical physics into sociology, which are often used to facilitate the advancement of efficient forecasting methods on these systems. In this paper, we focused on regional economy activities and compared with classic economic complexity measurements. We used China trade data sets from the general administration of customs of the People's Republic of China. The long-term trade datasets can be updated until 2015. We studied economic activities from the perspective of economic complexity, based on the concept of entropy in statistical physics, and then designed a new entropy-based economic complexity measurement method. Through all this, firstly, we analysed the China economic export and import matrices from 2008 to 2015, and found the export matrices were more evident nested than the import matrices. Further, the nestedness of two trade matrices kept steady varying over time, and that of the import matrices kept more stable. Then, we applied the trade value list, diversity and ubiquity, Eigenvector-based complexity index, fitness and complexity index, Hirschman-Herfindahl index and the proposed entropy method to measure the economic complexity of China trade flows from 2008 to 2015. Compared with current economic complexity measurement, the proposed method can generate a proper ranking list when we took USA as a benchmark. Finally, we analysed the correlations and the distributions of six estimators of country complexity from 2008 to 2015. The comparison of the six main economic complexity statics discloses that several of which performance moderate correlations. More specifically, there were strong correlations between two pairs like diversity and ubiquity, fitness and complexity index, Hirschman-Herfindahl index and the proposed new method based on entropy in statistical physics. While the correlation between Eigenvector-based complexity index and other methods fluctuated wildly with time. The diversity and ubiquity, fitness and complexity index, Hirschman-Herfindahl index and the proposed entropy method performed consistently when used to measure the country fitness and the product complexity, but were extremely sensitive when we used the Eigenvector-based complexity index to calculate the country fitness and product complexity. Although, we extracted the recent authoritative data to study the economic complexity of China, it is still necessary to embed other data of economic activity or more long-term data involving more countries and products. In addition, we just limited to research target to China in this paper, and will expand the research target to all other countries in long time span. This paper contributes to the development of interdisciplinary physics by using entropy from statistical physics to construct a new measure of economic complexity. Meanwhile, there are also some interesting but important problems. For example, we know there exists the triangular matrix, but we do not know the structure of the triangular matrix in economics. We can ask how a country changes its location in row with time, how a product changes its location in column over time, and how can we model this change and what we can predict? These open questions ask for future study.