Equivalence between the Fitness-Complexity and the Sinkhorn-Knopp algorithms

We uncover the connection between the Fitness-Complexity algorithm, developed in the economic complexity field, and the Sinkhorn–Knopp algorithm, widely used in diverse domains ranging from computer science and mathematics to economics. Despite minor formal differences between the two methods, both converge to the same fixed-point solution up to normalization. The discovered connection allows us to derive a rigorous interpretation of the Fitness and the Complexity metrics as the potentials of a suitable energy function. Under this interpretation, high-energy products are unfeasible for low-fitness countries, which explains why the algorithm is effective at displaying nested patterns in bipartite networks. We also show that the proposed interpretation reveals the scale invariance of the Fitness-Complexity algorithm, which has practical implications for the algorithm’s implementation in different datasets. Further, analysis of empirical trade data under the new perspective reveals three categories of countries that might benefit from different development strategies.


Introduction
The Fitness and Complexity algorithm (FC) was first introduced in [1], motivated by empirical regularities of international trades.Specifically, the most developed countries tend to produce and export most products, whereas developing nations export only a nested subset of such products [1].Therefore, the simple intuition behind the algorithm is that diversification is the signature of intangible capabilities necessary for the production of goods [2].Under this interpretation, each country cannot produce the commodities for which it does not own all the needed capabilities, and more complex products need higher levels of capabilities [2].The set of capabilities owned by countries could be measured, in principle, by collecting extensive data about their industrial systems and economies.Yet in practice, it is challenging to gather, harmonize and combine all the necessary information, and use it to operationalize diverse capabilities.
Export data have allowed researchers to bypass this issue.One can indeed infer the presence of the capabilities from a country's export basket, and capture the competitiveness of the country's productive system ("fitness") within a single variable [2].Indeed, under the assumption that countries produce and export most of the products they can within their capabilities, and that the products require different levels of capabilities, a natural hierarchy is detectable from the simple information of who produces and exports what.Building on this idea, FC measures the Fitness of countries and the Complexity of products by only taking as input the binary countryproduct matrix M cp , which accounts for the ability of country c to export the product p.
Fitness and Complexity has been used to accurately predict the future development of countries [3,4], and it plays a central role in the Economic Complexity framework to study and forecast macroeconomic systems [5].Compared to the Methods of Reflections [6] and the equivalent Economic Complexity Index ‡, the fitness-complexity better quantifies countries' and products' structural importance in the country-product bipartite network [7]; it substantially improves the GDP growth predictability [8]; it identifies correctly high-growth countries such as China and India, which are far from the top-ranked countries by ECI §.Because of its effectiveness, the FC algorithm itself has seen modifications [7,9,10] and applications in other domains [11,12,13,14].Compared to its variants such as the minimal extremal metric [9] and the generalized fitnesscomplexity algorithm [7], the original fitness-complexity algorithm is substantially more robust with respect to noisy input data, which makes it better suited for the analysis of world trade data [7,9].
Despite the central role of the fitness-complexity algorithm in the Economic Complexity field, several properties of the FC algorithm and its outputs are still unexplained.In particular, it remains unclear (1) why the FC is highly effective at rearranging M cp in such a way that a clear frontier separates a populated and an empty region of the matrix, see Results; (2) whether it is possible to derive the algorithm from optimization arguments; (3) why for forecasting applications, it is more suitable to logarithmically transform the Fitness scores [2,5]; (4) how to rescale the algorithm to compare Fitness and Complexity scores across different datasets, e.g., international trade data across multiple years.
Here we deepen our understanding of the Fitness and Complexity algorithm by connecting it with the long-known Sinkhorn-Knopp (SK) algorithm [15].Originally introduced in the context of doubly stochastic matrices and matrix scaling [15], the SK algorithm has been applied to pattern matching in computer vision [16]  combinatorial matrix analysis [17].Equivalent algorithms and variants have been applied in many fields under different names, such as Iterative Fitting Procedure and RAS method among others -see [18] for an extensive review.We show that while FC and SK algorithms were introduced to solve different problems, their mathematical structure is the same.The main objective of this paper is to show the equivalence between the two algorithms, already hinted at by [19], discuss how it sheds light on the above-mentioned unexplained properties of the FC algorithm, and discuss additional implications for the Economic Complexity field.

Fitness and Complexity algorithm
FC algorithm was developed by [1] and it is based on the definition of the Fitness of countries ( F c ) and of the Complexity of product ( Q p ) through the formulas: with a normalization step at the end of each iteration: The matrix M cp is the binary representation of the bipartite network linking exporter country c to exported product p.This algorithm is iterative and converges to a fixed point of the coupled equations 1 under some rather general conditions [20,9]; researchers have introduced robust criteria to establish the convergence of the iterative equations [20,10].
The standard interpretation of Eq-1 is straightforward.The Fitness of a country is defined as its diversification (d c = p M cp ) weighted by the average Complexity of its exported products, such that a country exporting highly complex products obtains a large Fitness.In parallel, the Complexity of each product gets larger contributions from the countries with lower Fitness values thanks to the harmonic sum structure of eq.1b.

Sinkhorn-Knopp algorithm
The Sinkhorn-Knopp (SK) algorithm [15] (also called Bregman projection [21]) is used to find the solution of the following problem: given a matrix A ∈ R n×m + with nonnegative entries, and two vectors r ∈ R n + and c ∈ R m + with non-negative numbers, find the two vectors u and v such that In other words, we want to transform the matrix A such that its sums over rows and columns correspond to the given vectors (r, c).In computer science, this is referred to as the matrix scaling problem [18].Of course, the constraint vectors must be balanced and sum to the same value, otherwise, a solution cannot be found.The SK algorithm identifies u and v iteratively by computing The convergence of the algorithm is extensively discussed in the literature, see for example [22,16] and references therein.In general, matrix A has to be non-negative with a total support ∥ [23,24].Notably, in the original work [15], the second equation in 4 was written considering u (n−1) instead of its n-th iterate, but a later work [25] found that the present formulation converges faster.

Logarithmic barrier function
The matrix scaling problems can be tackled by means of the logarithmic barrier function [26] g(x, y) where x and y must be positive element-wise vectors and the logarithm assures the feasibility of the domain.Any stationary point of g solves j A ij x i y j = c i and i A ij x i y j = r j and solves the matrix scaling problem setting u = x and v = y, and is locally marginally stable, as shown in Appendix A. Conversely, any matrix scaling solution gives a stationary point of g.This description highlights the nature of the logarithmic potential function of the vectors u, v.By comparing SK and FC we can say that F and Q are equivalent to u, v of some logarithmic barrier function.This will be crucial in our new interpretation of the Fitness and Complexity indexes.It explains why F and Q are exponentially distributed and it will help us to interpret them as 'potential' of some form of energy, see Results and discussion.
∥ For a square matrix to have total support means that for every positive element A ij > 0 there exists a permutation π such that A iπ(j) > 0 for all j = 1, ..., N .

Correlation between algorithms
The similarity between eq. 1 and eq. 4 becomes evident once one writes a symmetric version of Fitness and Complexity by transforming the Fitness via X c = 1/F c .In fact, using (X, Q) the two equations have exactly the same structure, although a few differences persist that could potentially lead to discrepancies.
First, F and Q in eq. 1 at stage n are both updated using the values at step n − 1.Indeed, with this prescription, FC generates two different series of values, one for even and one for odd n, both converging to the same fixed point.On the other hand, SK updates the two values sequentially, leading to a single chain of convergent values.Second, FC does not consider the vectors (r, c), implicitly setting their entries to unity.However, r i = 1 and c j = 1 are not feasible in SK because their sums must be equal in order to balance the constraints and this is not possible unless the matrix A is square.Third, the normalization of eq. 2 is considered by FC in order to guarantee finite values and more importantly to re-balance F and Q, while SK does not need it.
Given these differences, one may wonder if the two algorithms lead to the same result, upon setting r and c to a constant value (such that their sum are equal).We test this hypothesis by applying SK and FC to the export data, a set of data widely considered in economic complexity [6,1].Whilst not being numerically exactly the same, the ranking of F and u are perfectly matching and their Pearson correlation coefficient is basically 1 within the numerical precision of a float (10 −8 ).Therefore, the two approaches give the same solutions, regardless of the actual algorithm implemented.Balancing the equations on one side and implementing the normalization on the other might be both effective ways to find the fixed point.

The potential interpretation of Fitness and Complexity
The reinterpretation of the FC in terms of potentials of the logarithmic barrier function sheds light on some properties of F and Q that were previously unclear.
Fitness' logarithmic scale.Function g highlights that the natural scale of the fitness score is the logarithmic scale, as often considered in many diagrams (for example, the Income-Fitness diagram in [1]), because the problem corresponds to a constrained optimization problem with logarithmic constraint barrier.In fact, for relevant applications (see for example [3,4]), a logarithmic transformation is required in order to have well-separated values of Fitness and Complexity, as for example in the visualization of countries' trajectories, see below.
Potentials.Further, a well-known property of FC is its ability to reorder rows and columns of M cp in order to obtain a upper-left triangular matrix which highlights its nestedness [27], as shown in figure 1.This can be obtained by simply reordering rows and columns of the matrix by the ranking of F and Q, and this feature can be explained in terms of potentials functions (also called Kantorovich potentials).The Fitness of a country measures its industrial system's capability while the Complexity is a measure of how hard is to produce each commodity.Thus, the region of the matrix M cp with high Q and low F should not be populated, by construction, because that would mean low Fitness countries producing highly complex products.Indeed, the two logarithmic potentials in eq. 5 define a barrier that cannot be overcome: here is a value of ln x + ln y whose constant-value line defines the separation between the feasible and forbidden regions of the matrix.Translated in terms of Fitness and Complexity, we can find a value of Q/F that defines a barrier line in the reordered shapes that meets the same requirements with striking precision, see fig. 1 where we plot the isometric line with the lowest value among all the lines that define a completely empty region under the curve.Operationally, the red line in the figure is found by trying different values of Q/F and checking if any non-zero entries of the matrix fall below the curve.
This explains why in many real systems, by rearranging the matrix's rows and columns by FC, the parts of the rearranged matrix exhibit a continuous border line.
Taking further our interpretation, the sum p M cp Q p /F c corresponds to the constant vector r c = 1, up to a scale (see below), as assumed in FC.Let us now interpret Q p /F c as the amount of energy associated with the production of p by country c.Under this interpretation, constraint r c represents the total energy possessed by country c, and FC assumes (albeit unintentionally) fair availability of energy resources among nations.The fairness assumption is enforced by the condition that r i is homogeneous across countries, which makes the FC algorithm independent of country-level properties other than their export basket composition.Every country allocates resources into its productive system by distributing them among the production of different products; this information is encoded in M cp .On the other hand, the ability of different countries to produce a product is not fairly distributed: at different levels of complexity Q, the energy cost differs.This means that each given resource unit can be exploited with different efficiency by countries with different Fitness F , and the energy cost changes accordingly to the ratio of the two metrics.
It is valuable to define matrix element E cp = Log(Q p /F c ), which caputres the "energy cost" of element M cp .It should be clear to the reader that this is potential energy, hence it is relative to an offset.A complexity Q = 1 does not mean that the corresponding product can be exported effortlessly.To give a numerical example, a product of complexity Q = 1.4 would cost E = 2.64 for a low Fitness country with F = 0.1 and just E = −1.46 for a high fitness country with F = 6.Therefore, the forbidden region of M cp , displayed in fig. 1, is simply represented by too-expensive products in terms of resource allocation.The heterogeneity of resources owned by countries can be naturally introduced in this framework in further studies, see conclusions.
Scale invariance.Another feature that FC inherits from the logarithmic barrier function is that g presents a symmetry and, as a consequence, the stationary solutions of equation 5 are not unique.The transformation (x, y) → (x/α, yα) with α ∈ R + leaves invariant the value of g, which allows a rescaling of the fixed points.At the level of the numerical implementations of the algorithm, the rescaling helps to increase the numerical precision and consequently the stability of the codes [21].Instead, on the conceptual ground, the rescaling symmetry states that the fixed point solutions are not uniquely defined, rather a scale must be chosen to break the symmetry.The scale becomes particularly relevant when we are interested to compare the Fitness scores derived from different matrices, such as when comparing the Fitness scores of the same countries at two different years.
There are different ways to set a scale.For example, it is possible to select a single country or a product as a reference, as proposed by [28].Another possibility to define a scale, which we introduce here, is to add a dummy country able to produce all the products, thus adding a full row of ones to the M cp matrix.The value at the fixed point of the dummy country's Fitness sets the scale by requiring that at any time its value is fixed to a constant (e.g. 1).Setting the scale provides a formal method to compare different matrices, and it affects only the Fitness scores and trajectories and not the ranking.
Using the International Trade export (see Data), we verify that the trajectories in the Income -Fitness diagram can be substantially affected by the choice of the Fitness' scale.Figure 2 shows in the right panel that when we fix the Fitness score of the dummy country, the Fitness of Western countries (blue lines) does not evolve appreciably over more than 4 decades, whereas many Asian countries (purple lines), and China on top (red line), are catching up the wealth and developed nations.Instead when the scale is fixed by the normalization condition as in the original paper [1], we observe that the Fitness of wealthy countries decreases in time, as shown in the left panel.The same effect can be observed in the Income variable, since removing the global income average, make stationary most of the wealthy countries.The main takeaway from these results is that while ether scale-invariance of potential g allows the researcher to set the scale, this choice should be made cautiously, as it affects the resulting fitness temporal trajectories and their interpretation.
Possible development pathways.The existence of a potential barrier makes it also possible to identify Fitness as the maximum sophistication achievable by products exported from each country, which is represented by the isometric line of the M cp matrix.However, most of the developed countries do not export products lying on the isometric line (see Fig. 1), and exhibit a lower density of exported products near the isometric line, which indicates a productivity gap with respect to the maximum product-Complexity level achievable from their Fitness.This pattern suggests that different nations might benefit from pursuing different strategies to increase their capabilities and ultimately their wealth.We conjecture below that different countries might benefit from three different classes of new product development strategies: • Learner pathway, which might be suited for low-Fitness countries that are at their maximum level of capabilities and already reached the isometric line.A clear example is Somalia in Fig. 3.These countries might benefit from enhancing their industrial system by increasing the relevant capabilities before entering new markets.• Exploiter pathway, which might be suited for medium-Fitness countries, such as Namibia and Ukraine, whose spectra of exported products [29] lie far from the isometric line or who exhibit low-density of exports near the line.These countries may have unexpressed capabilities, and could benefit from investing in the production of goods already allowed by their capabilities.
• Explorer pathway, which might be suited for high-fitness countries that present a dense spectrum along the entire axis (see USA in the figure).These countries might be already efficiently exploiting their capabilities.Therefore, their best strategy might be to move from the exploitation of existing capability to the exploration of new, and thus not classified, activities and goods.
The productivity gap is clearly related to the Fitness and the top exported product (by complexity), as shown in Fig. 4. The left panel shows an intriguing U-shaped relationship between countries' fitness and productivity gaps: It is not the highestfitness or the lowest-fitness countries that exhibit the largest productivity gaps, but the countries with intermediate fitness values.Intermediate-fitness countries are likely those with unexpressed capabilities, that might benefit the most from the exploitation pathway described above.The three categories of countries can be loosely identified in Fig. 4, right panel.The countries that might benefit the most from the learner and explorer pathways are close to the forbidden region of M cp , whereas those that may benefit the most from the exploiter pathway exhibit middle top-product complexity, intermediate fitness values, and a very large productivity gap.In appendix ?? we show the entire list of countries with their productivity gap and top exported product.
Clearly, Fitness is a single parameter and cannot fully explain the productivity gaps.A finer description of export baskets is needed to identify opportunities for new products and optimal development strategies, which we leave for future studies.For example, our simplified classification could be integrated with network-based approaches [30] and machine learning approaches [31] to better predict product development probabilities, which might suggest new future development pathways for countries.

Conclusions
We showed the deep connection between the Fitness and Complexity algorithm, developed in Economic Complexity, and the Sinkhorn-Knopp algorithm devised in matrix scaling.Indeed, despite minor differences in the algorithms' structure, the fixed points of the two algorithms coincide.This connection allowed us to better understand some properties of FC and interpret Fitness and Complexity as potentials of a suitable energy function of mutualistic systems such as the international trade web.This description indicates how countries deploy their units of resources, it allows the definition of a barrier that defines the inaccessible region of country-product matrix, and it helps us to classify countries by their exploited potential, which suggests different strategies of investment and development.This classification is to be intended as a broad description of the global competition in international trade.Further studies will test the significance of these categories by looking in detail at their productive systems and their evolution through time.
The use of a dummy country to set a common scale across different M cp finds immediate application in all numerical employments of Fitness and Complexity, from GDP forecasting [5] to studies of relatedness of product [31].In principle, all studies that analyse the dynamics of these indicators should adopt a way to fix the scale before comparing datasets of different years, sizes or sources.
Arguably, the most interesting follow-up of this work is the connection it represents with the Optimal Transport theory.SK algorithm is an efficient way to solve the Optimal Transport problem [32].Thus, with this work, we opened a door to a vast and rich theory that can be used to extend, interpret and enhance the economic complexity framework.
For example, differences in resources and market demands can be accounted for by framing a resource allocation problem equivalent to a classic transport optimization.

Data
Export data are from COMTRADE ¶.Only for figure 2 we use data based on the SITC v2 classification started in 1976 to have a long time series, needed to show the regularization effect on trajectories.For all other results and plots we use Harmonized System (HS-2012), reconciled by the procedure described in [5], starting in 2012 and ending in 2021.Despite the SITC classification has a longer scope, the quality of the results is less accurate for two main reasons.On one hand, SITC v2 is based on a rigid classification defined in the 70s, thus missing many products that did not exist at the time.On the other hand, SITC has a definition of roughly 200 codes of products, while HS has more than 5000 at the 6-digit depth considered in the work.

Figure 1 .
Figure 1.Matrix representing the bipartite network of the export flow in 2016, reordered using the ranks of Fitness and Complexity.The red line represent the constant-line of Complexity over Fitness at the border.The network is based on the International Trade data using the Harmonized System classification HS-2012, the golden standard for the year 2016.

Figure 2 .
Figure 2. Trajectories of different countries in the Income -Fitness plane.We compare the standard Fitness (left panel) and the Fitness evaluated adding the dummy country (right panel).The insets show the average Fitness, indicating that there is a general trend.The Fitness are computed using the SITC v2 classification, which dates back to 1976.

Figure 3 .
Figure 3. Spectra of exports in 2016 of some representative country: USA, Ukraine (UKR), Namibia (NAM) and Somalia (SOM).Each blue line indicates that the country is actively producing the associated product.The products are disposed on the horizontal axis are ordered following the rank of Complexity.The red line indicates the position of the border line of figure 1.

Figure 4 .
Figure 4. Left panel: scatter plot Fitness Vs Productivity gap.Right panel: classification of development pathways in the plan Productivity gap -Top exported product.

FM
was partially supported by AFOSR (Grant No. FA9550-21-1-0236).MSM was supported by the URPP Social Networks of the University of Zurich and the Swiss National Science Foundation (Grant No. 100013-207888).