Economic complexity and the sustainability transition: a review of data, methods, and literature

Economic Complexity (EC) methods have gained increasing popularity across fields and disciplines. In particular, the EC toolbox has proved particularly promising in the study of complex and interrelated phenomena, such as the transition towards a more sustainable economy. Using the EC approach, scholars have been investigating the relationship between EC and sustainability, and identifying the distinguishing characteristics of green activities and to assess the readiness of productive and technological structures for the sustainability transition. This article proposes to review and summarize the data, methods, and empirical literature that are relevant to the study of the sustainability transition from an EC perspective. We review three distinct but connected blocks of literature on EC and environmental sustainability. First, we survey the evidence linking measures of EC to indicators related to environmental sustainability. Second, we review articles that strive to assess the green competitiveness of productive systems. Third, we examine evidence on green technological development and its connection to non-green knowledge bases. Finally, we summarize the findings for each block, while identifying criticalities and avenues for further research in this recent and growing body of empirical literature.


Introduction
The notion of Economic Complexity (EC) has been widely used to encompass a set of methods that characterize the productive and technological composition of arXiv:2308.07172v1 [econ.GN] 14 Aug 2023 economies (countries, regions, cities) relying on complex systems science (Hidalgo and Hausmann, 2009;Tacchella et al., 2012). EC methods have proved to be particularly effective in predicting future patterns of economic growth using information on the export basket of countries (Tacchella et al., 2018). The key intuition behind the EC approach is that economic development and growth are the result of the specialization and diversification patterns of economies, which emerge from underlying hidden interactions between elements in the society (Balland et al., 2022;Pugliese et al., 2019a). More specifically, EC focuses on the role played by the accumulation of (unobserved) productive and technological capabilities in driving economic diversification and growth Pugliese et al., 2017;Sbardella et al., 2018b). By preserving information on what economies produce, rather than merely how much, the EC literature is able to "describe and compare economies in a manner that eschews aggregation" . The EC approach yields a complementary perspective on several domains of economically relevant human activity (e.g. trade, technical innovation, scientific research) with respect to conventional (i.e. aggregate) indicators on productive inputs or performance. This emphasis on the content of the activity baskets of countries or regions resonates with other approaches in economics that understand economic growth through the lens of sectoral allocation of productive factors, such as the structuralist literature (Prebisch, 1950;Lewis, 1954;Hirschman, 1958;Lin et al., 2011), and that identify in the role of capabilities the main factor driving innovation, embodied in the evolutionary economics literature (Nelson and Winter, 1982;Dosi and Nelson, 1994;Teece and Pisano, 1994;Cimoli and Dosi, 1995).
The quality and diversity of an economy's productive and technological portfolio have broader implications than simply economic growth. For instance, the composition of economic and technological specialization has strong implications for the environment, as the footprint of different products and technologies can differ substantially. By the same token, the accumulation of technological capabilities may put countries on trajectories that can mitigate or exacerbate the current climate crisis. In this respect, EC methods can prove particularly useful to understand and guide the sustainable transition.
The transition towards a more sustainable and low-carbon socio-economic system is a top policy priority, with the EU Green Deal aiming at climate neutrality by 2050 1 , the Inflation Reduction Act in the United States (Bistline et al., 2023) and the ambitious renewable energy targets in China (Lo, 2014). However, decarbonizing the economy by phasing out polluting and energy-intensive industries will require radical transformations and profound structural change at the core of socio-economic systems. Moreover, this will have to account for the heterogeneous capacity of geographical areas and industries to achieve climate neutrality along with the possible long-lasting effects on income, spatial and environmental inequalities.
To inform policy on how to address such a complex transformation, wherein geographical, structural and institutional elements interact, in the last few years an approach has emerged in the literature (see among others Barbieri et al., 2020Barbieri et al., , 2022Montresor and Quatraro, 2020;Santoalha and Boschma, 2021) that draws from sustainability studies, evolutionary economic geography Frenken, 2006, 2018) and EC. Embracing a complexity perspective may in fact be more effective than traditional approaches in accounting for the interconnected nature of this process of change (Common and Stagl, 2005) providing policy-relevant, datadriven and granular evidence to embrace socio-economic complexity at different geographical scales.
While in the literature environmental goods and technologies have been mainly studied as homogeneous aggregated quantities, the capacity of EC methodologies to keep away from aggregation and to provide feasibility diagnostics at the level of single products or technologies may prove extremely relevant in analyzing the potential directions of green development for each country or region. In fact, environmental goods and technologies are highly heterogeneous in terms of functions , across geographical areas and stages of the life cycle of environmental goods and technologies (Sbardella et al., 2018a;Barbieri et al., 2022), they encompass many different domains of know-how  and may be linked in non-trivial ways to pre-existing knowledge as well as productive and specialization patterns Montresor and Quatraro, 2020).
Therefore, coupling the geographical distribution of productive and technological capabilities with environmental and socio-economic variables, various scholars have applied the economic complexity toolbox to try to answer three main broad questions: • What is the relationship between complexity and sustainability? 1 https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1596443911913&uri=CELEX% 3A52019DC0640#document2 • What are the properties of green products and technologies, and are these inherently different from non-green ones?
• How can we assess the readiness of national/regional knowledge bases and productive structures for the green transition?
Contributions addressing the first question investigate the empirical relationship between countries' environmental outputs, such as greenhouse gas (GHG) emissions, and the complexity of their productive structures, assessed by focusing on the export dimension through various EC metrics. However, these studies are far from unanimous, as they vary in their samples or estimating methods. This yields contrasting results on the effect of EC on the environment, ranging from linear positive to non-linear relationships a là Kuznets curve (1955).
The second and third questions have roots in the evolutionary economic geography literature and are more varied in terms of techniques and scales of observation. They mainly examine the geography of green competitiveness of national/regional knowledge bases or productive structures by analysing specialization profiles in green products or technologies. These studies also look at different dimensions of relatedness between green and non-green technological or productive activities, inequality, and political support for environmental policy. As is common in the EC literature, these works especially stress the potential complementarity between pre-existing capabilities and future specialization in lowcarbon technologies or products.
The EC framework plays a crucial role in understanding the sustainability transition and in helping to answer the above research questions. One of the characterizing elements of the EC approach lies in the emphasis it places on the role played by productive and technological capabilities in promoting advancement across domains of human activity. EC recognizes the intangible nature of capabilities, but it also finds ways to uncover information on capabilities from empirical patterns present in economic data. From the point of view of green goods and innovative activities, this implies that as countries and regions strive to move towards sustainable practices, EC offers a framework to analyze the composition and diversification of their economies. By doing so, the EC framework allows to identify the set of capabilities that are most conducive to green productive and technological specialization. Moreover, EC recognizes that capabilities are not static but can be accumulated over time, allowing countries to progress along the sustainability ladder. By leveraging their existing capabilities and building upon them, countries can foster the innovation necessary for transitioning to more sustainable technologies and practices. As we show in this review, economic complexity provides a valuable lens through which policymakers and researchers can understand and navigate the intricacies of the sustainability transition.
This literature review aims to summarize the empirical evidence produced so far to answer the three questions aforementioned from an EC perspective. To this aim, we first review the data on products and technologies used to apply EC methods to the study of the sustainability transition, highlighting drawbacks and advantages entailed by different types of data (Section 2.1). Second, we offer a methodological contribution by attempting to unify the methods used to estimate and analyse measures of economic complexity and relatedness (Section 2.2). Third, we canvas the literature that links complexity measures to the concept of environmental sustainability, in order to summarize the debate on the role of complexity in explaining environmental issues and in identifying viable avenues into the sustainability transition (Section 3). Finally, we attempt to identify limitations of the existing literature and methods, proposing further research avenues in the field (Section 4).

Data and Methods
The data employed in the EC-based analysis of the green economy are drawn from two main sources: patents and trade flows. This section presents in detail the most commonly used datasets and classifications of green activities, firstly focusing on patent data and secondly on trade data. This data is used in EC following a common empirical framework: a bipartite matrix is constructed by cross-tabulating products (technologies) and geographical areas (countries, regions, cities...). The bipartite matrix is then filtered, projected and ordered in different ways, depending on the specific purposes of the analysis. More on the methods to process the information contained in the bipartite matrix is explained in section 2.2.
2.1. Data 2.1.1. Patent data The two main sources of patent data used in EC studies are PATSTAT European Patent Office (2021) and REGPAT 2 . PATSTAT is a comprehensive patent database covering patent applications filed in more than 70 national and international patent offices, including the most important ones -e.g. the European, US, and Japanese patent offices -in terms of the volume and relevance (as measured by received citations, a commonly used proxy for patent value) of the applications they process. PATSTAT has been published biannually by the European Patent Office (EPO) since 2007 and has grown substantially in coverage over time.
As of the latest editions, it records information on more than 100 million patent applications filed since the late eighteenth century, which are collected in over 50 million families. Whenever the information is available for a patent application, PATSTAT records, among other things, the receiving patent office, the filing date (i.e., when the patent office received the document), the technologies in which the patent innovates (encoded in standard technology codes), and the residence of the applicants and of the inventors at the time of filing. The geographical information is incomplete, with the coverage varying widely across patent offices. The most commonly available information in this sense is the country of residence of inventors and applicants. For some documents, more granular data is available (e.g., the full address), but it is not recorded in a structured format, which makes it challenging to process at scale.
REGPAT is another popular resource for patent data that is published annually by the OECD. It covers the subset of PATSTAT patent applications filed at the EPO since its inception in 1978. Since, for geographical reasons, the EPO attracts disproportionately more European patent applications, it does not provide a uniform geographical coverage beyond European borders. Furthermore, filing costs at the EPO are higher than in most national offices. This skews the sample towards "highvalue" patents and, indirectly, leads to an over-representation of richer European countries with respect to more peripheral ones. However, REGPAT makes up for this shortcoming thanks to an accurate geocoding across over 5500 sub-national regions of the patent documents filed by applicants or inventors residing in one of over 40 countries belonging to the following set: OECD countries, the EU, the UK, 2 Available upon request at https://www.oecd.org/sti/inno/ intellectual-property-statistics-and-analysis.htm#ip-data Brazil, China, India, the Russian Federation, and South Africa 3 .
The technological fields in which the patents recorded in PATSTAT and REGPAT innovate follow two classifications: the International Patent Classification (IPC) and the Cooperative Patent Classification (CPC) 4 ). Both classifications follow a similar hierarchical structure spanning from codes associated with a very detailed description (e.g. G02B 1/02: optical elements [. . . ] made of crystals, e.g. rock-salt) to codes aggregating many detailed technologies under a broader common technological area (e.g. G02: optics; G: physics). For example, at 4digits, the classifications have around 600 unique codes , while at 8-digits there are a few thousand codes (7000 in the IPC and 10000 in the CPC). Despite the strong similarities between the two classifications, only the CPC has a section dedicated to tagging climate change mitigation and adaptation, the so-called 'Y02/Y04S' tagging scheme nested under section Y. Table 1 reports the 1-digit codes and the titles of the sections comprising the IPC and CPC classifications. Figure 1 compares the coverage of PATSTAT and REGPAT in terms of patent document and time span. The figure shows that the two databases are quite different, in both respects. The large difference in the number of documents is due to the fact that REGPAT records patents filed only at the EPO, while PATSTAT collects information from most patent offices worldwide. This is a notable difference, which makes REGPAT less suitable for the analysis of, for instance, smaller countries. However, the EPO tends to receive high-quality applications, making data collected from it more reliable. Concerning the temporal coverage, the figure shows that REGPAT and PATSTAT differ substantially. However, it also suggests that both databases cover a long enough time window for all practical purposes. The coverage of REGPAT starts only in 1978, while PATSTAT records US and UK patents dating back as far as the late eighteenth century. Nevertheless, large numbers of patents have been recorded only in recent decades. Moreover, both databases are updated to virtually the same date (for instance, as of Summer 2022, PATSTAT covers patent applications published up to April 2022, while REGPAT covers patent documents made public up to October 2021). It is worth noting that, due to the dynamics of patent offices and the regulations governing patent filing, there is an intrinsic lag of 12-18 months between when an application is received by a patent office and when the corresponding record appears in the database, at the end of the so-called search phase by the patent office. During the search phase, patent offices conduct a primary assessment of the originality of the invention and inform the applicants, who can decide to either withdraw the application thus keeping it confidential, or pursue the grant of a patent and disclose their application to the public. In some cases, notably the US patent office (USPTO) until the early 2000s, internal regulations used to allow the publication of granted patents only, implying a further lag caused by the longer procedure preceding the publication of the documents and a consistent backlog of applications awaiting examination. This resulted in a delay of up to 5 years during which database coverage remained incomplete. The USPTO has changed its rules allowing it to compute a complete and relatively timely patent count for disclosed patent applications. However, some lag in coverage is inevitable. Therefore, patent counts extracted from an edition of PATSTAT or REGPAT published in 2022 are not reliable beyond 2018 or 2019. An invention can be submitted to different offices -e.g. to cover different geographical regions -by filing different applications at different points in time. The first patent application filed to protect an invention is called the priority application. Subsequent applications that are related to the same invention name the same applications as priorities, allowing to group into the same patent family (there are over 50 million patent families in PATSTAT). Families are a useful way to group together documents referring to the same innovation and are thus frequently used as the basic unit of observation in empirical exercises.
As mentioned above, PATSTAT provides limited information about the geographical location of inventors and applicants for many documents, even though the database records patents filed by applicants and inventors located in more than 200 countries. Instead, REGPAT offers much more detailed information, albeit on a smaller set of countries, by associating patent documents to the Territorial Level (TL) (OECD, 2020) code of the region of residence of applicants and inventors. The TL classification is defined by the OECD and covers all its members plus a few important economic actors, such as Russia, China and India. Depending on the country, it follows a hierarchical structure that may go down to administrative regions, provinces and urban areas. For European countries the TL follows closely the hierarchical structure of the 2013 edition of the Nomenclature of Territorial Units for Statistics 5 ) (NUTS), developed by Eurostat. Instead, in the US, progressively finer levels of the classification identify states, economic areas and counties, that do not follow a nested structure.
To attribute a geographical location to patents one can leverage the inventors' as well as the applicants' residence information. Either can be more suitable depending on the research question. However, the former is often preferred as a proxy for the location of inventive capabilities because it is assumed that inventors (who are always physical persons) tend to live close to where they perform their duties. On the contrary, applicants (who are in many cases companies) may choose to assign to the corporate headquarters a patent that was developed in a subsidiary located in a different country or region for business-related reasons.
Multiple inventors or applicants can be linked to the same patent application or patent family. In such cases, one may choose between counting the patent fractionally and counting patents in full, when it comes to assigning patents to geographical areas. Full counting counts as a unit of every pair of patent documents and territorial units hosting at least one inventor. As a consequence of this double counting, the weight attributed to a patent (and even more for a family) depends on the number of inventors. Instead, in fractional counting, each patent (or each family of patents, if families are considered as basic units) sums to 1, and each territorial unit having an inventor gets a fractional weight inversely proportional to the total number of inventors. There is some debate in the literature concerning the best approach (Waltman, 2016). However, in economic complexity applications, fractional counting is generally the preferred approach.

The identification of environment-related technologies within patent classifications
The contributions about eco-innovation presented in this review rely mainly on the Y02/Y04S tagging scheme of the CPC and the Env-Tech classification based on a mixture of IPC and CPC codes, which cover a wide range of a wide range technologies related to sustainability objectives, including energy efficiency in buildings, energy generation from renewable sources, sustainable mobility, and smart grids. In response to the increasing attention and concerns about climate change mitigation and renewable energy generation, there was a large increase in the number and scope of patent applications in environment-related domains in the recent past (European Patent Office, 2013). However, searching for environmentrelated patent documents was not straightforward at the beginning because a dedicated classification system for sustainable technologies was not available. In fact, before 2011, no specific branch of the IPC or of other national/regional technology classification systems covered environment-related inventions.
A first step in this direction was the creation in 2011 of the Y02 class the EPO, in cooperation with the United Nations Environmental Program (UNEP) and the International Centre on Trade and Sustainable Development (ICTSD), to complement the IPC and the European Classification System. 6 From the beginning, the purpose of the Y02 class was to tag CCMT patent documents by means of search strategies and algorithms implemented by expert examiners and that can be re-run periodically to update the classes (Veefkind et al., 2012, p.1). The Y02 scheme initially covered only patent documents related to CCMTs in the energy sector and was later extended also to other types of mitigation technologies. At the time, it provided an additional classification for patent documents next to the IPC and was searchable through PATSTAT or Espacenet. This effort by the EPO constituted a major advancement for the study of green innovation both from an academic and policy perspective, as it has allowed also non-specialists to easily identify CCMTs.
In 2013, the European Patent Office and the United States Patent and Trademark Office (USPTO) agreed to harmonize their patent classification practices and developed the Cooperative Patent Classification (CPC). Since then, the CPC has become increasingly popular as a classification standard and has been complementing or substituting the IPC in a growing number of patent offices worldwide. As illustrated in table 1, two types of codes can be found in the CPC classification: codes starting with the letters A to H -similar to IPC codes and representing the traditional classification of technologies; and codes starting with Y, which are used to tag cross-sectional technologies that "do not fit in a single other section of the IPC", although "the tagging codes of this section do not in any way replace the classification or indexing codes of the other sections" 7 . Therefore, the Y classification scheme is 6 The European Classification System (ECLA) is a former patent classification system maintained by the European Patent Office (EPO), which has been replaced by the Cooperative Patent Classification (CPC) in following its introduction in 2013. 7 USPTO-CPC Section Y. https://www.uspto.gov/web/patents/classification/cpc/html/cpc-Y.html used to tag patent documents that are already classified or indexed somewhere else in the classification. In addition to the Y02 class, the new subclass Y04S dedicated to smart grids was integrated in the CPC section Y. More in detail, as shown in Table  2, the Y02 class consists of more than 1000 tags related to sustainable technologies organized in 9 sub-classes.
With the aim of maximizing the informative content on eco-innovation present both in the IPC and CPC classifications, in 2015 the OECD (Haščič and Migotto, 2015) developed Env-Tech, an expert-based catalogue of environment-related technologies based on the IPC classification and lastly updated in 2016 (OECD, 2016), which can be used to tag green patent documents in PATSTAT or other patent data-sets. Env-Tech identifies 94 environment-related technology areas that group 4-to 16-digit IPC and CPC codes, building on the CPC Y02 class whenever possible. The catalogue relies on a keyword search strategy "that identifies the relevant patent documents using alphanumeric symbols of the IPC or CPC systems [. . . ] which correspond to the target environmental technology field" (Haščič and Migotto, 2015, p.19) that is meaningful for policymakers. The process of selection is carried out with test searches on each class individually, or by reviewing the class description. When it is not possible to identify single IPC/CPC classes that portray alone the technological field of interest, it employs a combination of different patent classes. Env-Tech provides a categorization of green technologies that do not cover exclusively CCMTs and that encompasses also environmental management and water-related adaptation technologies (class 1 and 2). It does so by mixing codes at different aggregation levels from both the CPC and IPC classification (which is still widely used by many patent offices around the world).
While widening the object of analysis is undoubtedly commendable, unfortunately, it makes the catalogue very vulnerable to patent document reclassification. In fact, while the additional information may have been useful for the researchers working with versions of PATSTAT prior or contemporaneous to 2016, the year up to which Env-Tech is updated, using a combination of different IPC and CPC codes may constitute a drawback when working with more recent versions of PATSTAT. This is because the classification of the codes comprising any technology -especially for finer-grained codes -changes over time, possibly reclassifying past inventions into new codes. Furthermore, being fixed in time, Env-Tech not only it suffers from the reclassification of previous inventions, but it also fails to consider newer patent applications. Patent classifications are not slowly varying, like industry classifications where sectors are well defined and changes in the classification take place approximately every decade. Instead, technology codes are revised at least once a year to take into account new technical advances, as well as to improve the search for the prior art that the patent officers use to establish the application's degree of innovation. For instance, when a patent code becomes too populated it can be retroactively split into new codes. Let us consider a fictitious case in the domain of battery technologies, that are present in several Env-Tech sub-classes. Following new technical developments this kind of patent could be split into several subcategories (e.g. lithium or nickel batteries) and new classes might be added when batteries that use new materials are introduced (e.g. graphite). If Env-Tech photographed a moment in time in which only lithium and nickel batteries were present in the classifications and not graphite batteries, it would fail to capture the state of the art. Moreover, the Y04S subclass covering smart grids, which at the end of 2016 comprised 54000 patent documents (Angelucci et al., 2018), is not included in the Env-Tech catalogue. This might lead to an underestimation of the inventive efforts in smart grids, despite the fact that Y04S codes heavily overlaps with Y02 codes. By contrast, the Y02/Y04S codes are part of the CPC section Y and are therefore robust to changes in the CPC classification, are more user-friendly and readily usable for the relevant data can be directly extracted by PATSTAT with no need for intermediate steps. Lastly, another limitation of Env-Tech is that, due to its limited granularity (94 categories at the most disaggregated level), it does not allow very detailed studies on green technology development and diffusion, an obstacle for carrying out economic complexity analyses that are hinged on the possibility of developing algorithmic and network tools able to capture relevant parameters on the technological potential of geographic areas at very fine-grained levels.

Trade data Since
The environmental goods and services industry manual for data collection and analysis was published by the OECD in 1999 (OECD, 1999), a wide array of lists and taxonomies of green products has been proposed.
Here we briefly discuss the lists that are most commonly used in Economic Complexity analyses, highlighting the main critical issues in classifying environmental goods. Readers should keep in mind that the EC literature has focused, especially in the earlier works, on establishing the level of complexity and growth potential of an economy by extracting information from its trade specialization profile. In order to do so, a harmonized global classification of products with a uniform interpretation worldwide has been crucial for developing the framework. The Harmonised System (HS) classification satisfies these requirements and has been the most widely used in the field. This is a standardized numerical method of classifying traded products. It is used by customs authorities to identify products, and it is maintained by the World Customs Organization (WCO) which updates every five years. HS comprises more than 5,000 commodity subheadings which are identified by a 6-digit code and arranged according to a nested structure going up to 96 2-digit Chapters and 1-digit 21 Sections.
The first issue to take into account when constructing a comprehensive green product classification is in the very nature of the HS classification system, which was not conceptualized for accommodating a green/non-green dichotomy. Whilst monitoring the trade of environmental goods is a central objective in the global policy agenda (World Trade Organisation, 2001;Sauvage, 2014) and the introduction in the Harmonized System of several 6-digit subheadings including new environmental goods was announced (Steenblik, 2020) in 2020, currently, the World Customs Organization is slowly addressing this issue by updating the HS system with more codes for green products. However, the updated classification is not yet available. Therefore, a clear-cut identification of environmental goods within existing product classifications may at times prove to be a difficult task. For instance, up until recently, it was impossible to distinguish between combustion engines and electrical cars. Other classifications or surveys exist, which are more focused on the environmental aspects of products or their final use. Nevertheless, these classifications do not present the standardization and granularity properties required by EC techniques.
A second crucial point for a green classification is a unanimous definition of green products. This clearly depends on the specific needs it should address: for instance, different requirements need to be satisfied in order to fulfil regulatory or academic purposes or to define incentives and tax reliefs aimed at using renewable energy sources, or environmental practices in buildings. While stressing a series of open questions, partially still unanswered, in a 1999 report by the OECD (OECD, 1999) the environmental goods and services industry was defined as "activities which produce goods and services to measure, prevent, limit, minimize or correct environmental damage to water, air and soil, as well as problems related to waste, noise and eco-systems. This includes cleaner technologies, products and services that reduce environmental risk and minimize pollution and resource use". In the report, a list of 121 environmental goods satisfying this definition was proposed. More recently, according to the IMF: 8 "environmental goods include both goods connected to environmental protection --such as goods related to pollution management and resource management --and adapted goods --which are goods that have been specifically modified to be more environmentally friendly or cleaner". In 2009 the World Trade Organization (WTO) published a list of green products (Organisation, 2009) which is broader than the OECD list, comprising 480 products. The aim of such a list was to agree upon a wide classification for policy implications, evaluation, and tariff regularization in international trade. While the list is comprehensive, many inaccuracies and biases have been pointed out by EUROSTAT in a report published in the same year (EUROSTAT, 2009). The WTO also published two shorter 'friend' and 'core' lists composed respectively of 154 and 26 products. Similar efforts were made in 2010 by the OECD for the 2010 Toronto G20 summit which proposed an updated list of 150 Plurilateral Environmental Goods and Services (PEGS). In 2012, the Asian-Pacific Economic Cooperation (APEC) (APEC, 2012) has put forward a list of 54 environmental goods subject to reduced tariffs. Finally, in 2014, the OECD published the most recent list of green products, the Combined List of Environmental Goods (CLEG) (Sauvage, 2014). CLEG comprises 248 products identified by combining and revising the APEC, the WTO 'friend' and the PEGS lists, which cover more than 150 products. However, CLEG focuses mainly on goods of relevance to tackling climate change.
Regardless of the specific definition, two main pitfalls are shared by all green product classifications: • Final use: for many products, it is not possible to know their actual final use.
Many commodities labelled as green (e.g. filters, pumps and pipes) may be used also for non-environmental purposes. While statistics can be computed based on surveys, their reliability on a global scale is still unclear.
• Greener products: depending on the goal of the study/application one could consider products that are less raw material intensive, have longer life spans, are more energy efficient or are easier to dispose of. Clearly, these properties depend on the comparison with other products belonging to the same category. However, such comparisons are, even in principle, very difficult to carry out.
From these two issues, a clear trade-off between accuracy and comprehensiveness arises. Moreover and finally, any list of environmental products cannot be considered final and, as the HS classification, needs constant updates and revision.

Methods
In the Economic Complexity framework, economic performance is seen as the result of the accumulation of non-tradeable inputs and productive capabilities. Economic Complexity can be seen as an indirect measure of the capability endowment of a country. The notion of capabilities was developed at the firm level in the evolutionary economics literature to describe the dynamic know-how allowing firms to develop and introduce new products and services in the market (Penrose, 1959;Teece et al., 1997. In the EC literature productive capabilities describe location-based attributes, which encompass further intangible aspects contributing to building and effectively exploiting productive efficiency, such as the institutional setups, education systems, policies, and infrastructures needed by a country to learn how to produce and export competitively more complex products (Hausmann and Rodrik, 2003;Hausmann et al., 2005;Sutton and Trefler, 2016). Information on capabilities is gathered from a binary network based on international trade data that connects countries to the product they export with comparative advantage.
In the following, we present the main metrics and tools available in the framework of Economic Complexity.

Complexity measures
The basic intuition of the EC approach is that specific activities, such as export production, labour sectors, patenting activity or scientific research, are important because they constitute different learning opportunities and development possibilities. EC explicitly builds on the heterogeneity and interactions between different economic actors, assuming that the level of technological and scientific knowledge of a geographical area cannot be reached at an intensive level: knowledge grows not by accumulating 'more' of the same, but by adding new and different elements to existing capacities.
In order to do so, analyses in Economic Complexity often start from the observation of empirical bipartite networks connecting geographical areas -be they countries, regions or cities -to different types of economically relevant activitiessuch as patenting (Sbardella et al., 2018a), goods production  or scientific research  -in which they are competitive. These bipartite networks are typically assessed by evaluating the comparative advantage of the geographical area in the selected activity, using Balassa's Revealed Comparative Advantage (RCA) index (Balassa, 1965). The index is interpreted as a proxy for above-average competitiveness in any given activity. RCA measures the share of production of a region in activity compared to a reference distribution, typically the global share. In formula, the RCA of a geographic region g in an activity sector a on the activity W , can be written as follows: Despite the simplicity of the RCA as an index of competitiveness, and the criticism it received in favor of other measures such as the Absolute Advantage, the RCA index and its refinements (see Bruno et al., 2023) remain the workhorse for binarizing the adjacency matrices of bipartite networks in the EC literature. The RCA indicator assigns a real and positive value to each combination of geographical area and activity. The resulting distribution of RCA values is skewed with large tails . Dealing with continuous values with a large variability can be impractical for various applications which often only require the information on being competitive or not. Therefore, the RCA matrix is binarized by setting a threshold value, usually 1. Therefore, if the RCA value is above 1, it is assumed that the region has a comparative advantage in such activity, and a value 1 is inserted in the matrix cell, and zero otherwise Geographical area g has an RCA of 1 in activity a if the weight of a in the basket of activities of g is the same as the global weight of activity a relative to all activities. Though the unit threshold has a clear focal value which has made it the standard in the literature, different values might be appropriate depending on the particular application.
The first developed indicator addressing the evaluation of the effective, or general, diversification of a country is the Method of Reflections (Hidalgo and Hausmann, 2009), which proposed an iterative procedure The terms k n g and k n a are the generalized diversifications of geographical areas and activities at the n-th iteration. The initial conditions are k 0 g = a M ga and k 0 a = g M ga , corresponding to the normal definition of diversification and ubiquity. The idea behind the method is that the complexity of a region, interpreted as its generalized diversification, is driven by the average complexity of the activities it performs and that the average complexity of activity is driven by the average complexity of the regions with a comparative advantage in it. Hence, the complexity is standardized (i.e. removing the mean and dividing by the standard deviation). Since the iterative model in equation (3) has a trivial solution Kemp-Benedict, 2014), the authors aligned on a more practical definition of the index based on the standardization of the eigenvalue associated with the second largest eigenvalue of the matrices derived from the Method of Reflections. The new index is called Economic Complexity Index (ECI) (Hausmann et al., , 2014. Mealy et al. (2018) showed that ECI is equivalent to an approximated solution of a spectral clustering problem. Effectively, ECI divides countries into two different groups with similar export baskets. However, ECI presents a conceptual drawback due to the fact that the complexities are defined by averages . ECI defines geographical complexity as the average product complexity of a geography's basket, and product complexity as the average complexity of the geographical units specialized in that product. But averages create the problem that the complexity increases if you do something with a higher than average level of complexity but decreases if you do something with a lower level of complexity. By absurd, therefore, it could be that two countries, one that produces everything and one that makes only one product of average complexity, have the same complexity. This thought experiment shows that looking at averages isn't a very effective method when you want to analyze cumulative quantities like capabilities. Furthermore, ECI is based on the idea that countries are diversified.However, there is more to the binary M matrices than just the diversification of the regions represented by their rows. In particular, the M matrices are usually nested across domains of activity (e.g. irrespective of whether the columns of M represent products, technologies or scientific fields)  and scales of analysis (Pugliese et al., 2019b;Laudati et al., 2023) The concept of nestedness originates from ecology (Atmar and Patters, 1993) and refers to the observation that, in response to evolutionary pressure, species co-existing within an ecosystem occupy different niches in a predictable manner whereby more generalist species occupy most or all niches, while progressively more specialized species tend to occupy a subset of the niches. This results in ecosystems that can the represented by presence-absence matrix such that species and niches can be ordered in a triangular arrangement in which the ones cluster in the top left part and the zeros cluster in the bottom right. 9 Therefore, nestedness can be viewed as the footprint of a complex evolutionary process, and, as such, it can be also framed in the evolutionary economics tradition that has more room for path-dependency and complexity as driving forces of economic dynamic (Dosi and Nelson, 1994;Nelson and Winter, 1982). In this context, nestedness means that less diversified regions typically have a comparative advantage over the activities pursued by more diversified regions, resulting in the emergence of a hierarchical model represented by triangular matrices (Cristelli et al., 2015). It should be noted that the presence of nestedness is an indication that we are in the presence of a more correlated system that cannot be easily described through a simple linear model and averages. A more suitable model for the characterization of nested patterns is the Fitness and Complexity metrics . The Fitness of a region is a sum of the Complexities of all the activities pursued, thus expanding the activity basket increases the Fitness proportionally. The extensive nature of the fitness is complemented by the definition of the Complexity as driven mainly by the Fitness of the less competitive regions performing such activity. Operationally, the Fitness and Complexity algorithm finds the fixed point of The normalization step is found to be unnecessary for the evaluation of the fixed point (Mazzilli et al., 2022) but helps in the numerical procedures and the stability of the code. The rationale of the algorithm is that the Fitness of the geographical areas under analysis and the Complexity of the activities in which they are specialized can be determined recursively by taking advantage of the information provided by the composition of their productive or technological portfolio. In particular, a geographical area with a more advanced set of capabilities will have a more diversified portfolio of activities, spanning from the most to the least complex ones, and will therefore have a higher Fitness score. In turn, complex activities are rare and appear almost exclusively in the portfolio of high-fitness territories. Consequently, a region with low fitness has a smaller endowment of capabilities and thus operates exclusively in less complex (green and non-green) domains. Further, Fitness and Complexity are potential functions, related to the bipartite network M cp , which defines a forbidden region of the matrix given by the requirement of efficiency in the allocation of resources of countries and products (Mazzilli et al., 2022).
Fitness and ECI have in common that both are relative measures based on the bipartite network at a given snapshot of the economic system under analysis. Therefore, their magnitudes cannot be compared longitudinally across time since the networks change. This is an issue not often discussed in the literature but that is a rising concern when the metrics are used as variables for regressions. Indeed, it is possible to overcome this issue by setting a reference scale (Mazzilli et al., 2022), i.e. by assigning to a reference country a fixed value, or by adding a dummy country that is always specialised in all activities.
Another frequent issue that appears in the Economic Complexity literature is the analysis of the systems across different geographical scales. Often, the data at lower scales does not cover the whole system but only parts of it and the quality of the data is lower since volumes are smaller and errors become more relevant. For example, this is a general problem in the analysis of technological outputs. The available data often allows the evaluation of the patenting activity at the level of regions or cities, but many cities do not produce more than a few patents per year. In order to evaluate the Fitness of regions, it is preferable to compute the Complexity of economic activities at the country level where the data has the better signal-to-noise ratio and to plug the results in the first term of equation 4 to compute the Fitness of regions. We refer to this measure as "exogenous fitness" and in principle, the same philosophy can be implemented with ECI by applying the line of the Method of Reflections in equation (3).
As far as this work is concerned, the term Green Fitness is often used. This term refers to Fitness computed on technological codes taken from the classification of CCMT technologies (Y02 -Y04S, see section 2.1 on data). At the country level, the fixed point of equation (4) is calculated using only data on the above mentioned technological codes. At the regional level, due to the high fragmentation of the data, it is necessary to resort to the Exogenous Fitness described above.

Relatedness
Another set of techniques from the Economic Complexity toolbox aims at estimating the relatedness between activities, e.g. between pairs of products (Hidalgo, 2009;Zaccaria et al., 2014), by using product-level export data to measure relatedness through statistically significant patterns of co-exporting in the international trade network. This is done relying on the assumption that countries that are able to successfully export a product have developed a set of capabilities that would enable them to diversify into related goods. This measure of relatedness, or proximity, between two products is based on the observation of their empirical co-occurrences in the export basket of different countries and is connected to the probability that having a comparative advantage in the first product will also lead to a comparative advantage in the second. These analytical tools can be traced back to Jaffe (1986) and to the measure of corporate coherence introduced by Teece and Pisano (1994) and can be used also to estimate relatedness between technologies Breschi et al. (2003); Napolitano et al. (2018), or between a technology and a product (Pugliese et al., 2019a;de Cunzo et al., 2022) -see also Tacchella et al. (2021) for a multi-product, non-linear approach based on machine learning.
In more technical terms, relatedness is a measure of the pairwise similarity between activities. It is usually represented as a network of connected activities, where the links express the possible degree of similarity. There are several methods to evaluate the similarities between activities, resulting in different and heterogeneous networks that cannot always be easily compared. However, most of them present a core-periphery structure, with hubs and leafs associated with similar types of activities or arranged into meaningful communities.
The first approach to relatedness in the EC literature is the Product Space, developed by Hidalgo et al. (2007) and based on exported products. The similitude between products is evaluated using the proximity, a measure of normalized export co-occurrence pairs, mathematically defined as (Hidalgo, 2009): Originally, proximity was interpreted as the "conditional probability of exporting good product p given that one exports good p'" (Hidalgo et al., 2007) 10 . In the visualization of the Product Space, the network is typically obtained by pruning all the links with proximity below a certain value (in the original work the threshold was 0.55 (Hidalgo et al., 2007)). Intuitively, products that share similar inputs will be situated close to each other in the network and, once we have filtered the empirical co-occurrences with the null model, proximity indicates a relatively high probability of jumping from a product to a neighbouring one. Therefore, its observation allows us to trace the most profitable trajectories to enter into a new production line on the basis of the pre-existing endowment of productive capabilities. A more statistically sound approach is given by the evaluation of the similarity between pairs of activities through the statistical validation of the assist matrix (Pugliese et al., 2019a;Sbardella et al., 2022). This framework introduces the idea of studying the co-occurrence within the same geographical area of exported products at different points in time. Indeed, it is possible that when a country is able to be competitive in a given activity, the available portfolio of capabilities may be important to become competitive in a new activity after some time. The assist matrix estimates the probability of having a comparative advantage in one activity in a given year, conditional on having a comparative advantage in another activity in a previous year; typically a lag of 5 years is considered. The formula of the Assist matrix is The labels a refer to an activity, the label g refers to a geographical region, and y refers to a year and B aa ′ is a conditional probability, since a ′ B aa ′ = 1, 0 ≤ B aa ′ ≤ 1. The Greek labels α and β indicate the possibility that the matrices are not related to the same set of activities. For instance, it is possible to consider the assist matrix 10Remarkably, proximity cannot be confused with a conditional probability. It is true that each value of ϕ pp ′ is bounded to 1 and is real and positive, but a conditional probability, i.e. P (A|B), requires that A P (A|B) = 1. Instead, p ′ ϕ pp ′ depends on the dimension of the network (the linear sizes of M ga ) and is typically larger than 1.
between technologies and export goods, rising the analysis to interaction among different layers of economic activities. However, the assist matrix, like any other similarity measure, suffers from the problem that the relatedness networks are based on nested networks where diversified actors are active in most of the activities. As a consequence, a co-occurrence is not informative per se: it may in fact be due to the ubiquity of a technological field across geographical areas or may take place in a highly diversified area, where almost all products are represented. To evaluate whether the probabilities contained in the Assist Matrix are significant indicators of a capability spillover, one needs a null-model that discounts the fact that co-occurrences can be random and are more likely the more two activities are ubiquitous across areas (Cimini et al., 2019). In other words, according to the null hypothesis, the co-occurrences of above-average activity in the technology and labor layers are random and are determined only by the diversification of the geographical area and the ubiquity of the technological field.
A null model used in the literature to assess the statistical significance of the conditional probabilities B aa ′ (y 1 , y 2 ) is the Bipartite Configuration Model (BiCM) (Saracco et al., 2015(Saracco et al., , 2017, a maximum-entropy approach for the randomization of bipartite networks. The BiCM allows to generateM α (y 1 ) andM β (y 2 ), randomized versions of the empirical bipartite matrices (M ) defined in Equation (6), in which the elements are reshuffled randomly while preserving the degree distributionsi.e. the diversification of the geographical areas and the ubiquity of the technological fields.
By applying Equation 6 to a suitably large number of the reshuffled matrices, it is possible to obtain a large set of realizations of the null-model assist matrices B(y 1 , y 2 ) that we use to test on the empirical data the null hypothesis. Hence, we select the pairs (a, a ′ ) that belongs to the statistically validated networkB at the level of significance considered.

Review of the literature
The data and methods described in the previous section are at the core of the growing body of empirical literature that has studied the sustainability transition using the EC framework. In this section, we will review this body of literature, which we have divided in three main blocks. The first block includes contributions that have explored the association between complexity metrics at the geographical level (mainly countries and regions) and variables related to environmental sustainability, such as aggregate CO2 and Greenhouse Gases (GHG) emissions. The second block focuses on the identification of green products, aiming to measure the green potential for green diversification based on export patterns. The third block is constituted by empirical articles that use patent data to assess the readiness of regions for the green transition based on their existing technological capabilities, and at examining the complementarity between green and non-green technologies.

Economic Complexity and the environment
EC indices have been widely used by academics and policymakers to predict economic growth (Hidalgo and Hausmann, 2009;Tacchella et al., 2012;Cristelli et al., 2013;Tacchella et al., 2018). However, the pursuit of economic growth has been put under scrutiny by academics and society alike, due to the evidence that associates economic growth -measured in terms of GDP -to environmental degradation (Raworth, 2017;IPCC, 2022). Since the measurement of EC relies heavily on the nature of the products exported by countries or regions, which in turn impact the environment with different intensity (as they embody different levels of greenhouse gases, have different energy requirements, and produce more or less polluting byproducts), there has been growing interest in understanding the relationship between the complexity of countries' productive structures and environmental degradation. An ever-growing body of literature (Table 3) has investigated this relationship, looking especially at the export dimension of complexity and at country-level measures of environmental degradation.
One strand of this literature assumes a linear relationship between EC and environmental impacts, producing mixed evidence. In a rather comprehensive study, Romero and Gramkow (2021) analyse the relationship between 67 countries' complexity levels and their CO2 emissions, measured by aggregate, per capita and product-specific emissions. The latter is proxied by the Product Emission Intensity Index (PEI), which averages the emissions of the countries exporting a product with comparative advantage -following a methodology similar to Hartmann et al. (2017). The study finds that lagged ECI is associated with a reduction of both emissions intensity and per capita, and that lower emissions are associated with more interconnected (complex) products.
Looking exclusively at OECD countries, Dogan et al. (2021) also find a positive relationship between economic complexity and the reduction of CO2 emissions between 1990-2014. In addition, the authors show that the complexity of exports interacts positively with the consumption of renewable energy, contributing to a mitigation of environmental degradation in high income countries.
However, studies on the relationship between EC and environmental indicators are quite heterogeneous in terms of the data and the analytical techniques they employ. As a consequence, comparing their results is not always straightforward. Boleti et al. (2021) rely on a measure of environmental performance (including: emissions indicators for different pollutants; effects of pollution on human health and environmental degradation; and the effectiveness of environmental policies) to show that increased complexity is associated with better environmental performance across 88 low and high income countries over a (short) period spanning between 2002 and 2012. However, in the same period, higher complexity is also associated to worse air quality, measured in terms of exposure to CO2, PM2.5, methane, and nitrous oxide.
Neagu and Teodoru (2019) examine the heterogeneous effects of EC on greenhouse gases in European countries. Their work shows that higher EC is positively liked to the growth of emission of greenhouse gases by countries, although this happens faster for countries with relatively lower levels of economic complexity. These results are consistent with other studies focusing exclusively on the most complex countries: using time-series estimation techniques, Martins et al. (2021) find that higher EC is positively associated, in an unidirectional way, the levels of CO2 emissions in the top 7 countries in the EC ranking. Nevertheless, integration in international trade has mitigated such negative effects, allowing early industrializers to shift towards knowledge-intensive, less polluting tasks. Similarly, Rafique et al. (2022) resort to dynamic panel data estimation techniques to find that ECIalong with urbanization and export growth -is positively linked to the ecological footprint 11 in the top 10 complex countries.
The mixed evidence on the relationship between EC and environmental sustainability could also be explained by the fact that such linkage may be nonlinear. It has been argued that countries increase their environmental impact as they industrialize, eventually reaching a peak in their CO2 emissions per capita. However, as they move towards more sophisticated activities and services, emissions 11The authors rely on an index -the Ecological Footprint Index -which includes factors such as area occupied by forests, cropland, grazing, built-up land, fishing, and CO2 emissions.
per capita start to decrease, while GDP keeps growing. The reversed U-shape of the relationship between GDP and CO2 emissions has been named Environmental Kuznets Curve (EKC) (Grossman and Krueger, 1991;Selden and Song, 1994;Grossman and Krueger, 1995), building on the work by Kuznets (1955), who observed the same relationship between inequality and economic growth unfolding along with the process of structural transformation. Given the tight link between structural transformation and CO2 emissions, and the relevance of complexity metrics to describe the former as a process of continuous diversification towards more sophisticated products, a growing body of literature has attempted to identify a complexity-based EKC. The intuition behind a reversed U-shaped relationship between economic complexity and CO2 emissions lies in the fact that, as countries become 'fitter' and accumulate productive capabilities, they shift their specialization towards more knowledge-intensive goods, with the latter being greener. Moreover, as countries become more technologically advanced, they are able to improve their energy efficiency and introduce green technologies in the process of energy and material production.
An ECI-based EKC has been identified especially for high-income countries, which find themselves at a mature post-industrial stage -that is, beyond the peak of emissions that coincided with higher intensity of employment manufacturing industries. In particular, ECI-based EKCs have been empirically tested for France (Can and Gozgor, 2017) and the US (Pata, 2021), as well as for a sample of 25 European countries (Neagu, 2019) and for leading exporters (Zheng et al., 2021). However, the same cannot be said for emerging economies, as they are still intensive in manufacturing or extractive industries, which are associated to higher environmental impact. For instance, empirical tests of the EKC in China -both at the aggregate (Yilanci and Pata, 2020) and regional level (Akadiri et al., 2022) -find no evidence of a reversed U-shaped relationship between complexity and CO2 emissions. On the contrary, higher complexity appears to be associated with higher environmental impact in both cases, despite the negative association between regional complexity and coal consumption in China (Dong et al., 2020). In the case of Brazilian regions (Swart and Brinkmann, 2020), the EKC hypothesis is validated using a quadratic fit between ECI and several indicators of environmental quality. The hypothesis holds for waste generation, but not for forest fires, deforestation and air pollution. However, higher regional complexity appears to be associated to the creation of green occupations (Dordmond et al., 2021).
While it could be argued that the linear correlation between EC and environmental degradation in emerging countries -like China and Brasil -could be led by the fact that these are going through a phase of industrialization, their future ability to decouple economic growth from emissions is yet to be proved. This casts a doubt on whether we can expect low-and middle-income countries to go down the same path of currently high-income countries. Even if this will be the case, it must be taken into account that the polluting industries in which countries were once specialized may be simply outsourced to other countries, lending support to the pollution haven hypothesis 12 (Cole, 2004). Moreover, the right part of the EKC -where the decoupling of economic growth and CO2 emissions is observed -is populated only by a few developed countries (Csereklyei et al., 2016). Even if the same relationship was to hold also for developing countries in the future, waiting for all countries to move beyond the plateau may irremediably compromise the environment, as highlighted by the IPCC 2022. These arguments put into question the normative utility of the EKC (Stern, 2004;Savona and Ciarli, 2019), and highlight the importance of assessing the green potential of countries and regions, examining their ability to introduce technologies that can effectively mitigate the impact of production on the environment, while sustaining economic diversification and structural change. In fact, evidence suggests that the causal linkage between EC and CO2 goes in both directions (You et al., 2022). This indicates that countries with a comparative advantage in high-emissions technologies may have more incentives (and accumulated capabilities) to diversify towards greener ones.
Furthermore, while most of the empirical literature reviewed in this section relies on exported products data, the trade dimension is not the only relevant aspect that should be considered when examining the relationship between complexity and environmental sustainability. As shown by Stojkoski et al. (2023), the CO2 intensity across countries is explained more fully by a combination of trade, technological and research complexity, computed respectively using data on exported products, patent applications and scientific publications. Moreover, the authors show that not all complexity algorithms work equally well in predicting CO2 intensity, showing that the Fitness algorithm  outperforms competing measures.
12The pollution haven hypothesis states that regulations in high-income countries aiming at reducing greenhouse gases emissions may lead firm to relocate in countries with looser environmental regulations, thus leading to a relocation of CO2 emissions rather to their effective reduction worldwide. For a case linking EC to the pollution haven hypothesis, see Dong et al. (2020).
Finally, we would like to raise some methodological issues regarding the evidence canvassed in this section. A part from rare exceptions (i.e Can and Gozgor 2017), the empirical papers that examine the relationship between EC and indices of environmental performance, use a measure -the Economic Complexity Index (Hidalgo and Hausmann, 2009) -which should only be treated as relative within each year. As the term of reference for the measure changes every year, changes in ECI over time do not have any longitudinal interpretation, as the scale with which the index is measured changes every year. In order to address this, the use of ECI (or other similar EC indicators) in longitudinal regressions techniques should rely on the projection of product complexity in a given year of the series, upon countries' Revealed Comparative advantages in every year (Sbardella et al., 2018a;Operti et al., 2018), or a measure that maintains an invariance of scale over time, as suggested by Mazzilli et al. (2022) and explained in section 2.2.
One of the advantages offered by complexity methods in studying the sustainable transition is that these allow to observe productive structures and trajectories at a finely disaggregated level, for instance identifying single products or technologies towards which diversification should be steered in order to facilitate the transition towards greener activities. The role of products and technologies will be discussed in the following subsections.

Economic Complexity for assessing green productive capabilities
Product-level data are the most commonly used data in the EC framework. However, data at the product level are much less used in studies on EC and the sustainable transition, if compared to technology (patent) data. This is due to the problem of how to define a 'green product' described above. Few attempts have been made to assess the green complexity/potential of national industrial systems. So far, the literature mainly focused on the following research questions: Are green products more complex? How close to new green products is a given productive structure? Is there a significant difference in the dynamics of the product space for green products? Thus, on the one hand, scholars have tried to characterize green products as a particular subset of products in the international market by using standard complexity tools. On the other hand, the big question -'how prepared is a country to a green transition?' -is tackled mainly via network tools such as the products space. Again, no specific tools or framework were invented for green commodities. Most of the literature simply applied the standard EC methodology and restrict the analysis to the 'green subset' of products.
For instance, Fankhauser et al. (2013) assess the starting point of the green race for 8 countries: China, Germany, UK, USA, France, Italy, Japan and South Korea. They divide the green transition in three factors: green conversion, estimated using the Green Innovation Index (GII), an index constructed using patent data; change in Revealed Comparative Advantage; and green production at the outset, which is assumed to be proportional to total production for lack of sufficient data. They find, for the period 2005-2007, different areas of competitiveness among countries, detected from the scatter plot of GII and RCA. There is no actual correlation between patenting GII and RCA competitiveness, that could mean a lack of policy in terms of green conversion for most of these countries. Hamwey et al. (2013) applies the product space technique to identify opportunities for countries in green production. They arbitrarily choose 11 environmental groups of products, in the SITC 4 classification at 4 digits, taken from the list made by the WTO Committee on Trade and Environment (CTE) in 2011. They focus their analysis on Brazil, finding few selected products as potential opportunity of diversification. They recognize the weakness of the green products definition, classification and selection, as well as the limitation of neglecting the dynamics of the RCA time series used to construct the products space. Fraccascia et al. (2018) followed Hamwey et al. (2013) by applying the product space description and measuring the proximity of green products, as defined by "the environmental goods and service sector" classification of EUROSTAT (2009), to other products with an RCA. Using regression analysis, they find this proximity to be significant for the development of the export of green products in a 4-years horizon.
In Mealy and Teytelboym (2022) two list of green products are defined. A broader list of 293 green products, obtained by merging the WTO Core list, OECD lists, and the APEC list, and a shorter list of 57 renewable energy products. Using all the products available in COMTRADE, they first compute the ECI of all products and construct a product space. Then, they focus on green products extracting the Green Complexity Index (GCI), the sum of green products' complexity, and the Green Complexity Potential (GCP), estimated by the proximity of green products in the product space. They find the complexity of green products significantly higher than the entire products baskets. Notably, they find a negative correlation between GCI and CO 2 emissions per capita, as well as a positive correlation between GCP and an increase of export in green products.
Pérez-Hernández et al. (2021) follow Mealy 2022 and apply GCP and GCI at a sub-national level, studying the green potential of Mexican regions. They use an adapted version of the CLEG classification to characterize the green product space of regions. While they overall confirm the results of a positive correlation between GCI and GCP, some interesting outliers arise from the analysis. For example the state of Jalisco shows a high GCP and a comparative advantage on few high complexity products, but a low diversification in green products. This may be explained by some geographical or fine-grained factors, not captured in these indexes, that are less important at larger geographical scale.

Economic Complexity for assessing green technological capabilities
There is broad consensus among academics and policy makers that accelerating the development of new, far-reaching green technologies and promoting their global application are important steps, albeit not the only ones (Parkinson, 2010;Sarewitz and Nelson, 2008), towards containing and preventing GHG emissions and implementing the sustainability transition (OECD, 2011;Popp et al., 2010;Stern and Stern, 2007). Following this momentum, in recent years the EC literature exploring the intricate relationship between green technological development, regional specialization dynamics, and policies has grown substantially. By drawing insights from evolutionary economic geography, the studies comprising this branch of literature reveal valuable insights into the dynamics and drivers of green technology advancement by looking especially at the preparedness of European NUTS2 regions or national economic structures for entering in the green technology race, and by highlighting the potential complementarities between green and non-green knowledge bases and/or productive structures.
All the papers reviewed in this section rely on climate change mitigation and adaptation patent data as a proxy for low-carbon innovation, using mainly on the Env-Tech classification and Y02/Y04S tagging scheme presented in Section 2.1. This is increasingly becoming the golden standard to measure green innovative activities. While the limits of using patent data for the study of technology development should be acknowledged(see e.g., Arts et al., 2013;Griliches, 1998;Lanjouw et al., 1998), information on patents is widely available, and it can provide an array of quantitative information on the nature of the invention and its applicant or inventor, including their geographical location, allowing to easily geo-localize patents both at country and local levels (Dechezleprêtre et al., 2011). Moreover, patent data can be disaggregated into increasingly fine-grained technological areas, allowing very specific green technologies to be identified (Haščič and Migotto, 2015). This granularity is particularly helpful in the use of EC techniques to study technological specialization (Boschma et al., 2015;Napolitano et al., 2018;Pugliese et al., 2019a). The transliteration of EC approaches to reveal technological advantages in each technology field rests on the idea that the criteria for assigning patent applications to specific domains are based on the identifying characteristics of the expertise that is necessary to introduce successful inventions. As a matter of fact, complex technologies appear almost exclusively in the portfolios of high-complexity countries and less diversified countries operating in less complex sectors.
As we show below, some of these contributions focus on the interplay between existing green capabilities and the maturity of green technologies in shaping the way in which the technological portfolios of regions or countries grow and evolve over time. Other contributions explore instead the relation between income inequality and innovative capacities (both green and non-green) at the country level, revealing the influence of socioeconomic and policy factors (e.g. smart specialization strategies, political support, digital literacy) on the success in developing green technologies.
Overall, the articles in this field provide a cohesive narrative that emphasizes the important role played by local capabilities and their complex interplay with different socio-economic characteristics in fostering new sustainable technological innovations that are closely related to technological or industrial fields already present in a country or region. The insights gained from these studies offer a cohesive view on the important role played by relatedness and EC in providing valuable guidance for policymakers seeking to promote sustainable models of economic development, as well as researchers looking to advance knowledge on the importance of green technologies. Multi-layer based relatedness between pairs of green and nongreen technologies a là Pugliese et al. (2019a). Relatedness of nongreen technology z in which region i is specialised at time t-5 to green technologies in which i is already specialised at time t

EU NUTS2 regions
Non-green and green innovative capacity heterogeneous across regions and stable over time, with persistent dichotomy between Central and Eastern EU Complementarity between non-green and green technological capabilities but green capabilities depend on the regional patent portfolio composition Investigating the relationship between country characteristics and knowledge structures in the progression of green Env-Tech patenting, and national specialization or diversification patterns from the end of the 1970s, Perruchas et al. (2020) focus on the life cycle of green technologies and propose a "ladder of green technology development". Their evidence emphasizes that not only do countries diversify towards green technologies related to their existing competencies, but also that specialization in green innovation follows a cumulative path towards more mature technologies. On the one hand, technology maturity appears to be more relevant than a country's economic development, while, on the other hand, technology complexity -computed through the ECI algorithm applied to green patent data -does not prevent further specialization.
Furthermore, by focusing on a panel of US states from the early 1980s, the same authors examine the role of related and unrelated variety in green technology development, and their possibly heterogeneous effect over the technology life cycle . Drawing from the call for caution against determinism of Boschma and Frenken (2006) in analysing the role of spatial contingencies for industrial development especially at early stages, the authors observe that unrelated variety plays a positive role in fostering green innovation at early stages of the technology life cycle, while as technology matures, related variety becomes a more important driver. Sbardella et al. (2018a) introduce Green Technological Fitness (GTF), a measure of green innovative potential based on green patenting -here identified within PATSTAT through the Env-Tech catalogue. By taking a geographical approach, the authors identify heterogeneous global patterns in green technological competitiveness, with the United States, France and Germany as stable leaders, Eastern and Southern European countries gradually gaining importance and East Asian countries starting from the periphery and rapidly establishing themselves as key actors. Finally, by analysing the distribution of countries' innovation capacity across areas of specialization, they document that innovation in green technology has become more horizontal, with bigger efforts being observed in cross-domain, or enabling technologies.
Evidence of different dimensions of complementarity between green technologies and non-green pre-existing innovative capacity in European NUTS2 regions is investigated by three works using different approaches to assess relatedness, focusing on the connection between green and future technological advantage in non-green (Bergamini and Zachmann, 2020) or key enabling technologies (Montresor and Quatraro, 2020), or by looking at the potential to bring green new green innovative activities of each non-green technologies analysing the global green-non-green knowledge space and economic fitness Sbardella et al., 2022).
More in detail, Bergamini and Zachmann (2020) use regional patent data for Europe sourced from the REGPAT database (Maraut et al., 2008) to predict the potential of European regions to acquire (or maintain) a competitive advantage in developing low-carbon technologies. To this aim, the authors first leverage a network-based measure of relatedness between non-green technologies to estimate the potential technological advantage (RTA) in green technologies of each region. In a second stage, the authors identify a set of socio-economic variables that hold a statistically significant association with (observed or estimated) RTA and derive possible policy implications. Montresor and Quatraro (2020) explore the relationship between Env-Tech green patenting and smart specialization strategies in European regions, with a special focus on the role of key enabling technologiessix crucial technological domains identified by the European Commission: industrial biotechnology, nanotechnology, micro-and nanoelectronics, photonics, advanced materials, and advanced manufacturing technologies. With the aim of understanding, if the development or acquisition of new sustainable technologies is related to regional technological innovative capacity, the study shows that there is a strong connection between green technologies and both green and non-green pre-existing knowledge bases. Moreover, key enabling technologies support the shift towards green technologies also mitigating the impact of relatedness to pre-existing technologies. Barbieri et al. (2022) and Sbardella et al. (2022) investigate the nexus between non-green (A-H PATSTAT patents) and green (Y02-Y04S) innovative capacity in and the green development potential of European regions, computing measures of exogenous fitness, regional Non-Green Technology Fitness (NGTF) and Green Technology Fitness (GTF), i.e. computing technological complexities at the national levels and then input those into the fitness computation at the regional level, as explained in section 2.2. Secondly, using a statistically validated network approach, they define a green potential metric quantifying the relatedness between nongreen and green regional knowledge bases. They document a heterogeneous but stable distribution of non-green and green innovative competitiveness and observe complementarity between non-green and green knowledge capabilities, with also a positive and significant relationship between green potential and both NGTF and GTF, especially for regions that have not fully developed the entire set of Y02-Y04S technologies.
As we have seen, different contributions in this field are in agreement in suggesting that relatedness is a driving force behind diversification in green technologies. However, this new field of analysis have not yet fully emphasized the role of the socio-economic fabric and institutional set-up in sustaining the sustainable transition. The sustainable transition literature has paid greater attention to the policy and socio-economic dimensions. However, putting forward an important but not systematic collection of case studies, it has often failed to provide generalizable or scalable evidence on the role of environmental policies or local characteristics in shifting towards more sustainable economic activities and technologies (Hansen and Coenen, 2015; Santohala et al). The three following articles, instead, go in this direction and analyse how green technological competitiveness, diversification in green technologies and/or relatedness to non-green knowledge interact with a number of socio-economic characteristics of countries or regions, namely income inequality, the support for policies and politics for sustainability transitions, and digital literacy.
Looking at the regional dimension, and building on the fact that regional green technological development relies on pre-existing knowledge bases, Santoalha and Boschma (2021) examine the relationship between political support for environmental policies and the diversification of green technologies in European regions. While finding evidence of a stronger relationship between related capabilities and green diversification rather than regional policies, and highlighting the central role of regional differences in policy design and implementation, national political support appears to mitigate the importance of capabilities.  operationalize the notion of capabilities by studying whether and to what extent digital literacy -as a proxy for the competencies embedded in ICT infrastructures -foster diversification in green and non-green technologies across European regions. The level of digital skills in the workforce has a positive impact on a region's ability to specialize in new technologies, especially in green domains, with e-skills moderating the effect of relatedness. Shifting to a country-level analysis, Napolitano et al. (2022) investigate whether and to what extent income inequality is a barrier to a country's environmental innovative capacity, proxied by a measure of green technological fitness based on Env-Tech technologies. To this end, differently from Sbardella et al. (2018a), they define a measure of sectoral green technological fitness, to provide a more realistic assessment of the green technology complexity. Firstly, they account for the full technological spectrum in computing the EFC algorithm. Then, the authors select only the complexities of green technologies in computing GTF. A negative and significant relationship between income inequality and GTF is observed; by contrast, no significant association is found when all technologies are considered. However, while for high-income countries inequality does not appear to be a barrier, there is an income threshold below which it is unlikely to develop a sufficient number of complex technologies to obtain high green fitness. Low inequality reduces such thresholds allowing middle-income countries to achieve greater green innovative capacity.
Finally, focusing only on the technological dimension and not looking at country or regional dynamics, de Cunzo et al. (2022) is the first work to explore the connection between green technological innovation capacity and productive capabilities. In order to do so, the authors define a statistically validated network connecting comparative advantages in Y02 green technologies (from REGPAT) to the contemporaneous or subsequent comparative advantages in exported products (from UN-COMTRADE). When looking at same-year co-occurrences of single products at technologies, their findings emphasize a large number of significant links between green technologies and raw materials, especially critical minerals such as cobalt. While selecting timelagged green technology-product links, they suggest that, after a ten-year lag, green technology is better integrated into manufacturing, and that more complex spillover effects emerge with new additional links of products and technologies with higher complexity. The authors argue that their findings may provide support for short-and medium-term industrial policies by allowing to target the green technology domains that are more likely to leave larger footprints in industrial production, or mitigate the impact of polluting industries based on each country's green technological capabilities.

Conclusions
In this article, we have provided an encompassing account of the empirical literature implementing Economic Complexity methods and metrics to understand the sustainable transition. First, the paper has summarized the most used productand patent-level data sources to compute EC metrics, with a particular focus on green economic activities. Secondly, this review has attempted to harmonize the most relevant methods adopted in the literature on Economic Complexity and environmental sustainability, in an attempt to unify such methods into a unique framework. Third, the article reviews the three main blocks of empirical literature linking EC methods and metrics, looking at i) the impact of countries' and regions' Economic Complexity metrics and their environmental degradation, and the role of ii) green products and iii) green technologies in fostering the sustainable transition.
The growing literature on EC and the environment suggests that EC approaches can be a useful lens to better understand how productive structures and technological capabilities can be steered into the sustainable transition. The empirical literature reviewed here offers some evidence that can be summarized as follows. With respect to the first block of literature, which examines how trade-based economic complexity measures link to environmental outcomes such as GHG emissions, ecological footprint and environmental degradation, we conclude that the evidence produced is mixed. On the one hand, high-income countries -which are more specialized in knowledgeintensive production -seem to exhibit a positive association between the complexity of their export basket and their ability to preserve the environment. However, the same relationship cannot be verified in less mature, emerging economies, where increasing complexity is associated to higher emissions. An empirical test of the Environmental Kuznets Curve has shown that countries become able to decouple economic growth from carbon emissions only after passing the industrialization phase. Nonetheless, the extant literature on green economic complexity has some important limitations. First, the literature on country and regional level complexity and sustainability relies largely on aggregate measures of Economic Complexity. This approach can be limiting if confined to examining the linear relationship between complexity and environmental variables without taking advantage of the high level of disaggregation made possible by EC methods, as shown by the literature on green products and technologies. Secondly, as we have argued in Section 3.1, the quest for empirical validation of the Environmental Kuznets Curve hypothesis can be of little normative utility, considering that it is not realistic that all countries will be able to emulate the diversification and specialisation patterns of today's high-income countries, some of which are specialized in knowledge-intensive activities. Moreover, as shown by the IPCC (IPCC, 2022), the current efforts to curb carbon emissions also in high-income countries may be insufficient to prevent a climatic catastrophe.
Looking at the second block of literature, one of the main goals of these studies is to guide production systems toward a green, feasible and just transition. While the nature of the trade-off between these elements is left to more theoretical studies, EC has the potential to identify, characterize and measure potential paths of such transformation. From the first preliminary attempts, it is clear that a greener production of commodities requires higher capabilities. Green products show higher complexity and are intertwined to non-green production. As stressed in the Data section of this paper, the studies relying on green classifications for products are subject to the problem of 'how to define a green product': different classifications based on different definitions may undermine the replicability of the results and measures. A green sub-classification in the Harmonized System would be desirable for the upcoming updates, in order to have a single, broad and exhaustive database for green products. On the methodological front, most contributions in the EC literature studying green products apply the well known product space description to approach the problem of identifying the green potential of productive structures. While the product space is indeed useful to visualize the distances between the current status of production and green products, it has been shown that this approach has low accuracy in quantitative forecasting of future diversification (Tacchella et al., 2021).
The third block of studies on green technological development yields several key conclusions. Firstly, there is evidence of countries engaging in a dual strategy of diversification and specialization, entering green technologies aligned with existing competencies while specializing in mature technologies with accumulated experience. Geographically, there is a mix of stable leaders and emerging players in green technological competitiveness, with Eastern and Southern European countries gaining prominence. Furthermore, economic factors -such as income inequalityaffect a country's environmental innovative capacity. For instance, lower income inequality in middle-income countries seems to lower barriers to the successful development of complex green technologies. Finally, the complementarity between non-green and green innovation capabilities, the influence of regional factors and policies, and the significance of digital literacy and e-skills in promoting green technology adoption are highlighted. These findings provide valuable insights into the complex dynamics of green technological development, emphasizing the interplay between capabilities, regional factors, policies, and socio-economic aspects.
Whilst the studies on technology-based EC can shed light on various aspects of green technological development, it is important to acknowledge their limitations. Firstly, the analyses often rely on patent data as a proxy for technological innovation, which may not capture the full spectrum of green technologies, or account for innovations that are not patented. This could lead to a potential under-representation of certain sectors or technologies. Secondly, the studies primarily focus on regional or national levels of analysis, which may overlook the dynamics at the firm or individual levels. The role of specific organizations or entrepreneurs in driving green innovation is not extensively explored, potentially limiting the understanding of micro-level factors influencing technological development. Lastly, the studies primarily analyze the relationships between different variables and identify correlations rather than establishing causal relationships. While the associations observed are informative, further research is needed to delve deeper into the underlying mechanisms and causality. These limitations highlight the need for further research that encompasses a broader range of data sources, considers multiple levels of analysis, and employs rigorous methodologies to better understand the complexities of green technological development and its implications for environmental sustainability.
The discussion on the empirical literature presented in Section 3 uncovers a number of areas that require further exploration. First, the productive and technological structures of countries are always accounted for separately in relation to their linkage with environmental technologies. However, there have been previous efforts linking the productive, technological and scientific capabilities of countries with each other Pugliese et al. (2019a). Such effort should be extended to the analysis of the sustainability transition, in order to understand which specific capabilities are the most conducive to green specialization across fields.
Second, and related to the previous point, scientific capabilities have not yet been included in the discourse on the sustainable transition from a complexity perspective. Scientific production remains a key dimension of the process of capability accumulation and innovation, and its relationship with the greening of productive structure should be examined in depth.
Third, the sustainable transition is a complex and interrelated phenomenon, deeply intertwined with structural change and consequent reallocation of labor across sectors and occupations. Ensuring a fair, green transition will require to consider all these elements together. Future research will have to assess the job creation/destruction potential of the green transition, and to identify the transversal skills required to ensure a seamless transition of workers across jobs. By the same token, not all territories and geographies are equally well equipped with the necessary knowledge base to diversify away from carbon-intensive production and technologies. It will be paramount to identify the most viable ways for regions and countries to enter green economic activities, bearing in mind the potentially negative (or positive) impact on local labor markets.
Finally, green technologies and products may create further pressure on the environment due to their dependency on natural commodities and minerals (Marín and Goya, 2021), such as rare earth elements, lithium, cobalt, and others. Precisely, the World Bank (Hund et al., 2023) estimates that meeting the 2°C scenario by 2050 for energy storage alone will require a 450% increase in the production of graphite, lithium and cobalt. Therefore, while implementing the green transition may contribute to reducing global dependence on fossil fuels, keeping up with current levels of energy demand will shift the pressure towards the production and trade of raw materials, neither of which is exempt from complications. The case of electric cars is noteworthy: according to the International Energy Agency (International Energy Agency, 2021), a standard electric car needs six times the mineral input of a conventional vehicle and, under the Sustainable Development Goals scenario, demand for lithium, nickel and graphite -all key inputs for electric vehicles -will grow up to almost 30 times relative to 2020 levels. For this reason, future research will need to examine the production processes associated to each green product, the CO2 production incorporated in each value chain, its raw material content, the safety (also concerning toxicity and pollutant exposure) and workplace condition of the labor force employed in its production, and the environmental impact of production and use (e.g. life cycle emissions, energy content & waste management).