A meta-analysis of the relationship between companies’ greenhouse gas emissions and financial performance

We study how the business and economics literature investigates how companies’ greenhouse gas (GHG) emissions relate to their financial performance. To this extent, we undertake a meta-analysis to help us gauge the role of using highly different constructs and measurement techniques employed in this literature. Our study includes 74 effect sizes from 34 studies, covering 107 605 observations for the period 1997–2019. We establish a significant association between corporate GHG emissions and financial performance. It shows that companies with lower emissions have better financial performance. We find that the type of emission or financial performance indicator is not significant. The industry to which the firms in the sample studies belong does seems to matter slightly. We further establish that the relationship between GHG emissions and financial performance is especially pronounced for firms operating in countries with the most stringent carbon policies.


Introduction
It is well established that there is a relationship between firms' environmental performance (hereafter: CEP) and their financial performance (hereafter: CFP). Some studies show that corporate environmental performance (CEP) and corporate financial performance (CFP) are negatively associated (e.g. Hassel et al 2005, Qi et al 2014, Misani and Pogutz 2015, Brouwers et al 2018, whereas others find a positive relationship (e.g. Hart and Ahuja 1996, Russo and Fouts 1997, King and Lenox 2001, Wang et al 2014, Makridou et al 2019. There are also studies which arrive at a neutral effect (e.g. Waddock andGraves 1997, Konar andCohen 2001). The diverging findings seem to result from the wide range of methods employed and the variety in indicators used to measure both CEP and CFP (Guenther et al 2011, Albertini 2013, Dam and Scholtens 2015, as well as from moderating factors like industry and country characteristics (Albertini 2013, Dixon-Fowler et al 2013, Endrikat et al 2014. We concentrate on how firms greenhouse gas (GHG) emissions relate to their financial performance and investigate factors that might influence the GHG-CFP relationship. We focus on GHG emissions as their reduction is crucial to achieve the objectives of the Paris 2015 agreement in relation to mitigating climate change (Fujii et al 2013, Trinks et al 2018. We employ meta-analysis of studies after the relationship between corporate GHG performance and CFP to summarize, evaluate, and analyze empirical findings in this research field (Kirca and Yaprak 2010). Since the majority of studies included in our review do not measure the direction of the causality of the relationship, we cannot relate to this in our study as we are confined to the nature and scope of the studies included (Hunter et al 1982).
In this study, we first investigate the overall relationship between firms' GHG emissions and financial performance. Then, we study whether the type of reporting (voluntary, mandatory) plays a role. Third is that we compare the impact of absolute measures with those of relative ones. Fourth, we compare accounting-based measures of financial performance with financial market based ones. We also investigate whether industry affiliation matters, in particular the generic GHG emission intensity per industry, in relation to the association between firms' GHG and financial performance. Lastly, we study the effect of climate policy stringency.

Background and hypotheses
Numerous studies relate social and environmental performance to financial performance: Friede et al (2015) report there are more than 2000 of such studies. Then, meta-analysis is useful as it provides an integrated perspective of the results from using various data sources, control variables and estimation techniques (see table 1 for an overview).
Meta-analysis by Orlitzky et al (2003) and Alloche and Laroche (2005) documents a significant and positive relationship between companies' social and environmental performance and their financial performance. However, they also observe that the research design employed significantly influences this relationship. Social and environmental performance (i.e. corporate social performance; hereafter CSP) is of a very broad and diffuse nature, making it hard to provide a sound comparison and analysis. Dixon-Fowler et al (2013) try to bring focus and perform a meta-study on the relationship between environmental and financial performance. Here too, it shows there is a significant and positive association. They find that the association is significantly weaker when CEP is measured by emissions compared to other environmental performance measures. They report that contingencies (e.g. differences in firms' size) and methodological issues (e.g. mandatory versus voluntary reporting) moderate the CEP-CFP relationship. Vishwanathan et al (2020) concentrate on the transmission mechanisms between CSP and CFP. They establish that CSP influences financial performance via firm reputation, stakeholder reciprocation, firm risk, and innovation capacity. Albertini (2013) finds that the CEP-CFP relationship is influenced by the constructs used for both environmental and financial performance, regional differences, industry, and the period studied. Endrikat et al (2014) investigate both the direction of the causality and the multidimensionality of constructs. They find a positive relationship between CEP and CFP, and this appears to be partially bidirectional. Lewandowski (2015) and Busch and Lewandowski (2018) relate a firm's total carbon dioxide emissions to its financial performance and arrive at an inverse relationship between the two.
Climate change mitigation and adaptation receives increasing attention as governments, consumers, and financial market participants are increasingly concerned about global warming (Wang et al 2014, Trinks et al 2018. Carbon regulation has emerged in several countries and regions and emissions have become a cost factor for business (Clarkson et al 2015, Trinks et al 2020. Our meta-analysis aims to contribute to this literature from a range of perspectives: It studies the post-Kyoto era, focuses on the corporate level, uses GHG emissions as a CEP measure, relies on a systematic selection and analysis of the sample studies, accounts for industry affiliation, and investigates whether climate policy stringency is a vector. We regard the Kyoto Protocol as a breakpoint in climate policy as it contains the possibility for internationally legally binding emission targets for industrialized countries that trickled down into targets for business (Böhringer 2003). Therefore, we concentrate on studies using sample periods from 1997 onwards. Next, we take the corporate perspective, as it is primarily businesses who emit GHGs in the production and distribution process (see World Bank 2019). Of the existing meta-studies, only Busch and Lewandowski (2018) explicitly focus on the relationship between firms' GHG emissions and their financial performance. However, they do not seem to use a particular algorithm to select studies and their sample cannot be replicated. We deem this highly important and will select studies based on clear and transparent selection criteria. GHG emissions refer to the amount of carbon dioxide, methane, nitrous oxide, hydrofluorocarbons, perfluorocarbons, sulphur hexafluoride, and nitrogen trifluoride emissions (IPCC 2014). We define corporate GHG performance as the inverse of the GHG emissions of firms. As such, low (high) amounts of GHG emissions refer to high (low) GHG performance (see Misani and Pogutz 2015). Studies using GHG emissions as a proxy for CEP collect their data either from mandatory or voluntary reporting schemes. We investigate whether this influences the results. Further, GHG emissions can be in absolute or relative terms. From an economic perspective, it is relative terms that matter. However, from an environmental point of view, it is the absolute amount of emissions that goes into the atmosphere that is relevant. Therefore, we examine if and how scaling influences the results. Further, CFP can be measured using accounting-based indicators or market-based indicators. Some studies argue CEP is more strongly related to (contemporaneous and forward-looking) market-based measures of CFP than to (backward looking) accounting-based indicators of CFP (Dixon-Fowler et al 2013). Previous studies have investigated both measures and provide conflicting results. Consequently, we investigate how measuring financial performance relates to GHG performance. Next, we study if industryspecifics play a role. Delmas et al (2011) argue that environmental regulations for the most polluting industries are stricter and that polluting firms employ different strategies for reducing their emissions. Our study explicitly accounts for industry affiliation. We investigate whether the relationship between corporate GHG performance and CFP is different for studies after firms in polluting-intense industries compared to studies that do not differentiate in this regard. Lastly, Endrikat et al (2014) suggest that including country-specific factors, such as differences in regulatory environmental systems, might play a role. In  Clarkson et al (2015) posit that carbon emissions affect firm valuation only to the extent that a firm's emissions exceed its carbon allowances under a cap and trade system and the extent of its inability to pass on carbon related compliance costs to consumers and end users. Czerny and Letmathe (2017) find that GHG emissions were not reduced cost-effectively. They argue that companies' intrinsic values prevail over economic incentives from the ETS regarding carbon reduction. Both Clarkson et al (2015) and Czerny and Letmathe (2017) relate to the European Union's ETS only. To investigate the role of climate policy, we will investigate the impact of the policy stringency of the ETS. Kuo et al (2010) find a positive relationship between GHG and financial performance and attribute this to eco-efficiency. Eco-efficiency implies that productivity gains through reduction of materials use, improvements in the manufacturing processes, and utilization of waste can improve the operational efficiency of firms (Kuo et al 2010). Improved efficiency via emission reduction and the utilization of by-products and waste can lead to both lower costs and more innovation, improving firms' comparative advantage (Orsato 2006, Kuo et al 2010. Institutional investors may require companies to take their responsibility and become more eco-efficient too (Trinks et al 2018(Trinks et al , 2020. Consumers may avoid buying products from companies that have poor GHG performance. Then, firms can improve their financial performance by reaping the reputational benefits associated with cleaner production (Hart and Ahuja 1996). When investments in GHG emission reduction require significant up-front investments, costs may outweigh the benefits of the investment and therefore weaken firms' financial performance (Brouwers et al 2018). Fujii et al (2013) argue that emission reduction may negatively affect a company's competitive position as resources are allocated to non-core business operations. Enkvist et al (2007) indicate that the costs of emission reduction can differ widely between specific types of technology and over time. Therefore, we first investigate how GHG performance associates with CFP. In this regard, the following two competing hypotheses are tested:

H1B:
The association between GHG and financial performance is negative GHG emissions are administered via voluntary or mandatory reporting. Voluntary reporting schemes collect their data mostly by questionnaires and surveys, like the Carbon Disclosure Project. Voluntary reporting might result in a self-selection bias, allows for different methodologies, and usually lacks external verification (Perrault andClark 2010, Chen andGao 2012). In contrast, data collected via mandatory reporting is based on formal rules, which allows for comparison between industries and countries and over time (Perrault and Clark 2010). However, even data from mandatory reporting schemes can be biased, for example when firms can select the plants eligible for reporting, emission factors, and the specific way to measure emissions (Sullivan and Gouldson 2012). Several studies find that greater consideration for the impact of corporate activities on the environment and control of GHG emission may help reduce costs (such as waste management, energy and water consumption) and achieve benefits (improve reputation, increase revenues, improve competitiveness) (see Jiang and Bansal 2003). This may encourage firms to voluntarily disclose and reduce their GHG emissions (see Arimura et al 2008). However, Bansal and Roth (2000) and Lyon and Maxwell (2011) point out that there might also be greenwashing going on in this regard. Therefore, several jurisdictions opt for mandatory disclosure (such as the Norway, Singapore, UK) and hope that such disclosure will incentivize innovation and environmental performance (see Tang and Demeritt 2018). Of course, raising awareness in this way too may impact the corporate activities and environmental performance, but there is less scope for greenwashing. Notwithstanding, especially companies in energy intense industries will already have had emissions on their radar, but this might not have been the case elsewhere. Then, mandatory reporting may have resulted in the realization of new areas to manage costs and benefits in the latter industries. However, as the role of energy will have been only minor in the industries that were not already focused on emissions, one may not expect a substantial impact on the relationship between emissions and financial performance. Therefore, it is not likely this relationship will be stronger in the case of mandatory than with voluntary reporting (Tang and Demeritt 2018). In all, we think it is not possible to postulate whether the relationship between GHG emissions and financial performance is stronger in either of the two regimes.
Thus, we assume it is not evident which type of reporting more closely relates to CFP. Hence, we test the following hypothesis:

H2: The type of reporting scheme influences the results in GHG and financial performance studies
Studies measure GHG emissions with either absolute or relative indicators (Slawinski et al 2017). Absolute emissions reflect the physical emissions of a firm in a given period of time. Relative emissions relate these emissions to firms' key characteristics (e.g. number of employees, sales, revenues, costs), commonly labelled as carbon intensity or efficiency (Kuik and Mulder 2004, see Trinks et al 2020, for a critical reflection). We want to stress that in this regard the sample studies are not always clear what exactly is being used as the denominator in relation to the emissions, implying that the literature is subject to the homogeneity problem. Clarkson et al (2015) argue that absolute emissions have to be used to determine the costs of businesses as the acquisition of emission rights is based on the firms' overall emissions. Absolute emissions of businesses directly inform about their contribution to climate change (Ekwurzel et al 2017). GHG performance measured by absolute indicators should therefore be more strongly related to CFP. In contrast, Olsthoorn et al (2001) argue that emissions of firms have to be judged relative to their peers to allow for comparison. This is because financial market participants incorporate the extent to which the business model relates to GHG emissions and they compare different prospects (Trinks et al 2018). As such, relative GHG performance would be more strongly related to CFP than absolute GHG. Therefore, we study whether the nature of the measure for GHG performance influences the relationship with CFP and test the following two competing hypotheses: H3A: GHG performance influences financial performance more strongly when it is measured by relative than by absolute emissions H3B: GHG performance influences financial performance more strongly when it is measured by absolute than by relative emissions Further, several measures are used to proxy for CFP. Most studies use either accounting-based or market-based measures (Albertini 2013), but sometimes reputation, stakeholder reciprocation, firm risk, and innovation capacity is used too (Vishwanathan et al 2020). Accounting-based measures usually encompass indicators like return on assets (ROAs), return on equity (ROE), or return on sales (ROS) (Danso et al 2019). These indicators reflect the internal capabilities of the firm to generate value, rather than external perceptions of performance (Orlitzky et al 2003). They are of a backward-looking nature as the information about the constituting elements is available with some time lag. In contrast, market-based measures are of a more contemporaneous nature and also include market expectations about future conduct and performance (Dam and Scholtens 2015). Examples are (excess) stock market returns, stock return volatility, priceearnings ratio, price per share, and earnings per share (Dowell et al 2000, Orlitzky et al 2003. Albertini (2013) and Orlitzky et al (2003) find that accountingbased indicators are more closely related to CEP than market-based ones. Ambec and Lanoie (2008) reason that investments in GHG performance will be converted into better future accounting-based performance (Ambec and Lanoie 2008). In contrast, Dixon-Fowler et al (2013) find that CEP more closely relates to market-based performance. This would suggest that investors value carbon emissions and use off-balance sheet valuation discounts for GHG emission (Griffin et al 2017). This might be the case if outstanding GHG performance reduces regulatory risk and can become of increasing value in the case of future changes in carbon regulation (Albertini 2013). Therefore, we test:

H4A:
Corporate environmental performance is more strongly related to prior market-based than to prior accounting-based financial performance H4B: Corporate environmental performance is more strongly related to prior accounting-based than to prior market-based financial performance The relationship between CEP and CFP can differ due to different combinations of production factor inputs and technology usage (Konar and Cohen 2001). Such combinations vary between firms and per industry. Hart and Ahuja (1996) find that the largest impacts on CFP accrue to 'high polluters' since they can make low-cost improvements; in lesspolluting industries, investments in CEP tend to become increasingly expensive. Delmas et al (2011) find that this changes over time as additional emission reduction becomes increasingly more costly. So far, the focus in CEP-CFP studies has primarily been on industrial companies, as these are the ones concerned most with toxic and hazardous emissions (King and Lenox 2001). Some studies concentrate on particular subsectors (Van der Goot and Scholtens 2015) and find clear differences between these. Others rely on industry-wide data to arrive at generalizable results (Albertini 2013). Most of these studies suggest that the GHG intensity of the industry in which a company operates affects the results. Therefore, we test the following hypothesis:

H5: The relationship between GHG performance and CFP is strongest in the most polluting sectors.
An ETS puts a price on GHG emissions. In general, these systems consist of tradable emission permits and an overall cap on emission that decreases over time (Alkhurst et al 2003, Van der Goot andScholtens 2015). An ETS leaves companies with three alternative strategies: reducing GHG emissions to meet the requirements, buy emission rights, or reduce emissions to a level below the legal requirements and sell the excess emission rights (Sandoff and Schaad 2009). Since all strategies affect the costs of emissions, Policy stringency will influence the relationship between GHG performance and CFP (Czerny and Letmathe 2017). Stringency particularly relates to the proportion of GHG in the jurisdiction covered, the number of industries participating, the price of emission rights, and the amount of emission allowances distributed under free allocation or auctioning (World Bank 2019). Firms participating in ETSs that are more stringent face more carbon constraints (Joltreau and Sommerfeld 2018). A relative stringent policy imposes more costs on firms, as they have to invest more than firms under less stringent ones. A stringent policy also increases the monitoring and reporting costs of firms. Deschenes (2018) argues that a more stringent policy leads to worse financial performance and competitiveness compared to firms operating under less stringent regimes. Next to the impact on the firm, it is important to realize that ETSs allocate the costs of externalities that are otherwise fully borne by society. We hypothesize that the relationship between GHG and CFP may be more stronger (more positive) in jurisdictions with more stringent climate policy regimes.

H6:
The relationship between GHG performance and CFP is stronger for firms operating in countries with more stringent climate policy than for firms in countries with weak policy stringency.

Methodology
To test our hypotheses, we use meta-analysis to investigate the empirical findings regarding the relationship between GHG and CFP. Results from a meta-analysis may include a more precise estimate of the effect of a construct than any individual study contributing to the pooled analysis (Tavakol 2018). First, we present the way in which we sample studies. Then, we describe the effect sizes and coding procedures. Thirdly, we reflect on the meta-analytical procedure.

Sampling
In a meta-analysis, the literature included has to be systematically selected (Stanley and Doucouliagos 2012). In this regard, we rely on the preferred reporting items for systematic reviews and meta-analyses (PRISMA) method, which consists of four stages in data collection: identification, screening, eligibility, and inclusion (Moher et al 2010). To incorporate all relevant studies, an extensive search with a broad set of keywords was conducted. We used the following search (combinations): corporate environmental performance, CEP, environmental performance, corporate financial performance, financial performance, CFP, does it pay to be green, when does it pay to be green, carbon performance, GHG performance, climate change, GHG emissions, CO2 emissions, environmental management, environmental regulation, and carbonpricing. The (electronic) search was conducted using EBSCO, ScienceDirect, JSTOR, Emerald, and Google Scholar, and we selected peer-reviewed studies. In contrast, other meta-analyses (e.g. Dixon-Fowler et al 2013, Endrikat et al 2014, Busch and Lewandoski, 2018) also include papers based on a search in references of non-academic papers (i.e. not being peerreviewed) and as well as conference presentations. As this might lead to systematic bias (Hunter et al 1982), we do not employ these. We limit the study to peerreviewed academic work; we also refrain from including our own studies in the meta-analysis.
Our search based on keywords yielded an initial sample of 73 studies. Next, we implemented four inclusion criteria. First, we include only studies on GHG-CFP that rely on data from 1997 onwards. This is because of the Kyoto Protocol which marks the start of a new era of climate policy (Böhringer 2003). Because of the resulting shift in perception of the stakeholders towards impact of climate change policy, papers including data from before 1997 might yield different results compared to more recent studies (see also Endrikat et al 2014). Second, since we are interested in the effect sizes regarding GHG emission, we only include studies that measure the relationship between GHG emissions and CFP. We point out that the sample studies may use different measures in this regard. Most GHG emissions are measured by CO 2 e scope two emissions but more than half of the studies does not disclose in a transparent manner. This is a problem in most of the literature, where business and economics scholars use metrics they are not very familiar with. However, the same lack of transparency occurs with financial performance, especially accounting performance. Financial performance is measured via accounting and market data and we investigate whether the findings differ in case either of the two is used. We also point out that the potential of the multiplicity of data in the sample studies may lead to variability in the results of the meta-analysis (Tendal et al 2011). In fact, most studies do not include a detailed account of the sampling procedure regarding the selection of countries, industries, and firms or the period studied. This is problematic and requires disciplining in this regards within the field of business and economics as it does not allow for full replication of the results. Third, to allow for comparison, the studies have to report sample sizes and correlation coefficients or statistics that can be converted into these. Finally, we only include results from continuous variable studies as it is in general not possible to compare results from binary regressions (e.g. probit and logit studies) (see Hunter et al 1982). Likewise, we exclude event studies as their methodology is highly different from that of other estimates (Stanley and Doucouliagos 2012). As a result, and reported in table 2, our final sample consists of 34 articles.

Coding
The effect sizes of the individual studies are the main unit of our analysis. Effect sizes are gathered from two types of statistics: Pearson product-moment correlations and partial-correlations. Pearson productmovement are derived from the correlation table in the empirical studies. For studies that did not report correlation tables, the effect sizes (r) are calculated from the reported t-statistics and the degrees of freedom; for studies that do not report the tstatistic, it is calculated backward from the standard errors, significance level, or probability values. Studies often report more than one relationship because they use multiple constructs (Albertini 2013). Then, two approaches can be used to deal with multiple measures from independent studies, namely treating them as independent effect sizes or representing each study by a single effect size. Using a single observation for each primary study leads to loss of information, as averaging has to take place. Therefore, we include all observations from reported CFP constructs (e.g. Tobin's Q, ROA, ROE) and from GHG performance constructs (e.g. absolute, relative). In line with Stanley and Doucouliagos (2012), the result from the model with the highest adjusted Rsquared is included. Accordingly, from our 34 studies a total of 74 effect sizes are extracted (k = 74), with 107 605 observations (n = 107 605). Appendix    Table D1 in the Appendix.
The CCPI tracks efforts of countries to address climate change. It covers 58 countries between 2005 and 2019. C3-I offers a dataset including 172 countries for the period 1996-2008. Both indices capture overall performance scores as well as performance in terms of political behavior and emissions. The methodologies are closely related; they evaluate the emission component based on trends and emission levels. The policy component is assessed by expert assessment in CCPI but based on observed behavior in C3-I. Both measure historical output and emission trends in a wider range of environmental policies and do not measure the future carbon constraints faced by companies (Bernauer and Böhmelt 2013). As their methodologies are slightly different and the indices do not fully cover the whole period of our study, we proceed as follows: The CCPI was extracted from the website accompanying Burck et al (2016) -this data is available from 2005 onwards and we used the 2016 data; the codebook and data for C3-I were provided by Böhmelt (2013). Reassuringly though, for overlapping years, it shows that both indices yield identical country ranking. Therefore, we use CCPI as our basis for ranking countries for the periods 1997-2008 and C3-I for 2009-2019. To separate studies based on ETS stringency in the range of countries included, we construct 'study ranks' with the help of the country ranks. For studies conducted in a particular year in a specific country, this rank relates to the median rank of the country ranks of the year before the study, the year after the study, and the study year. By averaging over a three-year period, we reduce the effect of one-off events, like novel policy intentions of governments. Such events may initially improve the country score, but may not always persist (see Burck et al 2016).
For studies that collect their data in a single country over multiple years, we use the average median rank of the country over this period. For studies with multiple countries over multiple years, the average median rank of the countries is weighted by the number of observations per country. The use of study ranks allows us to assess climate policy stringency of the sample countries in each study, and compare with other studies (Botta and Kozluk 2014). To this extent, we differentiate along four groups of studies according to the climate policy stringency of their sample. When studies do not provide information about the number of observations from individual countries, they are excluded from the ranking. This approach allows dividing studies into four groups with the use of the two indexes, even though the scales and methodologies of both indices are not exactly the same.
In contrast to these two indices, Climate Action Tracker (CAT) assesses and ranks the intentions and progress of governments towards reaching the globally agreed aim of holding global warming below 2 • C. Hence, this is a more contemporaneous and forward looking assessment of stringency. CAT scores are based on the effect of current policies on emissions, the impact of pledges and targets, and fair share and comparability of effort. CAT ranks countries on a scale from critically insufficient to role models (New Climate; Climate Analytics 2011). Further, it accounts for regional effects, assuming that ETS stringency in a particular region will be higher when both individual reduction targets and actions of countries related to achieve the Paris Agreements are more ambitious. Hence, CAT provides a more contemporaneous and forward looking perspective. Studies are grouped based on CAT evaluation of the region in which they are performed: sufficient, medium, moderate, and insufficient (due to small subsamples, we combine medium and moderate). Appendix D1 highlights the key features of the three stringency indices used in this study. Appendix D2 relates the studies to the climate policy stringency groups.

Meta-analytical procedures
Previous meta-analytical reviews on the CEP-CFP relationship were based on two different approaches, namely the aggregation technique of Hunter et al (1982) (hereafter: HS) (e.g. Orlitzky et al 2003, Albertini 2013) and the Hedges-Olkin-type metaanalysis (hereafter: HOMA) (e.g. Endrikat et al 2014, Busch and Lewandoski., 2018). Johnson et al (1995) compare meta-analytical techniques and observe HS does not very effectively correct biases in the effect sizes before deriving mean effect sizes. As we deem this of great importance for accuracy, we use HOMA and correct for individual study artefacts (e.g. overestimation of the population effect size in small sample studies). As a robustness check, we also employ the HS method.
To test the effect size distribution on homogeneity, we calculate the Q-statistic. This is a nonparametric test to assess the significance of the differences of two matched samples. Parametric tests are only reliable when the sample follows a normal distribution (Hunter et al 1982). A parametric test may yield significant results for the differences between the constructed subgroups. However, since the effect sizes in a small sample usually are not normally distributed, a non-parametric test is more informative. In this regard, a significant Q indicates a heterogeneous distribution and suggests the presence of moderating variables (Tavakol 2018). In line with Hedges and Figure 1. Key characteristics of the study sample. Note: Figure 1 shows the key characteristics of our sample. Panel A gives information on the included effect sizes in this study. A total of 52 show positive effect. There are 21 negative effect sizes, and in one case no effect relationship was observed. A total of 54 effect sizes are gathered using Pearson product-correlations, and 20 are based on partial correlation coefficients. Panel B details the two CFP measures in the 34 sample studies. A total of 47 effect sizes are measured using accounting-based indicators for CFP. From which 32 indicate a positive, 14 a negative, and one measures no relationship. A total of 27 observations are measured using market-based indicators for CFP from which 20 measure a positive and 7 a negative relationship. Panel C shows how GHG performance is measured in the sample studies. A total of 47 effect sizes are measured using accounting-based indicators for CFP. Olkin (1985), we perform the Chi-square goodness of fit test with an alpha of 5% to test for the homogeneity of the distribution of the 74 effect sizes from the studies in table 2. The highly significant p-value (pQ = 0.000; see first line in table 3) indicates that the subgroups have different distributions and, therefore, there are likely to be moderating effects (Çogaltay and Karadag 2015). We are careful with interpreting the findings from subgroup analyses by using interaction tests as the analyses are not based on randomized groups of firms and therefore prone to confounding (Sedgwick 2015).
In addition, we want to test for the publication bias as studies with significant results have a higher probability of being published than studies with insignificant results. Here, we rely on the failsafe-N test of Rosenthal (1979). This test calculates the number of insignificant studies that should have to be included in the sample in order to arrive at an insignificant aggregated effect size (see Stanley and Doucouliagos 2012).
In order to test our hypotheses, several subgroups are constructed. We compare the subgroups to study whether the defining issue for classification indeed is relevant in relation to heterogeneity in our sample (see also Hedges and Olkin 1985). To determine whether the heterogeneity between subgroups is statistically significant, we also calculate Cochran's Q score and corresponding p-value using the Chi-square goodness of fit test. Because the sample size of the study is relatively small, it is important to realize that the Q statistic may provide  Table 3 summarizes the results of the meta-analysis based on the Hedges and Olkin (1985) method. It first gives the overall aggregated relationship between corporate GHG performance and CFP. Next, it shows the results of the different subgroup analyses. It gives the aggregated effect sizes for the subgroups for different reporting types and the indicator specification of the corporate GHG performance construct. Further, it reports the effect sizes for the market and accounting based CFP indicator specification and the industry carbon intensity. It also reports the ETS stringency hypothesis using two different methods. For ETS stringency based on the C3-I and CCPI, the 'high' group consists seven studies conducted in the most stringent environments, the following seven studies from subsequently lower ETS stringency environments form the group 'medium,' the seven following studies form the group 'medium-low and the studies conducted in the lowest ETS stringent regions studies form the group low. The CAT ETS stringency measure has resulted in three groups, studies which for the group 'sufficient' are from regions with sufficient policies for reaching the UN climate goals. Moderate forms the group of studies which are performed in countries with moderate policies, and inadequate forms the groups of studies which are performed in inadequate performing countries. For the group 'global/no data available,' the study was conducted globally, or no information about the studied country was. a misleading measure of heterogeneity and should be interpreted with care (Sedgwick 2015, Tavakol 2018. To address this issue and to test whether subgroups differ significantly from one another since the effectsizes of subgroups are unpaired, we also perform the non-parametric Mann-Whitney-Wilcoxon test. This test does not assume normally distributed or paired data (Fay and Proschan 2010). Here, the effect-sizes in the subgroups are not weighted, as differences in sample size would make the differences significant by definition. Table 3 presents the results from the meta-analysis for the relationship between corporate GHG performance and CFP. Regarding the overall effect, the aggregation of the effect sizes indicates a statistically significant positive relationship between GHG performance and CFP (r = 0.05, Z = 3.47, p = 0.001), based on a total of 74 effect sizes and 107 605 observations. This suggests that GHG performance is positively related to CFP. Therefore, we accept hypothesis 1A ('The overall relationship between corporate GHG performance and corporate financial performance is positive'). The significant positive association supports the eco-efficiency and stakeholder perspective and rejects the view of a trade-off between both constructs. It seems companies can improve their financial performance via the efficiency benefits of reducing their GHG emissions, which apparently satisfies the needs of their stakeholders (Hatakeda et al 2012, Trinks et al 2020. The Q score is highly significant and confirms the heterogeneity of the sample. Table 3 also reports the results for the analysis of the various subgroups. It shows that when emissions are measured by voluntary reporting types, it is positively and significantly related to CFP (r = 0.07, p = 0.01); the same as when using mandatory reporting types (r = 0.04, p = 0.01) (pQb = 0.498). The Mann-Whitney-Wilcoxon test also indicates that the subgroups do not differ significantly from each other (p = 0.270). Therefore, we reject hypothesis 2 ('the type of reporting scheme used influences the results in the GHG and CFP literature').

Results
Further, table 3 shows that GHG performance is significantly positive related to CFP (r = 0.09, p = 0.01) when absolute GHG emissions are used. At the same time, it shows that relative GHG indicators are significant too (r = 0.04, 0.01). Here, pQb = 0.207, and the Mann-Whitney-Wilcoxon analysis also shows that the differences between these two subgroups are not statistically significant (p = 0.550). As such, hypothesis 3B ('GHG performance affects CFP more when it is measured using absolute emissions compared to relative ones') is rejected.
Although the relationship between CEP and CFP is positive for both accounting-and market-based indicators, it appears to be somewhat stronger when market-based measures are used (r = 0.07, p = 0.015), than with accounting measures (r = 0.04, p = 0.012). However, we find an insignificant difference between these two groups (pQb = 0.458). In addition, the Mann-Whitney-Wilcoxon test results also suggest the difference is not statistically significant (p = 0.755). Hence, hypothesis 4A (GHG performance is more positively related to prior market-based than to prior accounting-based CFP) is rejected, as is its counterpart (4B).
Taking the industry perspective, table 3 show that studies that only included pollution-intense industries report lower effect sizes (r = 0.04, p = 0.119) than those with multiple industries (r = 0.08, p = 0.00). But the former is not significant and, hence, only in the mixed industry, GHG performance is significantly related to CFP. Based on the pQbet of 0.138 the two do not seem to differ in a statistically significant way. But the Mann-Whitney-Wilcoxon test results indicate that the differences between the subgroups are significant (p = 0.014). Based on the first test, we reject H5 (industry carbon intensity moderates the relationship between GHG and CFP; the GHG-CFP relationship is stronger in more polluting industries). However, on the basis of the Mann-Whitney-Wilcoxon test it appears that the GHG-CFP relationship seems to be significantly weaker for studies conducted in pollution-intense industries than for studies conducted in multiple industries. An explanation could be that over the years, forced by gradually tighter regulation, pollution-intense industries have already picked the 'low hanging fruits' (see also Delmas et al 2015).
For climate policy stringency, we first look into the way this is measured with the help of the CCPI and C3-I indices. In this regard, the relationship between GHG performance and CFP appears strongest for studies performed in countries with the most stringent policy regime (r = 0.09, p = 0.00). The CEP-CFP relationship for countries with medium-high stringency is insignificant (r = 0.05, p = 0.18), as is the case for sample countries in the mediumlow cohort (r = 0.02, p = 0.68). For studies about countries that score lowest on policy stringency, the relationship also is insignificant (r = 0.05, p = 0.23). The Mann-Whitney-Wilcoxon test (reported in appendix B) demonstrates marginally significant differences between subgroups high and mediumhigh (p = 0.089), high and medium-low (p = 0.095), and significant differences between high and low (p = 0.048). This suggests that the GHG-CFP relationship is stronger in the most stringent climate policy regions. Next, we discuss the results based on CAT information. Here it shows that studies conducted in countries with policies qualified as sufficient show a clear positive and significant relationship between GHG and CFP (r = 0.09 p = 0.005). For the other subgroups, it is not significant. The results from the Mann-Whitney-Wilcoxon tests (see appendix B) reveal that most subgroups are not significantly different from one each other, with the exception of the group sufficient versus medium and insufficient combined. Therefore, hypothesis 6 ('the relationship between GHG performance and CFP is stronger for firms operating in countries with more stringent climate policy than for firms in countries with weak policy stringency') cannot be accepted on the basis of CAT information. The results suggest that the relationship between GHG and CFP is significant and positive for all subgroups, but is only significantly more so for the most climate policy stringent environments. This might be the case because initial phases of ETSs are characterized by low stringency, high bureaucracy, and little influence on innovation (Czerny and Letmathe 2017). These early phases are known for the free allocation of emission rights, low emission prices, and many industries being excluded (Abrell et al 2011).
In order to assess the reliability of the results of the meta-analysis, two robustness tests are performed: we use a different methodology and we rely on an alternative calculation of effect sizes. In addition, we account for the publication bias. First, we use the Hunter et al (1982) method to test the robustness of the HOMA analysis. This procedure is briefly explained in appendix C, and the results are in table C1 therein. It shows that the Hunter et al (1982) method yields qualitatively highly similar results to the HOMA method. The main difference is that it suggests there is a marginal significant difference between highly polluting industries and multiple industries, and between the different correlation coefficients. Second, in line with Hunter et al (1982), effect sizes were calculated for both correlations and estimated partial correlations (last row in table 3). The effect sizes measured based on correlation coefficients tend to be slightly higher (r = 0.08, p = 0.00) than effect sizes which were estimated based on partialcorrelations (r = 0.06, p = 0.004). According to the Q-statistic, these correlations are not significantly different from each other (pQbet = 0.156). However, the results from the MWW-test hint at marginally significant differences (p = 0.068). We also account for the presence of a publication bias (Rosenthal 1979). Here, the failsafe-N is calculated, which points at just very moderate existence of the publication bias. In particular, we find that 5576 (Z-score of 14.37) additional null-effect studies are required to make the summary effect size insignificant. This result can be explained by the fact that this study only includes studies that investigate the relationship between GHG emissions and CFP, and the number of studies on the topic is growing but still limited (Chapple et al 2011).

Conclusion
We conduct a review of the nascent literature after the relationship between companies' GHG emissions and financial performance. We employ a meta-analysis to examine whether there is a relationship between firms' GHG emissions and financial performance, what it looks like, and how sensitive the relationship is for research design and measurement. We investigate the results of studies undertaken after the signing of the Kyoto Protocol, as we regard this as a breakpoint in international climate policy. Hence, we focus on international studies for the period 1997-2019. We select peer-reviewed published academic studies using PRISMA sampling and end up with 34 relevant studies, including 74 effect sizes covering 107 605 observations. We observe that there are several drawbacks in the studies that relate physical and economic performance. In particular, it shows that the interaction mechanisms are not always described and motivated in a clear and coherent manner. Further, the measurement of both GHG emissions and financial performance in many cases is not transparent. In particular, it appears that not all studies clearly report how these emissions are being calculated and whether scope 1, scope 2, or scope 3 emissions are used. The required homogeneity of samples does not seem to be fully satisfied and there appears to be multiplicity. This potential of the multiplicity of data in the sample studies may lead to variability in the results. We observe that in many cases the sample studies do not clearly detail their procedure regarding the selection of countries, industries, and firms or the period studied. This is problematic and requires disciplining in this regards within the field of business and economics as it does not allow for full replication of the results.
Given these reflections and data limitations, the main finding of our study is that there is a significant positive relationship between companies' GHG performance and their financial performance, suggesting that companies with less GHG emissions show superior financial performance. Although the type of pollution is very different from other pollutants, this finding is in line with studies on the generic corporate environmental-financial performance relationship (e.g. Albertini 2013, Dixon-Fowler et al 2013, Endrikat et al 2014, as well as with a related study after the association between firms' carbon emissions and their financial performance (Busch and Lewandowski 2018). There are several ways to come to grips with both financial performance and GHG emissions. However, the choice of proxies for both does hardly appear to influence the results. For example, we establish that there is no significant difference when voluntary or mandatory GHG reporting information is used, when absolute or relative GHG emission measures are used, or when market or accounting based financial indicators are employed. However, this conclusion is based on a sample of studies that are hampered by problematic homogeneity and multiplicity. Therefore, we need to await further research to check for its reliability. Further, although there is some evidence that firms in less polluting industries outperform, we do not find substantial evidence that industry affiliation per se is a defining vector in the relationship between GHG emissions and financial performance. Looking into climate policy stringency, it appears that only in countries with the most stringent ETS regime, the relationship between emissions performance and financial performance is significantly more positive than elsewhere. We want to point out though that most sample studies focus on industrialized countries and suggest to study emerging markets and low income countries too. Our findings appear to be quite robust. This also is established by using an alternative meta-analytical procedure. Furthermore, we find there is no substantial publication bias. Therefore, on the basis of this review, we conclude there is a positive association between companies' GHG emission performance and their financial performance. In particular, companies with relatively low GHG emissions have relatively high financial performance.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).

Appendix C. Hunter et al meta-analytical method
In contrast to the method of HOMA method, the Hunter et al method does not put emphasis on isolating and correcting sources of error and bias (Stanley and Doucouliagos 2012). The method uses the untransformed effect-sizes estimates, and weights are based only on the sample size (Field 2003). Mean effect sizes are calculated as follows:r The variance across sample effect sizes consists of the variance of the effect sizes of the population and the sampling error. As such, the variance in population effect sizes is calculated using the sampling error. The following equation is used to calculate the variance of the sample effect sizes The error variance of the sample is calculated as: The variance in population effect size is estimated by subtracting the sampling error variance from the effect size sample variance. It is calculated with the following equation:   et al (1982) method to test the robustness of the Hedges and Olkin (1985) results. It first describes the overall aggregated relationship between corporate GHG performance and CFP. Next, the results of the different subgroups analyses are presented: The reporting type and the indicator specification of the corporate GHG performance construct are given, the CFP indicator specification and the industry carbon intensity are reported, the ETS stringency hypothesis using two different methods is reported. For ETS stringency based on the C3-I and CCPI, the seven most stringent studies form the 'high' group, followed by the next seven studies which are the group 'medium' , the seven following studies form the group 'medium-low and the studies conducted in the lowest ETS stringent regions studies form the group low. The CAT ETS stringency measure has resulted in 3 groups, studies which for the group 'sufficient' are performed in a country with sufficient policies for reaching the UN climate goals. Moderate forms the group of studies which are performed in countries with moderate policies for reaching the UN climate goals and inadequate forms the groups of studies which are performed in poor performing countries. For the group 'global/no data available,' the study was conducted globally or no information about the included countries was available. Differences based on the use correlation coefficient or partial correlation are described to test the robustness of the results.  Before calculating the summary effects, the effect sizes are transformed to a standard normal metric by Fisher's z transformation to address skewness (see Hedges and Olkin 1985) using the following formula: where Z is the transformed partial correlation and r is the correlation coefficient. In line with Hedges and Olkin (1985), the weight assigned to the individual effect sizes is a variance component that consists of both the between-study and the within-study variance. The within-study variance V within is: The between-study variance V between is: The random-effect aggregated effect size is calculated by using the sum of the between-study and the within-study variance, V (Hedges and Olkin 1985): In line with Hedges and Olkin (1985), we assign weights to each effect size based on the inverse value of the sum of the between and within-study variance by the following equation: The mean effect size and the standard error of the mean effect size are calculated in line with Hedges and Olkin (1985) using the following equations The confidence interval for the aggregated effect size is calculated by CI Upper =z + 1.96 × SE (z r ) (E11) CI Lower =z − 1.96 × SE (z r ) .
Further, all values are transformed back to correlation units using

Publication bias
To test for the publication bias, we calculate the failsafe-N (see Rosenthal 1979). The failsafe-N test calculates the number of insignificant studies that have to be included in the sample to make the aggregated effect size statistically insignificant (see Stanley and Doucouliagos 2012). The number of additional scores that have to be included to make the aggregated effect size insignificant at the 5% level is calculated as follows: where Z a is the critical upper-tail value of the normal distribution, and Z s is calculated as follows: