Rethinking climate agreements: self-enforcing strategies for emission reduction through sanctions

Although climate change is expected to cause significant negative impacts, climate treaties give hope for reducing greenhouse gas (GHG) emissions, potentially leading to less severe climate change. However, climate change is a public bad: If each country aims to maximize its own benefit, some countries might free ride and continue to emit GHGs. Consequently, climate treaties have seen limited success, and a central question is how future treaties can achieve GHG emission reduction and also be ‘self-enforcing’—where participating countries have no incentive to withdraw or modify their contributions. Here we examine a dynamic negotiation process involving multiple countries, each deciding whether and when to join the agreement and whether to penalize non-participants. In particular, we distinguish between (1) indirect punishment, in which countries invest less in emission reduction in response to non-compliance of other countries, and (2) direct punishment, in which countries impose sanctions, such as punitive tariffs, against countries that do not comply. We analyze the negotiation process using evolutionary game theory. We show that how the two types of punishment are implemented greatly affects the agreement’s outcome. In particular, an efficient combination of the punishments could lead to more desirable self-enforcing agreements. These findings indicate that integrating punitive measures with an effective negotiation framework could result in more desirable climate agreements.


Introduction
Climate change is expected to inflict significant damage on our planet [1,2].Human actions significantly influence future climate trajectories: Projections indicate that, by the century's end, if greenhouse gas (GHG) emissions continue at current rates, temperatures may increase by 3 • C -6 • C compared to their levels at the beginning of the industrial revolution; however, with significant reductions in GHG emissions, warming could be limited to 1.5 • C -2 • C [3,4].Addressing climate change is a collective challenge as GHG emissions from any one country affect all others; therefore, this necessitates the cooperation of multiple countries worldwide.Since the problem is international, enforcing a solution becomes difficult, and countries are limited to using economic incentives to foster cooperation [5][6][7][8][9][10].As a result, despite five decades of international initiatives, GHG emissions have persistently increased [3,4].Climate agreements like the Kyoto Protocol and the Paris Agreement have achieved some successes, but they are far from achieving the desired reduction in GHG emissions [8,9,11,12].
A major obstacle to effective climate agreements is the incentive for countries to free ride and underinvest in emission reductions [8,9,[13][14][15][16]: A country that employs cheaper production methods may benefit economically while the climate-related costs are shared globally; namely, reduction of GHG emissions is a public good problem.A major question is, therefore, what is the most effective international environmental agreement (IEA) for climate change prevention that is also 'self-enforcing' (credible), in the sense that each country that participates also maximizes its own benefit by doing so [6,8,14,16].Countries can discourage free riding by punishing free riders via tariffs on products that generate GHG emissions [7,9,12,16,17].However, punishment often comes with a cost to the punisher, and therefore, the threat of punishment may not be credible in the sense that a profit-maximizing country will not choose to punish (the second-order free rider problem).Therefore, punishment is expected to be underprovided and does not alone resolve the free riding problem.Accordingly, the mainstream literature has focused on IEAs that are (a) 'self-enforcing,' where no country can increase its utility by unilaterally changing its strategy, and (b) 'renegotiationproof,' namely, if a country opts out of the agreement or reduces its contribution, the other countries will renegotiate and adjust their contribution to maximize their utility under the assumption that nonsignatories will not opt in [5,8,13,14].
Based on the assumption that agreements need to be renegotiation-proof to ensure self-enforcement, research suggests self-interested countries struggle to reach optimal agreements [8,9,14].Specifically, one type of model [8,13,14] has focused on participation in an IEA.In these models, the strategy of each country determines whether to opt in and participate in the agreement or opt out, where signatories that opt in will contribute to climate mitigation the amount that maximizes their benefit as a whole, while nonsignatories will each act independently to maximize its own utility.The solutions are renegotiation-proof if, once a country opts out, the remaining signatories renegotiate immediately and adopt the strategy that maximizes the utility of the remaining signatories.Another type of models [16,[18][19][20][21] examine competitive games in which each country decides how much it contributes, and there is no sharp distinction between signatories and non-signatories.But the renegotiation-proofness condition is similar: If a country defects and reduces its contribution, the remaining countries will assume that the defector will not return and immediately adopt the solution that maximizes their utility.The results of these models are generally rather pessimistic: Only agreements with a small number of participants or very low contributions by each participant can be achieved.
Nevertheless, the renegotiation-proof condition may be overly restrictive and undermine punishment mechanisms, because direct punishment, such as punitive tariffs, could be employed to incentivize non-participants to opt in.In other words, perhaps one should not be too naïve and expect countries to always punish, but at the same time, perhaps one should not be too pessimistic in assuming that non-participants will never change their actions.To address this issue, several studies have attempted to relax the renegotiation-proofness assumption.In particular, some studies have suggested that an agreement only takes effect after the number of signatories exceeds a certain threshold [22][23][24] (similar to the Kyoto Protocol and the Paris Agreement, which came into effect only after countries accounted for at least 55% of the total carbon dioxide emissions of all industrialized countries opted in).However, this idea introduces new questions and raises several difficulties.First, signatories can opt out after the agreement comes into effect, in which case it is unclear what mechanism could revive the agreement other than renegotiation.Second, it is unclear how to choose the optimal threshold number of participants.If the threshold is low, the agreement is still inefficient and includes only a few countries, whereas if the threshold is high, it could take a long time before enough countries opt in, and during this time, emissions will continue to be high.Third, it is unclear whether such a threshold strategy is at all selfenforcing/credible: if it takes too long to secure volunteers, countries may renegotiate to a lower threshold value.
In this paper, we examine the following questions: (1) Can direct punishment through tariffs against non-signatories be credible, in the sense that, despite being costly, punishment ultimately benefits the punisher by increasing the future participation of other countries?(2) Can such punishment foster a more efficient agreement?And (3) How to design negotiation frameworks, determining the rules for when and how to apply each punishment mechanism (punitive tariffs and investing less in GHG reduction), leading to a more efficient agreement?To address these questions, we consider the agreement's participation as a dynamic process, in which signatories can wait for non-signatories to join and become signatories.During that waiting period, punishment may be imposed when it is credible (i.e. if it ultimately benefits the punisher).We investigate two versions of the model: (1) a symmetric version with N identical countries and (2) a more realistic version that involves the five countries with the largest GHG emissions, using real data for costs, benefits, and emissions parameters.In the symmetric model, we compare (1) a framework in which GHG emissions reductions start in the first year, simultaneously with the punishment via tariffs, and (2) a framework in which the emissions reductions begin only after punishment via tariffs stops.
We show that, if the cost of imposing punishment via tariffs is sufficiently low, then punishment could be credible in certain negotiation frameworks.We also show that, if punishment is credible, it might incentivize additional countries to join and lead to a more efficient agreement.In addition, we show that an efficient combination of direct and indirect punishments is a key factor for achieving an efficient climate agreement.This paper contributes to the literature on the dynamic formation of IEAs [20,23,[25][26][27][28], which examines cases where countries join the coalition over time to become signatories.Specifically, our paper examines the addition of direct punishment via tariffs, where each signatory has the option to punish.Punishment in our model is not renegotiation-proof, but punishers can renegotiate and stop punishing.The paper also contributes to the literature on the evolution of costly punishment [29][30][31]: it examines cases in which punishment is not permanent, and punishers could renegotiate to stop the punishment throughout the game.

Framework
We consider a negotiation process in which each country can decide whether to opt in and become a signatory of the climate agreement.If a country does not opt in when the agreement is formed, it can still opt in during a later year.In turn, each signatory can decide whether it punishes non-signatories via tariffs.Punishment ends when the number of signatories reaches a certain threshold.This defines a general framework for the negotiation process, and we examine whether and under which conditions this framework can improve over negotiation frameworks without tariff-based punishment.We consider two models: one in which countries are identical, and one that includes more realistic cost and benefit functions of the five countries with the largest shares of GHG emissions.In both cases, we consider countries to be rational, profit-maximizing entities.Accordingly, the equilibria solutions that we find are those where each country chooses the strategy that maximizes its own utility given the strategies of the other countries.

Identical countries (symmetric model)
We consider a dynamic negotiation process in which countries can opt in and ratify a climate agreement (become signatories) either initially or in subsequent years.Signatories can decide when to renegotiate, and each non-signatory can decide if and when to change its action and opt in to become a signatory.We begin by describing the symmetric version of the model that includes N identical countries.In the next subsection, we describe the more realistic, asymmetric version of the model, in which we consider the five countries with the largest GHG emissions, with realistic benefits and costs estimated from the literature.The detailed description is given in appendix.
Initially, each country selects its type from three options: defector (D), renegotiator (R), or Punisher (P) (figure 1).The Ds begin as non-signatories but may opt in later to avoid punishment; the Rs are signatories that never punish others; and Ps are signatories that may directly punish non-signatories.Specifically, in the first year, Rs and Ps opt in to the agreement and become signatories, while Ds opt out.Thereafter, each year, any Defector can choose to opt in, such that the number of signatories, n, may increase over time.In turn, Punishers penalize nonsignatories (e.g. by imposing trade sanctions) as long as the number of signatories remains below a certain target, n * , where the target number n * is determined by the proportion of punishers among the population of non-defectors.(The idea is that n * increases as the portion of punishers among the signatories increases, because the renegotiators always want to renegotiate as soon as possible.)After n approaches n * , the punishers cease to impose penalties, and therefore, the remaining non-signatories have no incentive to opt in, leaving the number of signatories fixed thereafter.
In turn, the contributions by the signatories and the non-signatories to GHG emission reduction in a given year are determined by a subgame that is similar to the classical model of Barrett [14] (see 'contribution to GHG reduction subgame' in figure 1 and in Methods).Specifically, the signatories act as a single unit and choose the contribution that maximizes their utility as a whole.Then, each non-signatory acts independently and chooses the strategy that maximizes its own utility.In turn, the utility of each country in that subgame is given by the benefit it accrues from the reduction of GHG (due to the combined contribution of all countries) minus the cost due to its own contribution.(Note that the benefits account for the discounted sum over all future times due to the decrease in the present pollution stock.)We consider benefit and cost functions similar to those considered in the classic paper of Barrett, 1994 [14] (appendix).This model has been examined extensively and was proven robust to many of its underlying assumptions [6].In turn, the annual utility of each type-D, R, and P-is determined by the utility due to the contributions to GHG emission reduction and punishments.The countries' total utility or welfare is given by the sum of the utilities in all years, with future years discounted relative to the present (appendix).
In turn, we compare two frameworks (versions; formats) of the negotiation process.In Framework 1, the GHG signatories start reducing GHG emissions immediately, concurrently with the punishment, even when n < n * , as demonstrated in figure 1.In Framework 2, however, the countries delay GHG emission reduction until n ⩾ n * .Namely, in both Frameworks 1 and 2, Punishers impose penalties on non-signatories as long as n < n * , but in Framework 1, the 'contribution to GHG reduction subgame' begins in the first year, whereas in Framework 2, it begins only once n ⩾ n * .In both cases, the Defectors' strategy regarding whether and when to opt in is equivalent to an extended version of the war of attrition game [32] (appendix).
To find the set of strategies that could be adopted by these agents, we aim to identify the game's evolutionary equilibria (EE), which is assured to be a Nash equilibrium of the game [26,28,31,33] (appendix).Also, the choice between D, R, and P, is the first stage of the game, and since utilities and strategies for the rest of the game are determined using backward induction, the Nash equilibrium for the initial stage is part of the subgame-perfect Nash equilibrium of the entire game [34].Namely, rational agents are expected to adopt the EE strategy, and in particular, if an EE includes punishers, it implies that punishment is credible.

Non-identical countries (asymmetric model)
In the asymmetric model, we consider the five countries (or unions) with the largest share of GHG emissions worldwide: China, the United States, India, the European Union, and Russia.These countries differ in their relative share of worldwide GHG emissions and in the damage they incur due to climate change, as summarized in supplementary table S1.In the asymmetric model, we consider the same steps as those in the symmetric model (figure 1).Initially, each country chooses its type-either defector (D), renegotiator (R), or Punisher (P).We denote n as the share of the signatories in global carbon dioxide emissions and n * as the threshold share above which punishers stop punishing.Similar to the symmetric model, n * is determined by share of the punishers in GHG emissions among all non-defectors, and the punishers punish the non-signatories as long as n < n * .The contribution of each country to GHG reduction is determined by an asymmetric version of the contribution to GHG reduction subgame, and depends on whether the country is a signatory, and the country's relative GHG emission and expected climate damage (appendix).
To analyze the asymmetric model, we compute for each country a five-dimensional matrix (3 × 3 × 3 × 3 × 3) characterizing the expected utility of each country for each set of countries' initial strategy (D, R, or P for each of the five countries).Utility calculations are based on the initial strategy (D, R, or P) of each country, as detailed in appendix.We then find the Nash equilibria of the game for various parameter values characterizing the efficiency and strength of the direct punishment.

Results
Without punishers (Ps), our model aligns with classic self-enforcing IEAs, where stable strategies match renegotiation-proof strategies from the literature Only a few countries opt in to the agreement (Rs), while the majority of countries opt out (Ds).(b) In other parameter values (smaller CP/CD), however, the renegotiation-proof IEA is no longer an EE, and instead, an EE in which most countries participate in the climate agreement emerges.This agreement is still not optimal, as some countries still defect, but it is more desirable than the renegotiation-proof IEA: the average utility is 1778 in the renegotiation-proof IEA solution and 4417 in the EE.(c) For intermediate values of CP/CD, both the renegotiation-proof IEA and another, more desirable solution are EEs: the average utility is 1778 in the renegotiation-proof IEA solution and 4412 in the EE.This implies that each of these solutions can be selected, and which equilibrium is ultimately selected depends on how the countries coordinate.(d) It may also be the case that the renegotiation-proof IEA is not an EE, and two alternative EEs emerge.Note that each of the two EEs results in a more desirable agreement than the renegotiation-proof IEA: the average utility is 15.0 in the renegotiation-proof IEA solution and 55.2 and 39.1 in the EEs.[8,14].In such a scenario, the equilibrium number of signatories (n IEA ) is set where no signatory benefits from leaving, nor does a non-signatory from joining.Absent Ps, renegotiators (Rs) always opt in, defectors (Ds) opt out, aiming for an expected equilibrium where the number of Rs and Ds equal n IEA and N − n IEA , respectively.This dynamic, showing Rs' likelihood equal to n IEA /N, is illustrated in figure 2: the lower line, which connects D and R, demonstrates the evolutionary dynamics of the game between Ds and Rs in the absence of Ps.As suggested by previous studies, this solution implies that only a few countries participate and is far from the optimal outcome [8,9,14].In turn, in the asymmetric model, if the cost of being punished is very small (punishment is ineffective), the coalition only includes a single country (table 1).
However, the ability to directly punish via tariffs fundamentally changes the solution.In the symmetric model, the self-enforcing, renegotiation-proof IEA solution may become unstable (figures 2(b) and (d)), and the EE strategy may include a much greater number of signatories and result in significantly higher utility for each country (e.g. about 250% higher with parameter values used in figure 2).This EE is Pareto superior to the renegotiationproof EE and is therefore more socially desirable.Depending on the parameter values, several outcomes A Lampert Table 1.Solutions of the asymmetric model for various values of the cost of being punished (APC or CD).For each APC, there could be one or more Nash equilibria.In each Nash equilibrium, each country may adopt the role of either a Defector (D), Renegotiator (R), or Punisher (P).After choosing the initial strategy, Ps and Rs become signatories and start reducing their emissions (green background), while each D should either become a signatory to avoid punishment (light blue background) or remain a non-signatory forever (white background).For parameter values, see appendix and supplementary table S1.

Country
Global benefits and costs (present trillion USD)  2(c)), which implies that selecting the more desirable solution may necessitate coordination among the countries [34].Also, in other parameter regions, additional EEs may also exist (figure 2(d)).Our sensitivity analysis shows that these results are robust to changes in the number of countries (supplementary figure S1) and the discount factor (supplementary figure S2).
Next, consider the case in which the signatories do not reduce their GHG emissions until the number of signatories reaches the target, n * (Framework 2).Similar to Framework 1, if C P is high, the renegotiation-proof IEA is also the only EE (figures 3(a) and (c)), whereas if C P is below a certain threshold, an EE that is more socially desirable and encompasses more signatories emerges (figures 3(b) and (d)).Nevertheless, Framework 2 leads to a significantly less desirable outcome than Framework 1. First, there is a wide range of parameters where the more desirable EEs only emerge if Framework 1 is adopted (the more desirable EEs are exhibited in figures 2(c) and (d) but not in figures 3(c) and (d), although the parameters are the same in both figures).Second, even if the more desirable EE exists, it includes fewer signatories and results in a lower utility compared to the EE that emerges in Framework 1 with identical parameter values (e.g. the EE exhibited in figure 2(b) includes more signatories than the one exhibited in figure 3(b)).
In the asymmetric model, as in the symmetric model, our results show that direct punishment significantly improves the agreements and leads to agreements with more participants.In particular, if the cost of being punished increases, the number of signatories increases, and so does the global benefit from the reduction of GHG emissions (table 1).For some parameter values, multiple Nash equilibria exist, differing in the identities of the signatories.In some solutions, some countries are Defectors that start as nonsignatories but become signatories to avoid being punished (table 1-rows where some Ds have a light blue background and the cost of punishment column has a positive value).In other solutions, no punishment is imposed because n ⩾ n * from year zero, yet the threat of punishment results in a larger number of participants.

Discussion
Our study proposes a general framework for negotiating climate agreements (figure 1).Each country initially decides whether to become a signatory and reduce GHG emissions, and whether to punish nonsignatories.The framework also specifies when the punishment stops, which in turn affects how nonparticipants join the agreement over time.We compared two frameworks: one in which the signatories start reducing their GHG emissions immediately, and another where reductions start only after the number of signatories reaches the target n * .We found that the first framework yields a significantly better outcome (compare figures 2 and 3), which implies that the negotiation framework is critical for the agreement's success.In particular, our study indicates that reaching successful climate agreements necessitates: (1) enforcing punitive measures, such as tariffs, on non-signatories and (2) allocating adequate time to form the agreement, during which the participating countries reduce GHG emissions concurrently with the enforcement of the tariffs.
In particular, our results imply that punishing the non-signatories via tariffs to incentivize them to opt in during the first years can significantly improve the agreement's outcome.Without punishment, only a few countries are expected to join the agreement; however, if punishment is allowed, more countries may join the agreement, and the resulting agreement is more desirable.This occurs in both the symmetric model (figures 2 and 3) and the asymmetric model (table 1).Note that previous studies have also considered dynamic processes in which countries join IEAs over time [20,23,[25][26][27][28] or change their contribution over time [16,[35][36][37], but the restriction to renegotiation-proof solutions has limited the capacity of these models to exhibit more desirable outcomes.Here we showed that certain solutions, even if they are not renegotiation-proof and involve cost to the punisher, may still be credible and self-enforcing: if the cost incurred by the punisher for imposing tariffs on non-signatories is sufficiently low, the benefit from incentivizing these non-signatories to join may outweigh this cost, making punishment credible.Also, note that previous studies have shown that costly punishment could be evolutionarily stable [29][30][31].Our study extends this idea to cases in which the punishers could renegotiate in the sense that punishment stops at some point in time, after sufficiently many countries have joined the agreement.The ability to renegotiate and stop punishing indeed reduces the solution efficiency in the sense that the equilibrium solution is not the grand coalition, but it is still more desirable than the solution that emerges in the absence of punishment.
While both implying punitive measures and allocating sufficient time to form the agreement are crucial for successful negotiations, neither has been fully implemented in real-world climate agreements like the Kyoto Protocol and the Paris Agreement.The Kyoto Protocol required ratification by at least 55 UNFCCC parties, representing at least 55% of industrialized nations' carbon emissions, akin to our proposed target GHG emission share (n * ) for concluding negotiations.However, there are three key differences between our proposed negotiation format and the Kyoto Protocol: (1) n * is not predetermined in our approach; (2) we incorporate direct punishment, such as tariffs on non-signatories, until reaching n * and (3) our model assumes immediate cooperation in reducing GHG emissions, contrasting with Kyoto's delayed action until the 55% target was met.We compared this case (Framework 1, figure 2) to a case that is more similar to Kyoto's, where countries do not cooperate until n * is reached (Framework 2, figure 3), demonstrating that Framework 2 leads to a significantly less efficient solution.Therefore, our results align with observations that the Kyoto Protocol fell short of its intended objectives.
The negotiation framework we proposed also differs from the Paris Agreement.First, the Paris Agreement sets relatively modest objectives for the first decades, with the majority of GHG emissions reductions planned for after 2030 [38].These modest objectives might explain the Paris Agreement's swift six-month period to take effect and the nearunanimous ratification by countries, in contrast to the Kyoto Protocol's seven-year duration to come into force [39].In comparison, our model does not set predetermined GHG reduction targets, and a higher number of participants does not impede significant reductions by each signatory.Therefore, in our model, if all countries are signatories, the optimal solution is approached, whereas in the Paris Agreement, this is far from the case.Furthermore, the Paris Agreement lacks an enforcement mechanism (no penalties for non-compliant countries), although it does offer subsidies to some non-industrialized countries through the Green Climate Fund [39].
Our study has several limitations and assumptions that could be addressed in future studies.First, we considered a specific criterion (i.e. a renegotiation criterion, n > n * ), which may not be optimal.Future studies are needed in order to examine alternative renegotiation criteria.Second, we restricted attention to punishment through tariffs (direct punishment) and delaying GHG emission reduction (e.g.Framework 2's indirect punishment).Future studies could explore more complex punishment mechanisms, including other forms of punishment and counter punishment by the defectors.Finally, future studies may include more detailed climate dynamics and explicitly incorporate the pollution stock [20,23,25,26].
The data used to estimate the benefits and costs of climate mitigation and the share of GHG emissions for the countries in the asymmetric model is publicly available [40][41][42].The parameter values used for the asymmetric model are summarized in supplementary table S1.Parameter values used to generate the figures are detailed in appendix.Any additional information needed for regenerating the results is given in appendix.The code is available at the open-source repositories Dryad and Zenodo (DOI: https://doi.org/10.5061/dryad.3n5tb2rrk).

A.1. The model A.1.1. Overview
This section provides a detailed description of the model and methods for complete reproducibility, building on the summary provided in the Framework section and figure 1.We consider one model of a symmetric game with N identical players (countries), and another model of an asymmetric game in which countries differ in their benefits and costs.In the first stage, each player chooses to become one of the following three types: defector (D), renegotiator (R), or punisher (P).Players retain their chosen type for the rest of the game.All R and P players are signatories of some IEA.Each D player is initially a non-signatory, but each year it has the option to become a signatory, in which case it remains a signatory permanently.Consequently, more countries may become signatories over time.In the symmetric model, we denote n as the number of signatories, and in the asymmetric model, we denote n as the share of the signatories in global GHG emissions.
Each year, two subgames are played.First, the players play the 'Contribution to greenhouse gas reduction' subgame, in which their identities as signatories or non-signatories determine their utilities (see subsection A.1.2).Second, if n is below a certain threshold, n * , the P players punish the nonsignatories (e.g. by imposing trade sanctions).Then, the non-signatories decide whether to remain nonsignatories or to become signatories.The threshold n * is set at the initial stage and is lower when there are more renegotiators among the signatories.
Specifically, in the symmetric model, we consider n * that is given by the product of N and the proportion of P players among the P and R players: where |D| and |P| are the number of D and R players, respectively.In the asymmetric model, we use the same formula, but where |P| and |R| are the share of the punishers and the Renegotiators in global GHG emissions, respectively, and the number of countries is N = 5.In turn, the annual cost to each non-signatory due to being punished is C D , and the annual cost of punishing a single non-signatory is C P .(Specifically, if country 1 imposes trade sanctions on country 2, then country 2 has a loss of C D and country 1 has a loss of C P ; we assume that the constant C P is the same for all countries that punish and the constant C D is the same for all countries that are being punished.)In turn, punishment is carried out equally by all the P players, and therefore, each P player incurs a cost of C P times the number of nonsignatories divided by the number of P players, i.e.C P (N − n) / |P|.Note that the incentive of a D player to remain a non-signatory is that it gets a higher payoff in the 'Contribution to greenhouse gas reduction subgame' subgame, whereas its incentive to become a signatory is to avoid punishment.Ultimately, the utility of each player is given by the sum of its annual utilities over the years (due to both punishment and the contribution to GHG reduction subgame), subject to a discount factor 0 ⩽ β < 1.

A.1.2. Contribution to GHG reduction subgame
In the 'Contribution to greenhouse gas reduction' subgame, there are two types of players: signatories and non-signatories.In our model, the players' identities as signatories or non-signatories are determined before the subgame begins.Accordingly, the description here specifies how the utilities of signatories and non-signatories are determined.We assume that the benefits and costs in the 'contribution to greenhouse gas reduction' subgame are similar to those considered by Barrett [14].Specifically, the cost to country i due to its own contribution to GHG reduction is given by where q i ⩾ 0 is the contribution of country i, c > 0 is the marginal cost of the contribution when the contribution is small, and c i > 0 determines the rate at which the marginal cost increases as the contribution increases.In turn, each country has a benefit due to the total contribution of all countries, given by where Q is the aggregate contribution, where b i > 0 is the marginal benefit from the aggregate contribution, and b > 0 determines the rate at which the marginal benefit diminishes as the aggregate contribution increases.
In turn, the contributors to GHG reduction (q i ) are determined by the following two-stage game [14].First, the signatories collectively decide how much each signatory contributes to maximize the utility of the signatories as a whole.(If the game is symmetric, all signatories contribute the same amount.)Second, the signatories choose the set of contributions that maximize the sum over B i − C i for all signatories [14].Finally, if the game is asymmetric, we consider side payments among the signatories.Specifically, although many allocation methods have been proposed for side payments in the literature [6], we consider a simple allocation in which the portion of the total cost to the signatories (sum over C i ) that each signatory ultimately pays is proportional to its own benefit (b i ).

A.2. Numerical methods: how we calculated the equilibria A.2.1. General flow of the algorithm
In both the symmetric and the asymmetric cases, the calculation of the equilibrium follows the following two steps: • Step 1: the algorithm calculates each player's utility for every possible set of initial strategies (D, R, or P).• Step 2: the algorithm finds the equilibria; namely, those sets of strategies where each player maximizes its own utility given the strategies of the other players.
To find the utilities of the players and their strategies in the subgames that follow the initial choice of D, R, or P (Step 1), the algorithm uses backward induction.First, the algorithm calculates the utilities to signatories and non-signatories in the 'Contribution to greenhouse gas reduction' subgame for any value of n (A.1.2).Then, the algorithm finds the equilibrium strategy of the D players dictating whether and when to become signatories.
Step 1 is described in detail for the symmetric case in subsection A.2.2 and for the asymmetric case in subsection A.2.3.
To find the equilibrium strategies (Step 2) for the symmetric model, we used evolutionary game theory, whereas for the asymmetric model, the algorithm calculates the Nash equilibria directly from the fivedimensional (3 × 3 × 3 × 3 × 3) matrices calculated in Step 1. Step 2 is described in detail for the symmetric case in subsection A.2.4 and for the asymmetric case in subsection A.2.5.

A.2.2. Symmetric model-step 1: solving a single N-players game
If n < n * after the initial stage, the D players need to decide whether and when to become signatories, until n approaches n * .Eventually, after n ⩾ n * , the D players that remain non-signatories receive an annual payoff of u n (n) (from the contribution to greenhouse gas reduction subgame), and the D players that become signatories receive an annual payoff of u s (n) every year.Considering a discount factor β < 1 and n ⩾ n * , the difference in utility between the D players that became signatories and those that did not is given by At the same time, as long as n < n * , in every year in which a D player stays a non-signatory, it incurs a cost of C D .Therefore, this subgame is equivalent to a generalized war of attrition game [32] among the D players, with a cost of C D per unit for staying a nonsignatory and a reward of V for remaining among the last N − n * non-signatories.Also, note that the players have complete information about the type of the other players, such that all players know how many players are still non-signatories in any given year.
For simplicity, we consider a continuous-time version of the war of attrition game.To find the Nash equilibrium, consider first the case in which n = n, where n is the integer that satisfies n * − 1 ⩽ n < n * .(Namely, if n = n, then exactly one more signatory is needed so that the number of signatories approaches n * .)Then, only one non-signatory needs to become a signatory before n approaches n * .In that case, the symmetric Nash equilibrium (also the EE) is given by a solution where, for each of the k = N − n non-signatories, the probability distribution that the player quits at time t is given by where V = V (n).This is because each player is indifferent between quitting and staying if and only if the probability that the game ends despite staying (one the other k − 1 players quit) during a period dt is ( V/C D ) dt [32].This implies that the probability of each of the other players to quit during the period dt is given by And therefore, assuming that dt in infinitesimally small, the probability that any of the k players quits during that period is given by k times that amount.In turn, since the game is timeinvariant [32], the probability to quit at time t, P (t), is given by the above-mentioned Poisson distribution (which extends the well-known two-players case in which k = 2).Note that, following this solution, the expected utility is zero [32].Therefore, it follows that, in Nash equilibrium, if initially n < n, there are n − n non-signatories that will become signatories immediately when the game begins.

A.2.3. Asymmetric model-step 1: solving a single subgame
For the asymmetric game, a different approach is required in order to decide which players will become signatories if n < n * .Since there are multiple Nash equilibria to the asymmetric war of attrition game, none of which is symmetric, we consider a plausible scenario in which the player non-signatory player with the largest benefit from GHG emission reduction (largest b i ) is the first to attempt to opt in and become as signatory.However, this player becomes a signatory only if its utility as a signatory is higher than its utility as a non-signatory that is being punished (the utility loss due to becoming a signatory is smaller than the cost of being punished).If the country does not become a signatory, or if it becomes a signatory but still n < n * , the country with the next largest b i has the turn and could decide whether to opt in or not.This process concludes either when n ⩾ n * or when no other country can benefit from opting in.If n remains below n * , the punishers punish the nonsignatories forever.

A.2.4. Symmetric model-step 2: finding the EE
In previous subsections, we described how we calculate the utility of each type for some given initial configuration, where a configuration is given by |D|, |P|, and |R|, the number of Ds, Rs, and Ps among the N players, respectively.To determine how rational players would choose between D, R, and P at the initial stage, the next step is to find the EE, which would also be the subgame-perfect Nash equilibrium [34].Specifically, the EE strategies of choosing among D, R, and P are determined via an evolutionary game-theoretical analysis of an N-player game.We emphasize, however, that we do not assume that the countries actually participate in such an evolutionary process; rather, like numerous previous studies [26,28], we use the evolutionary dynamics as a technical method for identifying the Nash equilibria, as previous studies.
We denote a strategy of a player as P D , P R , and P P , the probabilities to choose the type D, P, and R, respectively.(And the strategy of each player during the rest of the game is determined by its own type and the types of the other players).(Note that, although the type of each player, D, P, or R, is determined in the initial stage, the probability of choosing each type may change during the evolutionary analysis, until it converges to the EE [31].) The first step for finding the EE is to calculate the utility of each type for any configuration of the game.Next, we calculate F D (P D , P R , P P ), F R (P D , P R , P P ), and F P (P D , P R , P P ), the expected utilities (or fitness) of the D, R, and P players, respectively, given as the weighted average on all possible configurations.
Specifically, the weights are given by the probability that a particular configuration occurs.From the point of view of a given player, there must be at least one player of its own type (self), and the probability for a given configuration of the other N − 1 individuals is determined by P D , P R , and P P and is given by the trinomial distribution where B(m|N, q) is the probability that an event with probability q per lottery occurs exactly m times out of N lotteries (binomial coefficient), and d, r, and p are the number of D, R, and P players among the N − 1 other (non-self) players.The configuration is then determined by d, r, p, and the identity of the player, e.g. for a D player, |D| = d + 1, |R| = r, and |P| = p).
It follows that the fitness is given by Next, note that P D , P R , and P P vary continuously within the region where P D ⩾ 0, P R ⩾ 0, P P ⩾ 0, and P D + P R + P P = 1.Accordingly, we calculate the fitness for all values of P D , P R , and P P within that region, up to some fine resolution.
Finally, to find the EEs, we consider the continuous-time replicator equations and find their stable steady states.Note that the EEs do not depend on the particular choice of the dynamic equations, and they are uniquely determined by the F D , F R , and F P .Nevertheless, using the replicator equations is a convenient method for finding the EEs.Specifically, the equations are given by: where s is a constant characterizing the speed of selection and ϕ is the average fitness in the whole population, given by ϕ = P D F D + P R F R + P P F P .
The steady states and the dynamic trajectories of this equation are demonstrated in figure 2. Every EE is also guaranteed to be a Nash equilibrium as well as a subgame-perfect Nash equilibrium of the game [34].

A.2.5. Asymmetric model-step 2: finding the subgame-perfect Nash equilibria
For each player, we created a five-dimensional matrix (3 × 3 × 3 × 3 × 3) representing its utility for each set of strategies (its own strategy and the strategy of the other four countries, where each strategy can be either D, R, or P).Every cell in these matrices was generated by solving the outcome of the underlying dynamics resulting from the initial choice of the strategies (Step 1; subsection A.2.3).Upon generating these matrices, we found the Nash equilibria by examining all possible sets of strategies and identifying those where no player could benefit from unilaterally changing its strategy.Note that, in some cases, depending on the choice of parameters, multiple Nash equilibria exist.

A.3. Estimation of parameter values for the asymmetric model
We estimated the benefits and costs for the five countries or unions with the largest share of GHG emissions worldwide: China, the United States, India, the European Union, and Russia.The GHG emissions share of these countries relative to all emissions worldwide are obtained from the Emissions Database for Global Atmospheric Research (2023 report) [41] and are summarized in supplementary table S1.Based on these relative shares, we estimated the cost values c i for each country.To achieve correct scaling in our model, we set c i values inversely proportional to the countries' relative shares.(To understand this scaling, notice that m identical countries contributing q each must have the same cost as a union of the m countries with a contribution of mq, and this is obtained only if the c i of the union equals the c i of the countries divided by m.) In turn, supplementary table S1 summarizes the benefit of each country due to maintaining the temperature at 1.5 • C increase instead of allowing a 3.2 • C increase.Benefits as a percentage of each country's GDP are taken from [40], and to obtain the benefit in USD, we multiplied this relative benefit by the 2023 GDP of the country [42] (supplementary table S1).(For the European Union, we used Germany as the representative country for estimating the benefits relative to GDP, and we multiplied it by the total GDP of the European Union.)To parameterize our model accordingly, we assume that the maximum benefit that a country could gain from global GHG reduction is proportional to the GDP gain of that country due to maintaining the temperature at 1.5 • C increase instead of letting it increase by 3.2 • C. Specifically, b i is proportional to the maximum benefit for country i, and b is determined such that the maximum benefit for all countries is obtained when Q has its optimal level (achieved only if all countries are signatories).Finally, note that the cost of punishment to the nonsignatory, C D , and to the punisher, C P , could vary across a wide range.Accordingly, we examine how the results depend on these parameters by examining a wide variety of values of these parameters.For the results in table 1, we examine various values of C D (APC), while keeping C P constant at 2 trillion USD.

Figure 1 .
Figure 1.Flow diagram showing the various stages of the negotiation process.The framework is similar for both the symmetric and the asymmetric cases.

Figure 2 .
Figure 2. The evolutionary equilibria (EE) of the climate negotiation game may be more desirable than the renegotiation-proof, self-enforcing IEA solution.Demonstrated are the evolutionary dynamics of the climate negotiation game for four different sets of parameter values (Methods).The location on the phase plane represents a strategy adopted by each of N = 17 countries: locations closer to D imply more defectors, those closer to R imply more renegotiators, and those closer to P imply more punishers.Ultimately, all Rs and Ps participate in the climate agreement, and some Ds might participate as well if Ps are present (Methods).Each line represents a dynamic trajectory, leading to a dot that represents either an EE (black dot) or an unstable saddle-node (white dot).Note that an EE is also a Nash equilibrium of the climate negotiation game.Each panel demonstrates the solution for a different set of parameter values (Methods).(a) For certain parameter values (cost to punisher is large-large CP/CD), the renegotiation-proof IEA solution is also the unique EE.Specifically, all renegotiation-proof strategies are restricted to the line connecting D and R, on which the dynamic trajectories lead to an EE that characterizes the renegotiation-proof IEA:Only a few countries opt in to the agreement (Rs), while the majority of countries opt out (Ds).(b) In other parameter values (smaller CP/CD), however, the renegotiation-proof IEA is no longer an EE, and instead, an EE in which most countries participate in the climate agreement emerges.This agreement is still not optimal, as some countries still defect, but it is more desirable than the renegotiation-proof IEA: the average utility is 1778 in the renegotiation-proof IEA solution and 4417 in the EE.(c) For intermediate values of CP/CD, both the renegotiation-proof IEA and another, more desirable solution are EEs: the average utility is 1778 in the renegotiation-proof IEA solution and 4412 in the EE.This implies that each of these solutions can be selected, and which equilibrium is ultimately selected depends on how the countries coordinate.(d) It may also be the case that the renegotiation-proof IEA is not an EE, and two alternative EEs emerge.Note that each of the two EEs results in a more desirable agreement than the renegotiation-proof IEA: the average utility is 15.0 in the renegotiation-proof IEA solution and 55.2 and 39.1 in the EEs.

Figure 3 .
Figure 3.The evolutionary equilibria (EE) of the climate negotiation game may exhibit a more efficient agreement than the IEA solution, but the efficiency depends on the negotiation mframework.Demonstrated are the evolutionary dynamics of the climate negotiation game for the same four sets of parameter values as those used in figure 2. However, here we considered Framework 2, in which the countries do not reduce their GHG emission as long as n < n * (whereas in figure 2, we considered Framework 1, in which the signatories reduce their GHG emissions already in year 1, concurrently with the punishment).(a) The IEA solution is also the only EE (similar to figure 2(a)).(b) The IEA is unstable and the EE is a more desirable solution in which several countries participate.However, the EE solution here is less desirable and includes fewer participants compared to the one demonstrated in figure 2(b).(c) As in panel (a), the IEA solution is also the only EE.This differs from the solution demonstrated in figure 2(c), in which a more desirable EE coexisted with the IEA solution.(d) One EE solution exists, whereas in figure 2(d), two EEs exist, including a more efficient one.

A. 4 .
Parameter values used in the simulations-symmetric model Figure 2.