Punishment in Public Goods games leads to meta-stable phase transitions and hysteresis

The evolution of cooperation has been a perennial problem in evolutionary biology because cooperation can be undermined by selfish cheaters who gain an advantage in the short run, while compromising the long-term viability of the population. Evolutionary game theory has shown that under certain conditions, cooperation nonetheless evolves stably, for example if players have the opportunity to punish cheaters that benefit from a public good yet refuse to pay into the common pool. However, punishment has remained enigmatic because it is costly, and difficult to maintain. On the other hand, cooperation emerges naturally in the Public Goods game if the synergy of the public good (the factor multiplying the public good investment) is sufficiently high. In terms of this synergy parameter, the transition from defection to cooperation can be viewed as a phase transition with the synergy as the critical parameter. We show here that punishment reduces the critical value at which cooperation occurs, but also creates the possibility of meta-stable phase transitions, where populations can"tunnel"into the cooperating phase below the critical value. At the same time, cooperating populations are unstable even above the critical value, because a group of defectors that are large enough can"nucleate"such a transition. We study the mean-field theoretical predictions via agent-based simulations of finite populations using an evolutionary approach where the decisions to cooperate or to punish are encoded genetically in terms of evolvable probabilities. We recover the theoretical predictions and demonstrate that the population shows hysteresis, as expected in systems that exhibit super-heating and super-cooling. We conclude that punishment can stabilize populations of cooperators below the critical point, but it is a two-edged sword: it can also stabilize defectors above the critical point.


Introduction
When individuals maximize their self-interest by exploiting a public good, they are often doing so by harming their (and others') own long-term interest, and create a social dilemma termed the "tragedy of the commons" [1].The tragedy of the commons is often discussed in environmental politics (for example, overgrazing and overfishing), as well as social science and politics (for example, vandalism and taxation) [1].However, the tragedy of the commons also plays an important role in evolutionary biology [2]: rate-yield tradeoffs in bacterial metabolism [3], the evolution of virulence [4] and the manipulation of a host by a group of parasites [5] can be viewed as a social dilemma involving a public good.Social dilemmas [6] (such as the tragedy of the commons) can be studied within the framework of Evolutionary Game Theory (EGT) [7][8][9][10][11][12], which describes populations of agents engaging in pairwise (or group wise) interactions, with defined payoffs for different strategies.The tragedy of the commons is usually described by a particular game form known as the "Public Goods" game.
The Public Goods game is a standard within the field of experimental economics [13][14][15].In this game, players possess tokens that they can invest into a common pool (the public good).The total sum contributed by the players is then multiplied by a "synergy factor" (creating a positive yield).This amount (typically larger than the invested sum) is then equally distributed to the players in the pool, irrespective of whether they invested or not.A group of players maximizes their investment if all the players contribute (so as to take maximum advantage of the synergy).However, this behavior is vulnerable to "free-riders": individuals that share in the pool but do not invest themselves.It can easily be shown that the rational Nash equilibrium for this game is not to pay in, because this strategy clearly dominates all others regardless of their play [1].
Hardin originally suggested that the tragedy of the commons can only be avoided by punishing free riders [1].Indeed, it has been shown that punishment can counteract defectors effectively [16][17][18][19][20][21][22][23][24][25][26], but punishment is difficult to maintain because it is costly and may reduce the mean payoffs of group members compared to groups in which punishment of free-riders is not possible [27].It was previously thought that punishment cannot be maintained in well-mixed populations [28] so most of the literature has focused on the spatial version of the game, where analytical results are difficult (but not impossible) to obtain [29].Here we study the well-mixed version, establish a number of theoretical results that suggest complex dynamics at the interface between the cooperating and defecting phases (as a function of the synergy factor) and clarify the role of punishment as a catalyst of cooperation in extensive agent-based simulations that agree with the theoretical results.
There are variations of the Public Goods Game in which punishment can be maintained in other ways, for example by voluntary punishment [29][30][31][32][33] or by using pool (that, is institutionalized) as opposed to peer punishment [33][34][35], but we do not study those here.

Mean field theory of Public Goods games
The Public Goods game emulates strategic decision making by groups, in which an individual must select between different decisions that affect the group as a whole.Each individual in a group of k + 1 players (the focal player and her k participants) can decide to cooperate by making a contribution of 1 unit to the public good, while defecting individuals do not contribute.
The sum of all contributions from cooperating players is multiplied by r (the synergy factor) and divided among all players.If N C is the number of cooperators within the group (but not counting the focal player, i.e., N C ≤ k) and N D is the number of defectors, then the cooperator obtains a payoff compared to the defector's A dilemma exists if it is advantageous for the individual to defect, while mutual cooperation would be best for all.Clearly a defector does better if P D − P C > 0, so a dilemma exists only if r < k + 1.At the same time, the payoff for a cooperator playing within a group of cooperators should be larger than the payoff for a defector playing only with defectors, that is, P C (N C = k) − P D (N C = 0) > 0 which implies r > 1.Thus, a dilemma exists only for 1 < r < k + 1 (see Fig. 1).Standard evolutionary game theory arguments imply that defection is the rational (and optimal) strategy for r < k + 1, while cooperation is selected for when r > k + 1.The synergy parameter r can thus be viewed as a critical parameter, dialing a phase transition from defection to cooperation as r is increased through k + 1.Note that the Public Goods game turns into the standard Prisoner's dilemma for k = 1, with a dilemma for 1 < r < 2.
How can this dilemma be solved?How can evolution achieve cooperation in the greyshaded area in Fig. 1?The answer is: this is impossible unless additional mechanisms change the critical point below r c = k + 1.One such mechanism investigated in the literature is giving players the option to punish players who do not contribute.Following the notation of Helbing et al. [26], defecting players suffer a fine β/k levied by each punisher in the group, which costs each punisher a penalty γ/k.Let N M be the number of The phase diagram of the Public Goods game with a synergy factor r. Below r = 1 defection is the strategy with the highest payoff and therefore favored by evolution.Conversely, above r = k + 1 cooperation is evolutionarily favored.A dilemma exists in the grey-shaded are between r = 1 and r = k + 1, where cooperation would be beneficial for a cooperating group as a whole, but defection is the Nash equilibrium point and thus evolutionarily favored.
players that cooperate as well as punish (the "moralists") and N I the number of defectors that punish ("immoralists").As before, N C and N D are the number of players that cooperate viz.defect but do not punish.The payoffs for the four possible strategies then become Let us calculate the critical point for the game with punishment, assuming a well-mixed population so that each player encounters on average the same fraction of strategies.
Introducing the mean density of cooperators ρ C and the mean density of punishers ρ P along with ρ D = N D +N I k we can write the average payoffs for each of the four strategies as Investigating P C − P D again, we notice that the area where the dilemma exists is now shifted by βρ P (see also Fig. 2): 1 where the right boundary corresponds to the critical point r c = (k+1)(1−βρ P ) separating a cooperating and a defecting phase.Because we will be concerned with this critical point from now on, let us introduce the re-scaled synergy parameter ξ = r/(k + 1).The critical point is then ξ c = 1−βρ P .According to standard population genetics, a single cooperating Figure 2. The phase diagram of the Public Goods game with a synergy factor r and punishment.Compared to Fig. 1, the area where a dilemma occurs is shifted towards lower r, implying that cooperation can occur for smaller r (right boundary of dilemma area).A single defector cannot invade the population to the right of the critical point, and a single cooperator cannot invade to the left of r c individual cannot invade a population of defectors unless its fitness advantage P C − P D is positive, which implies ξ > 1−βρ P .However, if the entire population consists of defectors, punishment is expected to be absent because defectors do not punish each other (i.e., we assume here that immoralists do not matter in the long run, as was found in numerical simulations [26]).What happens if a group of cooperators (rather than a single individual) tries to invade the defectors (or a group of defectors tires to invade the cooperators)?Because the fitness of any group is frequency-dependent, we have to recalculate the mean fitness of a group as follows: Assume a population of strategies given by the mean densities (ρ C , ρ P ).Let us also assume that, in general, cooperators punish with a probability π C , while defectors punish with a probability π D .Then, ρ P = π C ρ C +π D ρ D .The mean fitness of a group of cooperators is then given by wC = (1 using the payoffs (9)(10)(11)(12), and the fitness advantage of the cooperating type with respect to the defectors is We will see in the numerical results below that immoralists go extinct quickly (because they bear the double cost of meting out and receiving punishment).As a consequence, we set π D = 0 (defectors don't punish), and write the cooperator's probability to punish as π C ≡ π, so that wC Eq. ( 16) implies that punishment enables a "premature" phase transition to cooperation as long as a "nucleus" of cooperators ρ in C of sufficient size exists, just as in supercritical phase transitions (see Fig. 3).Thus, a "fluctuation" of pure moralists (ρ in C = 1, π = 1) is stable at ξ = 1 − β, which can be significantly smaller than 1 if the effect of punishment is large.However, the opposite dynamics occur for groups of defectors: they can invade stable cooperators at ξ > 1 as long as the density of invading defectors ρ in D is large enough.As outlined in Fig. 3, a "fluctuation" into all defectors ρ in D = 1 is stable for ξ = 1 + πγ, which can be substantially larger than 1 when the defectors displace perfect moralists (π = 1).Thus, punishment enables both cooperation and defection in the "supercritical" phase, away from the critical point ξ = 1.This supercritical behavior is due to alternative meta-stable states, and can result in hysteresis: a population that starts in a defecting phase will stay in the cooperating phase past the critical point ξ = 1 as ξ is raised adiabatically from low values, and remain in the defecting phase past the critical point as ξ is lowered from high values adiabatically.We will verify this behavior in the numerical simulations that follow.

Evolutionary simulation of Public Goods games
In this section we test the predictions of the (infinite population size) mean-field theory using agent-based simulations with finite population size.The population consists of 1,024 individuals who each have four (randomly assigned) opponents, that is, we use Invasion probabilities for a fixed density of cooperators ρ in C , as a function of the critical parameter ξ.Cooperation is stable for ξ < 1 as long as the initial density of cooperators ), which may be larger or small than 1.For ρ in C =0, the critical point is actually to the right of the critical point in the absence of punishment, that is, punishment hinders the establishment of cooperation.We sketch invasion probabilities as continuous across the critical lines to indicate the effect of finite population size.Increasing the population size creates steeper transitions approaching a sudden transition.k = 4 throughout in the results presented here (with some results for k = 8).For populations of this size, neutral drift is negligible and results do not change qualitatively if populations are larger.However, the steepness of the transition between defection and cooperation may depend on the population size in the standard manner expected from finite-size scaling arguments (see, e.g., [36, p. 441]).

Game dynamics and Genetic Algorithm
Since all opponents are also players, each individual plays k + 1 games per update.The actual play of each individual is determined by their probabilities to cooperate p C and to punish p P encoded as two genetic loci, which can be thought of as the outcome of a network of genes that encode this decision.When mutating strategies, instead of mutating the individual genes that make up the decision pathway, we simply replace the parental probability p C by a uniformly drawn random number in the offspring.We will call the locus encoding the probability p C simply the "C gene" and similarly for the punishment gene.
When every individual has played against its k partners, 2 percent of the population is replaced using a Moran-like process [37] in a well-mixed fashion.The Moran-like process with a finite replacement rate interpolates between a true Moran process (replacement rate equals to inverse population size) to a Wright-Fisher process, where the entire population is replaced every update.In our replacement scheme, the identity of the players in any group is unrelated to their ancestry so that, effectively, the members of a particular playing group are randomly selected from the population [38].With a replacement rate of 2%, it takes on average 50 population updates until the entire population is replaced, that is, a single generation has elapsed.In our simulations, the fitness of each individual is cumulative, that is, the payoff obtained in the next update of the population is added to the payoff already obtained (until that player is removed).However, we have tested that zeroing out the fitness after each update does not alter the game dynamics.We also verified that varying the replacement rate does not change the dynamics of the population in this game, unlike in the case where strategies communicate [39].If strategies make their play dependent on the last play, then replacing the opponent can introduce noise into the communication, resulting in different levels of cooperation.
We verified that the probability for a player to encounter cooperators is independent of whether that player is a cooperator or a defector, as is required for well-mixed populations [40].The accumulated payoff (fitness) is used to calculate the probability that this player's strategy will be chosen to replicate and fill the spot of a player that was removed in the Moran process.In case payoffs (calculated according to the equations above) are negative, we add a constant payoff to each and every strategy so that the relative payoffs are unchanged (it is known that such an offset does not alter the population dynamics).While the spatial version of the game shows somewhat different dynamics than studied here, we study the well-mixed version because it is amenable to theoretical prediction (see section 2).
The two genes of every individual mutate with a probability µ when replicated.As mentioned earlier, mutating a probability replaces the probability with a uniformly distributed random number.While we used a fixed mutation probability (µ = 0.02 per locus) in the results presented here, we have previously studied the effect of varying mutation rate in this game [41] and found only a weak dependence.

Line of Descent
After 500,000 updates, the line of descent (LOD) of the population is reconstructed [42,43], by picking a random organism of the final population and following its ancestry all the way back to the starting organism.This is possible because no recombination occurs between genotypes: descent is entirely asexual.The LOD recapitulates the evolutionary dynamic of the population, because it contains the successive list of genotypes that have achieved fixation in the population.Because the population size is large, only a small fraction of mutations (on the order 1/N where N is the population size) find themselves on the LOD by chance.Thus, the LOD reflects the selective pressures operating on the population, and the fixed point of the evolutionary trajectory faithfully characterizes these pressures.The ancestral genotype that anchors all lines of descent is given by the random strategy p C = 0.5 and p P = 0.5.Because there is only one species in these populations, the individual LODs of the population coalesce to a single LOD fairly rapidly (which is why it is sufficient to pick a random genotype for following the LOD).In other words, the most common recent ancestor of a population is invariably recent.To be certain that we deal with LODs that have coalesced when calculating strategy fixed points from the LOD, we routinely discard the last 50,000 updates (about 1,000 generations) from every run.When determining evolutionary fixed points for the trajectory, we also discard the first half, as the population trajectory may still be transient.

Evolutionary trajectories and fixed points
As the strategies adapt to the environmental conditions (specified by the parameters that define the game, including the neighborhood size, the mutation rate, and the replacement rate), the probabilities change from their initial values (p C , p P ) = (0.5, 0.5) towards the selected "fixed point" strategy.In order to visualize the evolutionary trajectory of a population, we reconstruct the evolutionary line of descent of an experiment (LOD, see section 3.2), which tells the story of that adaptation, mutation by mutation.While the LOD in each particular run can show probabilities varying wildly, averaging many such LODs can tell us about the selective pressures the populations face.In particular, averaging the probabilities on the LODs after they have settled down, can tell us the fixed point of evolutionary adaptation [39].We determine this fixed point by discarding the first 250,000 updates of every run (the transient), along with the last 50,000 (in order to remove the dependence of the LOD on the randomly chosen anchor genotype) and averaging the remaining 200,000 updates.Note that this fixed point is a computational fixed point only: we do not mean to imply that the population's genotypes all end up on this exact point.Rather, due to the nature of the game and the selective pressures that change as the composition of the population changes, the evolutionary trajectories approach this point and then fluctuate around or near it.Thus, the fixed point reflects the mean successful strategy given the conditions of the game.
We show in Fig. 4  All trajectories originate at (0.5,0.5).We show an average of the LOD of 10 runs each.Here, β = 0.8, γ = 0.2, and µ = 0.02.and 5 all anchored at the random strategy (p C , p P ) = (0.5, 0.5) that was used as the seed strategy for every evolutionary run.We can see that, depending on the synergy (and the values chosen for the cost and effect of punishment), populations evolve towards a cooperating or defecting fixed point, and take different trajectories to get there.For r = 3, synergy is too low to lead to cooperation, and the fixed point of that trajectory is (p C , p P ) = (0, 0), that is, defection.For r = 4, however, the population moves toward a fixed point centered around (p C , p P ) = (0.7, 0.2), that is, players cooperate most of the time.(The location of the endpoint of the trajectory does not depend on the starting point.)Note, however, that the players engage in punishment only sparingly.For r = 5, cooperation is almost fully established, while punishment occurs about 40% of the time on average.However, the average trajectory (average over ten independent runs) only tells part of the story, because at this level of cooperation there is very little difference between a punishing and a non-punishing player (given there are very few players to punish) and as a consequence the punishment gene has begun to drift.An unselected (and thus drifting) probability p P is a uniformly distributed random number, with mean 1/2 and variance 1/12.As p C → 1, the average p P and its variance approach precisely these numbers.
When mapping the strategy fixed point (average strategy on the LOD over 20 independent runs, again discarding the transient and the last 50,000) as a function of the parameters β (effectiveness) and γ (cost) of punishment (defined in section 2) each in the range from 0.0 to 1.0 and at low synergy r = 3.0, we find that defection is the most prevalent strategy on the LOD (see Figure 5A), as was found previously [25,26].When γ = 0 there is no cost associated with the punishment, which implies that the P gene is not under selection and drifts.Thus, for this value of synergy (and lower), we find that the strategy fixed point is defection without punishment, except for the values γ = 0, where punishment is random.
As the degree of synergy increases to r = 3.5, cooperation starts to appear even in this well-mixed population (see Fig. 5B), while it appears as early as r = 2 for sufficiently high β and low γ in the spatial (but deterministic) version of the game, see [25,26].For r = 4 we find players cooperating (p C ≈ 0.8) at high β and low γ which indicates that under conditions where punishment is not very costly or even free, punishment pays off.In addition we notice that the probability to punish increases under the same conditions that allows cooperation (high β and low γ, that is high impact, low cost of punishment), indicating that punishment is indeed used to enforce cooperation (Fig. 5C).The mean punishment probability grows to 0.5, but at the same time the variance shows that this gene is not under selection (as long as γ = 0).
Increasing the synergy level even higher towards r = 4.5 we witness the emergence of dominance of cooperation (p C > 0.5) for most of the range of punishment cost and effectiveness, see Figure 5D.At the same time the punishment probability reaches 0.5 for a larger range of parameters, but the mean punishment probability on the LOD never exceeds 0.5, implying that full persistent punishment is not stable, and probably not necessary.Note that, in an implementation where decisions are deterministic (such as in the implementation of Helbing et al. [26]), punishment may remain for a long time in the population even though it is not selected anymore.In that case, players that cooperate with and without punishment have exactly the same fitness, and one or the other strategy should only dominate by drifting to fixation neutrally, a process that can take a significant amount of time in large populations such as those studied in Ref. [26].

Critical dynamics and the role of punishment
Previously, a phase transition between cooperative and defective behavior in the Public Goods game as a function of the synergy r was observed for the spatial version [25,28,29] of the game (but not the well-mixed version).We can study the critical point and its dependence on punishment in detail in the well-mixed version of the game, where analytical predictions (as outlined above) are available.We show in Fig. 6 the average probability to cooperate (solid line) and to punish (dashed line) as a function of synergy for our default values γ = 0.2 and β = 0.8.Cooperation sets in at r = 4 and becomes prevalent for synergies just exceeding that.
We will now study how punishment affects the critical point.The average probability  Mutation rate is set to µ = 0.02 per probability throughout.A: For r = 3, cooperation does not evolve except when punishment is free (γ = 0), and even then only if punishment is very effective (β close to 1).At γ = 0, the punishment gene drifts neutrally.B: For r = 3.5 defection is still the predominant strategy except for very low γ and high β.C: At r = 4, cooperation is fully established for low γ and high β, but not for medium values.D: For r = 4.5 cooperation is the dominant strategy for all values of the cost γ, and for high effect (β > 0.75).Note that the average punishment probability p P never exceeds 0.5 (the value achieved when the gene drifts neutrally). of cooperation in Fig. 6 shows the typical behavior of an order parameter as a function of the critical parameter r.It is instructive to run a control of the experiment where punishment does not exist.If we force p P = 0, cooperation does not set in until r = 4.5 (see inset in Fig. 6) and only becomes dominant at r = 5.Thus, although punishment is sporadic when it is possible-and drifts when cooperation is established-it is essential to lower the critical barrier for cooperation.The probability distribution of the punishment gene throughout the population (Fig. 7) shows that punishment is never prevalent: it is absent below the critical point, while the distribution is close to uniform (because of drift) above it.In a sense, punishment catalyzes the transition from defection to cooperation.Note also that the levels of cooperation achieved (at a given r) are significantly higher when punishment exists, even though punishment is only weakly selected for.Apparently, the possibility of punishment alone is sufficient to enforce higher levels of cooperation, but the mechanism for this enforcement is not immediately clear as punishment is rare above the critical point.
In section 2 we calculated approximately the point at which cooperation is favored in a mean-field approach that does not take mutations into account, by writing Eqs.(3)(4) in terms of the density of cooperators ρ C encountered by players in a group, and found that cooperation was favored as long as This equation (which also follows from a replicator equation approach) implies that the emergence of cooperation depends crucially on the density of punishers.In fact, the mean-field theory predicts that cooperation in the absence of punishment is favored only at r = 5.We see cooperation emerge quite a bit earlier than that in our simulations (see inset in Fig. 6), but crosses p C = 0.5 very close to r = 5, as predicted by the mean field theory.Of course, the departure from the mean-filed theory results is a consequence of the finite population size of the simulations.We can test Eq.( 17) explicitly by finding the critical r at which p C crosses 0.5 for simulations in which the punishment probability is held fixed, so that ρ P ≈ p P .To find the critical point, we performed 100 simulations each at fixed r with small increments ∆r and interpolated the data within the steep portion of the transition to find the crossover point.The curves in Fig. 8 show that the steepness of the transition between cooperation and defection depends on the level of punishment, changing from a dependence reminiscent of second-order transitions (at vanishing punishment) towards a first-order-like transition at high punishment.We plot the critical line r c = (k + 1)(1 − βρ P ) in Fig. 8 for k = 4 and β = 0.8 (r c = 5 − 4p P ).The mean field theory reproduces the simulated r c within errors.The prediction in fact works just as well for other parameter values: we tested k = 8 (each agent plays with eight random other agents) and readily observe that the critical value is given by r c = 10 − 8p P (data not shown).
Because of the crucial importance of punishers in determining the synergy level at which cooperation emerges, the Public Goods game with a genetic basis (that is, with genes coding for probabilities of moves) implies curious dynamics close to the critical point.Below the critical point, defection is a stable strategy, and punishment is absent.When cooperation emerges as a possibility, punishment becomes more and more important, leading to a lowering of the critical synergy for cooperation via Eq.(17).At that point, cooperation emerges rapidly and decisively once a critical level of punishment has been achieved.Once cooperation is dominant and defectors are all but driven to extinction, punishment becomes irrelevant and the gene for punishment begins to drift.As this happens, the fraction of punishers drops, thus raising the critical synergy according to Eq. ( 17).As a consequence, a drifting punishment gene can lead to the sudden reemergence of defectors as stable states.Once those have taken over, the reverse dynamics begins to unfold.Given this dynamic, we should observe periods of cooperation and defection that follow each other closely when the synergy is near the critical point.These dynamics are reminiscent of the phenomenon of supercooling and superheating in certain phase transitions observed in condensed matter physics, as predicted in section 2. If we imagine the synergy parameter r as the critical parameter and the mean probability to cooperate as the order parameter, it is possible that when r is slowly increased, the population remains in the defecting phase because a switch to cooperation requires a critical number of cooperators as a "seed".In such a situation, the defecting phase is unstable to fluctuations.If a critical number of cooperators emerges by chance, punishment immediately becomes effective against defectors, lowers the critical point as implied by Eq. (17), and the population could transition to cooperation very quickly.A hallmark of such bi-stable systems that require nucleation events in order to transition is hysteresis, a phenomenon where the state of the system depends on its history.We can test whether hysteresis exists in the Public Goods game (and whether the strength of this effect depends on the probability to punish), by adiabatically changing the synergy parameter first from low to high (transitioning from defection to cooperation), and then adiabatically back from high to low.While we see evidence of hysteresis even when punishment is absent (Fig. 9A), the effect is much more pronounced when punishment is possible (Fig. 9B).The population moves from cooperation to defection at about the expected critical synergy r crit ≈ 4.15 as r is decreased, but stays in the defecting phase much beyond the critical point as r is increased.The observed hysteresis effect implies that once cooperation is established, it can be maintained even when the expected synergy fluctuates below the critical point, but that cooperation is difficult to establish even when the synergy would be conducive for that establishment.It also explains why levels of cooperation are higher when punishment is possible, even if punishment is used sparingly.In super-critical phase transitions, bubbles of the new phase increase in size exponentially if larger than a critical size, but shrink exponentially when smaller than the critical size [44].Thus, if a group invades with ρ in C > ρ crit , ρ C → 1.This is different from the dynamics in the absence of punishment, where at the critical point all ρ in C have the same fitness, and the mean level of cooperation is 0.5, as is evident in Fig. 4 (inset), and Fig. 8A.Indeed, the critical point for ρ P = 0 (no punishment) is neutral, while it is a repulsive fixed point when punishment is present.As a consequence, the phase transition as a function of r becomes steeper and steeper as punishment increases, and higher levels of cooperation are achieved.
This behavior is strongly reminiscent of phase transitions in ferromagnetic systems where the transition is second order in the absence of a magnetic field, but becomes firstorder when magnetic fields are present.This suggests that a treatment of Public Goods games in terms of Ising-like models where punishment plays the role of a magnetic field forcing the alignment of spins should be possible, and we are currently pursuing such an approach [45].

Discussion
We studied the Public Goods game for well-mixed populations both theoretically and in agent-based simulations of Darwinian evolution of stochastic strategies, using genes that encode the probabilities for cooperation and punishment.It is known that punishment can drive the evolution of cooperation above a critical synergy level as long as there is a spatial structure in the environment [25,26].It was also previously believed that in wellmixed populations cooperation via punishment can only become successful if additional factors like reputation [22] or the potential for abstaining from the public good [29,31] are influencing the evolution.Here we show that cooperation readily emerges in a well-mixed environment above a critical level of synergy.This critical level is influenced by a number of factors: the rate of punishment because punishment favors cooperating groups, but also spatial structure [25,28,29], because a single cooperator can nucleate a transition simply because offspring cooperators are placed next to it, giving rise to a "bubble" of cooperators of sufficient size.
We conclude that in well-mixed populations cooperation can emerge if the synergy outweighs the defectors' reward, which is reduced by punishment.A punishmentdependent barrier to cooperation introduces an interesting dynamic near the critical synergy.Starting in the cooperative phase, as long as the mutation rate is low enough the dearth of defectors in the cooperating phase makes punishment obsolete, that is, the selective pressure to punish disappears.As a consequence, the density of punishers decreases, thus increasing the critical point in turn.If the critical synergy has decreased sufficiently, defectors can again gain a foothold.Such a shift, however, reinstates the selective pressure to punish, leading to a re-emergence of moralists that can drive defectors out once more.Thus, for synergy factors near the critical point, we can expect oscillations between cooperators and defectors, and no strategy is ever stable.
Finally, the observation of hysteresis implies the existence of metastable states and "supercritical" phases, and gives rise to a self-enforcing (or "self-aligning") dynamic where cooperation is stable even when punishment is never actually used.It is clear that super-critical dynamics can only occur in the shaded region in Fig. 3, which is set by the probability π with which cooperators punish, and provides a mechanism to protect cooperating groups from defectors, as the defectors need to achieve a critical density in order to thrive.In a very real way, super-criticality raises the scepter of punishment to maintain cooperation, even when it is not used.

Figure 5 .
Figure 5. Mean probabilities for cooperation p C and punishment p P at the evolutionary fixed point.These graphs show the fixed point (averaged over 20 LODs) as a function of the cost of punishment γ and the effectiveness of punishment β, for different values of the synergy r.Left panel: probability to cooperate p C , right panel: probability to punish p P .Note the inversion of the β and γ scales for better visibility.Mutation rate is set to µ = 0.02 per probability throughout.A: For r = 3, cooperation does not evolve except when punishment is free (γ = 0), and even then only if punishment is very effective (β close to 1).At γ = 0, the punishment gene drifts neutrally.B: For r = 3.5 defection is still the predominant strategy except for very low γ and high β.C: At r = 4, cooperation is fully established for low γ and high β, but not for medium values.D: For r = 4.5 cooperation is the dominant strategy for all values of the cost γ, and for high effect (β > 0.75).Note that the average punishment probability p P never exceeds 0.5 (the value achieved when the gene drifts neutrally).

Figure 6 .
Figure 6.Mean probability of cooperation and punishment.Probability of cooperation p C (solid, left scale) and probability of punishment p P (dashed, right scale) with adaptive punishment at the evolutionary fixed point of the trajectory, as a function of the synergy r (β = 0.8, γ = 0.2, µ = 0.02, 100 replicates for each data point).The probability to cooperate when punishment is forced to zero (p P = 0) is shown in the inset.

Figure 7 .
Figure 7. Histogram of the punishment probability distribution.Punishment probability distribution in a typical equilibrated population, just before the critical point (r = 4, black), at the critical point (r = 4.15, grey), and above r crit (r = 4.5, white).

Figure 8 .
Figure 8. Critical point at fixed punishment for k = 4. A: Mean probability to cooperate averaged over 100 independent lines of descent (average over 200K updates, discarding the first 250K and the last 50K as described in section 3.2, as a function of synergy r for fixed (unevolvable) probability of punishment p P =0.0, 0.25, 0.5, 0.75, and 1.0.The dashed line indicates a mean probability to cooperate p C = 0.5, which we us to extrapolate the critical value r c .This critical value depends on the punishment levels as predicted by Eq. (17).B: Critical synergy r c as a function of punishment probability p P as deduced from panel A (points) by identifying the r crit at which the cooperation probability p C = 0.5.The dashed line indicates the prediction r c = 5(1−βρ P ), assuming that the density of cooperators ρ P ≈ p P (mean field), with β = 0.8 and γ = 0.2.

Figure 9 .
Figure 9. Hysteresis effect from punishment.Population fraction of cooperators (measured as the density of non-punishing cooperators plus the density of moralists) as a function of synergy r when r is adiabatically changed from low to high values (solid), and back from high values to low values (dashed).All population fractions are started at 0.5 (either at the high or low end of r).The lines show the average over 100 runs.Standard error is of the size of the fluctuations.