Competition between self- and other-regarding preferences in resolving social dilemmas

Evolutionary game theory assumes that individuals maximize their benefits when choosing strategies. However, an alternative perspective proposes that individuals seek to maximize the benefits of others. To explore the relationship between these perspectives, we develop a model where self- and other-regarding preferences compete in public goods games. We find that other-regarding preferences are more effective in promoting cooperation, even when self-regarding preferences are more productive. Cooperators with different preferences can coexist in a new phase where two classic solutions invade each other, resulting in a dynamical equilibrium. As a consequence, a lower productivity of self-regarding cooperation can provide a higher cooperation level. Our results, which are also valid in a well-mixed population, may explain why other-regarding preferences could be a viable and frequently observed attitude in human society.


Introduction
How can cooperation survive if defection promises a larger individual income?This is the central question of evolutionary game theory, which seeks to understand the subtle interactions among individuals with conflicting interests [1,2].In a simplified situation, individuals face two choices: contributing to a common pool or not.The decision determines their individual payoff and, consequently, their success.This scenario forms the core of the public goods game, where individual and collective interests are in conflict within a group [3,4,5].According to the principle of Darwinian selection, a successful strategy spreads more easily, leading defection to become the dominant strategy among selfish individuals [6].The resulting evolutionary process would lead to the so-called "Tragedy of the Commons" [7], which is frequently against our real-life experiences [8,9].
Over the last decades, several insightful approaches have been proposed to unravel this mystery.These solutions encompass sophisticated strategies [10,11], the presence of various mechanisms [12,13,14,15,16,17], other incentives to support cooperation [18,19,20,21], and specific conditions, such as the extensive application of network theory in structured populations [22,23,24,25,26,27,28].It is almost impossible to list all of them in a short Introduction.Instead, readers seeking more comprehensive information are directed to review papers that cover these topics in detail [29,30].The focus of the present work is on a significant, though somewhat overlooked, aspect: when individuals choose not to maximize their own payoff.Instead, the modified goal is to improve the collective income of co-players [31,32,33].
This other-regarding preference is frequently observed not only in human society but also across the animal kingdom [34,35,36,37].The mentioned behavior was not simply studied by model calculations but also comprehensively reviewed from an economic perspective [38,39,40,41,42].Apparently, if all participants adopt this approach, cooperators stand a better chance of survival.However, the critical question remains whether such a strategy updating preference can emerge and persist as a result of evolutionary processes.To address this, we propose a model in which both traditional self-regarding players and the introduced other-regarding ones are present.Both defectors and cooperators can exhibit these preferences, leading to a four-profile model.We systematically explore the full parameter space in this modified public goods game model, where the collective productivity of these preferences vary.
We find that other-regarding preferences are more effective in promoting cooperation than self-regarding preferences at equally strong productivities.Importantly, in a specific parameter region where the productivities of two preferences differ, a new phase arises where cooperators with both preferences coexist.When the system is in this phase, a lower productivity of self-regarding cooperators can result in a higher general cooperation level.To check the robustness of our observations, we also study well-mixed populations, where the conclusions remain consistent.
The organization of this paper is as follows.We first present the model in Sec. 2 and then proceed with our observations and their explanations in Sec. 3. Section 4 contains our conclusions and a discussion of their implications.Last, we end with Appendix A where the details of calculations in a well-mixed population are provided.

Model
In the extended model, players are categorized based on two unconditional strategies, cooperation (C) and defection (D), used in the classic model, as well as two preferences in updating strategies: self-regarding (0) and other-regarding (1).In this way, there are four profiles: cooperation and self-regarding preference (C 0 ), cooperation and otherregarding preference (C 1 ), defection and self-regarding preference (D 0 ), and finally defection and other-regarding preference (D 1 ).Technically each player's profile is denoted by a vector, s i = (s ).The element corresponding to the player's profile is set to 1, while the remaining three components are 0.For example, if an agent adopts cooperation and self-regarding preference, its profile is represented as s i = (1, 0, 0, 0), whereas the profile of a cooperator with other-regarding preference is represented as s i = (0, 1, 0, 0), and so on.
In the spatial public goods game framework [43,44,45,46,47,48], we consider a square lattice of size L × L and population N = L 2 , where each agent i occupies a node and forms a group Ω i with size G = 5, consisting of itself and its four nearest neighbors.Consequently, agent i is also a member of the groups Ω j of its neighbors j ∈ Ω i \ {i}.In the public goods game of the group centered on j, agent i contributes a cost c > 0 if cooperating or contributes nothing if defecting.The contributions from all group members k ∈ Ω j are enhanced by a synergy or productivity factor r > 1 and evenly distributed among all G members.Furthermore, the payoff π (0) i of agent i is the average over the games played in the G groups, given by To distinguish between agents with self-and other-regarding preferences, we introduced a technical modification in Eq. ( 1), where the diverse productivity of cooperation with self-and other-regarding learning are characterized by r 0 and r 1 (both r 0 > 1 and r 1 > 1), respectively.The average payoff of agent i's nearest neighbors is denoted by π i and is calculated as π If agent i adopts other-regarding preference, it rescales the original payoff by weighing its own payoff and the payoff of neighbors.The payoff of i's neighbors is weighted by 0 ≤ u ≤ 1, and the payoff of agent i is weighted by 1 − u, where u is the other-regarding rate.Accordingly, the rescaled payoff is denoted by π(1) i , which is given by π(1) i .
An agent i pursues a higher π(1) i when engaging in other-regarding learning, and prefers a higher π (0) i when engaging in self-regarding learning.This is captured by the following updating process [49].At each elementary Monte Carlo step, a random agent i and a random one of its neighbors i ′ ∈ Ω i \ {i} are selected, and their payoffs are calculated.If agent i has other-regarding preference (i.e., s i = (0, 1, 0, 0) or s i = (0, 0, 0, 1)), then i adopts the profile of i ′ with probability If agent i has self-regarding preference (i.e., s i = (1, 0, 0, 0) or s i = (0, 0, 1, 0)), then i adopts the profile of i ′ with probability In Eqs. ( 4) and ( 5), a higher π(1) i ′ makes it more likely for agent i to imitate the profile of i ′ .To gain results comparable with previous studies [43,50,51,52,53], we set κ = 0.1, the noise parameter in both probability functions.Importantly, the motivation of agents is complex: the update happens to the entire profile vector s i ← s i ′ .If agent i has other-regarding preference, it compares the rescaled neighboring payoff between the reference neighbor and itself, in an attempt to learn the profile that brings others higher payoff.This is independent of the reference profile: if "self-regarding preference" is found to bring others a higher payoff, then agent i switches to "self-regarding preference" in pursuit of its current purpose of other-regarding preference.Similar reasoning holds if agent i adopts self-regarding preference: when the reference neighbor i ′ has a high personal payoff with other-regarding preference, agent i adopts both the strategy and other-regarding preference from i ′ in order to achieve the purpose of a higher personal payoff.
The model can reduce to the classic spatial public goods game when r 0 = r 1 ≡ r in Eq. ( 1) and u = 0 in Eq. ( 3).If r 0 = r 1 , there is no difference between the productivities of self-and other-regarding cooperation.If u = 0, we have π(1) The other-regarding preference is not truly otherregarding and becomes indistinguishable from self-regarding preference.If u = 1, we have π(1) i .The modified preference is completely other-regarding, and these agents only pursue the higher payoff of neighbors.Overall, investigating 0 < u < 1 allows for an examination of the gradual change of other-regarding agents from selfregarding to complete other-regarding preferences, and studying r 0 ̸ = r 1 can provide additional insights by directly adjusting the relative advantages of each.
To keep the results comparable with previous studies, some parameters, including group size G = 5, cost c = 1, and selection noise κ = 0.1, are fixed in the simulations.The typical population size is L×L = 300×300, which allows us to avoid finite-size effect.The evolution starts from a state where the four profiles are distributed randomly.The total running time is at least 10 5 full Monte Carlo steps where the stationary quantities are averaged in the last 2 × 10 4 full Monte Carlo steps.The measured quantities are as follows: the fraction of self-regarding cooperation ρ C 0 = N i=1 s

and the fraction of cooperation ρ
)/N .The presented stationary quantities are robust by changing system size and simulation time.To close the description of our model definition, we note that it is possible to use an alternative model setup where pure strategies and other player-specific tags can have different time scales to change [54,55].However, in our present model, such an extension would cause unnecessary complexity; hence, we keep the original setup focused on the proper competition of self-and other-regarding preferences.

Equally strong preferences
As a natural entry point, we study a special case where the productivities of selfand other-regarding cooperation are equal, r 0 = r 1 ≡ r.According to Eqs. ( 1 i holds when u > 0. Therefore, the learning probabilities P1 (s i ← s i ′ ) (other-regarding) and P 0 (s i ← s i ′ ) (self-regarding), determined by Eqs. ( 4) and ( 5), are distinct.However, we focus on the effect of other-regarding rate u on the general cooperation level (ρ C ).To this end, we aggregate the fractions of cooperation, as well as defection, across both preferences.The resultant phase diagram on the parameter plane of productivity r and other-regarding rate u is shown in Fig. 1(a).We can distinguish three phases here.As expected, defectors dominate at low r values, regardless of u.At high r values, cooperators prevail.Located between these solutions is a mixed Phase Notes C + D phase where both strategies coexist.The critical r value, at which cooperation becomes dominant, decreases as u increases, reinforcing the expectation that otherregarding preferences support cooperation.Cross-sections obtained at different u values are shown in Fig. 1(b).These curves confirm that an increase in other-regarding rate u helps cooperation even in the mixed phase at a constant r, which supports a previous conclusion that "standing in others' shoes promotes cooperation" [37].Finally, at u = 0, we have π i according to Eqs. ( 1)-( 3), which makes self-and other-regarding preferences indistinguishable, reducing the results to those obtained in the traditional spatial public goods game [43].

Unequally productive preferences
A question of interest is whether other-regarding preferences can emerge in the presence of self-regrading preferences and what solutions arise when both preferences compete.To provide a general answer, we remove the artificial constraint of r 0 = r 1 and allow for r 0 ̸ = r 1 , the productivities vary among cooperators with different preferences.First, we use u = 1, where the potential impact of other-regarding preference is maximal.In other words, players with profiles of (0, 1, 0, 0) or (0, 0, 0, 1) completely ignore their individual income and are solely influenced by the average payoff of their neighbors.The resulting phase diagram is shown in Fig. 2, where dominant solutions are marked on the r 0 -r 1 plane.These solutions are color-coded and explained in "Phase Notes." For largely different values of r 0 and r 1 , the results resemble the patterns obtained in two-profile models.When the productivities for both preferences are low, the system evolves into a full defection state, marked as (D 0 ) D 0 +D 1 .Here, the microscopic dynamics becomes a neutral drift when the last cooperator dies out because π i = 0 for all players.However, this voter-model-like coarsening is not fully symmetric because a higher starting portion for the D 0 profile ensures a higher fixation probability of D 0 .This is why we denoted this phase as "D 0 " with D 0 +D 1 subscript.As r 0 or r 1 increases, the system evolves into the C 0 + D 0 or C 1 + D 0 phase, where self-or other-regarding cooperation coexists with self-regarding defection by spatial reciprocity.As r 0 or r 1 becomes sufficiently large, the system enters the full C 0 or full C 1 phase.It is noteworthy that the emergence of other-regarding cooperation C 1 requires a smaller productivity r 1 compared to the necessary r 0 for the emergence of self-regarding cooperation C 0 .The reason is straightforward: other-regarding cooperation cannot easily transform into self-regarding defection, as the latter does not benefit neighbors.
We also note that the "other-regarding defection" profile cannot survive in evolution according to the full phase diagram shown in Fig. 2. Importantly, individuals have no preconceived moral cognition about behaviors in evolutionary dynamics.The "defection" and "other-regarding" refer to the process of playing games and updating profiles, thus belonging to independent dimensions and do not conflict.Yet, evolution does not favor this profile.This is because that the other-regarding preference aims to maximize the payoff of neighbors, which defection cannot achieve.When faced with C 0 or C 1 , other-regarding defection D 1 easily transforms into C 0 or C 1 , while the reverse process is difficult.When faced with D 0 , the transformations follow a neutral drift, but before extinction, C 0 or C 1 players can beat most D 1 neighbors, leading to a larger initial fraction of D 0 and hence a higher fixation probability for D 0 .
The phase diagram of Fig. 2 also shows that the C 1 + D 0 phase (light green) crosses the r 0 = r 1 diagonal marked by a white dashed line.It means that other-regarding preferences can beat self-regarding ones even if the latter have higher productivity.This phenomenon is quite impressive and answers our original question: other-regarding preference can emerge even in the presence of self-regarding players because it better stabilizes the coexistence with defection especially in the low-productivity interval.
The phase diagram reveals a new phase near the r 0 = r 1 diagonal, where r 0 slightly exceeds r 1 , and C 0 , C 1 , D 0 coexist.To get a deeper look at the phase transitions, we present some cross-sections of the phase diagram in Fig. 3. Here, the stationary ρ C 0 , ρ C 1 , and ρ D 0 are shown as functions of r 1 or r 0 .The top panels show two representative horizontal cross-sections.When self-regarding productivity r 0 is high enough to allow the emergence of self-regarding cooperation, but not sufficient to exclude defection, two scenarios can be observed.Panel (a) shows the case when r 0 is relatively small (r 0 = 3.8).This productivity level allows self-regarding cooperation to coexist with defection.If we increase other-regarding productivity r 1 , then C 1 players can replace C 0 players via a discontinuous phase transition.Notably, the fraction of C 1 players is significantly higher than the fraction of C 0 players on the other side of the transition point, which is of great importance as we will explain it later.As we increase r 1 in the C 1 +D 0 phase, the system behavior is similar to the traditional two-profile model.Accordingly, the fractions of defection decrease gradually for larger r 1 and the system terminates in the C 1 phase via a continuous phase transition.Panel (b) illustrates an alternative scenario obtained at r 0 = 4, where the self-regarding productivity ensures a relatively high cooperation level in the C 0 + D 0 phase.Here, the increase in other-regarding productivity r 1 does not lead to a sudden switch to the C 1 + D 0 solution.Instead, the larger r 1 supports C 1 players to form a solution with the other two profiles.Further enlarging r 1 could also be efficient in supporting C 1 , and the system terminates in the full C 1 phase via a continuous phase transition.The bottom panels of Fig. 3 present vertical cross-sections of the phase diagram.We can see a seemingly counter-intuitive phenomenon: a higher synergy does not necessarily result in a higher cooperation level.For example, in Fig. 3(c), obtained at r 1 = 3, the fraction of defectors suddenly jumps as r 0 exceeds a critical value, and the system enters the C 0 +D 0 phase.Naturally, a further increase in r 0 within this two-profile phase results in a decay of ρ D 0 , and the system finally reaches the full C 0 phase via a continuous phase transition.Similarly, at r 1 = 3.5 [Fig.3(d)], as r 0 increases, the system exits the C 1 phase and enters the C 0 + C 1 + D 0 phase, where the defection level gradually increases with r 0 .Further increasing r 0 leads to the C 0 + D 0 phase, where we see similar system behavior discussed in panel (c).To explain, high self-regarding productivity prevents the reproduction of other-regarding cooperation with low productivity: self-regarding cooperation with high productivity bring higher payoffs to their neighbors, such that other-regarding cooperation transform into them.However, other-regarding cooperation can better suppress defection: with low other-regarding productivity, cooperation can flourish more than with high self-regarding productivity.If self-regarding cooperation has a high productivity, then it prevents other-regarding cooperation from utilizing this ability, thus reducing the fraction of cooperation in the system.Therefore, to better cooperation, self-regarding cooperation should sometimes keep their productivity low to make way for other-regarding cooperation.Next, we discuss the dynamics in the previously mentioned three-profile phase.By taking a representative combination of parameters, r 0 = 4.0, r 1 = 3.5, we present the time evolution of different profiles in Fig. 4(a).It indicates that C 0 , C 1 , and D 0 players form a stable solution after D 1 players die out.To stress the difference between the initial evolutionary and the final stationary state, we use semi-log plot in this panel.Panel (b) captures a representative snapshot in the pattern formation in the stationary state.This is taken at a relatively small system size, L × L = 100 × 100, but our goal is to present the critical elements of the invasion process in detail.We use a color-coded presentation for different profiles as shown on the right-hand side.The first comment is that each of the C 0 + D 0 and C 1 + D 0 phases, marked by ellipses, would be a stable solution in the absence of the other at these parameter values.They are based on the network reciprocity mechanism observed by Nowak and May [22].It is crucial, however, that the fractions of defection differ significantly in the domains controlled by self-and otherregarding cooperation.Since other-regarding cooperation can better suppress defection even with low productivity r 1 = 3.5, only tiny "cracks" of defection can survive in the C 1 + D 0 regions.Instead, in the C 0 + D 0 regions with relatively high productivity, r 0 = 4.0, of self-regarding cooperation, defectors can exploit C 0 players, opening larger "cracks."Other-regarding cooperation can enter these larger defection cracks and then keep smaller cracks that they can, which is the way C 1 + D 0 regions invade C 0 + D 0 .This process is shown by white arrows in Fig. 4(b).We may say that the D 0 profile plays as a "Trojan Horse" when C 1 + D 0 invades the area of C 0 + D 0 .Conversely, the way how C 0 + D 0 regions invade C 1 + D 0 is more straightforward: self-regarding cooperation C 0 , with its higher productivity r 0 , is able to directly beat other-regarding C 1 whose productivity r 1 is lower.This process is shown by a black arrow in Fig. 4(b).After C 0 agents invade C 1 , they are not able to suppress the small cracks of defection in the previous C 1 + D 0 regions.The uncontrolled defection expand and finally reach a balance with self-regarding cooperation, where the cracks of defection are large enough and, in turn, open up an opportunity for the invasion of C 1 .The process is a continuous loop, forming a "dynamic equilibrium," as shown in the mini diagram below Fig. 4(b).
In the above discussed explanation how a three-profile solution emerges, a fundamental point is that self-and other-regarding cooperators are not equally successful in suppressing defectors.In other words, C 1 players do it better, and the difference in the portions of D 0 players in these domains is the driving force behind the new threeprofile solution.This argument can be easily verified if we reduce the effectiveness of other-regarding preferences and explore the phase diagram again.Obviously, if we decrease the other regarding rate to u = 0.5, the difference between self-and otherregarding preferences becomes smaller.In this case, the other-regarding preference is not completely other-regarding but rather in the middle ground between completely self-and other-regarding.The corresponding phase diagram on the r 0 -r 1 plane is shown in Fig. 5. Compared to Fig. 2, the changes are clear and they confirm our expectations.First, the critical r 1 separating the full defection and two-profile phases shifts towards a higher value, indicating that C 1 becomes less strong against defection.In other words, the distribution of self-and other-regarding phases is more symmetrical around the diagonal (white dashed line) in Fig. 5.The C 1 + D 0 phase (light green) still crosses the r 0 = r 1 diagonal but not as extensively as for u = 1, which is also in agreement with our expectation.We also find that the C 0 + C 1 + D 0 solution on the parameter plane is significantly reduced at u = 0.5.When self-and other-regarding preferences are less distinct, the new solution based on their differences has a smaller chance to emerge.

Well-mixed populations
General evolutionary game dynamics under arbitrary selection noise have been proved unfeasible for simple analytical solutions in structured populations [56], especially under such endogenous behavior-dependent learning rules.However, mathematical results are essential to explore the robustness of our observations.To achieve this, we can consider an infinite and well-mixed population, which is analytically feasible.
In a well-mixed population, an individual's co-players are randomly selected from the population each time.Individuals interact with these random co-players, playing games and updating profiles.The calculation of self-regarding payoff is the same as in the traditional model.The other-regarding payoff follows our model setting, obtained by averaging the self-regarding payoff that an individual brings to its co-players.Then, different profiles transform into each other at the rate P 0 or P1 , in the same way as in structured populations.By using the methods of replicator dynamics, we have obtained the corresponding dynamical equations of different profiles' frequencies at a theoretical level (Eq.(A.11) in Appendix A).Solving the equilibrium points and analyzing the stability (Appendix A.4) lead to theoretical phase diagrams on parameter planes.We have proved that the phase boundaries are consistent across any selection noise κ.
The phase diagram on the r 0 -r 1 plane in an infinite and well-mixed population is shown in Fig. 6(a), where we set u = 1 numerically.The presented phases are derived from the parameter ranges of different stable equilibrium points.In the (D 0 phase, parameters satisfy r 0 < G r 1 < (1 − u/2)G.The system may equilibrate where D 0 and D 1 coexist, but a hypothetical appearance of C 0 or C 1 leads to the ultimate extinction of D 1 and a full D 0 state.In the C 0 phase, only C 0 survives, and the boundaries are r 0 > G and r 0 > r 1 .In the C 1 phase, only C 1 exists, and the phase boundaries are r 1 > (1 − u/2)G and r 1 > r 0 .In the C 0 + C 1 + D 0 phase, three profiles, C 0 , C 1 , and D 0 form a dynamic relation, in the parameter range r 0 < G, r 1 > (1 − u/2)G, and r 1 < r 0 .These analytical phase boundaries are also marked in Fig. 6(a), and we can see that the phase diagram is qualitatively consistent with the previous simulation in structured populations.The effectiveness of C 1 players against defection and the emergence of the three-profile solution along the diagonal on the r 0 -r 1 parameter plane are robust and generally valid.
Figure 6(b) shows a typical time evolution of the profile frequencies ρ C 0 , ρ C 1 , ρ D 0 , and ρ D 1 in the C 0 +C 1 +D 0 phase.As analyzed in Appendix A.4, none of the equilibrium points are stable in this phase, and the system state periodically oscillates around the C 0 + C 1 + D 0 equilibrium.In structured populations, we previously described this phase as dynamic equilibrium, and here we provide a theoretical understanding of it in wellmixed populations.It is also worth noting that the time evolution functions vary with selection noise κ, but the parameter space over which the periodic oscillation occurs (delineated by phase boundaries) is still independent of κ.
To complete our study, we show the corresponding phase diagram at u = 0.5 in Fig. 6(c).We can see that the dynamic C 0 + C 1 + D 0 phase occupies smaller parameter space, similar to the phenomenon revealed in structured populations by Fig. 5. Here, the theoretical phase boundary r 1 = (1−u/2)G in well-mixed populations switches from r 1 = G/2 (at u = 1) to r 1 = 3G/4 (at u = 0.5), thus shrinking the dominant area of the C 0 + C 1 + D 0 phase.

Conclusion
Understanding the emergence of cooperation is a fundamental challenge across various scientific disciplines, from social sciences to biology.The most complex explanations involve individuals with cognitive abilities.In such scenarios, solutions may rely on terms and concepts emerging from a long social evolutionary process.Concepts like morals or reputation, which undoubtedly support cooperation, require learning by individuals.The preference to prioritize collective benefits over individual outcomes also represents a sophisticated concept resulting from social learning processes.The primary goal of this study is to demonstrate that such complex behaviors and preferences might evolve spontaneously without the need for additional assumptions.
We utilize a four-profile model, allowing players to choose their worldview.Specifically, when deciding on behavior changes, they can opt for a traditional selfregarding preference, focusing individual payoff, or an other-regarding preference that aims to optimize the income of others.Unlike approaches that rely on reputation or morals, our model makes no assumptions about the inherent value of these options, avoiding direct support for cooperation or other-regarding preferences.Instead, we let these concepts compete within the diverse parameters of a public goods game.We define the combination of basic strategy and preference as a "profile" and track the evolution of these four profiles.Notably, our model does not confine itself to the unrealistic scenario where different preferences operate with identical productivity.By introducing two productivity factors, we explore an expanded model where either self-or otherregarding cooperation is more productive.
Our key findings indicate that other-regarding preferences can emerge and dominate spontaneously, even when self-regarding cooperation is more productive.This suggests that individual and collective interests are not inherently in conflict.An other-regarding player disregards personal income, focusing instead on the welfare of other group members.Yet, in the end, all participants, including the focal individual, achieve higher payoffs than they would under a self-regarding preference.In this way, our simple model is capable of explaining the real-life observations that strongly unselfish acts among competitors could be favored by evolution [38,39].
Another intriguing observation is the emergence of a new phase, where self-and other-regarding cooperation coexist with self-regarding defection.This occurs when the productivity of self-regarding cooperation marginally surpasses the other-regarding cooperation, leading to a dynamic equilibrium where both classic solutions engage in ongoing conflict, resulting in stable fractions for three profiles.This dynamic interaction relies on the higher fraction of defection among self-regarding players, allowing other-regarding cooperation to exploit the prevalence of self-regarding defection at the interface.Conversely, due to greater productivity, the self-regarding cooperation profile directly outcompetes other-regarding ones, leading to the reverse process.The robustness of this phenomenon has been confirmed, even in well-mixed populations, through analytical calculations using replicator dynamics.
These results align with research avenues that do not presuppose cooperationsupporting incentives to address the foundational question posed at the start of this paper.Instead, granting individuals the freedom to make choices and approaching the puzzle offers not only a novel approach but also promises broader applicability.
The calculation of other-regarding payoffs is slightly more laborious.According to Sec. 2, the other-regarding payoff of a profile X means the average self-regarding payoff that X brings to its co-players.Similar to Eq. ( 2) in the main text, we can use the following equation to calculate the other-regarding payoff π (1) X (g) of profile X in a game with co-player configuration g: . Intuitively, it means that in X's co-player configuration g, a co-player Y 's co-player configuration in the same group can be expressed by modifying g, counting one less Y and one more X.Then, π ) is the total self-regarding payoff of all co-players in X's co-player configuration g, and the average is obtained by dividing by G − 1.
Applying the general Eq.(A.2) to X = C 0 , we can calculate the other-regarding payoff of profile C 0 in a single game: In Eq. (A.3a), we have utilized the property g Similarly, we can calculate the other-regarding payoff of C 1 , D 0 , and D 1 in a single game: D 0 (g) = π (1) • Rescaled payoff considered by other-regarding preferences The rescaled other-regarding payoff is the weighting between self-and other-regarding payoffs, as shown in Eq. ( 3).Applying Eqs.(A.1) and (A.3), the rescaled other-regarding payoff π(1) X (g) for profile X ∈ S in a single game with co-players g can be written as π( 1)

. Statistical mean payoff
Given the payoff expressions in a single game, we can further calculate the statistical mean payoff of each profile, resulting from multiple games that a player participates each time.In a well-mixed population, the G − 1 co-players are randomly selected from the population for a focal player.Therefore, we introduce the following function, which calculates the statistical mean value of function f (g) through all possibilities of configuration g randomly selected from an infinite well-mixed population.We can apply Eq. (A.6) to calculate the statistical mean self-regarding payoff of profile C 0 : ⟨π (0) Similarly, we can apply Eqs.(A.1b) and (A.1c) to Eq. (A.6) to calculate the statistical mean self-regarding payoff of C 1 , D 0 , and D 1 : We can also use Eq.(A.6) to calculate the statistical mean payoff that a profile brings to its co-players.The ones that we need are the rescaled other-regarding payoffs.points, we can compute the Jacobian matrix: (A.12) If the Jacobian is negative definite, then an equilibrium point is stable; otherwise, it is unstable.Below, we list and discuss each equilibrium after solving ρC 0 = 0, ρC 1 = 0, ρD 0 = 0.For simplicity, after checking their existence, we only show the existing ones here.
• The C 0 equilibrium: ρ * = (1, 0, 0, 0).One solution is ρ C 0 = 1, where profile C 0 dominates.Substituting ρ * = (1, 0, 0, 0) into Eq.(A.12), we have the Jacobian matrix at this equilibrium point: , where 0 ≤ ρ * D 0 ≤ 1.This solution is not a point, but a line.D 0 and D 1 can coexist and equilibrate everywhere on this equilibrium line.To study this type of equilibrium, we can treat the two variables on the equilibrium line as a whole, as some previous work did [57].In this equilibrium, we treat D 0 and D 1 as whole.That is, there are three variables in the system: C 0 , C 1 , and D 0 + D 1 .Furthermore, due to the constraint C 0 + C 1 + (D 0 + D 1 ) = 1, the degrees of freedom decrease to 2. Therefore, the system can be described by ρC 0 and ρC 1 only, as given by Eqs.(A.11a) and (A.11b).The Jacobian matrix of this system at this equilibrium is where

Figure 1 .
Figure 1.Panel (a): phase diagram on the parameter plane of productivity factor r 0 = r 1 ≡ r and other-regarding rate u.The full defection and cooperation phases are separated by a mixed phase, conceptually similar to the traditional PGG model.Panel (b): the fraction of cooperation ρ C as a function of productivity r 0 = r 1 ≡ r at different other-regarding rate u.An increase in other-regarding rate u supports general cooperation.

Figure 2 .
Figure 2. Phase diagram on the parameter plane of self-regarding productivity (r 0 ) and other-regarding productivity (r 1 ).The other-regarding rate is u = 1.The white dashed line marks r 0 = r 1 .When r 0 and r 1 are largely different, we get back to the pattern in simplified two-profile models.Interestingly, other-regarding cooperation can conquer self-regarding cooperation even if r 0 > r 1 .Furthermore, when r 0 slightly exceeds r 1 , a new phase emerges composed of C 0 , C 1 , and D 0 .The nature of this solution is discussed in the main text.

1 Figure 3 .
Figure 3. Horizontal and vertical cross-sections of the phase diagram of Fig. 2 showing the stationary fractions of profiles.Panel (a): fractions as a function of r 1 at r 0 = 3.8.There is a discontinuous phase transition between C 0 + D 0 , C 1 + D 0 followed by a continuous phase transition to the C 1 phase.Panel (b): fractions as a function of r 1 at r 0 = 4.The C 0 + D 0 phase is replaced by C 0 + C 1 + D 0 and followed by the full C 1 phase, and the phase transitions are continuous.Panel (c): fractions as a function of r 0 at r 1 = 3.The transition between the C 1 + D 0 and C 0 + D 0 phases is discontinuous.Panel (d): fractions as a function of r 0 at r 1 = 3.5.The fraction of defection gradually increases through the C 0 + C 1 + D 0 phase as r 0 increases.

Figure 4 .
Figure 4. Two solutions forming a new one.(a) When D 1 dies out, the remaining three profiles form a stationary solution, as their time evolution suggests.(b) A typical snapshot of the dynamic equilibrium in the C 0 + C 1 + D 0 phase on a 100 × 100 square lattice.Both C 0 + D 0 and C 1 + D 0 solutions are stationary alone as the ellipses mark.However, there are permanent and mutual invasions between these phases.The fraction of D 0 players differ in these solutions.C 1 can better suppress D 0 even with low productivity, thus squeezing in from large defection cracks in C 0 + D 0 regions, transforming C 0 + D 0 into C 1 + D 0 , shown by white arrows.C 0 , due to high productivity, invade C 1 , transforming C 1 +D 0 into C 0 +D 0 , shown by a black arrow on the top.The mini diagram of these processes is on the bottom.The arrows represent the direction of invasions.Parameters: r 0 = 4.0, r 1 = 3.5, u = 1.

Figure 5 .
Figure 5. Phase diagram on the r 0 -r 1 parameter plane obtained at other-regarding rate u = 0.5.The C 0 + C 1 + D 0 phase still exists but is significantly smaller.The white dashed diagonal marks r 0 = r 1 .

Figure 6 .
Figure 6.Panels (a) and (c): phase diagram with respect to self-regarding productivity r 0 and other-regarding productivity r 1 in an infinite and well-mixed population with any selection noise κ.Panel (a): u = 1.Panel (c): u = 0.5.Panel (b): a typical time evolution pattern of the system state in the C 0 + C 1 + D 0 phase, with parameters u = 1, κ = 0.1, r 0 = 3.5, r 1 = 3.0.