Evolution of cooperation driven by sampling reward

A social dilemma implies that individuals will choose the defection strategy to maximize their individual gains. Reward is a powerful motivator to promote the evolution of cooperation, thus addressing the social dilemma. Nevertheless, it is costly since we need to monitor all participants in the game. Inspired by these observations, we here propose an inexpensive protocol, a so-called sampling reward mechanism, and apply it to social dilemmas, including public goods game and collective-risk social dilemma. More precisely, the actual usage of reward depends on the portion of cooperators in the sample. We show that the average cooperation level can be effectively improved under high reward threshold and high reward intensity, albeit at the expense of reward cost. It is intriguing to discover that for the latter aspect, there is a critical threshold at which further increases in reward intensity have no significant effect on improving the cooperation level. Moreover, we find that the small sample size favors the evolution of cooperation while an intermediate sample size always results in a lower reward cost. We also demonstrate that our findings are robust and remain valid for both types of social dilemmas.

Yet from a theoretical perspective, the introduction of institutional incentives is a feasible means for promoting the evolution of cooperation.Sasaki and Unemi showed that institutional reward can help maintain cooperation [42].Apart from this, the combination of institutional incentives and scale has drawn a lot of interest.Institutional punishment on both a local and a global scale was explored by Vasconcelos et al [43].In particular, Xiao et al investigated the sampling punishment mechanism, which involves different scales of sanctioning for the population.They found it can help improve the cooperation level under low punishment threshold and small sample size when punishment intensity is considerable [44].
However, in addition to sampling punishment, there is also sampling reward in our daily life.For instance, the government will reward regions or enterprises for their efforts for biodiversity conservation, which is often measured through sampling [45].Yet there is little research on sampling reward, and it is not clear how it affects the evolution of cooperation.Moreover, the differences between its effects and those of sampling punishment are also unclear.Therefore, it is of great theoretical significance to study the sampling reward mechanism.
Here, we propose a sampling reward mechanism into PGG and CRD, respectively.We select a sample of individuals from the population after each round of game interaction.Each cooperator in the population will receive a reward depending on the number of cooperators in the sample.That is to say, if the portion of cooperators in the sample is below a certain threshold, players who contribute to the common pool will receive a reward.Otherwise, there are no rewards for cooperators.We then use the Markov chain to study the evolutionary dynamics of cooperation in a finite, well-mixed population.The results show that a high reward threshold, a small sample size, and a stronger reward intensity are beneficial to improve the cooperation level.What's more, a high reward threshold and a small reward intensity cause a higher reward probability for the reward institution, but the sample size has no appreciable effect on reward probability.Additionally, a high reward threshold and a stronger reward intensity are always associated with high incentive cost.More intriguingly, we find that the cooperation level is significantly influenced by the reward intensity because the reward can work effectively once its intensity exceeds a critical level.Above this, increasing reward intensity is not worthwhile.We emphasize that the aforementioned findings can be observed both in PGG and CRD, which also demonstrates the robustness of our conclusions.

Sampling reward in PGG
We consider the PGG played in a finite, well-mixed population of Z individuals.At each time step, a group of N players is selected to participate in the PGG.Each player has the opportunity to act as a cooperator or a defector.A cooperator contributes a fixed cost c (c > 0) to the common pool while a defector contributes nothing.Then, all contributions to the common pool are multiplied by a synergy factor r (1 < r < N) and equally distributed among N individuals in the group.In the absence of additional incentives, a rational player would choose to defect when r < N [7].We would like to point out that r is inversely proportional to the dilemma strength in evolutionary games, which is an important parameter in social dilemmas [46].Without loss of generality, we set r = 3.0 in this paper.
We then introduce an institutional reward given by an external institution in the PGG.After each round of the game, the mentioned institution randomly selects M 0 (0 < M 0 ⩽ Z) individuals from the whole population.When n C cooperators are found among them and n C ⩽ M 0 α, each cooperator in the population will receive a reward of ω (ω > 0) from the institution, otherwise cooperators are not rewarded.Here, α (0 ⩽ α ⩽ 1) represents the reward threshold.In particular, when α = 1, all cooperators in the population will be rewarded regardless of the sampling result.Conversely, when α = 0, no cooperator will receive a reward since the reward threshold is too low to reward anyone.Thus, the payoffs of cooperators (C) and defectors (D) with j C cooperators in the group can be written as Here ∆(i) represents the reward probability when the population configuration is in the presence of i C cooperators and i D (i D = Z − i C ) defectors, and it can be written as follows In addition, Θ(k) is the Heaviside function.That is, Θ(k) = 0 if k < 0, being one otherwise.

Sampling reward in CRD
We consider a finite, well-mixed population of size Z, where N individuals are randomly chosen to participate in a CRD.Each player has an initial endowment b at the beginning of the game, cooperators will contribute c to the common pool while defectors contribute nothing.In particular, a coordination threshold H (0 < H ⩽ N) is required to ensure the benefits of the common pool for everyone.Cooperators contribute to reaching the coordination target.If the number of cooperators exceeds the coordination threshold, all participants will retain their endowments.Otherwise, everyone in the group will lose their remaining endowments with the same probability r (0 ⩽ r ⩽ 1), where r marks the risk value [10].Additionally, when the portion of cooperators in the sample is lower than the reward threshold α, all cooperators in the population will receive a reward of ω.Conversely, cooperators will not be rewarded.
Accordingly, we can obtain the payoff an individual within a group of j C cooperators as (5)

Evolutionary dynamics
The evolutionary dynamics of PGG and CRD in finite well-mixed populations have already been studied earlier, but here we introduced the sampling reward mechanism.Therefore, the average payoffs of the aforementioned strategies under the configuration i = {i C , i D } can be written as where j = {j C , j D } characterize the number of cooperators and defectors in a group of size N, respectively.Importantly, we note that the above equations should satisfy j C ⩽ i C and j D ⩽ i D , since the binomial coefficients satisfy Here we consider a mutation-selection process to study how the strategy distribution evolves.At each time step, a player is randomly selected from the population.With probability µ, the player switches to a randomly selected strategy from the available strategies.With probability 1 − µ, the player selects a role model from the population.Suppose the focal player is X, the probability that X adopts the strategy of the role model Y equals φ = [1 + e ( f X −fY)/K ] −1 , whereas X maintains the strategy with 1 − φ [47].The parameter K corresponds to the amplitude of noise.Without loss of generality, we set K = 0.2, which implies that players tend to choose the better performing strategy.
In this work, we focus on a two-strategy version, and the transition probabilities between C and D can be given as In the presence of mutations, the population will not be fixed on the two monomorphic states which are full of cooperators or full of defectors.Hence, we use p i (t) to analyze the prevalence of each configuration at time t, which evolves in time according to the following Master-equation The equation allows us to compute p i (t) under which T ii ′ denotes the probability of transferring from configuration i ′ to configuration i per unit time τ .Furthermore, we can obtain the stationary distribution by letting the left side of the above equation equal to zero, which corresponds to the eigenvalue 1 of the transition probability matrix [48].Here, we use p i to represent the stationary distribution of configuration i.
To better explore the sampling reward mechanism, we also monitor three additional quantities, namely average cooperation level (η C ), average reward probability (P ω ), and average reward cost (C ω ).We use multivariate hypergeometric sampling to get the average proportion of cooperators in the group (a C (i)) under configuration i.After that, we can calculate the average cooperation level by using the stationary distribution, which can be written as At the same time, by using ∆(i) combined with p i , we can obtain the mentioned two quantities which generally characterizes the effectiveness of the institutional incentive.In particular, the average reward probability P ω is given by Subsequently, reward cost is a quantity to measure the effectiveness of the sampling reward method.In addition to p i and ∆(i), we also consider the number of cooperators in configuration i and the reward intensity ω.Hence, C ω can be calculated as In the following we monitor the above defines quantities with regard to the key parameters of α, M 0 , and ω.For a more comprehensive view we calculate them both for PGG and CRD.

Evolutionary dynamics in PGG with sampling reward
To visualize the impact of the sampling reward method on cooperation, it is instructive to compare the average cooperation level η C in dependence on reward threshold α, sample size M 0 , and reward intensity ω, as shown in figure 1.We can see that the cooperation level increases with increasing α for three different values of sample size.On the contrary, the cooperation level decreases with the increasing M 0 for different values of α.Namely, adopting a higher reward threshold as well as a smaller sample size is beneficial to the evolution of cooperation.Moreover, we present how the average cooperation level changes with reward intensity ω in figure 1(C).It can be seen that the cooperation level increases by increasing ω for three different reward threshold levels.Interestingly, we find that the function is step-like.In other words, the reward becomes efficient only if ω exceeds a critical level.Once ω above this critical level, increasing in ω will only result in a slight further improvement of cooperation level, hence it is unnecessary to bear extra cost for a tiny improvement.This effect is independent of α and we can see that the bigger the α, the higher the η C .These results also inspire us that the external institution should pay attention to the reward intensity in designing an optimal reward mechanism.
In order to explore the implementation of reward more precisely, we show in figure 2 the average reward probability P ω as a function of reward threshold α, sample size M 0 , and reward intensity ω, respectively.We see that the average reward probability always increases with the increasing value of α for different M 0 values in figure 2(A).For the majority of the range of α values, we discover that the change in P ω is relatively small.However, it has a sharp increment when α approaches 1, which suggests that big α favors the implementation of sampling reward.Figure 2(B) presents how M 0 influences the average reward intensity for different values of α.As the panel shows, the average reward probability tends to decrease first and then increase as M 0 increases.Notably, the vertical scale is magnified here to make the slight change visible.According to figure 2(C), irrespective of α, the average reward probability declines rapidly after remaining 1 when ω is small.The reason is that when ω increases, the payoff difference between cooperators and defectors is balanced out and the proportion of cooperators gradually rises.Once the reward threshold is exceeded, the institution stops providing rewards for cooperators.
Reward cost is also an important factor in designing an efficient incentive mechanism.Thus, we plot the average reward cost C ω in dependence on α, M 0 , and ω. Figure 3(A) depicts that the average reward cost monotonically increases with α for different values of M 0 .The result is intuitive, since the more generous the reward mechanism, the more likely the cooperators are to be rewarded, which in turn increases both the number of cooperators and the cost of reward.After that, we present how the reward cost changes with M 0 in figure 3(B), where we find that C ω first decreases and then slowly increases as M 0 increases.That is, an  intermediate sample size is cost-saving.In figure 3(C), we see that C ω increases monotonously with the increasing value of ω.Namely, a higher reward intensity is always associated with a higher reward cost.

Evolutionary dynamics in CRD with sampling reward
In this section, we also take into account the non-linear scenario of CRD in order to demonstrate the potential robustness of our findings achieved for PGG.We first show how the average cooperation level changes with the key parameters.It can be seen that the cooperation level increases monotonically as α increases, regardless of the value of M 0 , which is similar to the behavior that we obtained in PGG.Namely, a generous reward threshold is beneficial to the evolution of cooperation.Correspondingly, we observe that the effects of the sample size on cooperation level are slight since η C declines more and more slowly as M 0 increases in figure 4(B).This behavior is also consistent with what we find in PGG.Indeed, we present how the average cooperation level depends on the reward intensity in figure 4(C).This trend is nonetheless in line with that previously observed by PGG, except that here the jump is gentler, probably attributed to the non-linear characteristics of CRD.
Next, we consider the changes of the average reward probability.As is shown in figure 5(A), the reward probability is always increasing with increasing α.In particular, when α is small, the increment of P ω is greater than what is observed in PGG.This phenomenon may be caused by the coordination threshold already involving a sort of punishment for defectors in CRD, therefore the additional sampling reward can further improve the evolutionary advantages of cooperators.In figure 5(B), we show the average reward probability as a function of M 0 for different values of α.It can be observed that for each value of α, the reward probability first decreases and then increases with increasing M 0 .This characteristic is similar to what we saw in figure 2(B).In addition, we present in figure 5(C) that the average reward probability monotonically declines as a function of ω.We can also see that the higher the α, the higher the P ω , supporting the idea that a high reward threshold increases the chances of cooperators being rewarded, thereby making it possible for cooperation to flourish.
Lastly, we close our report to reveal how the average reward cost evolves in CRD in dependence on key parameters.Figure 6(A) depicts a monotonously increasing trend of the average reward cost as a function of α.This trend is similar to the case of PGG.Namely, a generous sampling reward mechanism always results in  high reward cost.Regarding the impact of M 0 on the average reward cost, we can see that C ω first decreases significantly and then has a very slight change when M 0 exceeds 20 in figure 6(B).What's more, the increasing trend is somewhat subtle compared to the one we observed in figure 3(B), which may also be influenced by the non-linear characteristics of CRD.Finally, in figure 6(C) we present the average reward cost in dependence on ω.It is interesting to note that, despite the similar trend observed in PGG, there is a gentler increment rather than a jump as ω increases, especially for small α.As a matter of fact, here the collective risk can also facilitate the evolution of cooperation, hence its combined effect with rewards makes it less abrupt for cooperators to overtake defectors in terms of payoff.

Conclusion
Thus far, many previous theoretical works have revealed that the introduction of reward can reduce the payoff difference between cooperation and defection, thereby promoting the evolution of public cooperation.However, it is costly to monitor all participants permanently during the games.Hence we consider taking a sample of the participants and then rewarding all cooperators in the whole population according to the sample result.We describe it as the sampling reward mechanism, yet there are presently only a few studies investigating it.In this paper, we applied the sampling reward mechanism to explore how it works for two representative social dilemmas, namely, PGG and CRD.Correspondingly, the reward for cooperators is implemented if the fraction of cooperators in the sample is below the reward threshold.
In sum, we have shown that the evolution of cooperation can be promoted effectively under the sampling reward mechanism.Specifically, the cooperation level increases with the reward threshold and the reward intensity, whereas decreases with the sample size.Indeed, a high reward threshold and a low reward intensity are always associated with a high reward probability, but the sample size has no significant effect on the reward probability.What is more, a high reward threshold and a high reward intensity always lead to high reward costs.An intermediate sample size, however, results in low reward cost.Quite remarkably, the consequence of reward intensity is heavily non-linear, since once it exceeds the critical threshold, the almost highest cooperation level can be reached without further increment.What we need to emphasize is that the aforementioned conclusions remain valid both in PGG and CRD.Although a high reward threshold and a high reward intensity can improve the cooperation level, it comes at the expense of reward cost.Additionally, both the sampling reward mechanism and the sampling punishment mechanism can promote the evolution of cooperation under some specific conditions.However, the sampling reward mechanism always leads to higher incentive cost compared to the sampling punishment mechanism.That's to say, we may consider applying the sampling punishment mechanism first if we want to be economical on incentive cost.It is worth noting here that there are alternative ways of how to try to optimize incentives.For example, it could also be a choice that we monitor how the cooperation level changes in time and react only on undesired changes [49].This protocol, however, requires an additional effort because participant needs to monitor the cooperation level permanently.From this aspect, the presently suggested protocol is simpler because we only need to compare the actual value to a fixed one.
Therefore, we can conclude that the sampling incentive mechanism is effective in promoting the evolution of cooperation.In future studies, it will be interesting to investigate whether the sampling incentive mechanism is applicable in other game scenarios, such as considering the spatial structure of the game or considering the feedback-evolving games in which the environment-dependent payoffs and strategies co-evolve [50][51][52].Also, recent research shows that evolutionary branching can arise from group cooperation with incentive mechanisms, perhaps serving as inspiration for further research on the sampling incentive mechanism [53].

Figure 2 .
Figure 2. Panel (A) shows Pω in dependence on α for different values of M0.Panel (B) illustrates Pω in dependence on M0 for different values of α.Panel (C) denotes Pω in dependence on ω for different values of α.The parameters are the same as in figure 1.

Figure 3 .
Figure 3. Panel (A) depicts Cω in dependence on α for different values of M0.Panel (B) shows Cω in dependence on M0 for different values of α.Panel (C) illustrates Cω in dependence on ω for different values of α.The parameters are the same as in figure 1.

Figure 4 .
Figure 4. Panel (A) shows ηC in dependence on α for different values of M0.Panel (B) depicts ηC in dependence on M0 for different values of α and panel (C) illustrates ηC in dependence on ω for different values of α.We adopt r = 0.1 here, which is consistent with the real world, given that risk perceptions are still low for collective-risk social dilemmas such as climate mitigation.Other parameters are Z = 100, N = 5, H = 2, b = 1, c = 0.1, µ = 0.01, K = 0.2, and ω = 0.3 (panels (A) and (B)), M0 = 50 (panel (C)).

Figure 5 . 4 . 6 .
Figure 5. Panel (A) shows Pω in dependence on α for different values of M0.Panel (B) denotes Pω in dependence on M0 for different values of α.Panel (C) illustrates Pω in dependence on ω for different values of α.The parameters are the same as in figure 4.