Abstract
According to the fundamental principle of evolutionary game theory, the more successful strategy in a population should spread. Hence, during a strategy imitation process a player compares its payoff value to the payoff value held by a competing strategy. But this information is not always accurate. To avoid ambiguity a learner may therefore decide to collect a more reliable statistics by averaging the payoff values of its opponents in the neighborhood, and makes a decision afterwards. This simple alteration of the standard microscopic protocol significantly improves the cooperation level in a population. Furthermore, the positive impact can be strengthened by increasing the role of the environment and the size of the evaluation circle. The mechanism that explains this improvement is based on a self-organizing process which reveals the detrimental consequence of defector aggregation that remains partly hidden during face-to-face comparisons. Notably, the reported phenomenon is not limited to lattice populations but remains valid also for systems described by irregular interaction networks.
Export citation and abstract BibTeX RIS
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
The presence or absence of cooperation has a huge consequence in various fields of life, therefore it has a paramount importance to identify which conditions help or which ones block the spreading of altruistic behavior in a complex population [1, 2]. Interestingly, some universal mechanisms were identified in the last two decades which remain valid not only in microbiological systems, but also in human societies where interacting agents have significant cognitive skills to adjust their behavior for a higher individual income [3–7].
Without exaggeration, hundreds of research papers were published by scientists with biology, economics, applied mathematics, or statistical physics background, in which they proposed different microscopic models to increase the general willingness of actors to cooperate with their partners [8–12]. In some cases the desired evolutionary outcome is expected, for example when defection is punished or cooperation is awarded by individuals or by a governing institution [13–19]. In these cases, however, the proper question is how to avoid the so-called second-order free-riding, when a cooperator player is reluctant to contribute and maintain the mentioned cooperation supporting institution or behavior [20–23]. Intellectually it is more challenging to identify those mechanisms or conditions which do not support directly one of the competing strategies. More precisely, in the latter cases it is not obvious in advance why they result in a higher cooperation level. In these, so-called strategy-neutral alterations of the traditional models it is a common feature that the cooperator supporting effect emerges just as a secondary or indirect consequence of the fair and democratic rule. One of the very first and most celebrated example was to identify that heterogeneous population could be a cooperator supporting environment [24]. The heterogeneity may originate from an irregular interaction networks where some players have significantly more neighbors than for others hence they can collect higher payoff [25–27]. Diversity may also originate from different individual skills, like strategy teaching capacity or other social status, which could also result in similar effect [28–32]. The common feature behind these models is a kind of matching process in which players locally coordinate their strategies which reveal the advantage of cooperators. For completeness we note that coordination can be reached directly via a sort of conformity attitude [33–35], but it is not scope of our present work. Similar impact can also be reached when player treat their neighbors differently, via weighted interaction graph, or support their neighbors in an unequal way [36–41]. Interestingly, an intervention into the microscopic dynamical process can also be helpful for cooperation. By introducing an inertia into the decision making or hindering too fast individual strategy change could also be beneficial for cooperation [42–44]. The related model studies pointed out that the mentioned dynamical change has asymmetric consequence on the invasion process of different strategies. The mentioned intervention does not relevantly modify the slow and balanced propagation of cooperator state resulting in smooth interfaces separating competing domains in a spatial system. On the other hand, the resulting dynamical change blocks significantly the rapid progress of defection state, which would lead to irregular interfaces and easy individual victory of defection otherwise. But we can also mention memory effects when accumulated success from past interactions can reveal that defection can only be successful for short term because neighbors who follow this behavior eliminate the potential prey of further exploitation [45–48].
In this work we consider an alternative 'strategy neutral' modification of the traditional model where the positive consequence on the evolutionary outcome is not straightforward. In particular, we focus on the strategy imitation process when a learner player analyzes the payoff value of the model player who represents a tempting strategy. Evidently, the decision whether to adopt or not an alternative strategy is based on the information that learner player collects from the partner. But this info could be inaccurate [49–51]. Principally we are not talking about a perception error, which can be handled by a noise parameter introduced in the strategy learning probability. Instead, we focus on the deceptive behavior of the model player. Such deception is rather frequent in animal kingdom and basically it serves to avoid conflicts or to gain mating advantage [52]. But of course, Homo sapiens is the best liar and we can easily give examples when someone's dress or lifestyle shows more than her/his proper success [53]. This experience makes a general learner more careful who may try to collect additional information to evaluate an alternative strategy more accurately. In this way the learner's decision is not based solely on the success of a particular player but on a more reliable averaged statistics obtained from the neighborhood. Here the key question is how to weight the directly observed local and the average payoff values obtained from the learner's environment. Naturally, the size of the environment from which the learned collects information could also be a crucial detail. To explore the possible consequences of extra information we study not just different sizes of perception environment of the learner player, but also check cases when the mentioned environment is not stable, but potential model players are chosen randomly from the population.
In the rest of this paper we propose a very simple model to explore how averaged payoff values change the learning process and reveal that it has a significant cooperator supporting consequence. We not just report this phenomenon, but also give a plausible explanation what is behind it. Furthermore, we also emphasize that the simple extension we propose results in a universally valid effect that could be observed in populations characterized by not only regular, but also irregular interaction graphs. But we first define our extended model, and then proceed with the results and a discussions of their implications for a more sophisticated and effective learning process.
2. Evaluating the complete neighborhood
We start from the traditional version of spatial prisoner's dilemma game model where players are distributed on a graph and interact with their neighbors. The players represent either cooperator (C) or defector (D) strategy, which strategies are distributed randomly in the initial stage. We first define our proposed model for a square grid, but the extension to other graphs is straightforward. For simplicity, but without jeopardizing the essence of the conflict of interests, we use the so-called weak prisoner's dilemma game parametrization where the only control parameter is T, temptation to defect, which characterizes a defector's income against a cooperator partner. The latter player gets nothing in the mentioned interaction, similarly to the case when two defector players meet. In the last case, when two cooperators meet, both collect R = 1 payoff value.
According to the standard simulation protocol, in an elementary step a randomly chosen player x, who has strategy sx , plays the game with her neighbors and collects altogether Πx payoff from these interactions. Similarly, a neighboring player y, who has the opposite sy strategy, collects Πy payoff from the games played with the corresponding neighbors. In the usual strategy imitation rule the Πy − Πx payoff difference has a crucial role on how likely player x adopts the sy strategy of the model player y. This likelihood is defined by the well-known Fermi-function [54]
where K denotes the noise factor which collects different sources of errors, like the possibility of a bad decision based on the available information. Our present work focuses on the reliability of information that can be collected from other partners. Of course, there are several ways how to deceive others for a particular reason. For instance, a player may try to show a different strategy to the neighborhood from the one she actually applies. But here we concentrate on the possibility that the payoff value we collect from a potential model actor is not accurate. Needless to say, such an ambiguity could be frustrating for the learner player because her decision about strategy change is based on this payoff value, as it is summarized by equation (1).
To minimize the possible error of evaluating the competitor's payoff value, our learner player may want to collect alternative information about the potentially tempting strategy. More precisely, player x makes a survey in the available neighborhood and checks the payoff values of all players who practice the alternative sy strategy. If player x averages the related values then she has a more reliable information about the general success of the strategy she wants to adopt. Here we have two fundamental aspects to be contemplated. The first one is how strongly to consider the additional information collected from the neighborhood. This can be done in a way that we replace Πy in equation (1) by a weighted Πw value which is the combination of the original Πy payoff value of model player y and the Πav averaged value obtained from akin players from the neighborhood:
Here q is the control parameter determining how strongly our learner player trusts on the alternative source of information about the success of tempting strategy. Accordingly, if q = 0 then we get back the traditional spatial prisoner's dilemma game, while in the q = 1 limit the adoption probability is based on the averaged value collected from the neighborhood exclusively. We must stress that the average is obtained by summing over not all payoff values detected in the neighborhood but are reduced to those values only which are achieved by similar strategies to the one represented by model player y. This way of averaging is in stark contrast to previously applied averaging methods [55, 56], because our learner player does not want to explore the general wellness of the neighborhood, but focuses on the success of a specific strategy. We also note that the averaging process is restricted to the payoff values of the alternative strategy in the neighborhood because the learner's goal to gain more accurate information for potential strategy update. This protocol is in stark contrast to general average process applied in mean-field calculations and in some previous spatial models [57, 58]. Evidently, to collect additional information from the neighborhood requires a high cognitive skill from a learner that was detected in previous human experiments [59–62].
The other main ingredient of our model is to define the neighborhood of a learner player which is accessible to her to collect more accurate information. A natural way to assume that all players are checked who are within an le steps from player x hence they are within the evaluation circle. To clarify it better, we present a case in figure 1 where we surrounded by a dashed diamond shape line those players who are within the le = 2 evaluation circle of the learner player x. Naturally, the value of le can be increased from 1 toward higher values gradually and we can monitor how the information obtained from larger and larger set influences the evolution of cooperation. Importantly, as we already stressed, the Πav value is calculated from the values of those players who represent identical strategy to the one having by the model player y. In the above specified case they are marked by yellow background in our figure.
In our simulations we studied populations containing up to N = 160 000 players. According to the standard protocol in an elementary step a randomly chosen player has a chance to change her strategy by adopting the alternative strategy of a randomly chosen neighbor. By repeating this loop N times we declare a natural unit of simulation, called 1 Monte Carlo (MC) step. In this work we applied maximum 50 000 MC relaxation steps to reach the stationary states where the fraction of cooperators were measured and time averaged for another 100 000 MC steps. The applied system size and the sufficiently long simulation time made us possible to obtain results which are independent of the applied system size, hence finite-size effects can be excluded. In this work we used K = 0.1 noise level to allow comparison with results of traditional model, but we stress that our qualitative observations remain intact if we use other K < 2 values of noise parameter. Of course, in the high K parameter region the strategy imitation process becomes completely random and independent of the payoff difference. Beside the mentioned square grid topology we also used random network interaction graph where the degree of nodes was k = 4 unchanged. In this way we could check the consequence of using irregular topology without introducing additional effects originated from heterogeneity of players. Last, we mention that we also studied a case when the 'evaluation circle' was not selected from players around the learner player, but we choose them randomly from the whole population. Nevertheless the details of this modified protocol will be given in the next section when we present its consequences by comparing with the results of the originally defined model.
3. Results
Our first observations are summarized in figure 2 where we plotted the stationary cooperation level in dependence of temptation value for different values of the weight factor q. In the presented case we used le = 2 evaluation circle, but qualitatively similar behavior can be found when the size of the neighborhood to collect extra information is different. As we noted here q = 0 is equivalent with the traditional spatial model which suggests a Tc = 1.035 76 critical temptation value for the used K = 0.1 noise level [63]. But as we enlarge q, hence the learner players give a larger credit to the alternative information obtained from the neighborhood, the chance of cooperators to survive is improved significantly. Furthermore, when q is close to 1, hence the additional information becomes dominant during the decision making about the strategy change then only T > 1.5 temptation values can provide a full defection state. It is worth noting that le is relatively small in the presented case, which practically means that typically not more a half dozen of other players are checked to gain a valuable extra information. Still, the improvement is remarkable.
Download figure:
Standard image High-resolution imageNext we illustrate how the size of the neighborhood, from which the extra information is gained, influences the cooperation level. A representative plot is shown in figure 3 where the weight factor is fixed at q = 0.2. These curves highlight that the cooperation level can be improved if learners can collect information from a larger neighborhood. This effect, however, cannot be enlarged endlessly, because after a certain level this enhancement saturates. For example, collecting data from a 180-member set at le = 9 gives almost equally good information than the neighborhood of beyond 3000 neighbors which is obtained for le = 39. But the tendency is clear. One may note that the improvement of the critical temptation value characterizing the border of mixed state is not really large. But this change is the consequence of the relatively small q weight factor which gives a modest credit to external information. We stress, however, that even at this q the threshold Tc can be doubled if the neighborhood size is large enough.
Download figure:
Standard image High-resolution imageTo understand the cooperator supporting mechanism more deeply in the following we present a comparison of pattern formations obtained in the traditional and in the modified models. Figure 4 shows the significantly different evolutionary paths when we launch the simulations from the same initial state, shown in panel (a), where a red defector island is surrounded by a blue cooperator domain. This setup, where different players meet along two domain walls, helps us to reveal the characteristic movement of propagation fronts more easily. For a proper comparison we applied the same T = 1.1 temptation value for both cases. In the top row, containing panels (b) to (d), we show the evolution in the traditional model. Notably, this can be considered as a q = 0 extreme case of the modified model. Here q = 0 weight factor ensures that a learner x player estimates the success of the alternative strategy based exclusively on the payoff value of her neighboring y model player. As a consequence, shown in panel (b) and (c), the original straight front line starts roughening because a neighboring defector can collect a high individual payoff value because of the relatively high value of temptation. We here note that the threshold temptation value is well below the presently applied T value. The rough propagation front results in even more difficult circumstances for cooperators because it destroys their original phalanx and network reciprocity can hardly work anymore. Only just small islands of cooperators remain when the front passes. They are marked by white circles in panel (c). However, they are unable to survive long because of the high T value and the system eventually evolves to a full defector state, shown in panel (d).
Download figure:
Standard image High-resolution imageAs a comparison, in bottom line we present the evolutionary path in the other extreme case, when the success of the alternative strategy is estimated from the information collected from the neighborhood. Despite of the fact that we applied relatively small le = 3 radius, the mentioned trajectory is significantly different. Here the direction of the invasion is reversed by maintaining a not too noisy front line. Behind these lines, however, small fraction of defector players remain alive, as they are marked by white ellipses in panel (d). The reverse direction of propagation informs us that a bulk of defector cannot provide a large average payoff to their members because there is no one to exploit. Furthermore, a pure cooperator domain provides robust average payoff value for a cooperating neighborhood, hence the adoption of cooperator strategy by a defector player becomes a frequent process in the initial stage. But if the density of defectors becomes low, as in the case marked by the ellipse, then they can collect competitive average payoff value again and form a stable coexistence with their rivals. Evidently, a pure cooperator neighborhood offers a high average of payoff, therefore the spreading of the mentioned mixed state is a slow process. For comparison, for L = 200 linear system size the traditional evolution terminates into the full defector state typically within 300 MC steps, while at least 1000 MC steps needed to reach the stationary state in the modified case shown in panel (g).
Based on the above described argument we can also understand why the le value influences the stationary concentration of defector players. The larger the value of le the smaller the faction of defectors who can survive permanently, as we observed in figure 3. If their concentration exceeds a threshold value then their average payoff becomes less attractive, which provides a feedback mechanism to maintain a significant cooperation level in spite of relatively high temptation value. In this way the average information about the competing strategy maintains a self-organizing pattern of a mixed state where compelling cooperation level can be reached even for a high temptation value. This mechanism also explains our observations summarized in figure 2 because the effect becomes stronger as we give higher credit to the neighborhood via using larger q weight factor.
One may ask what if the additional information is not collected from a local neighborhood, but originates from a random sampling where the target could be the whole population? In this modified model a learner player x calculates the crucial Πav average payoff value by selecting m other players randomly from the complete population. As previously, if the strategy of a selected i player agrees to the strategy of the model player y then we consider the related Πi payoff value of player i when the mentioned average is calculated. Naturally, for a proper comparison with previously defined model extension, the value of m should agree with the size of the neighborhood defined by the radius le. For example, if le = 1 then m should be 4, for le = 2 the corresponding value is m = 12, etc. The largest sampling set we used contains m = 3120 members size is equivalent to a neighborhood around x for le = 39 radius. Importantly, the former m = 3120 sub-population contains randomly chosen players from the whole population who are not necessarily neighbors to each others.
As previously, we still have two parameters, q and m, but there roles are different from the one we previously observed for q and le. A typical behavior is summarized in figure 5. The first conspicuous feature is that size of the sampling set has no relevant role on the stationary value of cooperation level. Roughly speaking, it is enough to collect additional information from a small random sample because it gives no relevant advantage if a learner player bothers too much by checking too many players about the expected success of the alternative strategy. Our second observation is the general improvement of cooperation level comparing to the case when fixed and connected neighbors are used as a source of additional information. This fact can be seen easily if we check figure 2 where even a higher q = 0.4 value is still unable to provide as high portion of cooperators as we see in figure 5 for random sampling.
Download figure:
Standard image High-resolution imageThe above mentioned superiority of random sampling is valid for all related parameter pairs. Next we give some inspirations to understand its origin. To illustrate and understand the difference between random sampling and collecting data from a compact neighborhood in figure 6 we show how they drive the pattern formation at similar conditions. Importantly, we not simply apply equally strong temptation value and weighting factor, but also use equal size for the sampling population. Indeed, the latter has no decisive importance for random sampling, but it could be an essential factor for neighbor-based sampling, as it was illustrated in figure 3. Accordingly, le = 2 radius around the learner player is equivalent to check m = 12 randomly chosen players.
Download figure:
Standard image High-resolution imageIn contrast to figure 4 we here use an alternative common initial state, shown in panel (a), where both a homogeneous cooperator and defector domains meet with a phase where strategies are mixed randomly. In the top row, where additional information is collected from the neighborhood, the fastest change can be observed in the mixed phase. This is a general phenomenon that can be seen even for spatially structured populations, because cooperators can only protect themselves if they are organized. In our case, despite of the relatively high q = 0.5 weighting factor, network reciprocity alone is incapable to block the spreading of defection. It is because mixed environment can always give a decent payoff advantage for other defector players, too. At the same time a fully homogeneous and compact cooperator domain cannot really resist the invasion of defectors who are wrapped in a supporting cooperator shell. Interestingly, the homogeneous defector domain is not sensitive and cooperator player never enters into the down-left quadrant. At such a q value the temptation is too high to replace defection by cooperation.
We, however, observe a strikingly different evolutionary trajectory when a learner collects information from randomly selected players. In this case the mix phase is table, albeit the actual ratio of defectors and cooperators is adjusted to the value of T and q. On the other hand, the stability of homogeneous domains is proved to be the opposite we detected previously. Firstly, the shrink of the fully cooperator island is slower because in the average payoff of defectors may not be tempting: it can easily happen that we sample defector players from deep of the full-D domain where they get nothing. But our argument is also valid for the opposite case. In the bottom row the homogeneous defector island becomes unstable and disappears very fast. In this case the strategy of cooperators standing at the front may become attractive because their average payoff value may be increased significantly by the contributions of their akin fellows who are sitting safe in the middle of a fully cooperator patch.
But we should stress that both dynamical process we discussed about the stability of homogeneous spots are just temporary because the distant information collected by random sampling drives the system eventually toward a uniform state where the fraction of C and D strategies is the same everywhere. In this stationary state, however, the previously mentioned self-organizing mechanism still works, which prevents defectors to grow too large homogeneous spot. Admittedly, this information gathering via random sampling also hinders cooperators to grow too large homogeneous domains because they cannot really utilize their high cooperation thanks to the smaller contributions to Πav from other C players. Nevertheless, from cooperator's viewpoint the situation is fine because they can reach a decent fraction even at high temptation value if q is large enough.
Finally we briefly note that our observations about the positive consequence of considering additional information is not restricted to lattice-type populations, but remains valid when the interaction graph is not ordered. Having discussed the very positive consequence of random sampling, maybe this fact is not really surprising because our argument did not utilize the translation invariance of the interaction graph. But for completeness we also checked our results by using random topology where players have similar degree distribution as for square grid. Therefore we can check the consequence of randomness exclusively without bothering other effects due to the degree change of the topology. The essence of our findings are summarized in figure 7 where we plot the results obtained for neighboring-based and random sampling based additional information gathering.
Download figure:
Standard image High-resolution imageIn agreement to our earlier results, to give larger credit to averaged payoff values via increasing q will improve the cooperation level. However, the clear consequence of random interaction topology is that this effect is really strong and cooperators may survive even at T = 2. We stress that alone the random interaction topology would not be enough to produce such an improvement, because in the traditional model the critical temptation value remains close to T = 1. Our other main observation is based on the comparison of the panels of figure 7 where similar curves are detected for neighboring-based and random sampling. This agreement suggests that the original randomness of the topology already serves as an information mixing tool. Therefore, in sharp contrast to the lattice-type topologies like square grid, in a randomized graph to collect extra information via random distance sampling has no additional value. But the positive consequence remains intact, and is more pronounced for irregular topologies.
4. Conclusion
Making a decision about which behavior to follow is a crucial act not only at personal but also on collective level. It is easy to see that the dominance of a hasty or careless adoption choice of players can drive the whole society toward an undesired destination. Therefore, huge intellectual efforts have been focused on this delicate task to find those methods which are in agreement with the fundamental Darwinian selection rule of the more successful strategy, but on the same time they help us to block the obvious advantage of defection. For example by recording and accumulating previous success of a strategy or by introducing an inertia and keeping a strategy more valuable if it survived long could be a cooperator supporting modification of the simplest 'imitating the more successful strategy' protocol. But, of course, there are alternative methods and we refer the interested reader to related review papers.
In our present work we suggested a very simple modification of the traditional model where we considered the chance that a learner player is more careful and does not accept the information about the model player unreservedly. Instead, the former player tries to collect information about the success of the competing behavior from an alternative source. This could be the neighborhood of the learner player or could be randomly selected group of other players from the whole population. No matter which source is used, the population where members give higher credit to averaged information about the success of an alternative strategy can reach a higher cooperation level. The larger the weight of this additional information in the decision making the more significant improvement can be achieved.
The main mechanism which is responsible for this positive consequence is based on a self-organizing pattern formation of the spatial population. More precisely, to consider average information instead of accepting unconditionally the success of a particular case prevents the condensation of defectors, hence maintains an acceptable cooperation level even at high temptation values. This procedure works not only in populations having lattice-type interaction graphs, but also for irregular topologies.
It is worth noting that the observed cooperator supporting mechanism fits nicely to those where the introduced strategy-neutral rule has biased impact on the strategy invasion of competing strategies [42], hence they provide an alternative way to understand to original enigma, why cooperation may prevail among selfish agents. This research direction could be potentially promising for broader application of evolutionary game theory beyond human societies. In these systems participants may not necessarily have cognitive skills, like in microbiological populations, therefore the related theories should not rely on additional assumptions on moral issues, like reputation [64, 65] or preliminary judgment about strategies which are the source of punishing or rewarding mechanisms in advanced populations [66–70].
Acknowledgments
This research was supported by the Slovenian Research Agency (Grant Nos. P1-0403, J1-2457, and P5-0027).
Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).