Mobility restores the mechanism which supports cooperation in the voluntary prisoner's dilemma game

It is generally believed that in a situation where individual and collective interests are in conflict, the availability of optional participation is a key mechanism to maintain cooperation. Surprisingly, this effect is sensitive to the use of microscopic dynamics and can easily be broken when agents make a fully rational decision during their strategy updates. In the framework of the celebrated prisoner's dilemma game, we show that this discrepancy can be fixed automatically if we leave the strict and frequently artifact condition of a fully occupied interaction graph, and allow agents to change not just their strategies but also their positions according to their success. In this way, a diluted graph where agents may move offers a natural and alternative way to handle artifacts arising from the application of specific and sometimes awkward microscopic rules.


Introduction
Despite extensive research efforts, the evolution of cooperation remains a puzzle in a wide range of domains [1,2]. In this context, two-strategy games such as the prisoner's dilemma (PD) game have been widely studied for many years from different perspectives with mechanisms [3] such as group selection [4] and network reciprocity [5,6,7] investigated. Traditionally, the agents' interactions in those games are compulsory, i.e., the agent has to opt between cooperation or defection, where the dilemma arises because individual selfishness leads to a collective disaster [8,9]. However, in many real-world scenarios, the agents' participation in the game is voluntary (optional). Thus, in order to account for the concept of voluntary participation (abstention), researchers have been exploring the voluntary prisoner's dilemma (VPD) game, also known as the optional prisoner's dilemma game, which extends the PD to a three-strategy game where agents can also choose to abstain from playing the game [10,11,12]. In particular, abstention has attracted attention both for acting as a mechanism to support cooperation and for promoting cyclic behaviour [13,14,15,16,17,18]. The cyclic dominance behaviour is often studied within the bounds of the rock-paper-scissors game, which, different to the VPD game, imposes the cyclic dominance in the payoff matrix [19,20,21,22,23,24].
In addition to the discussion about the game strategies, studies concerning agent mobility are also of interest because, in many real ecological systems, individuals are usually on the move to improve their performance [25]. In this sense, research has shown that in a spatial environment, mobility and percolation thresholds have a critical impact on the sustenance of biodiversity in nature [19,26,27,28,29,30,31]. Interestingly, despite a large number of papers discussing the effects of mobility in the prisoner's dilemma [32,33,34,35,36,37], the rock-paper-scissors [25,38,39] and the optional public goods games [40,41,42], the impact of mobility in the context of the VPD game is still almost unknown. Indeed, some effort has also been made to explore contingent movement strategies modelling the so-called "win-stay, lose-move" rule, which, as also argued by Szabó and Fáth [5], might capture the concept of abstention in the sense that agents abstain by moving away from their opponents [43,44,45,46]. Although this is a valid way to account for voluntary participation, we highlight that in many scenarios there must be a cost (payoff) associated with the act of not playing the game, i.e., abstention defined in terms of the set of game strategies rather than the movement strategies. In other words, defining abstention as a strategy rather than a movement ensures that all agents have the right to abstain from a game interaction, independently of having a way to walk away (space permitting) or not.
Despite the very recent introduction of the VPD game in a diluted network with a purely random mobility scenario [47], many questions regarding the impact of mobility, in both the sustenance of biodiversity and the potential for widespread cooperation, remain unanswered. For instance, given the recent advances in the understanding of coevolutionary models [48,49,50,51,52,53,54], what happens to the population when considering agent mobility in a coevolutionary fashion? Thus, without loss of generality, this research introduces the VPD game with a coevolutionary model where not only the agents' strategies but also their movement is subject to the evolutionary process, which provides a more realistic representation of mobility within the domain of voluntary/optional participation.
Furthermore, we investigate the foundations of the emergence of cyclic dominance for the VPD game in both the fully populated (without mobility) and diluted networks. We discuss that the emergence of the cyclic dominance behaviour, which is commonly associated with the VPD game, is very sensitive to the chosen imitation rule. Results show that when using other imitation rules, the cyclic dominance can be broken easily, but this difference diminishes when we use a more general diluted model where mobility can repair the missing chain that is necessary to support cyclic dominance.
The remainder of the paper is organised as follows. Section 2 describes the model and the experimental settings. Section 3 presents the results of the extensive Monte Carlo simulations, which allow us to unveil the reason why mobility and optionality favour cooperation and cyclic dominance. Finally, Section 4 outlines the main conclusions.

Methods
In order to account for the features of the concept of voluntary participation (abstention) and agent mobility, we consider a set of N rational agents playing the voluntary prisoner's dilemma game (also known as the optional prisoner's dilemma game) on a M × M diluted square lattice network with von Neumann neighbourhood and periodic boundary conditions, i.e., a toroid where sites are either empty or occupied by an agent. In this way, to describe the lattice occupation, we define the lattice's density as ρ = N/M 2 (0 < ρ < 1), where ρ = 1 means that the lattice is fully populated.
In the voluntary prisoner's dilemma (VPD) game, agents can be designated as a cooperator (C), defector (D) or abstainer (A). Considering a pairwise interaction, the payoffs are defined as follows: D gets P = 0 for mutual defection, C gets R = 1 for mutual cooperation, T = b for defection against a cooperator, and S = 0 for cooperation against a defector. Regardless of whether one or two agents abstain, both agents get the loner's payoff L = σ, where R > L > P . Note that we adopt a weak version of the game, where T > R > L > P ≥ S maintains the nature of the dilemma [7,10].
We consider a randomly initialized population in which N/3 of each strategy (C, D and A) is distributed at random in the network. Following the standard procedures of an asynchronous Monte Carlo (MC) simulation in this context [55,56], at each MC time step, each agent (x) is selected once on average to update its strategy and position immediately. Thus, in one MC step, N agents are randomly chosen to perform the subsequent procedures: if the agent x has no neighbours, it moves to one of the four nearest empty sites (von Neumann neighbourhood) at random; otherwise, the agent x accumulates the utility U x by playing the VPD game with all its nearest active (nonempty) neighbours (Ω x ), selects one of them at random (i.e., the agent y, which also acquires its utility U y ), and considers copying its strategy with a probability given by the Fermi-Dirac distribution function: where K = 0.1 characterizes the amplitude noise to allow irrational decisions [5,57].
In this research, we also consider the scenario in which agents do not make irrational choices in the strategy updating process (Equation 1), i.e., the agent x only considers copying y if U y > U x . After the agent x updates its strategy, U x is recalculated, and x considers moving to a random empty site (if any) in its neighbourhood with probability: where K = 0.1, u x = U x /k x is the agent x's average utility, k x is the number of active neighbours in x's neighbourhood, and v x = (u x + y∈Ωx u y )/(k x + 1) is the average utility of x's neighbourhood including itself. Thus, the agents that are performing worse (better) than their neighbours have more (less) incentive to move. Note that to make this research comparable with previous works, we consider the absolute payoff during the strategy imitation process (Equation 1). Nevertheless, it is noteworthy that our key results remain unchanged qualitatively if we apply a degreenormalized payoff in this function. However, in the case of mobility, the application of an absolute payoff in Equation 2 would cause an artifact effect. More precisely, it would result in the erosion of a cooperative cluster because agents at the periphery, who have fewer neighbours, would always be unsatisfied and move, i.e., the mentioned cluster would shrink gradually.
In order to avoid finite size effects, results are obtained for different network sizes, ranging from M = 200 to M = 1000. Simulations are run for a sufficiently long relaxation time (10 5 or 10 6 MC steps), where the final level of each strategy is obtained by averaging the last 10 4 MC steps.

Results
In this section, we present some of the relevant experimental results obtained when simulating a population of agents playing the voluntary prisoner's dilemma (VPD) game on diluted square lattice networks, i.e., a coevolutionary model where not only the agents' strategies but also their positions evolve over time. Firstly, we consider the case in which the population is fully populated, i.e., density ρ = 1, and we demonstrate that the emergence of cyclic dominance in the VPD game is sensitive to the chosen dynamical rule because by using other imitation rules the cyclic dominance can be broken easily. Secondly, we investigate the case in which ρ < 1 (diluted network), where we show that mobility and dilution can repair the mechanisms necessary for supporting cyclic dominance. Thirdly, we further investigate the micro-level evolutionary dynamics for a diluted network both with and without mobility.

Fully populated network (ρ = 1): fragile cyclic dominance
In order to validate our coevolutionary model and provide grounds to explore the effects of mobility on a diluted square lattice, we start by investigating how the population evolves when there is no space for the agents to move. Figure 1 (upper panel) features the time course of the average frequency of each pure strategy, i.e., cooperation, defection and abstention, for a density ρ = 1, temptation to defect b = 1.4, and the loner's payoff σ = 0.5. The lower panel of Figure 1 shows the typical spatial patterns of the strategies at different Monte Carlo steps. Note that as ρ = 1, the model collapses to the traditional and well-known scenario in which only the strategies evolve. As expected, the results are qualitatively the same as those reported in previous studies [10,47].
In this case the three strategies coexist because of the emergence of cyclic dominance behaviour where defectors beat cooperators, cooperators beat abstainers, and abstainers beat defectors [19,20,58]. To gain deeper insights into the mechanisms which underlie the cyclic dominance behaviour in the context of a spatial voluntary prisoner's dilemma game, we perform the same experiments as above but for the case in which an agent (x) only considers copying the opponent's strategy if the opponent (y) is performing better than itself, i.e., applies the Fermi-Dirac distribution function (Equation 1) if and only if the utility of y is greater than the utility of x, U y > U x . Interestingly, Figure 2 shows that when this simple modification in microscopic dynamics is imposed, the cyclic dominance behaviour is broken and the population converges to a frozen state where only defection and abstention are present, but the cooperator strategy becomes extinct. Note that the idea of employing different imitation rules such as Equation 1 for both rational and irrational decisions have been systematically investigated in previous studies for twostrategy games [36,59,60], and it is well-known that different imitation rules, as well as the adoption of different values of K (amplitude noise) in the Fermi-Dirac rule may affect the outcome [57]. However, there is an unexplored gap in the literature regarding the possible consequences of the adoption of the Fermi-Dirac rule in the context of the VPD game, and our results suggest that the cyclic behaviour commonly associated with the VPD game may be related to the use of this function, which also supports strategy change when the utility values are equal. Figure 3 depicts the average frequency of the three strategies (C, D and A) in the full b − σ plane when agents are allowed to make irrational (top panels) and rational (bottom panels) decisions. Note that while cyclic dominance is maintained for almost any combinations of b and σ values in the traditional case (top), the same does not occur when the imitation rule is slightly changed (bottom). Thus, contrary to previous observations, our results highlight that the use of noisy imitation, dictated by Equation 1, is an essential condition for promoting cyclic behaviour in the context of the VPD game. The reason for this discrepancy can be summarized as follows: • Considering a random initial population (see the early MC steps in Figures 1 and 2), the typical trajectory predicts the advantage of defectors which is then followed by the rise of abstainers or both cases.
• Next, checking (or not) for the U y > U x condition can be decisive to allow (or not) the subsequent rise of cooperators, which in turn supports the cyclic dominance phenomenon seen in Figure 1.
• At a micro level, if one cooperator/defector (x) is mostly surrounded by abstainers (y), its utility U x will be mostly equal to U y . Remember that in the voluntary prisoner's dilemma game, if one or two agents abstain (A), both will get the same loner's payoff σ, i.e., for any pair of strategies CA, AC, DA, AD, AA both agents get an identical σ value.
Thus, if we impose the U y > U x condition, as the utilities of x and y are the same, the population is not able to curb the spreading of abstainers, which consequently produces the pattern observed in Figure 2, i.e., a few isolated defectors stuck in a sea of abstainers. Otherwise, if Equation 1 is applied for any value of U x − U y , as the number of abstainers increase, W will be approximately equal to 0.5 for most agents, which is one of the main mechanisms to keep the three strategies alive as observed in Figure 1.

Diluted network (ρ < 1): recovering cyclic dominance and promoting cooperation
As we already argued, a fully occupied interaction graph seems to be a specific rather than a generally valid real life situation, hence this section discusses the coevolutionary cases for a diluted lattice network where not only the strategies but also the agents' positions evolve over time.
At a macro-level, we start by analysing the influence of the density ρ on the evolutionary process for the noisy Equation 1 (i.e., agents are allowed to make irrational decisions) after a sufficiently long relaxation time. In line with previous research for twostrategy games such as the prisoner's dilemma game [32,34,61], experiments with our coevolutionary model reveal that mobility and dilution also play a key role in promoting cooperation in the VPD game. Figure 4 shows the average frequency of the three strategies in the full b − σ plane for some representative densities. As compared to the traditional case (ρ = 1.0 regime i.e., Figure 3 top), we observe that the cyclic dominance behaviour still emerges for most b − σ settings for ρ ≥ 0.59. Interestingly, results show that scenarios of full cooperation arise monotonously when ρ < 0.59, i.e., the more diluted the network is, the easier it is for cooperators to dominate the population. However, when the density is too low (ρ < 0.10) the cooperators become too vulnerable to invasion by abstainers due to the increasing difficulty of forming clusters. Also, experiments show that 0.10 ≥ ρ > 0.05 quickly produces very unstable C + A states which either converge to full C or full A. Notably, this behavior cannot be seen directly from the heat map because the average of full C and full A destinations results in around 0.5 density for both strategies. The latter may also suggest a coexistence of these strategies, but as we stressed, not in the present case because either C or A prevails at these global concentration values. Furthermore, when ρ ≤ 0.05 cooperators always die out and abstainers dominate in all scenarios. Note that the percolation threshold (ρ p ) for this square lattice network with von Neumann neighbourhood is approximately equal to 0.59 [62,63]. Thus, this result is of particular interest because cooperation is favoured when the density is below the percolation threshold, which is known to be an adverse situation for maintaining cooperation [27,29,30]. Moreover, results in Figure 4 also highlight the importance of exploring the outcomes of the VPD game across the whole loner's payoff (σ) spectrum, and not only for a specific σ = 0.3 value, as was used earlier [10,47].
Considering the discrepancy observed in Figure 3 for ρ = 1, we now repeat the same experiments as above but for the case where an agent only applies Equation 1 if the opponent is performing better than itself, i.e., the case of a fully rational imitation rule. Surprisingly, Figure 5 shows that the previously observed difference for both imitation rules diminishes when we consider a diluted network (ρ < 1) with mobile agents. More importantly, results show that when 1 > ρ > ρ p the mechanisms which support cyclic dominance in the traditional case (i.e., for the noisy Equation 1 and ρ = 1) are recovered for a wide range of b − σ scenarios. In fact, results for both imitation rules and ρ < 1 are qualitatively the same for most settings. However, as seen in figures 4 and 5, when the density is below the percolation threshold ρ < ρ p , it is possible to observe a small shift of ρ ≈ 0.05 in the boundaries of the region in which full C occurs. For instance, results for ρ = 0.15 in Figure 4 are similar to those when ρ = 0.10 in Figure 5. Note that the bistable outcomes, where the population either converges to a full C or a full A state, observed for ρ ≈ 0.10 in the first case happens at ρ ≈ 0.05 in the later case.

Micro-level analysis of the effects of dilution and mobility
In order to further explore the aforementioned phenomena, we extend our analysis of the evolutionary process to a micro perspective. Figure 6 shows the average time course of the three strategies for a fixed temptation to defect b = 1.65 and loner's payoff σ = 0.55, which is representative of the outcomes of other parameters as well. For this scenario, when ρ = 1, cyclic dominance is maintained for the traditional case with the noisy imitation rule, but it is easily shattered when considering a rational rule. However, the difference diminishes when ρ < 1.
Results show that the profile of the curves for the initial 10 2 MC steps are very similar to scenarios which support cyclic dominance, i.e., an initial drop followed by a quick recovery of the frequency of cooperators. This phenomenon has also been observed in previous work for dynamic networks [48,64], where it was discussed that defectors are quickly dominated by abstainers, allowing a few clusters of cooperators to remain in the population, then with the lack of defectors, those cooperative clusters expand by invading the abstainers. Note that it also explains the reason that higher values of σ are more beneficial in promoting cooperation (as seen in Figures 3, 4 and 5), i.e., abstainers have to be strong enough to protect cooperators against invasion from defectors in the initial steps. Moreover, Figure 6 (right) shows a clear correlation between the density ρ and the speed of the initial inflation of abstention. In order to distinguish between the impact of mobility and dilution on the emergence of cooperative behaviour and cyclic dominance, we have also investigated the case in which the agents are not allowed to move. That is, the same model described in Section 2, but without the movement updating process. As shown in Figure 7, when ρ ≤ ρ p the frequency in which the agents change their strategies is extremely low, i.e., the population quickly reaches a frozen pattern which is very dependent on the initial configuration. Also, in line with preceding research [29,30], we observe that when considering the traditional noisy imitation rule (Figure 7 top), dilution alone can improve the level of cooperation, where the optimal value of ρ is always above the percolation threshold (1 > ρ > ρ p ). In another perspective, the emergence of cyclic dominance behaviour is diminished when the agents do not move (e.g., compare the top panels of the Figures 6 and 7).
Interestingly, different phenomena occur when we consider the fully rational imitation rule (Figure 7 bottom). Note that dilution alone is not able to fix the evolutionary mechanisms which support either the emergence of cyclic dominance and the evolution of cooperation. In other words, results show that mobility plays a key role in diminishing the difference on the outcomes of both imitation rules (as seen in Figure 6 for ρ < 1). Moreover, it is noteworthy that mobility allows for the full dominance of cooperation for lower values of ρ, as well as the robust emergence of cyclic dominance for a wider range of scenarios.
To advance the understanding of mobility and dilution in the context of the VPD game, we also analyse the spatio-temporal dynamics of the strategies for both the noisy and the rational imitation rules. Figure 8 provides an animation for a prepared initial state where the strategies are arranged in stripes. This prepared configuration allows us to separate cooperators from defectors, making it easier to observe the mechanisms which are responsible for breaking the cyclic chain where A beats D, D beats C and C beats A. In summary, results show that the key difference between the dynamical rules is that, when applying the fully rational rule, defectors in the middle of abstainers do not have the incentive to become abstainers. Hence, as discussed in Section 3.1, the rational rule produces frozen D + A states (as seen in Figure 2) which cannot be observed in the noisy Fermi-Dirac case. As a consequence, the isolated defectors trapped in the sea of abstainers inhibit the formation of larger cooperative clusters, which in turn breaks the cyclic chain. However, when mobility is introduced for ρ < 1, the D + A states are not a stable phase anymore. Here, there is a small stir which causes a random drift of defectors. Consequently, when two defectors meet they become vulnerable against invasion from abstainers. This process would lead to a homogeneous A phase, but the latter is sensitive to the attack of cooperators. In this way, abstainers are now able to support the emergence of cooperation, which in turn restores the mechanism to maintain the coexistence of all competing strategies. Furthermore, regarding the phenomenon of cyclic dominance observed when ρ < 1.0, although using a different scenario and methodology, our results are compatible with previous research concerning mobility in the rock-paper-scissors game, where it is discussed that mobility can jeopardise cyclic dominance [25,65]. However, in the context of the VPD game, the enhancement of cooperation for ρ < ρ p is counter-intuitive because it diminishes the cooperators' ability to form larger clusters [47]. Besides, results show that when the agents are allowed to abstain, the population of mobile agents will never converge to full defection. Finally, it is noteworthy that results also echo the findings of previous research concerning the PD and VPD games on weighted networks [48,50,51], i.e., a coevolutionary model in which the link weights are also subject to evolution. In parallel, the ability of avoiding interactions either by weakening the link weight or by moving to another position acts as an important mechanism to strengthen cooperators against exploitation.

Discussion and Conclusions
This work investigates the role of mobility and dilution in a population of agents playing the voluntary prisoner's dilemma (VPD) game, also known as the optional prisoner's dilemma game, in a diluted square lattice network. We propose a coevolutionary model where both the agents' strategy and position are subject to evolution. In this model, in addition to the commonly applied imitation rules for the strategies [10], we also adopt a mobility rule in which agents who are performing worse (better) than their neighbours have more (less) chance to move. Thus, without loss of simplicity, this coevolutionary and asynchronous model is more realistic than the previous ones which consider random mobility with synchronous updating rules [47].
Research in this domain has claimed that the addition of abstention in the prisoner's dilemma game leads to a rock-paper-scissors type game, in which cooperation dominates abstention, abstention dominates defection, and defection, in turn, dominates cooperation, which describes the so-called cyclic dominance behaviour [19].
Interestingly, the present study shows that, in the context of the traditional VPD game for a fully populated network [10], the emergence of cyclic behaviour is biased by the use of the Fermi-Dirac distribution function (sigmoid) in the strategy adoption process. This sigmoid function is often employed to allow for irrational or unjustified decisions where agents occasionally copy the strategy of a worse or an equally performing neighbour [5,57,66,67,68]. We show that when agents make fully rational decisions such as only copying the strategy of better performing neighbours, the outcome changes drastically, making cyclic behaviour unsustainable in most cases. However, the present study shows that the mechanism that supports cyclic behaviour is fixed when agents are allowed to move due to a diluted interaction space.
In fact, the noisy strategy updating rule has been applied to avoid artifact or frozen outcomes. However, in the present study we show that it is also possible to avoid such frozen states in a more realistic way, where, for instance, agents are allowed to move and change their connections over time. Hence, a deterministic rule can be as efficient as the noisy Fermi-Dirac function if we assume a partly diluted system. Furthermore, by means of robust and systematic Monte Carlo simulations, results show that mobility plays a crucial role in promoting cooperation in the VPD game for a wide range of values of the temptation to defect b, and loner's payoff σ, including for scenarios of high b and density below the percolation threshold ρ < ρ p , which are known to be adverse for maintaining cooperative behaviour [27,29,30].
To conclude, this paper aims to bridge the gap between agent mobility and the concept of voluntary/optional participation in social dilemmas. In addition, it provides a novel perspective for understanding the foundations of cyclic dominance behaviour in the context of the prisoner's dilemma game with voluntary participation (VPD game). We hope this work can serve as a basis for further research on the role of abstention to advance the understanding of the evolution of cooperation in coevolutionary spatial games.