No-boarding buses: Agents allowed to cooperate or defect

We study a bus system with a no-boarding policy, where a"slow"bus may disallow passengers from boarding if it meets some criteria. When the no-boarding policy is activated, people waiting to board at the bus stop are given the choices of \emph{cooperating} or \emph{defecting}. The people's heterogeneous behaviours are modelled by inductive reasoning and bounded rationality, inspired by the El Farol problem and the minority game. In defecting the no-boarding policy, instead of the minority group being the winning group, we investigate several scenarios where defectors win if the number of defectors does not exceed the maximum number of allowed defectors but lose otherwise. Contrary to the classical minority game which has $N$ agents repeatedly playing amongst themselves, many real-world situations like boarding a bus involves only a subset of agents who"play each round", with \emph{different subsets playing at different rounds}. We find for such realistic situations, there is no phase transition with no herding behaviour when the usual control paramater $2^m/N$ is small. The absence of the herding behaviour assures feasible and sustainable implementation of the no-boarding policy with allowance for defections, without leading to bus bunching.


I. INTRODUCTION
Bus transit systems play a vital role in moving people efficiently. Left on its own, however, buses tend to bunch and form clusters of buses moving together. The formation of such clusters can be intuitively understood as follows: Suppose buses are initially distributed evenly. Due to stochasticity in the number of people waiting at a bus stop as well as traffic conditions, a bus may happen to be slightly delayed at a bus stop. Once it leaves the bus stop, the bus that is trailing it experiences a slightly shorter headway such that it needs to pick up slightly less people than the original bus. This speeds up the trailing bus further, allowing it to catch up even more with fewer and fewer people at the subsequent bus stops for it to pick up due to the diminishing headway. Therefore, these two buses end up bunching together. Bunched buses reduce the efficacy of the system, because if a person misses a group of bunched buses, he essentially misses not one but multiple buses and have to wait longer for the next bus(es) to arrive. This problem has long been identified [1][2][3][4][5][6], with many studies conducted to propose possible rectifications. Some suggested methods include holding back buses so that they follow a prescribed schedule or to counteract the diverging headways between buses [7][8][9][10][11][12][13][14][15][16][17][18][19], stop-skipping to correct for the headways [14,[20][21][22][23][24], deadheading (i.e. having an empty bus move directly to a designated bus stop) [21,[24][25][26][27], carefully engineering the bus routes and locations of the bus stops [28], using buses with wide doors to speed up boarding/alighting [29][30][31], as well as a no-boarding policy where a "slow" bus only allows people to alight but disallows boarding [32].

A. No-boarding buses
In the no-boarding policy, the "slow" bus gets to speed up by saving time from otherwise getting stuck at the bus stop to pick up people, and allows the "fast" bus behind it (which would soon bunch into it, if no interference is carried out) to pick up these people instead -effectively slowing it down. Ref. [32] has worked out analytically, backed up by extensive simulations based on a real bus system, that the bus system as a whole would experience significant improvement in terms of reducing the overall average waiting time of people at the bus stop for a bus to arrive. The global improvement comes with minor local cost, however, as those denied boarding would doubtlessly have their own waiting times slightly extended. Nevertheless, the overall global gain far outweighs those minor local costs.
The purpose of this paper is to investigate the scenario where unlike in Ref. [32] with noboarding mandatorily enforced by the bus system when a bus is deemed as "too slow", here the people waiting to board at the bus stop are given the options of whether to cooperate or defect the no-boarding policy. Such a social situation is of immense interest to policymakers and bus operators because sometimes some people necessarily require service urgently and are willing to pay a premium for it -of course, provided that such an option is available in the first place. On top of that, certain weather conditions like thunderstorm, snow (in countries with winter), blazing heat, amongst others, may make waiting in the open bus stop undesirable. Besides, it is arguably less of a pain point to be on board the bus, seated and enjoying the air-conditioner albeit slowly moving, compared to waiting at the bus stop with uncertainty on when a bus would actually arrive.
Whilst allowing defections out of goodwill and compassion for the needy is certainly worthwhile, without a mechanism for check and balance, this may be subjected to abuse since everybody would instinctively like to board immediately instead of wasting extra time waiting for the next bus. But if too many people defect, the system as a whole would fail to maintain its optimal configuration of buses with bus bunching being a repercussiondefeating the original intention of the no-boarding policy. Such a situation is an example of the tragedy of the commons [33].

B. Inductive reasoning and bounded rationality
To simulate people evaluating choices and making what they would individually perceive as their respective best action [34,35], we adopt the description of humans with inductive reasoning and bounded rationality presented by Ref. [36] in studying the El Farol Problem.
When a bus announces that the no-boarding policy is activated, here in this paper, each person who would otherwise normally be allowed to board but denied in Ref. [32] would be given the options to cooperate (i.e. not board, and obediently wait for the next bus) or defect (i.e. defy the no-boarding rule and board anyway). The way they make their choices is determined as follows. We represent them as independent agents each endowed with s different strategies and memory m. These s strategies are ideally distinct for different agents, since different people would behave differently with their own beliefs and ideas.
Nevertheless, coincidentally similar strategies are allowed. In fact, if the number of agents N are way more than there are available strategies (which is determined by m, i.e. 2 m strategies, see Refs. [38,39] for discussions on this), then some agents must be sharing at least one strategy. The memory records the m most recent past results of the winning choices, i.e. whether the cooperators are winners or the defectors are winners. Then, for each such 2 m combinations of m binary historical outcomes, a strategy would specify the next action of whether to cooperate or defect. A strategy is thus a set of maps, one map for each of these 2 m possible combinations to an element of {cooperate, defect} to be made in the next round. A different strategy would map each of those 2 m combinations into a possibly different element of {cooperate, defect}. The performance of each of the s strategies is tracked based on their predictions and the actual outcomes, and a strategy gains a "virtual point" for correctly predicting the next outcome. These virtual points track how well each strategy is predicting the next outcome regardless whether they are used or not in making the actual choice. The current best performing strategy (i.e. with the highest virtual point) would be used to make the actual next decision. If multiple strategies are tied on virtual points, the actual one to use is decided randomly. Thus, this adaptivity property allows agents to learn which strategy amongst their s possible ones is so-called "current best" from their individual perspectives.
C. Winners are those in the minority group, collective cooperation, herding behaviour Note that in Refs. [36,37], the winning group (cooperators or defectors) are determined by the minority group, i.e. the side with fewer people. This rule presents itself with a natural feedback mechanism that never settles into any permanently desired group. If cooperators are the winners right now, then more defectors would like to be cooperators. But this would turn cooperators into the majority and defectors would become the winning group.
Such a minority feedback has led physicists to draw comparisons with physical systems possessing quenched disorder and phase transitions [39,[41][42][43][44][45][46][47][48]. For instance, the minority game has different phases, where in one such phase the best strategies for a significant number of agents, respectively, are frozen, i.e. these are always their best respective strategies and these agents never switch into a different strategy. Ref. [43] argued that this is akin to symmetry breaking of a spin system when the control parameter exceeds a critical value, analogous to spontaneous magnetisation. Such correspondences with physical systems have fruitfully opened up the use of statistical mechanical techniques being applied to the minority game [44][45][46], with applications even to financial markets [49][50][51][52] and other problems on resource allocations [47,48].
A key revelation from research in the minority game is that there is a regime where the entire community exhibits incredible collectively cooperative behaviour even though each agent is only interested in self-gain. Apart from this surprising positive global behaviour, another astonishing result arises when the number of agents, N , is much greater than the available independent strategies, 2 m : herding behaviour emerges where many agents with similar strategies behave as a crowd who take the same action. This is highly undesirable as it would lead to a skewed outcome, with a small minority group containing few winners.
In terms of resource utilisation, this means that there are many people who could have benefited but missed out because they joined the "majority bandwagon". Quantitatively, the variance from optimal utilisation per number of agents varies as a power law with respect In the bus problem that we are considering, this is also a resource allocation optimisation problem as the bus system strives to enhance the efficiency in serving commuters. The no-boarding policy essentially imposes a limit on the capacity, i.e. it is a bounded resource, which is being competed by the waiting commuters at the bus stop.
D. Who are winners in cooperating or defecting the no-boarding bus?
For the bus system, on the other hand, there is no clear, obvious, nor a unique generalised meaning for cooperators or defectors being in a "minority group". Does "less defectors compared to cooperators" imply that defectors "win"? Why should a naïve "less defectors compared to cooperators" allow defectors to declare victory? After all, a so-called successful implementation of the no-boarding policy aims to minimise defections, i.e. "zero defections is ideal", from the point of view of the bus system. Hence, a key part of this paper is to define and formalise what it means for the cooperators or defectors to win, instead of just counting the numbers in each group and seeing which has less people. One important aspect of the original minority game was to optimise usage of the resource, i.e. whilst the minority group wins, the system as a whole would be considered as "optimal" if the wastage or deviation from the ideal capacity is minimised. Once again offhand, it is not directly straightforward what this would be for the bus system, or if such a notion is even applicable here.
We note that Ref. [40] found that the actual historical outcomes of winners is not crucial in cultivating the emergence of collective cooperation. Instead, any exogeneous piece of information is sufficient to generate that community-wide learning. Therefore, as long as the bus system systematically decides who the winners are and this information is made clear to all agents (for instance, defectors get away and thus "win" this time, or they are all punished with a fine for defecting and so cooperators are "winners" at another time), this would probably also be true for the no-boarding bus where people can choose whether to cooperate of defect. In other words, winners being decided by the minority rule can be replaced by other winning criteria, whilst maintaining the key feature of the community's collective cooperation. We will verify this in this paper.
E. More total agents than those who are actually playing each round The classical minority game sets a fixed number of agents who repeatedly play amongst all of them. But in the real world, this is definitely not the case. Why should everybody always play each round? Some people may take a break or play only occassionally. In the original El Farol Problem [36], it is arguable that a realistic situation may be that there are say 200 people, but on average only about 100 of them would actually consider whether or not to go to the bar and compete for the 60 available places, with the remaining people taking a break from playing.
In fact in the bus system, the actual number of people boarding a bus each time is not even fixed! The total number of people using the bus system is overwhelmingly more than the number of people boarding each time, with even fewer who are actually boarding when the no-boarding policy is activated. In view of this, the classical minority game needs to be extended to a situation where there are necessarily more agents in the overall pool of people than those who are actually playing each round ; as opposed to a fixed number of agents who always play against each other every round.
The herding behaviour of the agents in the classical minority game is a major cause of concern for the bus system. If in some successive rounds a massive crowd decides to defect, this may slow down the bus too much and nullify the no-boarding policy -leading to bus bunching. But since the bus system is not quite like the classical minority game, we need to study it explicitly and find out what happens.
In the next section, we recapitulate on the classical setup and features of the minority game, stating the key results as well as the behaviour of the agents in adapting to the optimal situation and how efficiency of resource usage depends on factors like number of agents and memory [37-39, 41, 42]. We also consider an "open" minority game, where there are 2N total agents with only N of them (randomly selected) who are actually playing each round. There are other models where not every agent in the pool plays every round, for instance some agents only play when they receive enough information in a financial market [50] or financial agents trade at different time scales [52]. Then in Section 3, we investigate several situations for the bus system where a bus allows defectors to go through "victoriously" when there are few of them, but is capable of correcting the situation and "punishes" defectors when the defection level becomes high. Interestingly, the agents with such inductive reasoning and bounded rationality are indeed capable of adapting to the rules and optimising according to different situations at a global scale.
II. CLASSICAL MINORITY GAME 4. As the memory m gets larger, each agent tracks more information and becomes more complicated. This increased complexity turns out to make them behave more randomly. Thus, the universal curve approaches the random choice game asymptotically as m → ∞, but the community is generally still performing better than the random choice game. Fig. 1(b), the proportion of agents who stick to one of their two strategies shows a peak in the region where the entire community behaves most cooperatively. That proportion is substantial, with about half of the total number of agents experiencing a frozen strategy.

As shown in
6. Fig. 1 shows the situation where each agent is endowed with s = 2 strategies. With larger s, the added complexity tends to diminish the region where the curve is below the random choice game in Fig. 1(a), and the universal curve approaches the random choice game as s increases. 1. The plots in Fig. 1 for the classical minority game are in log-log and semi-log scales, respectively, with the x-axis being with respect to 2 m /N . The plots in Fig. 2 for the open minority game on the other hand are in the usual linear scale, with the x-axis being with respect to m (independent of N ).

RATIONALITY
Let us now move on to deal with the problem of interest, viz. a bus system with a no-boarding policy. Our simulation environment for a bus loop system is based on that developed in Ref. [32], with parameters tuned from a real university campus bus loop service [6,32]. A simplified setup with realistic parameters is to consider two buses serving one bus stop in a loop. We let both these buses move with a natural period of T = 12 minutes As it is, the two buses would quickly bunch into one single unit. In this case, the average waiting time of people at the bus stop for a bus to arrive is about 6 minutes and 11 seconds.
To prevent this, a "slow" bus would implement the no-boarding policy, i.e. it only allows alighting and then leave, if its phase difference as measured from the bus behind it (or the bus immediately behind it if there are more than two buses) becomes less than some critical θ 0 . Note that since the buses go in a loop, one can map this loop isometrically to the unit circle where the notion of a phase on the circle (0 • to 360 • ) is well-defined, and we can speak of the phase difference between two buses on this unit circle. For our simulations in this paper, we set θ 0 = 120 • so that if the phase difference between two buses gets smaller than that, the leading bus (which is "too slow") would disallow boarding at the bus stop, leaving the people there to be picked up by the trailing bus (to slow it down, since it is "too fast").
Ref. [32] has established significant improvements due to the no-boarding policy in preventing bus bunching and dramatically reducing the waiting time of people at the bus stop for a bus to arrive. With this setup where θ 0 = 120 • , the average waiting time is only about 3 minutes and 47 seconds, an improvement of almost 40% from the situation without the no-boarding policy. Instead of mandatorily enforcing the no-boarding policy when the phase difference drops below θ 0 , here we allow the people waiting to board to choose whether to cooperate or defect the no-boarding policy, when it is activated. Each person who would normally be allowed to board would decide for themselves, based on the inductive reasoning and bounded rationality described above. Then those who decide to defect would proceed to board as usual, whilst those who decide to cooperate would remain at the bus stop and wait for the next bus.
In this setup, the actual number of people who have to "play the no-boarding game" (i.e.
who are faced with no-boarding, but given the choices to cooperate or defect) varies each time, from as low as 1 person up to occasionally nearly 30 people. The mean number is around 10 to 16 people (Table I,  For a criterion on determining the winning group, suppose the bus system allows a fixed number of defectors to board when the no-boarding policy is implemented during that stop.
This number is arbitrarily decided by the bus system: Perhaps it could set a larger limiting number during lull times when the pressure on bus bunching is weaker and a smaller limiting number during busy times when a bus would have to stop longer to serve more passengers. If the number of defectors is within this limit, then they get away without any punishment. In this sense, the defectors are deemed as winners, whilst the cooperators are losers since they apparently "wasted their time for nothing" when they obeyed and waited for the next bus.
On the other hand, if the number of defectors exceeds the prescribed limit, then all those who defected that round are punished and charged a (possibly hefty) fee whilst the cooperators of that round are given a rebate for obediently following the rule. This information on how many defectors are allowed each time, however, is not announced to the agents as it is only meant for the bus system to decide on the winning group. Therefore, each agent has to individually weigh the pros and cons of defecting the no-boarding rule, if it is activated: Note also that by adapting to the limiting number of defectors set by the bus system, the entire community is capable of co-evolving their proportion of defection rates. For example with m = 2 (Table I)  down the "slow" bus and thus slightly increases the excess people who would face the noboarding policy. This is why allowing more defectors would increase the average number of agents who "play each round". seconds where no-boarding is mandatorily implemented with no option to defect. This is only a small cost, but gives people with urgency to board the option to do so.

IV. CONCLUDING REMARKS
The application of the inductive reasoning and bounded rationality description of agents to the no-boarding buses suggests that with the no-boarding policy improving the efficacy of the bus system [32], allowing a fixed number of defectors each time the policy is activated is sustainable since the agents are able to adapt to making use of the available resource, without too much drawback on the bus system's performance. It is also a fitting example of an "open" system where only a subset of the overall pool of agents actually play the "game" in each round, hence there is no emergence of herding behaviour even though many agents may possibly be sharing similar strategies. This is because in such an "open" system, each agent faces different groups of opponents each time and therefore continually co-evolve their best strategies. Furthermore, we see similar features to the open minority game presented in this paper, even though the winning criterion for the bus system is not determined by the group with a smaller number of people.
Crucially, the absence of the herding behaviour assures the feasibility of the no-boarding policy with allowance for defections. The variance from the prescribed limiting number of defectors for a group of agents with inductive reasoning and bounded rationality being smaller than that for a group of randomly behaving agents (Figs. 3 and 4), with no blowing up [unlike in the classical minority game when N 2 m , Fig. 1(a)], implies that the number of defectors reasonably hovers near the prescribed limit and it is unlikely that there would arise formation of huge crowds of people who decide to defect. Hence, the bus system is protected from the situation where there are surges of people defecting which may slow down the "slow" bus too much to an extent that bus bunching occurs, nullifying the intention of the no-boarding policy. This however, may not be the case if it is the same group of agents who always play repeatedly with each other as in the classical minority game, which does possess the herding regime.
In terms of a physical system like spin glass, the classical minority game corresponds to all atoms in the spin glass contributing to the overall state at each time step. In the open minority game that we presented here, the correspondence would be that only a subset of these atoms are contributing to the overall state at any time step, with the rest somehow shielded off and momentarily not participating in the interaction. Although this is clearly not how spin glass behaves, in other kinds of system like a financial market, it is arguable that only a subset of the overall pool of market players are active, with the others going about with other businesses and only being active at other times [50,52]. Thus, such a distinction between the classical and open minority games is important as we have shown that the properties of these games are different.