Evolutionary dynamics of Bertrand duopoly

Duopolies are one of the simplest economic situations where interactions between firms determine market behavior. The standard model of a price-setting duopoly is the Bertrand model, which has the unique solution that both firms set their prices equal to their costs—a paradoxical result where both firms obtain zero profit, which is generally not observed in real market duopolies. Here we propose a new game theory model for a price-setting duopoly, which we show resolves the paradoxical behavior of the Bertrand model and provides a consistent general model for duopolies.

Economic systems are exemplars of complex systems, in which the properties of the whole system are determined by, but not simply reducible to, the behavior of numerous interacting sub-units [1][2][3][4]. The inherent complexity of economic systems has manifested itself in the analysis of markets, where most effort has been devoted to two extreme cases: on the one hand, a monopoly (consisting of a single firm) and on the other, a perfectly competitive market (consisting of a large number of firms). The rationale for this restriction is that the analysis of these two cases is especially simple because strategic interactions between the firms can be ignored-in a monopoly there are no strategic interactions by definition, and in perfect competition it is reasonable to assume that no one firms' actions can significantly affect the behavior of the large number of other firms in the market. Thus, the theoretically most interesting case, which is also of great real-world importance, is that of a market consisting of a small number of firms. In such a situation, the strategic interactions between the firms are the key determinant of the behavior of the market. The paradigm of such a case is a duopoly, in which there are two firms in the market. Bertrand introduced a celebrated duopoly model [5], in which the firms decide simultaneously the price at which they will sell a homogeneous good, where the amount of the product consumed is determined by the price. It is assumed that the firm that sets the lower price captures the whole market, while the firm that sets the higher sells nothing. Given these assumptions the unique solution of the Bertrand model is that the firms should set a price that is equal to their costs of production, in which case they obtain zero profit [6][7][8][9][10]. Since real-world duopolies are generally profit-making, this outcome is referred to as the Bertrand Paradox.
In this letter we propose a new game theory model for price competition which is more realistic and general than the Bertrand model. We study this model analytically using adaptive dynamics and also through agentbased simulations, and show that it resolves the Bertrand paradox.
To place the new duopoly model in context we first briefly review the Bertrand model. Consider two firms i and j, selling a homogeneous good. The strategies of the firms are the unit prices p i and p j at which they sell their product (where p i , p j ∈ R + = [0, ∞)). We denote the corresponding quantities sold by the firms by q i (p i , p j ) and q j (p i , p j ), respectively. In the Bertrand model it is assumed that the cost incurred by a firm is determined by the amount of the product that it produces and subsequently sells. Let C(q(p i , p j )) be the cost to a firm of producing a quantity q(p i , p j ). Here we consider the standard case C(q(p i , p j )) = cq(p i , p j ), in which the firms have the same constant marginal cost c > 0. The quantity q of the product sold in the market at price p is given by the demand function q = d(p). Here we focus on the standard case of a linear demand function d(p) = a − bp, where a, b > 0. Since we will assume that d(c) > 0, it follows that a > bc.
Under these assumptions the revenue for firm i is where Θ is the Heaviside function, with the half-maximum convention (i.e., Θ(x) = 0 if x < 0, Θ(x) = 1 2 if x = 0, and Θ(x) = 1 if x > 0). Thus, the payoff to firm i is given by The payoff to firm j is correspondingly given by Π(p j , p i ). Therefore, for the case of a linear demand function the payoff to i in the classical Bertrand game is given by Since there is no incentive for any firm to set a price that results in a negative payoff (as a firm may always obtain a profit of zero by not selling its product) the feasible strategy space for this game is The unique Nash equilibrium of the Bertrand game with payoff (2) is that the firms set their prices equal to the marginal cost c [6][7][8][9][10]. At this Nash equilibrium both firms obtain a payoff of 0. This is the Bertrand paradox. Moreover, the Bertrand paradox persists in an evolutionary analysis of the Bertrand game, since the long-run behavior is equivalent to perfect competition [11]. Approaches to resolving the Bertrand paradox include: capacity constraints [6], sluggish consumers [12], endogenous timing of pricing decisions [13], endogenous choice of production technologies [14], product differentiation [15,16], demand uncertainty [17,18], product quality uncertainty [8,19], and non-constant marginal costs [11,[20][21][22].
A central assumption in the Bertrand duopoly model is that consumers buy purely from the firm with the lowest price. This results in the Bertrand game having a discontinuous payoff function, which in turn results in marginal pricing being the unique Nash equilibrium. Here we adopt the perspective advocated by Hotelling [15], who emphasized that this feature of the model is unrealistic since it assumes that even extremely small differences in the price of the two products will result in all consumers buying the marginally cheaper and none buying the slightly more expensive. In reality there are numerous factors that could result in firm i's product sometimes being preferred to firm j's product even if p j < p i . Examples of these are: there could be incomplete price information, so not all consumers are aware of the price ordering, leading them to sometimes buy the higher priced product; there may be a non-uniform spatial distribution of the products which results in it sometimes being more convenient for some consumers to buy the more expensive product; there may be an effective advertising campaign that leads to consumers in some cases preferring the more expensive product.
These considerations therefore suggest that a more realistic model would assume that the probability that firm i's product is preferred over firm j's product is a function of the difference in the prices of the products φ(p j − p i ), where φ is a smooth, monotonically increasing function, with lim x→−∞ φ(x) = 0 and lim x→∞ φ(x) = 1. In this letter we will study a new model of price competition based on this assumption, which we will refer to as the smooth Bertrand duopoly model. Given these assumptions, the expected revenue for firm i is p i d(p i )φ(p j − p i ) and the expected cost to firm i is cd(p i )φ(p j − p i ). Thus, for a linear demand function, the expected payoff to i in the smooth Bertrand game is given by The expected payoff to j is correspondingly given by π(p j , p i ). The payoff π(p i , p j ) is a smooth function, without the discontinuity associated with the classical Bertrand game, and thus the smooth Bertrand game can have quite different behavior to the classical game. We will assume that consumers buy either firm i or firm j's product and therefore φ(p j − p i ) + φ(p i − p j ) = 1, from which it follows that φ(0) = 1 2 and φ (0) = 0. We will let λ = φ (0), as this parameter will play an important role in the analysis of the smooth Bertrand game.
It is common to study the evolutionary dynamics of important games [23][24][25][26][27][28][29][30], and here we conduct such an investigation of the smooth Bertrand game. We analyze the dynamics of a population of agents interacting pairwise through the smooth Bertrand game using the deterministic framework of adaptive dynamics [31][32][33][34][35], which we briefly review. Consider a monomorphic population in which every agent adopts the same strategy, p. It follows from replicator dynamics [24] that the invasion fitness of a rare mutant strategy, p , in the resident population of p-strategists is f p (p ) = π(p , p) − π(p, p), where π(p , p) is the payoff to an p -strategist interacting with a p-strategist [34]. The evolution of the strategy p is then determined by the selection gradient D(p) = ∂f p ∂p | p =p , and the adaptive dynamics of p is governed by the differential equationṗ = αD(p), where α depends on the population size and on the mutational process at work [36]. For a constant population size, α simply scales the time variable, and we can, without any loss of generality, set α = 1. Singular strategies p are equilibrium points of the adaptive dynamics and are solutions of D(p ) = 0. If no such solution exists, the strategy p increases or decreases monotonically with time, according to the sign of D(p). If a singular strategy p does exist, then it is convergent stable, and hence an attractor for the adaptive dynamics, if D (p ) = dD dp | p=p < 0. If, however, D (p ) = dD dp | p=p > 0, then p is an evolutionary repeller. Initially, the population will approach a convergent stable singular strategy p ; however, the final evolutionary state depends on whether p is a maximum or minimum of the invasion fitness f p (p ). If p is a maximum, (i.e., then p is an evolutionarily stable strategy (ESS), and this represents an end state of the evolutionary process. If, however, p is a minimum then a population of p -strategists can be invaded by mutant strategies on either side of p , and p is an evolutionary branching point. In this case the population splits into two distinct and diverging clusters of strategies.
We will now use these methods to study the evolutionary dynamics of the smooth Bertrand game. The invasion fitness of a rare mutant strategy, p , in the resident population of p-strategists is . It follows, therefore, that the selection gradient is The adaptive dynamics of p is described by the differential equatioṅ The phase space of this dynamical system describing the adaptive dynamics is the strategy space S. The singular strategies p , which are equilibrium points of the adaptive dynamics, are solutions of D(p ) = 0. There are two solutions to this equation, with the smaller root being given by and the larger root beingp It is straightforward to show that p ∈ (c, m) andp > m, for all λ, and that p → c + andp → m + as λ → ∞. It follows from this that p is the unique singular strategy in the strategy space S, or equivalently, that p is the unique equilibrium point for the adaptive dynamics in the phase space S. It follows from the expressions for the selection gradient D(p) and the singular strategies p ,p that and Thus, the singular strategy p is convergent stable, or equivalently, p is a stable equilibrium point of the adaptive dynamics, and therefore p is an evolutionary attractor. Similarly,p is an unstable equilibrium point of the adaptive dynamics andp is therefore an evolutionary repeller. Moreover, since D(p) > 0 for all p ∈ [c, p ) and D(p) < 0 for all p ∈ (p , m] it follows that p is globally stable, and therefore, p is the global attractor for the adaptive dynamics in S. The graph of the selection gradient D(p) is shown in figure 1.
The final evolutionary fate of the system depends on whether p is a maximum or minimum of the invasion fitness. Since, p is a local maximum of the invasion fitness, and thus p is a local ESS. It is straightforward to show that p is in fact a global ESS, which therefore represents the final evolutionary state of the system. In this final state all agents use the strategy p , and since p > c, for all λ, the final evolutionary outcome is that all agents set a price p which exceeds their marginal cost. In this final state the payoff to each agent is π(p , p ) = 1 2 (p − c)(a − bp ) > 0, for all λ. Therefore, in the evolutionary end state in which each firm sets its price to be p , each firm receives a positive payoff. This result shows that the smooth Bertrand game satisfactorily resolves the Bertrand paradox.
Since p is a global ESS, this implies that p is a Nash equilibrium [24]. Therefore, our evolutionary analysis of the smooth Bertrand game also establishes the rational solution of this game. We note that in the limit in which we let the function φ tend to the Heaviside function Θ (in which λ → ∞), the smooth Bertrand game converges to the classical Bertrand game. In this limit p → c + and π(p , p ) → 0 + ; and thus, the standard results for the classical Bertrand game are recovered from the limiting behavior of the smooth Bertrand game. We also note that in the limit λ → 0 + , p → p c = 1 2 (c + m), where p c is the cartel price (i.e., p c = 1 2 (c + m) is the value of p that maximizes π(p, p), and therefore, p c is the price that would be set by both firms if they colluded to maximize their combined profits). Both p and π(p , p ) are monotonically decreasing in λ, with these quantities approaching the maximum cartel values as λ → 0 + and the minimum perfectly competitive values as λ → ∞.
The analytical results we have obtained for the smooth Bertrand game can be corroborated using agent based simulations. In these simulations we can relax some of the assumptions that were made in the analytical treatment, and we consider here polymorphic populations with stochastic evolutionary dynamics [34,35,[37][38][39][40][41][42].
Consider a population P of N agents representing firms, labeled by i = 1, . . . , N. The strategy of agent i ∈ P is the price p i ∈ S at which the corresponding firm sells its product. The agents interact pairwise by playing the smooth Bertrand game. In order to compute the payoffs in the smooth Bertrand game we must specify a functional form for the probability φ(p j − p i ). In our simulations we take where k is a positive real parameter. We note that for this choice of the function φ, λ = k 4 . The agent-based simulation consists of two stages. In the first stage we determine the payoffs the agents receive through interacting. We pick a pair (i, j) ∈ P at random, and without replacement so this pair is not picked again, and compute the payoffs to i and j. We repeat this process of picking pairs of agents and computing payoffs until all pairs of agents have been chosen and every agent in P has received a payoff. In the second stage, we update the strategies in P using the payoffs that have been computed in the first stage. We pick a pair (i, j) ∈ P at random (without replacement, so this pair is not picked again): then i adopts j's strategy with probability [38,41] p and j adopts i's strategy with probability The positive real parameter β is the selection strength [42]. We repeat this procedure until all pairs of agents have been chosen. In addition, we occasionally introduce new strategies (i.e., mutations) into the population, in the following way [41]: in a situation in which i would normally adopt j's strategy, then with probability μ, i instead adopts a strategy which is randomly picked from a normal distribution (truncated to lie in S) with mean equal to j's strategy and standard deviation σ. Performing both these stages once constitutes a single generation of the agent-based simulation. In the simulations reported here we have taken the selection strength to be β = 1. However, we have repeated the simulations for a wide variety of values of β and in all cases the results are unchanged from those shown here. The time-series for the average strategies and payoffs from the agent-based simulations for different values of λ are shown in figure 2. The results from the agent-based simulations are in excellent agreement with the analytical results. The asymptotic values of the strategies and payoffs from the agent-based simulations for a larger set of λ values are shown in figure 3. These results are again in very good agreement with the analytical results. It is apparent in figure 3 that both the strategies and payoffs converge to the cartel values as λ → 0 + and to the perfectly competitive values as λ → ∞. Thus, the smooth Bertrand model allows outcomes that range across the full spectrum of possible behaviors, from prices and payoffs that approximate the maximum possible values that would be obtained by a cartel to those close to the minimum possible values that would be obtained in perfect competition.
The results that we have found for the smooth Bertrand game may be understood intuitively in the following way. In the classical Bertrand game marginal pricing is the unique Nash equilibrium, with the consequence that both firms obtain zero profit. This is the Bertrand paradox, which is not observed in real-world duopolies which are generally profit-making. The zero-profit state is stable in the classical game because firms that attempt to increase their profits by increasing their prices are punished by losing their revenue entirely. However, in the smooth Bertrand game firms are not punished so harshly, and the zero-profit state is no longer stable because when firms are faced with the choice between making no profit using marginal pricing and making a positive profit by increasing prices with some consequent decrease in revenues, the latter option is the more desirable.
Finally, we would like to mention that it would be interesting to extend this work to an arbitrary number of firms, and thereby develop a new model of price-setting oligopolies. Such an extension could elucidate the results of [43].

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).