Random walkers with extreme value memory: modelling the peak-end rule

Motivated by the psychological literature on the"peak-end rule"for remembered experience, we perform an analysis within a random walk framework of a discrete choice model where agents' future choices depend on the peak memory of their past experiences. In particular, we use this approach to investigate whether increased noise/disruption always leads to more switching between decisions. Here extreme value theory illuminates different classes of dynamics indicating that the long-time behaviour is dependent on the scale used for reflection; this could have implications, for example, in questionnaire design.


Introduction
The use of stochastic processes in interdisciplinary modelling has a long history dating back at least to Bachelier's seminal work in finance [1] and encompassing applications to traffic flow [2], biological processes [3] and opinion dynamics [4], among others. Often such systems are treated with a Markovian (or memoryless) approximation which considerably simplifies the theoretical treatment. However, within the statistical mechanics community there has been much recent interest in characterizing the properties of non-Markovian models. There are many ways to incorporate memory effects including generalized Langevin or Fokker-Planck approaches [5,6,7], and the assumption of non-exponential waiting times in many-particle microscopic models [8,9,10]. At the random walk level, recent analytical studies in the physics literature have included the imaginatively named "elephant" random walker who remembers a property of the entire history [11], the "Alzheimer" random walker who recalls just the distant past [12,13], and "bold" and "timorous" random walkers who behave differently only when they are at the furthest point ever attained [14]. In fact, the elephant random walk can also be related to the older Pólya urn problem [15]; see [16] for a mathematical review of this and other random processes with reinforcement.
In real-life social and economic scenarios the dependence on memory is, of course, rather complicated. However, one psychological heuristic is the "peak-end rule" suggested by Kahneman et al. [17]. This asserts that the remembered utility (loosely speaking the pleasure or pain experienced) of a specific situation/episode is approximately given by the mean of the peak experience (best or worst) during the event and the final experience of that event. Notice in particular that this implies "duration neglect" [18] in the sense that the extreme and final snapshots are considerably more important in the memory than the overall length of the experience (even if it is an unpleasant one!). Empirical support for this peak-end approximation comes from situations ranging from the pain of medical procedures [19] to the pleasure of material goods [20]. Whilst other work paints a more complicated picture, particularly for extended events [21], it is clear that peak experiences play an important role and, to the best of our knowledge, such memory of extreme values is largely unexplored from the perspective of statistical physics. ‡ In this spirit, our contribution is to consider a random walk model where the probability of moving left or right depends on the maximum value of a random variable associated to each time step. As we will show, this can be thought of as a simple discrete choice model with a dependence on the "peak" of past experience. In particular, we use this framework to investigate whether increased noise in the model (corresponding perhaps to the "churn" of changing circumstances or some kind of disruption, cf. e.g., [22]), always leads to more switching between decisions. Using the mathematics of extreme values, we show that the answer to this question depends on the distribution of the random variable encoding the experience at each step. Our work thus helps to shed light on real-world issues as well as contributing to building up general understanding of memory effects in statistical mechanics models.
The remainder of the paper is structured as follows. In section 2 we describe our random walk formalism and explain its significance as an opinion choice model as well as the manner in which it extends previous work on generalized Pólya urns. In section 3 we employ extreme value theory to develop a heuristic argument for different classes of long-time behaviour depending on the distribution of past experience, and compare our predictions with simulations. Finally, in section 4, we conclude with a discussion of implications and open questions.

Random walk set-up and interpretation as decision model
We consider a one-dimensional random walker who steps right or left in discrete time, denoting by X + t the number of steps right up to time t and X − t the corresponding number left. Note that X − t = t − X + t by construction. For later convenience, we also define the corresponding time averages ("velocities") V + ≡ X + /t and V − ≡ X − /t, suppressing the notational dependence on t where no confusion can arise.
In addition, at each time step i we associate an independent identically distributed (i.i.d.) random variable U i from some known distribution with cumulative distribution function (c.d.f.) F (u). Crucially, the walker "remembers" the maximum value of U i for all rightward steps in its history and, separately, the maximum value of U i for all leftward steps. We denote these history-dependent random variables by U + t and U − t respectively so that formally we have Memory is then built into the dynamics via the setting of left and right hopping probabilities for the next step to depend on the current values of U + and U − . It is clear that the system is non-Markovian in position space although, of course, still Markovian in an enlarged state space including U + and U − . The central idea is that this set-up is analogous to a single agent in a discrete decision model where U is some kind of "utility" and the agent remembers its extreme value (corresponding to the "peak" part of Kahneman's peak-end rule) for each of two choices. Specifically, we fix the right and left stepping probabilities as functions of the random variables U + and U − to accord with the familiar "logit" choices of economic theory where T represents the level of noise in the decision. Note that, throughout the paper, we set U + 0 = U − 0 = 0 so that the two choices (step directions) are initially equally likely; as the system evolves the jump probabilities become asymmetric due to differing values of U + , and U − . In particular, note that U + and U − are monotonically increasing with the number of steps right and left respectively.
In passing, we note here that if U ± were deterministic functions of the velocities V ± the model would closely resemble the Pólya urn problem, familiar in the mathematics literature [15], where the probability of selecting a ball of a particular colour depends on the fraction of that colour chosen previously (in a similar manner, the elephant random walker of [11] steps left and right with probabilities depending on the relative number of such steps in his past). If, as here, the probability function is nonlinear, the urn model is known as a generalized Pólya process [16,23]. The crucial difference in our model is that U ± fluctuate in a correlated way due to the statistics of the extreme valueswe seek to determine the effect this has on long-time properties such as the average velocity of the random walker (or, equivalently, the proportion of time the agent makes each decision).
One might naively expect that, as for the Pólya models, in the large t limit our random walker approaches a fixed-point state where the relative probabilities (and hence the fraction of steps left and right) do not change. The symmetry of (2) suggests two specific types of fixed point: (i) (V + , V − ) = (1/2, 1/2) with U + and U − asymptotically equal and hence symmetric behaviour of the random walker, i.e., both choices equally likely in the long run; (ii) (V + , V − ) = (0, 1) or (V + , V − ) = (1, 0) with one of U + or U − negligibly small with respect to the other and hence an asymmetric random walker moving only in one direction, i.e., the agent frozen in one or other choice.
We shall demonstrate the existence of these fixed points more carefully later. For now, we remark that a pertinent question relates to their stability, in particular, whether the symmetric fixed point (1/2, 1/2) can be made stable by increasing the noise. This could be important for sharing the load between two different choices (e.g., two different routes or transport options). In the next section, we address this issue for different distributions of U before, in section 4, considering the added effect of the "end" part of the peak-end rule.

Outline of method
The behaviour of the model will obviously depend on the distribution of U. Our strategy is first to analyse the typical long-time dynamics by approximating U ± in (2) by the so-called "characteristic largest value" of extreme value theory and then, where relevant, to consider the added effect of fluctuations about this.
The characteristic largest value after X ± trials is defined for a given F (u) as the value of u at which F (u) = 1 − 1/X ± . It gives a straightforward way to obtain the scaling of the maximum value and is closely related to other properties of the full distribution [24], as we shall see for various cases in the following subsections. Our approach using the characteristic largest value leads to hopping probabilities depending on the number of left/right steps over the whole previous history and is thus in the spirit of the generalized Pólya urn models mentioned above or continuous-time analogues with current-dependent hopping rates [25]. One added subtlety here is that the resulting probabilities in our model depend directly on the number of steps left and right, X ± , not the fractions, V ± . Depending on the functional form of the characteristic largest value this may introduce an explicit time dependence in the dynamics for V ± as we shall see in some of the subsequent examples.
In fact, since V + + V − = 1 by construction, this procedure enables us for a given utility distribution F (u) to write P ± simply in terms of V ≡ V + − V − and possibly time t. Now, if the random variable V takes value v, the mean distance moved in the next step is given by the corresponding value of P + − P − which we denote by ∆ t (v). Hence, on average, we expect a "typical trajectory" given by the discrete mapping For cases where the function ∆ t (v) has no dependence on v it is immediately clear from (3) that fixed points v * should satisfy v * = ∆(v * ) and a standard "cobweb"type construction predicts that the stability of the fixed points is determined by the slope of the function ∆(v), see figure 1.  The slope of v t+1 as a function of v t is (t + ∆ ′ (v t ))/(t + 1) which approaches unity as t increases (indicated by dashed lines) leading to slow decay/growth.
Notice that due to the time dependence of the mapping (3) the decay towards fixed points is expected to be power-law rather than exponential in nature -physically this is because as the measurement time increases the last step has a smaller and smaller effect on the overall time average. In the following subsections we illustrate this approach for three qualitatively different scenarios corresponding to the three known families of extreme value theory. (In cases where ∆ t (v) itself depends on time we shall chiefly be interested in its behaviour as t → ∞.) We then confront the predictions with simulation results and discuss how fluctuations in the extreme values modify the picture of typical behaviour given above.

Exponential tails
To demonstrate the method, we first look in detail at the case where the utility variable U has an exponential distribution with c.d.f.
Here the characteristic largest value after X ± steps is given by (ln X ± )/λ so, substituting for U ± in (2), we approximate P ± in the long-time limit by or equivalently, in terms of the time averages, In this particular case, the probabilities can be written in terms of V ± without explicit time dependence illustrating a direct connection to the class of elephant random walker and Pólya urn type problems.
To determine the fixed points we further write P ± in terms of the net velocity by substituting V ± = (1 ± V )/2 to obtain so the function specifying the mean displacement of the next step can be compactly written as where v is the current value of the velocity. The fixed points satisfying v = ∆(v) are then seen by inspection to be v * = 0, ±1 as predicted from symmetry arguments.
Recall from the previous subsection that to determine which of these fixed points is stable we need to check the slope ∆ ′ (v * ); here it is straightforward to show from (8) that Hence if λT > 1, the mixed solution (V + , V − ) = (1/2, 1/2) is stable and the asymmetric frozen solutions (V + , V − ) = (1, 0) and (V + , V − ) = (0, 1) correspondingly unstable. Similarly, for λT < 1, the mixed solution is unstable and we predict that the random walker becomes frozen into ballistic motion in one of the two directions. To check this heuristic argument we appeal to Monte Carlo simulations -in figure 2 we show the empirical distribution of velocities at t = 100 for an exponential utility distribution (with mean λ = 1) and values of noise predicted to correspond to the two different cases (T = 0.8 and T = 4.0). We see good qualitative agreement of the simulations with the prediction: in the low-noise case the trajectories are sharply peaked around the asymmetric fixed points, i.e., V = V + − V − = ±1 (corresponding to each agent almost always making the same choice) whilst in the high-noise case the trajectories are clustered around the symmetric fixed point, i.e., V = V + − V − = 0 (corresponding to each agent sampling the two choices approximately equally). However, there is a finite width of the distribution about the fixed point(s) even for λT significantly greater than unity -to investigate this more systematically, and reveal possible finite time effects, we plot in figure 3 the standard deviation of the distribution as a function of T for increasing measurement times. This quantifies how close the trajectories end to symmetric or asymmetric fixed points without making a distinction between the two asymmetric states (whose selection is expected to depend sensitively on the agent's first few choices). According to the analysis of typical behaviour given above, we expect the standard deviation to be unity for λT < 1 and zero for λT > 1. In fact, although the simulations do show evidence of a transition around λT = 1, the situation is somewhat more complicated; in particular, the standard deviation clearly converges to a finite value even for λT > 1. These observed results for the standard deviation suggest that, even in the long-time limit, the properties of the model are sensitive to the full distribution of maximum values not just the characteristic largest value. As further evidence of this, we remark that if U ± were given deterministically by the characteristic largest value, the variance for λT > 1 could only be due to decay towards the stable fixed point and fluctuation of individual trajectories about the typical behaviour. In this case, V would be expected to obey a large deviation principle with "speed" t α [25,26] and the variance would eventually converge to zero, as confirmed in Appendix A where, for comparative purposes, we present simulation results from an artificial model with U ± at every time step set equal to (ln X ± )/λ. It is clear then that the limiting value of the variance in the full model is determined by fluctuations in the extreme values, leading to fluctuations of the typical trajectories themselves.
In the case of an exponential distribution it is, of course, well known that the limiting form of the rescaled maximum has a Gumbel distribution; here the c.d.f. of U ± is asymptotically given by where a ± = ln X ± /λ and b ± = 1/λ (see, e.g., [24,27] and references therein). The mode a ± coincides with the characteristic largest value calculated earlier while the mean is a ± + b ± γ (with γ the Euler-Mascheroni constant) so differs from it only by a constant amount. Taking account of the fluctuations, the maximum value random variables thus obey where the distribution of ǫ is given by the difference of the two Gumbel distributions as a logistic distribution with mean zero and variance π 2 /(3λ 2 ). Substituting the form of (11) in the expression for P + − P − and repeating the calculations leading to (8) one finds that for a given, non-zero, value of ǫ the position of the "symmetric" fixed point is shifted from zero although both its stability and the position of the asymmetric fixed points remain unchanged. A crude estimate of the standard deviation in the position of the symmetric fixed point can be obtained as the value of v ∈ (−1, 1) which solves the transcendental equation As seen in figure 3 this leads to a loose upper bound on the observed standard deviation. The actual standard deviation is smaller because the value of ǫ and the corresponding fixed point changes during the course of each trajectory. In Appendix B, we include this effect within a linear expansion to obtain an analytical expression for σ V which is a better approximation for λT large (see, again, figure 3). Notwithstanding the finite variance, the claim that one can control the long-time behaviour by increasing/decreasing the noise is well borne out by simulation. For example, in figure 4 we show the evolution of the standard deviation in a scenario where the noise level (and hence the stability of the fixed points) is abruptly changed after the first 500 time steps.
The cornerstone of extreme value theory, the Fisher-Tippett-Gnedenko theorem [28,29], asserts that the Gumbel distribution is universal for the rescaled maximum of i.i.d. random variables drawn from a distribution with exponential tails. However, the functional form of the scaling parameters depends on the distribution being considered. As a second example, we now make the arguably reasonable hypothesis that agents assign utilities according to a Gaussian with some mean µ and standard deviation ς. The mode of the limiting distribution is again given by the characteristic largest value as where Φ is the c.d.f. of the standard normal distribution. We note that, in this case, a ± retains a logarithmic dependence on X ± growing like ς √ 2 ln X ± as X ± → ∞. Ignoring the fluctuations about this value, an analogous argument to that given above then yields for large t It is clear that v * = 0 is a fixed point for all t and its stability is controlled by the slope As t → ∞ the slope tends to zero and hence we predict that the symmetric fixed point is always stable in the long run. However, since the dependence is only logarithmic in t, one still expects to see a noise-controlled transition for large but finite times. This is supported by the simulation results for standard deviation shown in figure 5. For comparison, we have set there the first two moments equal to those of the exponential distribution in figure 3 and the picture for the Gaussian case is qualitatively similar with the transition between low and high-noise regimes only weakly dependent on t. To complete the story, we can again consider fluctuations of the maximum values. In this case, the width of the Gumbel distribution is controlled by which decays to zero as X ± → ∞ (again see, e.g., [24,27]). Hence, in contrast to the exponential case, we do not expect a finite limiting velocity variance in the high-noise regime and indeed the relevant simulation data do seem to show a slow convergence towards zero. A similar argument applies to other distributions with exponential tails -the characteristic largest value of U ± converges to the mode of the corresponding Gumbel distribution and generically grows as (ln X ± ) γ where the power γ determines the long- time stability of the symmetric fixed point via The exponential distribution (4) corresponds to the special case of γ = 1, while for γ < 1 we expect long-term stability of the symmetric fixed point (∆ ′ t (0) < 1) and, for γ > 1 we expect long-term instability (∆ ′ t (0) > 1). For intermediate timescales, the system can be driven towards either the symmetric mixed state or the asymmetric frozen state by increasing or decreasing the noise, as we demonstrate for the Gaussian distribution in figure 6.

Power-law tails
The second class of extreme value statistics corresponds to distributions with power-law tails as typified by the Pareto distribution with c.d.f.
where u m is a lower bound and α > 0. In this case one finds that the characteristic largest value after X ± steps is given by u m (X ± ) 1/α leading to the approximation P ± = e um(V ± t) 1/α e um(V + t) 1/α + e um(V − t) 1/α (19) and hence, by the same method as previously, Again, for all t, we find a symmetric fixed point at v * = 0 with stability determined by the slope which is greater than unity for t > 2(T α/u m ) α . In fact, in the limit t → ∞, ∆ t (v) approaches the step function sgn(v) with corresponding stable fixed points at v * = ±1. In this Pareto case, it is straightforward to show that for large X ± the maximum value U ± has approximately a Fréchet c.d.f.
where the scale parameter s ± is given by u m (X ± ) 1/α . The mean of this distribution is only finite for α > 1 but the mode and the median are both proportional to s ± so, again, the trivially calculated characteristic largest value should give a good indication of the long-time behaviour. This is confirmed in figure 7 where the standard deviation of the velocity against noise strength is plotted for a case where the utility has a Pareto distribution with unit mean (x m = 0.5, α = 2). For all values of T , the velocity variance converges towards unity (corresponding to individual trajectories approaching the asymmetric fixed points at v * = ±1). We have also checked that the convergence is faster for smaller values of α ("longer tails"), noting in particular that the distribution of U has infinite mean for α ≤ 1. More generally, the Fréchet distribution is the limiting form for the rescaled maximum of i.i.d. random variables drawn from any distribution with power-law tails [28,29]. In all such cases we expect that ∆ ′ t (0) increases as some power of t, leading each agent to ultimately become frozen in a pure state corresponding to one or other choice. We remark that this power-law dependence is stronger than the logarithmic form found in section 3.2; even by increasing the noise we only expect to be able to favour the mixed state for short timescales, e.g., up to the order of (T α/u m ) α for the Pareto distribution considered above.

Bounded distributions
Finally, we consider distributions of U with finite upper bound (as might be appropriate, for instance, if an agent's memory is based on some predetermined numerical scale with given minimum and maximum). The obvious example is a uniform distribution with c.d.f.
whose characteristic largest value after X ± steps is given by r −(r −l)/X ± . Notice that, in contrast to the previous examples, this converges to a finite constant as X ± → ∞ which is an elementary consequence of the upper bound on the underlying distribution and already gives a hint at the long-time behaviour. In this case, following our previous heuristic procedure we have and The slope at the symmetric fixed point is given by which is less than unity for t > 2(r − l)/T and tends to zero as t → ∞. Hence we argue that the symmetric fixed point is always stable for long enough times (regardless of noise strength). This conclusion is supported by the simulation data in figure 8. The observed behaviour of the variance for very small T can be explained by noting that, for this version of the model, the walker can become stuck for finite times in a metastable fixed point at v * = ±1. To see this, we plot in figure 9 the function ∆ t (v) of (25) and examine its intersections with the line v, for fixed T and increasing t. Notice that, in this case, for t > 2(r − l)/T both symmetric and asymmetric fixed points are stable but separated by an unstable point whose position tends to ±1 as t → ∞. The corresponding potential landscape has metastable states at v * = ±1 and a trajectory can be trapped in such a state until fluctuations drive it over the barrier (whose height decreases with time) to the global minimum at v * = 0. We emphasize that, since the fixed point at v * = 0 is always stable except for very short times, the long-time behaviour of the system cannot be effectively controlled by altering the noise (confirmed by further simulations, not shown).  It is easy to show that, for large X ± , the maximum of i.i.d. uniform random variables has approximately the reversed (unit) Weibull distribution with scale parameter s ± = (r − l)/X ± and mean coinciding with the characteristic largest value calculated above. However, once again, the argument is more broadly applicable -for bounded distributions the limiting distribution of the rescaled maximum is generically reversed Weibull (also known as "Type III" extreme value) with mean and median typically approaching the upper bound as some inverse power of the number of trials [28,29]. In all such cases, ∆ ′ t (0) → 0 as t → ∞, meaning each agent is expected to ultimately end up in the mixed state with both choices equally likely. Physically, it is clear that in the long-time limit the system approaches a standard memoryless diffusive model with P + and P − fixed and equal.

Discussion
In this paper we have performed a detailed analysis of random walkers with peak memory dependence. Commensurate with the original motivation of Kahneman's peak-end rule, we now make some comments on the effect of also including an explicit dependence on the final value of the utility U. To be precise, we consider the mean of peak and final experience so that the left/right hopping probabilities in (2) are replaced by where here U + f (U − f ) is the value of U corresponding to the last step right (left). Note that the U ± f are much less strongly correlated than the U ± and hence we might expect their effect to cancel out on average in the long-time limit. At the same time, the dependence on U ± is here weakened in the sense that the values are now divided by 2T rather than T . The simulation results in figure 10, for the case of an exponential utility distribution, confirm that this modified model behaves very similarly to an increased noise version of the original model with the replacement of T by 2T .
With the preceding paragraph in mind, we argue that our work on the peak memory model also has implications for the peak-end case. Specifically, we have found that the effect of noise/disruption in the model is dependent on the properties of the utility distribution. Using the characteristic largest value to cast the problem as an effective Pólya process provides direct information on the long-time dynamics (in particular the stability of fixed points in the system) but, in order to quantify the observed variance, one needs to consider the distribution of maximum values. The examples we have shown, together with general arguments rooted in extreme value theory, reveal three qualitatively different classes of behaviour: • For utility distributions with heavy tails each random walker (agent) eventually becomes frozen in a state corresponding to one or other step direction (choice), regardless of the level of noise. • For utility distributions with exponential tails the situation is more subtle -for an e −λu decay we find a transition between frozen and mixed states at λT = 1; in other cases there is a weak logarithmic dependence on the time. Furthermore, for the special case of e −λu decay, even in the high-noise regime there is a finite variance around the mixed state which can be attributed to fluctuations in the maximum values.
Significantly, this implies that only for exponential-tailed utility distributions can one hope to increase the switching between decisions on intermediate/long timescales simply by increasing the noise. From a statistical physics point of view, it would be interesting in the exponential case to characterize the phase transition and scaling exponents at λT = 1, e.g., by calculating the correlation function. This latter is also relevant in the opinion dynamics context as it quantifies how sensitive the long-time behaviour is to the first step and thus the extent to which a particular one of the two asymmetric fixed points might be favoured by a small initial perturbation. Preliminary simulations suggest that the correlation function in the full model converges to zero for λT > 1 and, for λT ≪ 1, decreases more strongly with λT than in the artificial model of Appendix A (presumably due to the added fluctuations reducing the effect of the initial conditions). However, a more detailed analysis with finite-time scaling along the lines of [30] is deferred to a future publication.
Other extensions of the work might include considering coupled random walkers (modelling collective rather than individual memory) or peak effects in other opinion dynamics models, such as contact processes and voter models [31]. The peak-end rule itself can also be critiqued (see, e.g., the discussion in [21]) and realistic refinements such as the slow fading of peak memories in the distant past could be incorporated into the modelling. However, we believe that our current work represents an important first step beyond simply averaging over the whole past experience or just recalling the most recent history.
Although our analytical calculations thus far have been carried out in the framework of a specific toy model they highlight more generally the possible role of (experienced or remembered) utility distributions in maintaining and controlling behaviour. In particular, if agents are encouraged to reflect on their own experiences with a view to possibly modifying future choices then the outcome could be subtlely dependent on any numerical scale offered for reflection, e.g., whether or not it has a fixed upper bound. There is much scope here for future interdisciplinary work linking with current understanding in psychology and economics. general grounds to depend on the slope ∆ ′ (v * ) at the stable fixed point (cf. [25,32] for the continuous-time case). Specifically, as found in other models [11,33,34], one anticipates a dynamical phase transition at ∆ ′ (v * ) = 1/2 with diffusive fluctuations (i.e., t −1/2 decay of the velocity standard deviation) for ∆ ′ (v * ) < 1/2 and superdiffusive behaviour (with t ∆ ′ (v * )−1 decay) for ∆ ′ (v * ) > 1/2. Setting ∆ ′ (v * ) = 1/(λT ), the resulting predictions indeed fit the simulation data very well (with logarithmic corrections expected at the dynamical phase transition itself). Note that this provides a quantitative explanation for the observed slow convergence close to the transition point (λT = 1) which may also be relevant in the full model. Another important quantity for generalized Pólya processes such as this is the correlation function C(t) between the direction of the first step and the (t+1)th step [30]. The long-time limit of this quantity plays the role of an order parameter and simulation results in the present case (not shown) confirm a continuous phase transition at λT = 1 with C(t) converging to zero for all λT > 1.

Appendix B. Limiting variance in exponential case
To understand the fluctuations of the velocity, one needs to take account of the fact that each value of U ± persists for a number of time steps before being replaced by a larger one. In this appendix, we pursue such an approach with an exponential utility distribution to obtain an approximation for the long-time limit of the standard deviation in the high-noise regime.
We let τ be the time at which the last record occurred (i.e., the last change in either U + or U − up to the current time t) and note that, by definition, the value of P + − P − is unchanged for τ ≤ i ≤ t. In the exponential case U + − U − is given by (11) and typical trajectories should then obey the stochastic mapping v t+1 = τ v τ + (t + 1 − τ ) tanh 1 2λT ln 1+vt 1−vt + ǫ 2T t + 1 (B.1) where τ and ǫ are random variables and the second term in the numerator gives the expected displacement since the last record. In the case where τ = t and ǫ = 0 for all time (i.e., U + or U − updated at every step with no fluctuations) we recover the deterministic mapping of (3) with (8). In slight abuse of notation, we denote a typical trajectory by v t even in the present stochastic case and argue that it is the fluctuations of this trajectory which lead to the long-time variance of V . For large times we can approximate t + 1 ≈ t and v t+1 ≈ v t (time-averaged velocity changes slowly) which allows us to write v t ≈ ρ τ v τ + (1 − ρ τ ) tanh 1 2λT with ρ τ the fraction τ /t. To proceed further we then make three key assumptions: (i) |v t | and |ǫ| are sufficiently small that we can approximate the tanh term by a linear function.
(ii) The X + right steps (and therefore also the X − left steps) are uniformly distributed throughout the trajectory.
All three of these assumptions are expected to fail as λT approaches 1 but they do facilitate analytical progress for λT ≫ 1. First, we use assumption (i) to make a linear expansion and trivially rearrange to obtain where for convenience we have rescaled to ǫ ′ which has a standard logistic distribution with variance π 2 /3. To obtain the distribution of ρ τ , we first consider separately the fractional times ρ + τ and ρ − τ for the last records corresponding to right and left steps respectively. Each of the X ± previous steps is equally likely to have produced the maximum value so, with assumption (ii), we have ρ ± τ ∼ Uniform[0, 1] in the longtime limit. Then, by straightforward calculation, ρ τ = max(ρ + τ , ρ − τ ) is governed by a triangular distribution on [0, 1] with mode 1. Now, in the long-time limit, we expect the distribution of v τ to be the same as that of v t , both characterized by standard deviation σ V . Hence using the independence assumption (iii), together with the symmetry of trajectories around zero, we obtain where the functions p(x) and q(x) are given by expectations with respect to the distribution of ρ τ : Rearranging and substituting in for σ ǫ ′ yields the final approximation . (B.8) The expression in (B.8) clearly diverges at λT = 1 but we can estimate its range of applicability by considering assumption (i). Specifically, x approximates tanh(x) to within 10% for |x| 0.55 so, since v t and ǫ, are strongly correlated we require σ V λT + σ ǫ ′ 2λT 0.55 (B.9) which is satisfied for λT 2.3. Indeed (B.8) is seen to provide a reasonable approximation to the simulation data in this regime (cf. figure 3). The small remaining discrepancy is probably mainly due to the failure of assumption (iii); in particular, v τ is not strictly independent of ǫ.