Paper The following article is Open access

Graph-combinatorial approach for large deviations of Markov chains

, and

Published 4 July 2022 © 2022 The Author(s). Published by IOP Publishing Ltd
, , Citation Giorgio Carugno et al 2022 J. Phys. A: Math. Theor. 55 295001 DOI 10.1088/1751-8121/ac79e6

1751-8121/55/29/295001

Abstract

We consider discrete-time Markov chains and study large deviations of the pair empirical occupation measure, which is useful to compute fluctuations of pure-additive and jump-type observables. We provide an exact expression for the finite-time moment generating function, which is split in cycles and paths contributions, and scaled cumulant generating function of the pair empirical occupation measure via a graph-combinatorial approach. The expression obtained allows us to give a physical interpretation of interaction and entropic terms, and of the Lagrange multipliers, and may serve as a starting point for sub-leading asymptotics. We illustrate the use of the method for a simple two-state Markov chain.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Fluctuations for discrete-time Markov chains in the large deviations regime

Markov chains are widely-used stochastic models of in and out-of-equilibrium physical systems. We consider a discrete-time ergodic Markov chain $X={\left({X}_{\ell }\right)}_{\ell =1}^{n+1}=({X}_{1},{X}_{2},\dots ,{X}_{n+1})$ evolving in a finite discrete state space Γ of N states according to the (irreducible and aperiodic) transition matrix Π. The matrix Π characterises the probability of going from a state X = i at time to a state X+1 = j at time + 1. We will use the index to refer to time and the indices i and j to refer to general states of the state space.

In this setting, one and two-point observables having the general form

Equation (1)

where f is any function that may depend both on the starting and landing state, are of fundamental importance to characterise the typical and fluctuating behaviour of the associated physical systems. [Notice that by taking f(i, j) = g(i), Cn in (1) can also cover the case of purely time-additive observables 3 .] Just to give an example, the observable in (1) can represent the number of transitions over [1, n] in a particular subset of the state space [1], obtained by fixing f = 1Δ, with Δ the characteristic function of the subset. Furthermore, in certain contexts, Cn can also express heat [2], two-point correlation functions, activities [36], particle and energy currents [7], efficiency [812], entropy production [6, 13, 14], and many others.

To study fluctuations of Cn , the probabilistic theory of large deviations and, in particular, the Donsker–Varadhan approach may be used as it offers analytical and numerical methods to calculate the large deviation (or rate) function

Equation (2)

characterising the time-leading exponential behaviour—provided there is one—of the probability distribution $\mathbb{P}({C}_{n}=c)$ [1, 1520]. The rate function in (2) is always positive and measures the extent of the fluctuations of Cn around its typical value c*, which, for ergodic Markov chains, is the unique zero of I [1618]. The existence of the rate function I is referred to as the validity of a large deviation principle for the observable Cn and can be seen as an extension of the weak law of large numbers as it provides information on the speed—exponential in n—of convergence of Cn to c*.

In the context of Markov chains, there are several ways to compute the rate function I. It is known that, by means of spectral large deviation techniques (see, for instance, [5, 2125]), one could calculate the scaled cumulant generating function (SCGF)

Equation (3)

where s is the Lagrange (or tilting, in the large deviation jargon) parameter dual to Cn = c. The SCGF Ψ represents the leading exponential behaviour of the moment generating function, associated with the observable in (1). To obtain the rate function I in (2), it would then be enough to Legendre–Fenchel transform the SCGF, provided it be a differentiable function—a result known as Gärtner–Ellis theorem [1618]. Although these methods serve well to the scope, variational techniques can also be employed and one may derive the rate function I by solving a variational problem [3, 20, 26]. The advantage of employing variational methods is, at least, twofold. In case of non-analytically solvable problems, variational methods offer ways to bound the true rate function (see, for instance, [27, 28]) and, at the same time, alternative numerical techniques—inherited from the fields of optimization theory and PDEs—are available to compute it [29].

In the case considered here, it is known that all the information on the fluctuations of one and two-point observables can be obtained by studying the pair empirical occupation measure

Equation (4)

as the value of Cn can be deduced via the formula

Equation (5)

Interestingly, the long-time behaviour of (4), denoted by $\rho ={({\rho }_{ij})}_{i,j=1}^{N}$, can be interpreted as the amount of time that the Markov chain X spends transiting from a state i to a state j of Γ [1618].

The pair empirical occupation measure of (4) is known to satisfy a large deviation principle of the form

Equation (6)

with rate function

Equation (7)

where $\nu ={({\nu }_{ij})}_{i,j=1}^{N}$ belongs to the set of probability measures satisfying two constraints: the global balance on the state space, i.e., ∑j νij = ∑j νji , such that the sum of probability density currents flowing in and out of an arbitrary state i is conserved, and the normalisation ∑i,j νij = 1 (with ∑j νij = μi the occupation measure). The rate function H in (7) is known to be finite, continuous, and convex for densities ν that satisfy the global balance on the state space, featuring minimum and zero for ν = ρ [16]. Here, it is interesting to notice that the rate function I associated with Cn , can be obtained variationally by solving the following contraction 4 (minimisation) problem

Equation (8)

where the constraint appearing beneath the inf symbol is the formula (5), which selects c, the fluctuation of interest for the observable Cn in (1).

The functional H in (7) is thus a key ingredient for the variational study of fluctuations in discrete-time Markov chains and, as mentioned, it plays a pivotal role in statistical mechanics as many interesting dynamical observables arising in physics have the two-point form in (1).

The form in (7) has been derived with various methods. Among these, the exponential tilting procedure combined with the Radon–Nikodym change of measure [16] (see [26] for continuous-time processes) holds a leading position as it offers a simple and straightforward way to tackle the calculation, provided that the form of the rate function for the i.i.d. process (or any other useful process) is known. We will review and discuss this method in section 2.

Although simple and well suited to large deviation estimates, the exponential tilting procedure does not allow for the calculation of o(n)-exponential sub-leading terms in the probability distribution of the pair empirical occupation measure (4). In the probability and applied statistics literature, however, exact combinatorial derivations that work at finite time can be found. These may lead to the evaluation of sub-leading order terms that, although not significant in the large deviation regime, would be important if one wanted to study transient regimes. The first combinatorial result goes back to [30], later on reviewed in [31], and more recently recalled in [32]. Another graph-combinatorial derivation for the probability distribution of the pair empirical occupation measure was proposed in [33] and later on extended in [34]. More recently, [35] provided an explicit—although not fully rigorous—expression of subleading terms in (6), and constructed a Gauge theory for typical fluctuations of Cn around its expected value.

In the main section 3 of our paper we use similar arguments to provide an alternative, exact expression for the moment generating function of the pair empirical occupation measure. We make use of notation and terminology that are more familiar to the theoretical physics audience, and show—in line with previous literature [17]—that our expression for the SCGF, akin to a Helmoltz (canonical) free energy, allows us to give a straightforward physical interpretation of all the terms and of the Lagrange multipliers that fix the necessary constraints. Furthermore, we establish a direct link with spectral methods and show an alternative variational formulation of the so-called driven process [1, 20, 36, 37] (the Markov process responsible for the creation of fluctuations in the large-deviation regime). In section 4 we show explicitly in a general two-state model the equivalence of our approach and the standard spectral techniques to compute the moment generating function at finite time n.

2. Pair empirical measure rate functional

In this section we show how the rate functional H in (7) can be derived via the exponential-tilting method. We start by writing the path-probability definition

Equation (9)

Equation (10)

where tij represents the number of jumps that the Markov chain X makes between nodes i and j, and in (10) we make use of the Markov property. We also notice in (9) that we can interpret the set of tij s as the elements of a matrix T, which will be a central object in the rest of this work.

We now introduce a new i.i.d. process ${X}^{\prime }={\left({X}_{\ell }^{\prime }\right)}_{\ell =1}^{n+1}=({X}_{1}^{\prime },{X}_{2}^{\prime },\dots ,{X}_{n+1}^{\prime })$ based on the probability distribution $\zeta ={\left({\zeta }_{i}\right)}_{i=1}^{N}$ on the state space and with its own pair empirical occupation measure that, with abuse of notation, have the same form of (4). A large deviation principle for the pair empirical measure of X' is known to hold (see, for instance, chapter 9 of [38] or section II.2 of [16]) with rate functional

Equation (11)

Consequently, we multiply and divide in the summation of (10) by the path-probability ${\mathbb{P}}^{\prime }({L}_{n}^{(2)}=T/n)$ of this i.i.d. process and then introduce an exponential function as follows

Equation (12)

Equation (13)

The derivation continues by observing, in the exponential function, the equality

Equation (14)

obtained by using the definition of the pair empirical measure (7). Hence, we get

Equation (15)

Eventually, by taking minus the logarithm of the probability $\mathbb{P}$, dividing by n, and taking the limit n we get

Equation (16)

where the matrix ν is defined as

Equation (17)

In the derivation of (16), we make use of the fact that ${L}_{n}^{(2)}\to \nu $, and also that for the probability ${\mathbb{P}}^{\prime }({L}_{n}^{(2)}=\nu )$ a large deviation principle holds with rate functional (11). The last formula obtained in (16) is exactly (7). We remark that it is only because of the long-time limit that we can get rid of the boundary term $\mathbb{P}({X}_{1})/{\mathbb{P}}^{\prime }({X}_{1})$ in (16) and thus get the form of the rate functional for the pair empirical occupation measure of the Markov process X. We also notice that, although extremely useful, the use of an i.i.d. process with its pair empirical rate functional is not strictly necessary for the purpose of the proof. Indeed, if the asymptotics of the pair empirical probability of another process were known and easy to handle, we could have tilted the path probability measure of the Markov process in (12) with respect to it and we would have obtained the same result. For further details on this and on how to best use the tilting method we refer to [1].

The derivation presented in this section makes use of methods that are well known in the large deviation community. Nevertheless, for a more rigorous proof of the large deviation principle for the pair empirical measure (4) having rate functional (7)—which focuses on lower and upper bounds over closed and open sets—we refer the reader to [16, 18, 38].

The derivation presented in this section takes into consideration only leading order terms in n and, furthermore, lacks some physical interpretations of the form of the rate functional (7). The finite n behaviour, captured by subleading terms in (6), is in general much harder to study than the large deviations regime. For the continuous-time setting, in [39] the authors use matrix product states to study finite-time large fluctuations of one-dimensional lattice models. For discrete Markov chains, estimates and bounds for subleading terms in (6) are known in the literature [40, 41], and derived by using spectral methods. In [35] the author proposes a characterisation of subleading terms using graph-combinatorial arguments. Using a similar approach as [35], we provide an exact formula for the moment generating function valid for any finite n, a first step towards an alternative derivation of the subleading terms in (6).

3. Graph-combinatorial approach

In this section, we present an alternative derivation of the rate functional associated with the pair empirical occupation measure in (4). The proposed derivation moves the focus from the probability distribution $\mathbb{P}$ and rate functional H in (4) to the moment generating function ZN,n and SCGF

Equation (18)

where, with abuse of notation with respect to (3), $s={\left({s}_{ij}\right)}_{i,j=1}^{N}$ is now a set of Lagrange parameters. This paradigm shift is equivalent to a change of ensemble in statistical mechanics [1]. Instead of working with the probability distribution $\mathbb{P}({L}_{n}^{(2)}=T/n)$ at a fixed t, we introduce Lagrange parameters sij s that fix the tij s only on average, and thus work with a moment generating function. The equilibrium statistical mechanics analogue would be a change from the microcanonical ensemble, where the energy is fixed, to the canonical ensemble, where only the average energy is fixed by the Lagrange parameter β, the inverse temperature.

In this canonical framework, thanks to Markovianity and ergodicity, it is known [17, 18] that we can map the large deviation problem to a spectral one. This is because the SCGF can be calculated as the logarithm of the dominant eigenvalue of the so-called tilted matrix ${{\Pi}}_{s}={\left({{\Pi}}_{s}\right)}_{ij}\enspace \forall i,j\in {\Gamma}$, which has the form

Equation (19)

Noticeably, thanks to a graph-combinatorial mapping [33, 34], we can derive an exact expression for the moment generating function ZN,n at finite N and n. In principle, the exact form ZN,n allows one to evaluate sub-leading terms (in n) that cannot be calculated within a purely large deviation approach as that of section 2. Historically, graph-combinatorial arguments similar to those used in this work have been proposed for cyclic Markov chains by Dawson and Good in [33], and later on extended for general Markovian paths by Goodman in [34]. The derivation that follows explains in the details, with a theoretical-physics approach, a similar graph-combinatorial calculation but moves the focus onto the moment generating function ZN,n of the pair empirical occupation measure. This allows us to naturally give a physical interpretation of the interaction and entropic terms in the SCGF λN (18).

3.1. An alternative expression for the moment generating function

The graph-combinatorial approach is based on the representation of the state space connectivity as a graph G with associated adjacency matrix A—see figure 2(a). This has elements Aij = 1 if state j can directly be reached from i, and 0 otherwise. In this context, we will refer to states also as nodes or vertices. The transition matrix Π of the Markov chain X, in turn, embeds in its elements the connectivity of the state space as Πij = Aij pij with pij the jump probability between i and j.

The moment generating function of the probability $\mathbb{P}({L}_{n}^{(2)}=T/n)$ is

Equation (20)

where $\mathbb{P}({X}_{1})$ indicates the probability distribution of our process at initial time n = 1 and $s={({s}_{ij})}_{i,j=1}^{N}$ indicates the set of tilting parameters.

The specific form of the distribution $\mathbb{P}({X}_{1})$ will play a role for finite time behaviour or sub-leading asymptotics, but it will not matter in the large deviation regime—it only amounts to a boundary term—provided that the graph G is strongly connected. For convenience, we choose $\mathbb{P}({X}_{1})={\delta }_{{X}_{1},1}$, viz the starting node is fixed to be node 1.

The core idea of our work is to perform a change of variables: we transform the sum over all states X , for ∈ {1, ..., n + 1}, to a sum over variables tij ∈ {0, ..., n}, for nodes i, jG. The new variable tij , as in the previous section 2, is the number of times the Markov chain jumps from state i to state j, in particular tij can be different from zero only if there is an edge in G between nodes i and j.

For a matrix T to represent the number of jumps of a chain of states (X1, X2, ..., Xn+1) the following constraints must be satisfied: (i) the total number of jumps is equal to the total length of the chain minus one, ∑ij tij = n, (ii) all jumps can be temporally arranged like domino tiles (1, X2), (X2, X3), ..., (Xn , Xn+1) reflecting the fact that if at time the Markov chain jumps to state i, then at time + 1 it has to start from state i. Constraints (i) and (ii) do not make the change of variables one to one—there can be many instances of the Markov chain that correspond to the same set of tij s. In fact, the variables tij do not carry any information regarding the temporal order of the jumps. In other words, given an instance of T we have to count in how many ways we can order the jumps as (1, X2), (X2, X3), ..., (Xn , Xn+1) so that ${\sum }_{\ell =1}^{n}{\delta }_{{X}_{\ell },i}{\delta }_{{X}_{\ell +1},j}={t}_{ij}$ and {X1, X2, ..., Xn+1} realises a walk in G: we call this number ΘT . Hence, we can express ZN,n as

Equation (21)

We now face the problem of computing ΘT . We notice that for many instances of T, this number is simply zero: this is because the aforementioned domino-like constraint (ii) imposes stringent conditions on the form of T. First of all, the set of edges (i, j), for which tij > 0, together with the union of all their extremes i and j must form a connected graph. This is because the Markov chain starting from a node i can only hop to neighbours of i according to the connectivity of G. Mathematically, this condition is equivalent to requiring that the dimension of the kernel of the Laplacian L = DinT is 1 [42], where Din is a diagonal matrix with elements ${({D}_{\text{in}})}_{ii}={\sum }_{j}{t}_{ji}$. Second, the number of times a Markov chain jumps towards a state i have to be related to the number of jumps starting from that state i, a phenomenon analogous to the Kirchhoff law in electric circuits that encodes the global balance of the dynamics. We hereby distinguish two possible scenarios in which these conditions on T are satisfied. In the first one, for every state the incoming flux and outgoing flux are equal, that is ∑j tij = ∑j tji : this situation corresponds to a Markov chain starting and ending in the same node, and we will refer to this as the cycle scenario. In the second one, for all but two states the incoming and outgoing fluxes are equal. The two special states are the initial, that we set to 1 choosing $\mathbb{P}({X}_{1})={\delta }_{{X}_{1},1}$, and the final, F, for which one must have ∑j t1j = 1 + ∑j tj1 and 1 + ∑j tFj = ∑j tjF : we will refer to this as the path scenario. This leads to a natural way to express

Equation (22)

where ${{\Theta}}_{T}^{\text{path}}$ $({{\Theta}}_{T}^{\text{cycle}})$ is the number of distinct permutations of the set of tij s in the path (cycle) scenario which give a realisation of a walk in G and the deltas enforce Kirchhoff law. The factor $1-{\delta }_{{\sum }_{j=1}^{N}{t}_{1j},0}$ ensures that the cycle will pass at least once from node 1: this condition is required because the starting node is node 1. We will show that it is not necessary to enforce connectedness explicitly because the expressions for ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$ are automatically zero when T is not connected.

We now note that we can interpret the matrix T as the adjacency matrix of a directed multi-graph MT with tij directed links, having unitary weight, between nodes i and node j—see figure 2(b). A directed multi-graph is a collection of nodes and directed links, in which multiple links between two nodes are permitted. We refer to the collection of links between two nodes as a multi-link. As a preliminary step in the computation of ${{\Theta}}_{T}^{\text{path}}$ $({{\Theta}}_{T}^{\text{cycle}})$, we consider a related combinatorial problem, that is counting how many paths there are on MT that start in 1 and end in F (cycles that start in 1) and pass through every link exactly once. We can interpret this as the number of non-distinct ways we can arrange the jumps like domino tiles that respect the matrix of jumps T. This number overestimates ${{\Theta}}_{T}^{\text{path}}$ (respectively ${{\Theta}}_{T}^{\text{cycle}}$). To see this, we can consider a multi-link in MT having at least two links l1 and l2. Given a path (or cycle) that passes through every link in MT , we can, for instance, generate another distinct one by swapping the order in which we visit l1 and l2. This new path (cycle) will not contribute to ${{\Theta}}_{T}^{\text{path}}$ $({{\Theta}}_{T}^{\text{cycle}})$, as the time-ordered jumps (1, X2), (X2, X3), ..., (Xn , Xn+1) are unaffected by the swap. Nonetheless, this calculation is a useful starting point as we can compute this number using results available in the literature [43], and we will show how to correct this overcounting later on.

3.2. Computation of ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$

So far we have described the key steps that underlie our approach, which are also summarised in the flowchart in figure 1. In the following instead we will provide the details of the calculation. For this reason, some definitions will be useful and we collect them in this paragraph. An Eulerian multi-graph is a multi-graph for which, at every node i, in-degree and out-degree are the same, viz ${k}_{i}^{\mathrm{i}\mathrm{n}}={k}_{i}^{\text{out}}$. Noticing that in MT we have ${k}_{i}^{\mathrm{i}\mathrm{n}}={\sum }_{j}{t}_{ji}$ and ${k}_{i}^{\text{out}}={\sum }_{j}{t}_{ij}$, it follows from Kirchhoff law that MT is either an Eulerian multi-graph (for the cycle scenario) or close to an Eulerian multi-graph (for the path scenario), in the sense that only the initial and final nodes do not satisfy kin = kout. An Eulerian cycle (path) on a multi-graph MT , as already mentioned in section 2, is a cycle (path) that passes through every link exactly once. We denote the number of Eulerian cycles (paths) with ec(MT ) (ep(MT )). Furthermore, we will indicate by ec(MT |χ) (ep(MT |χ)) the number of Eulerian cycles (paths) given some specified condition χ, that in our case will be a combination of the starting node 1, the final node F, the starting edge e1 and the final edge eF.

Figure 1.

Figure 1. Flowchart that summarizes the computation of ΘT . From top-left, we start from an instance of the matrix of jumps T. We first check if the total number of jumps is n and if the multi-graph MT associated with T is connected. Then, we proceed by checking if Kirchhoff law is satisfied. There are two possible positive scenarios: (i) it is satisfied in every node—cycle scenario; (ii) it is satisfied in every node except node 1 and another node which we call F: in the former node there is one more outgoing link, while in the latter there is one more incoming link—path scenario. In either of these cases, we can relate ΘT to the number of T-Eulerian cycles (or paths) in WT , which can be computed knowing the number of Eulerian cycles (or paths) in MT .

Standard image High-resolution image

In the literature on the topic, a result is known for the number ec(MT |e1) of Eulerian cycles of an Eulerian multi-graph MT with a fixed starting edge. This goes by the name of BEST theorem [4345] and reads

Equation (23)

where Ωw (MT ) is the number of arborescences, i.e., spanning trees rooted in a node w such that there exists a unique path from every vertex of MT to w. We note that Ωw does not depend on the choice of root w when MT is an Eulerian multi-graph, so that Ωw (MT ) = Ω(MT ) [44, 46]. Similarly, the rhs of (23) does not show any explicit dependence on the starting edge e1 because of the inherent symmetry in MT . An explicit expression for Ω(MT ) is given by

Equation (24)

where det is the determinant operator and Lw is a submatrix of the Laplacian of the Eulerian multi-graph MT obtained by removing (any) wth row and column, a result known in the literature as Tutte's theorem or Matrix tree theorem.

In the following, we first consider the path scenario. In this case, the multi-graph MT is not Eulerian, but we can make it so simply by adding a link eF from F to 1. We refer to this modified graph as ${\tilde{\mathbf{M}}}_{T}$. Using BEST theorem we have

Equation (25)

where we use the fact that the in-degree of node 1 is ${\sum }_{j=1}^{N}{t}_{j1}+1$ in ${\tilde{\mathbf{M}}}_{T}$. The number of Eulerian cycles starting from node 1 is related to $ec({\tilde{\mathbf{M}}}_{T}\vert {e}_{1})$ by $ec({\tilde{\mathbf{M}}}_{T}\vert 1)=ec({\tilde{\mathbf{M}}}_{T}\vert {e}_{1})\left({\sum }_{j}\,{t}_{j1}+1\right)$. Furthermore, ep(MT |1, F) is equal to the number of Eulerian cycles in ${\tilde{\mathbf{M}}}_{T}$ starting in 1 and ending with the link we added to construct it, viz $ec({\tilde{\mathbf{M}}}_{T}\vert 1,{e}_{F})$. This number can be computed by considering an Eulerian cycle in MT as a collection of loops passing through node 1. The number of these loops is given by the in-degree of node 1, so that we have $ep({\mathbf{M}}_{T}\vert 1,F)=ec({\tilde{\mathbf{M}}}_{T}\vert 1)/({\sum }_{j}{t}_{j1}+1)$. All these considerations put together give

Equation (26)

where we used ${\Omega}({\tilde{\mathbf{M}}}_{T})=\mathrm{det}({L}_{1})={{\Omega}}_{1}({\mathbf{M}}_{T})$ with L1 the cofactor of the graph Laplacian L obtained by removing the first row and column. We note that while ${\Omega}({\tilde{\mathbf{M}}}_{T})$ does not depend on the choice of the root since ${\tilde{\mathbf{M}}}_{T}$ is Eulerian, Ω1(MT ) does, because MT is not Eulerian.

We now consider the cycle scenario. In this case, since MT is already an Eulerian graph, we can readily express ec(MT , 1) as

Equation (27)

As previously argued, ep(MT |1, F) (ec(MT |1)) overestimates ${{\Theta}}_{T}^{\text{path}}$ $({{\Theta}}_{T}^{\text{cycle}})$. To correct this, one must consider all the links belonging to a given multi-link as totally equivalent. This boils down to considering a weighted graph WT (see figure 2(c)) in place of the multi-graph MT . The weighted graph WT has adjacency matrix T and directed links (e.g., between nodes i and j) obtained by merging all the multi-links (between i and j) in MT together. In analogy with [45], we define the notion of T-Eulerian cycle (path) as a cycle (path) that passes through every link (i, j) a number tij of times. With an abuse of notation, we denote the number of T-Eulerian cycles (paths) by ec(WT |χ) (ep(WT |χ)), as it will be clear by the graph we are considering whether we are referring to Eulerian or T-Eulerian cycles (paths). Crucially, in the cycle scenario ${{\Theta}}_{T}^{\text{cycle}}$ is equal to the number of T-Eulerian cycles starting from node 1, i.e., ec(WT |1) in WT , whereas in the path scenario ${{\Theta}}_{T}^{\text{path}}$ is equal to the number of T-Eulerian paths from 1 to F, i.e., ep(WT |1, F) in WT . The combinatorial factor connecting ep(WT |1, F) (ec(WT |1)) to ep(MT |1, F) (ec(MT |1)) is simply the number of permutations of links in a multi-link for every multi-link in MT

Equation (28)

Figure 2.

Figure 2. (a) State-space connectivity G, un-directed and un-weighted, with adjacency matrix A; (b) directed un-weighted multi-graph MT , with adjacency matrix T, a multi-link from i to j is composed by tij links; (c) directed weighted graph WT , with adjacency matrix T, the boldness of links is proportional to the integer weights tij .

Standard image High-resolution image

This allows us to write explicit expressions for ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$

Equation (29)

where we recall that L1 is—in both the cycle and the path scenarios—the submatrix of the graph Laplacian L obtained by removing the first row and column. We remark that, although the expressions for ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$ in (29) are formally the same, the variable T is of different nature in the path and cycle scenario as it satisfies different sets of constraints. With this expression, we can write the moment generating function explicitly as

Equation (30)

We note that the factor det(L1) kills configurations of T that have the null-space dimension of the Laplacian greater than 1. This ensures that we only consider graphs MT —equivalently, WT —that are connected, as it is known in the literature that the dimension of the null-space of the graph Laplacian is the number of connected components of a graph [42]. Remarkably, in equation (30) the contributions for paths and cycles are split, giving an interesting physical perspective. In general, this difference is more pronounced when n is small, in particular when the walker has not explored the full state space. In the limit of large n, contributions relative to paths and cycles are comparable and share the same asymptotics, as we show in the next section.

Compared to the spectral method to compute the moment generating function [17], which requires the computation of all eigenvalues and eigenvectors of an N × N matrix, our formula is computationally favourable when n is small and N is large. If n is large, instead, the spectral method is numerically more efficient.

3.3. Long-time asymptotics

Expression (30) is valid for every finite n, and can be used to derive the large n limit and, in principle, finite n corrections. In the following, we focus on the large deviation regime, which corresponds to taking n to be much greater than the longest relaxation time of the system τ(N), nτ(N). In this limit it is useful to rescale time-additive variables with n as in (17), as we can approximate the sums over t11, ..., tNN with integrals

Equation (31)

where $\vert {E}_{{\mathbf{W}}_{T}}\vert $ is the number of directed edges in the weighted graph WT and νij s are defined as in (17). In the rhs of (31) and in the following, by ∑ij and ∏ij we mean sums and products over (i, j) such that (i, j) is a directed link in WT . To leading order in n we obtain the following asymptotic expressions

Equation (32)

Equation (33)

Equation (34)

Equation (35)

The Kirchhoff constraints tend to the same form for large n, giving explicitly

Equation (36)

Equation (37)

We also notice that

Equation (38)

where we use the fact that the determinant is multi-linear in the rows and that each element in L1 is proportional to n by construction, so that $\mathrm{Tr}\left[\mathrm{log}\left(\frac{{L}_{1}}{n}\right)\right]$ is finite for large n. Remarkably det(L1) becomes sub-leading in the large n limit, while it may be an interesting term to study the finite-time transient behaviour of the Markov chain. Finally, the term ${\delta }_{{\sum }_{j}\,{t}_{1j},0}$ present in the factor $1-{\delta }_{{\sum }_{j}\,{t}_{1j},0}$ becomes negligible for large n.

Putting all together, we obtain to exponential leading order in n

Equation (39)

We note that the integrand in (39) can be brought to the form ${\mathrm{e}}^{n{\lambda }_{N}[\nu ]}$, with the following definitions:

Equation (40)

Equation (41)

Equation (42)

Equation (43)

Equation (44)

where epsilon and ηi are Lagrange multipliers fixing the respective constraints. In (40) each term has a clear physical interpretation: λ1, in (41), is the geometric—viz related to the connectivity of the graph G—entropy of a random walk on a graph with nodes and links contributions, akin to the entropy of a free particle; λ2, in (42), is the entropy due to the dynamics, encoded in the transition matrix; λ3, in (43), is the tilting potential necessary to drive the system towards a fluctuation of the pair empirical occupation measure; finally, λ4 in (44), enforces the normalisation and Kirchhoff-law (global balance).

We can calculate the leading order in n of (39) via a saddle-point approximation, arriving at

Equation (45)

where ν* = argminν,epsilon,η λN [ν] and ν* are the minimisers of λN [ν] with respect to the set of νij s, ηi s and epsilon. From the Euler–Lagrange equations for critical points of (40), we find the following implicit expression for ${\nu }_{ij}^{\ast }$:

Equation (46)

where the tilted matrix introduced in (19) appears. From (46) we can write self-consistent conditions for epsilon and ηi as follows:

Equation (47)

Equation (48)

which reveal that eepsilon is an eigenvalue of the tilted matrix Πs with right eigenvector components ${r}_{i}={\text{e}}^{-{\eta }_{i}}$ and left eigenvector components ${l}_{j}={\sum }_{k}{\nu }_{jk}^{\ast }/{\text{e}}^{-{\eta }_{j}}$. Substituting (47) into (40) we get

Equation (49)

and, in particular, since λN [ν*] is a maximum, eepsilon is the dominant eigenvalue of (19). The same conclusion can be reached by noticing that the left and right eigenvector elements in (47) and (48) are all positive, which is true only for the dominant eigenvalue. These arguments provide a direct link with spectral methods. In particular, (49) provides an expression for the logarithm of the dominant eigenvalue of the tilted matrix.

Remarkably, this approach also provides an alternative expression for the so-called driven (or effective) process. This is a modified Markov chain that explains how specific fluctuations are created in time [1, 20, 36, 37]; under certain conditions, it is equivalent to the original Markov chain conditioned to visiting the fluctuation of interest. Useful spectral and variational expressions of the driven process already appeared in the papers just mentioned. Here, we offer another explicit variational representation valid for discrete-time Markov chains. In agreement with [20], the minimisers ${\nu }^{\ast }=\left\{{\nu }_{ij}^{\ast }\right\}$ of the action functional (40) characterise the driven process transition matrix with components

Equation (50)

This last expression offers an alternative way to physically study and simulate the appearance of fluctuations and rare events in discrete-time Markov chain models.

Concluding, in (40) we have obtained λN , the SCGF associated with the probability distribution of the pair empirical occupation measure in (4). To get the rate functional (2) we only need to Legendre–Fenchel transform the SCGF in (40), i.e.,

Equation (51)

where in the last step we recognise the pair empirical rate functional (with the necessary constraints—mentioned and understood in (2)–fixed by the Lagrange multipliers in λ4).

Assuming that one is interested in studying large fluctuations of an observable of the form (1), we remark that the associated SCGF can be obtained simply replacing λ3[ν] in (43) with

Equation (52)

where s is the tilting parameter conjugated to Cn . For instance, in physics applications, it is often of interest to consider the empirical current ${\mathbb{J}}_{n}(i,j)={L}_{n}^{(2)}(i,j)-{L}_{n}^{(2)}(j,i)$, viz the antisymmetric part of the pair-empirical occupation measure in (4), or again the occupation measure itself ${L}_{n}(i)={\sum }_{j=1}^{N}{L}_{n}^{(2)}(i,j)$. The empirical current is an important observable as it allows us to estimate how far a system lies from equilibrium, whereas the occupation measure gives an estimate of the time spent by the system in each state of the state space.

4. Two-state model

In this section, in order give a more pedagogical understanding of how one could use (30) to derive leading, i.e., the SCGF in (40), and finite n behaviour, we compare our method with the more standard spectral approach on a simple two-state Markov chain. We show that the two methods give equivalent results and propose a physical interpretation of all terms appearing in the SCGF. We consider a general two-state Markov chain, whose transition matrix Π reads

Equation (53)

with p and q between 0 and 1. We choose to observe the flux between node 1 and node 2, that is

Equation (54)

The long-time behaviour of Cn is given by ${\mathrm{lim}}_{n\to \infty }\,{C}_{n}=\frac{pq}{p+q}=:{c}^{\ast }$. Intuitively, when t12 is large, the Markov chain jumps frequently from 1 to 2 and from 2 to 1; instead, when t12 is small the chain spends most of the time jumping from 1 to 1 and/or from 2 to 2. This situation is reminiscent of a particle in a double well potential immersed in a thermal bath, where temperature—that is, the strength of noise—regulates the frequency of jumps between the two minima. In this two-state model, the tilting parameter s plays a role analogous to the temperature.

4.1. Spectral approach

The moment generating function can be computed using spectral methods. We start from (20) (restricted to the case of the observable (54)), i.e.,

Equation (55)

which can be cast in the form

Equation (56)

where $\langle {\mathbb{P}}_{1}\vert =(1,0)$ is the vector of initial probabilities, |1⟩ = (1, 1) and Πs is the tilted matrix, viz (19) restricted to the case at hand, which reads

Equation (57)

We can use the spectral decomposition of Πs to get

Equation (58)

where Λ± are the eigenvalues of Πs and l±, r± the corresponding left and right eigenvectors, respectively. We notice that—for the spectral decomposition of Πs to be valid—left and right eigenvectors have to be bi-orthonormal.

By computing the eigenvalues and eigenvectors of Πs explicitly, we arrive at

Equation (59)

4.2. Graph-combinatorial approach

The moment generating function can also be computed using (30). Remarkably, this other formulation highlights two different contributions coming from cycles and paths travelled starting from state 1 of the state space. These can explicitly be written as

Equation (60)

Equation (61)

where in ${Z}_{2,n}^{\text{cycles}}(s)$ we made explicit the cycle contribution coming from staying for n consecutive steps on state 1, and in ${Z}_{2,n}^{\text{paths}}(s)$ the counting needs to start from t12 = 1 because to have a meaningful path contribution the Markov chain needs to hop at least once from state 1 to state 2. We remark that in this simple model det(L1) = t12 if t12 ≠ 0 (in such a case the term is absorbed in (60) and (61) by the binomial coefficients), while when t12 = 0 the Laplacian is a 1 × 1 matrix: L1 is thus an empty matrix, and we take its determinant to be 1 for consistency.

We can find an explicit expression for (60) and (61) analytically. We replace the delta functions appearing by their contour integral representations

Equation (62)

After making the substitution, we notice that the integrands in (60) and (61) are analytic functions everywhere except in 0. This allows us to deform the integration contour to a circle of radius epsilon ≪ 1. The reason for this is to avoid spurious poles in the following steps.

We now let all the sums run up to . This procedure is allowed as higher order terms in the sums do not affect the residue in 0. The infinite sums can be explicitly evaluated and by doing so we get

Equation (63)

Equation (64)

where

Equation (65)

Equation (66)

Notice that ${z}_{1}^{\ast }$ and ${z}_{2}^{\ast }$ are exactly the inverse of the eigenvalues found with spectral methods. We remark that the integrands in (63) and (64) have acquired new singularities, in the form of simple poles at ${z}_{1}^{\ast }$, ${z}_{2}^{\ast }$ and 1/(1 − p): these poles are unphysical, in the sense that their residue should not be considered when computing the contour integrals.

We can express ${Z}_{2,n}^{\text{cycles}}(s)$ and ${Z}_{2,n}^{\text{paths}}(s)$ as

Equation (67)

Equation (68)

Computing the residues we find

Equation (69)

Equation (70)

By summing these two contributions and replacing ${z}_{1}^{\ast }$ and ${z}_{2}^{\ast }$ from (65) and (66), we obtain exactly (59).

In figure 3 we show the functions ${Z}_{2,n}^{\text{cycles}}$ and ${Z}_{2,n}^{\text{paths}}$ and compare them with the moment generating function previously obtained via spectral methods.

Figure 3.

Figure 3. From top left to bottom right: cycles and paths contributions to the moment generating function, their sum (Comb.) obtained using the graph-combinatorial approach, in comparison with the moment generating function obtained via the spectral decomposition (Spect.) for increasing values of n. Comb. and Spect. curves fully overlap and the greenish colour shown is obtained by the combination of blue (Comb.) and yellow (Spect.). Furthermore, as expected, the cycles contribution to the moment generating function is smaller for s > 0 and larger for s < 0 than the paths one. To generate these plots, we have used p = q = 1/2.

Standard image High-resolution image

Evidently, the moment generating function obtained by summing up cycles and paths contributions completely matches the moment generating function obtained with spectral methods, as the two curves are indistinguishable. An advantage of the graph-combinatorial approach with respect to the spectral calculation is the possibility to split the contributions coming from cycles and paths. As expected for the simple model investigated, cycles contribute less to the moment generating function for s > 0 with respect to paths, and viceversa for s < 0. The reason for this is that in the path scenario the Markov chain has to jump at least once from 1 to 2, contributing to Cn . The larger the n, the less pronounced is this effect. We also show in figure 4 the ratio ${Z}_{2,n}^{\text{cycles}}/{Z}_{2,n}^{\text{paths}}$ for a few fixed values of the tilting parameter s as a function of time n. Noticeably, the ratios become constant for n big enough, supporting the fact that both ${Z}_{2,n}^{\text{cycles}}$ and ${Z}_{2,n}^{\text{paths}}$ share the same asymptotics for large n and differ only by a constant prefactor that is a function of s.

Figure 4.

Figure 4. Ratios $\frac{{Z}_{2,n}^{\text{cycles}}}{{Z}_{2,n}^{\text{paths}}}$ as a function of n for five different values of s, which are, from top to bottom: cyan, s = −1.0; magenta, s = −0.5; black, s = 0; red, s = 0.5; orange, s = 1.0.

Standard image High-resolution image

4.3. Large deviation regime

We now investigate fluctuations in the large-n limit computing the SCGF λ(s). We compare the spectral and the variational formulae, to highlight the benefits of both approaches.

Using the spectral approach, the logarithm of the dominant eigenvalue is the SCGF (59) and reads

Equation (71)

We can arrive at the same result by minimising (40). Noticing that Kirchhoff law reduces to ν12 = ν21, the action functional reduces to

Equation (72)

The determinant of the system of equations satisfied by the minimum of (73), which is linear in ν, must be 0 to have non-trivial solutions. This condition gives an equation for eepsilon (epsilon is the Lagrange multiplier fixing the normalisation condition) whose solution gives—through (49)—expression (72). This last can be replaced in the form of the minimisers obtained by solving the linear system, which read

Equation (73)

Equation (74)

Equation (75)

to get their explicit form as a function of p, q, and the tilting parameter s.

In the top-left and bottom-left panels of figure 5 we plot the minimisers (74)–(76) as a function of the tilting parameter s. For the top-left case, we use p = q = 0.5, while for the bottom case p = 0.5 and q = 0.9. We notice that when p = q, as in the top-left panel, ${\nu }_{11}^{\ast }={\nu }_{22}^{\ast }$ identically. This reflects a permutation symmetry of the system: when p = q, switching states 1 and 2 does not affect the transition matrix. In this case, the Markov chain smoothly transitions between two regimes: for s ≪ 0, the chain spends half of the time in node 1 and half on 2; for s ≫ 0, the chain spends all the time jumping from state 1 to state 2 and back. When pq, instead, for s < 0 the system smoothly transitions to a localised state, where the Markov chain is mostly located on 1 (resp. 2) if p < q (resp. p > q). Interestingly, we notice that for p < q the maximum of ${\nu }_{22}^{\ast }$ occurs at a finite and negative value of s.

Figure 5.

Figure 5. Top-left panel: plot of the minimizers ${\nu }_{11}^{\ast }$, ${\nu }_{12}^{\ast }$, ${\nu }_{22}^{\ast }$ of the form (73)–(75) for p = q = 0.5 as a function of s. We notice that the curve of ${\nu }_{11}^{\ast }$ coincides with that of ${\nu }_{22}^{\ast }$, due to the symmetry of p and q. Top-right panel: plot of all the contributions to the SCGF as defined in (42)–(44), (76), (77) of the two-state model and their sum as a function of s for p = q = 0.5. For this choice of parameters, we notice that the curve of λ1,states coincides with that of λ2, and interestingly they do not depend on s. Bottom-left panel: plot of the minimizers ${\nu }_{11}^{\ast }$, ${\nu }_{12}^{\ast }$, ${\nu }_{22}^{\ast }$ of the form (73)–(75) for p = 0.5, q = 0.9 as a function of s. Bottom-right panel: plot of all the contributions to the SCGF as defined in (42)–(44), (76), (77) of the two-state model and their sum as a function of s for p = 0.5 and q = 0.9.

Standard image High-resolution image

In the top-right and bottom-right panels of figure 5 we plot each contribution to the SCGF obtained with our approach alongside their sum. The SCGF is in perfect agreement with the one obtained using spectral methods. Furthermore, our approach allows us to understand the magnitude of each physical term. We split λ1(s), as defined in (41), into two terms as follows:

Equation (76)

Equation (77)

and plot them separately. In the top-right panel we used p = q = 0.5, while in the bottom-right panel we used p = 0.5 and q = 0.9. In both cases, when |s| is large we notice that λ1,states and λ1,links balance each other, and their sum is close to zero. This is because in both cases, the Markov chain spends most of the time in just a fraction of the available links. For s ≫ 0, the dominant contribution in both cases is due to the tilting term λ3(s). For s ≪ 0, we notice a striking difference: when p = q, both λ1,states and λ1,links tend to a finite value. This is because the chain still visits both node 1 and node 2. Instead, in the case pq, λ1,states and λ1,links tend to 0. This is because of the aforementioned localisation behaviour. In both cases, since the tilting term λ3(s) becomes negligible, the SCGF λ is well approximated by λ2, the dynamical entropy.

5. Conclusion

In this work we propose a way to study the large deviation regime of fluctuations of two-point observables of a discrete-time Markov chain. Adopting graph-combinatorial arguments similar to those in [3335], we show how to calculate the finite-time moment generating function and the SCGF, objects that have a clear interpretation in the framework of statistical physics as they correspond, respectively, to the canonical partition function and Helmoltz free energy. In particular, all terms of the Helmoltz free energy have a clear physical meaning—see (40) and following discussion. We establish a direct and explicit link with spectral methods, as the Lagrange multipliers in (40) can be shown to be the dominant eigenvalue and right eigenvector of the tilted matrix—see (47). Furthermore, from the minimisers ν* we show how to compute in a simple way the occupation measure on the nodes and the driven process.

We illustrate the benefits of our method in a general two-state model, for which we can compute analytically both the moment generating function and the SCGF. We show plots where we highlight the new information accessible with our method: in particular, we compare the different contributions of paths and cycles to the moment generating function. For the large deviation regime, analysing the minimisers ν* as well as all the terms in our formula for the SCGF, we find an interesting localisation behaviour of the Markov chain when the two-state model is not symmetric.

Remarkably, the finite-time expression for the moment generating function could be used as the starting point for future investigations on the role of sub-leading terms in the fluctuations of observables, for which to our knowledge not much is known. A remarkable contribution in this direction is [39], where authors use matrix products states to characterise fluctuations at finite time. An interesting avenue for future research would be to try to apply our methods in the continuous-time setting.

Furthermore, once we fix the state-space connectivity and probability weights, it would be interesting to understand the interplay between the long-time limit and the large number of states limit. In the framework of Markov chains satisfying detailed balance, this approach could, in principle, be adopted to investigate transient behaviour and metastability in rough energy landscapes, a problem relevant to many areas in statistical physics [47, 48]. More generally, Markov chains that satisfy global balance but not detailed balance are a paradigmatic model for out of equilibrium phenomena. In this context, understanding finite-time behaviour is challenging—see [49] for applications to biology. Out of equilibrium steady states are directly accessible in the large deviations framework [19, 50] and are of interest to many communities.

Finally, we remark that the Helmoltz free energy associated with the pair empirical occupation measure is a powerful tool to investigate dynamical phase transitions in fluctuations of one and two-point observables. For instance, in [21, 24] the authors show evidence of a localisation phase transition in random walks on random graphs. In an upcoming work, we intend to investigate toy models where this phenomenon can be analytically characterised using the approach outlined in this paper.

Acknowledgments

GC and FC are thankful to Gianmichele Di Matteo for insightful discussions and to Mayank Shreshtha for having designed figure 2. FC is grateful to Hugo Touchette for pointing to interesting literature in the topic and for the hospitality in Stellenbosch (South Africa) during the writing stage of the manuscript. GC is supported by the EPSRC Centre for Doctoral Training in Cross-Disciplinary Approaches to Non-Equilibrium Systems (CANES, EP/L015854/1).

Data availability statement

No new data were created or analysed in this study.

Footnotes

  • This marginalisation only works for discrete-time processes.

  • This term is commonly used in the large deviation theory jargon when referring to the solution of a variational problem passing from a higher-up level in the large deviation hierarchy to a lower one.

Please wait… references are loading.