Graph-combinatorial approach for large deviations of Markov chains

Giorgio Carugno; Pierpaolo Vivo; Francesco Coghi

doi:10.1088/1751-8121/ac79e6

1. Fluctuations for discrete-time Markov chains in the large deviations regime

Markov chains are widely-used stochastic models of in and out-of-equilibrium physical systems. We consider a discrete-time ergodic Markov chain $X={\left({X}_{\ell }\right)}_{\ell =1}^{n+1}=({X}_{1},{X}_{2},\dots ,{X}_{n+1})$ evolving in a finite discrete state space Γ of N states according to the (irreducible and aperiodic) transition matrix Π. The matrix Π characterises the probability of going from a state X_ℓ = i at time ℓ to a state X_ℓ+1 = j at time ℓ + 1. We will use the index ℓ to refer to time and the indices i and j to refer to general states of the state space.

In this setting, one and two-point observables having the general form

$\begin{equation}{C}_{n}=\frac{1}{n}\sum\limits _{\ell =1}^{n}f({X}_{\ell },{X}_{\ell +1}),\end{equation} \tag{ 1 }$

where f is any function that may depend both on the starting and landing state, are of fundamental importance to characterise the typical and fluctuating behaviour of the associated physical systems. [Notice that by taking f(i, j) = g(i), C_n in (1) can also cover the case of purely time-additive observables³ .] Just to give an example, the observable in (1) can represent the number of transitions over [1, n] in a particular subset of the state space [1], obtained by fixing f = 1_Δ, with Δ the characteristic function of the subset. Furthermore, in certain contexts, C_n can also express heat [2], two-point correlation functions, activities [3–6], particle and energy currents [7], efficiency [8–12], entropy production [6, 13, 14], and many others.

To study fluctuations of C_n, the probabilistic theory of large deviations and, in particular, the Donsker–Varadhan approach may be used as it offers analytical and numerical methods to calculate the large deviation (or rate) function

$\begin{equation}I(c)=-\underset{n\to \infty }{\mathrm{lim}}\,\frac{1}{n}\,\mathrm{ln}\,\mathbb{P}({C}_{n}=c)\end{equation} \tag{ 2 }$

characterising the time-leading exponential behaviour—provided there is one—of the probability distribution $\mathbb{P}({C}_{n}=c)$ [1, 15–20]. The rate function in (2) is always positive and measures the extent of the fluctuations of C_n around its typical value c*, which, for ergodic Markov chains, is the unique zero of I [16–18]. The existence of the rate function I is referred to as the validity of a large deviation principle for the observable C_n and can be seen as an extension of the weak law of large numbers as it provides information on the speed—exponential in n—of convergence of C_n to c*.

In the context of Markov chains, there are several ways to compute the rate function I. It is known that, by means of spectral large deviation techniques (see, for instance, [5, 21–25]), one could calculate the scaled cumulant generating function (SCGF)

$\begin{equation}{{\Psi}}_{N}(s){:=}\underset{n\to \infty }{\mathrm{lim}}\,\frac{1}{n}\,\mathrm{ln}\,\mathbb{E}\left[{\mathrm{e}}^{ns{C}_{n}}\right],\end{equation} \tag{ 3 }$

where s is the Lagrange (or tilting, in the large deviation jargon) parameter dual to C_n = c. The SCGF Ψ represents the leading exponential behaviour of the moment generating function, associated with the observable in (1). To obtain the rate function I in (2), it would then be enough to Legendre–Fenchel transform the SCGF, provided it be a differentiable function—a result known as Gärtner–Ellis theorem [16–18]. Although these methods serve well to the scope, variational techniques can also be employed and one may derive the rate function I by solving a variational problem [3, 20, 26]. The advantage of employing variational methods is, at least, twofold. In case of non-analytically solvable problems, variational methods offer ways to bound the true rate function (see, for instance, [27, 28]) and, at the same time, alternative numerical techniques—inherited from the fields of optimization theory and PDEs—are available to compute it [29].

In the case considered here, it is known that all the information on the fluctuations of one and two-point observables can be obtained by studying the pair empirical occupation measure

$\begin{equation}{L}_{n}^{(2)}(i,j)=\frac{1}{n}\sum\limits _{\ell =1}^{n}{\delta }_{{X}_{\ell },\;i}{\delta }_{{X}_{\ell +1},j}\quad \mathrm{\forall }\;i,j\in {\Gamma},\end{equation} \tag{ 4 }$

as the value of C_n can be deduced via the formula

$\begin{equation}{C}_{n}=\sum\limits _{i,j=1}^{N}f(i,j){L}_{n}^{(2)}(i,j).\end{equation} \tag{ 5 }$

Interestingly, the long-time behaviour of (4), denoted by $\rho ={({\rho }_{ij})}_{i,j=1}^{N}$ , can be interpreted as the amount of time that the Markov chain X spends transiting from a state i to a state j of Γ [16–18].

The pair empirical occupation measure of (4) is known to satisfy a large deviation principle of the form

$\begin{equation}\mathbb{P}\left({L}_{n}^{(2)}=\nu \right)={\mathrm{e}}^{-nH[\nu ]+o(n)},\end{equation} \tag{ 6 }$

with rate function

$\begin{equation}H[\nu ]=\sum\limits _{i,j}{\nu }_{ij}\,\mathrm{ln}\left(\frac{{\nu }_{ij}}{{\mu }_{i}{{\Pi}}_{ij}}\right),\end{equation} \tag{ 7 }$

where $\nu ={({\nu }_{ij})}_{i,j=1}^{N}$ belongs to the set of probability measures satisfying two constraints: the global balance on the state space, i.e., ∑_j ν_ij = ∑_j ν_ji, such that the sum of probability density currents flowing in and out of an arbitrary state i is conserved, and the normalisation ∑_i,j ν_ij = 1 (with ∑_j ν_ij = μ_i the occupation measure). The rate function H in (7) is known to be finite, continuous, and convex for densities ν that satisfy the global balance on the state space, featuring minimum and zero for ν = ρ [16]. Here, it is interesting to notice that the rate function I associated with C_n, can be obtained variationally by solving the following contraction⁴ (minimisation) problem

$\begin{equation}I(c)={\mathrm{inf}}_{\begin{subarray}{c}\nu :\\ c=\sum\limits _{i,j}f(i,j){\nu }_{ij}\end{subarray}}\,H[\nu ],\end{equation} \tag{ 8 }$

where the constraint appearing beneath the inf symbol is the formula (5), which selects c, the fluctuation of interest for the observable C_n in (1).

The functional H in (7) is thus a key ingredient for the variational study of fluctuations in discrete-time Markov chains and, as mentioned, it plays a pivotal role in statistical mechanics as many interesting dynamical observables arising in physics have the two-point form in (1).

The form in (7) has been derived with various methods. Among these, the exponential tilting procedure combined with the Radon–Nikodym change of measure [16] (see [26] for continuous-time processes) holds a leading position as it offers a simple and straightforward way to tackle the calculation, provided that the form of the rate function for the i.i.d. process (or any other useful process) is known. We will review and discuss this method in section 2.

Although simple and well suited to large deviation estimates, the exponential tilting procedure does not allow for the calculation of o(n)-exponential sub-leading terms in the probability distribution of the pair empirical occupation measure (4). In the probability and applied statistics literature, however, exact combinatorial derivations that work at finite time can be found. These may lead to the evaluation of sub-leading order terms that, although not significant in the large deviation regime, would be important if one wanted to study transient regimes. The first combinatorial result goes back to [30], later on reviewed in [31], and more recently recalled in [32]. Another graph-combinatorial derivation for the probability distribution of the pair empirical occupation measure was proposed in [33] and later on extended in [34]. More recently, [35] provided an explicit—although not fully rigorous—expression of subleading terms in (6), and constructed a Gauge theory for typical fluctuations of C_n around its expected value.

In the main section 3 of our paper we use similar arguments to provide an alternative, exact expression for the moment generating function of the pair empirical occupation measure. We make use of notation and terminology that are more familiar to the theoretical physics audience, and show—in line with previous literature [17]—that our expression for the SCGF, akin to a Helmoltz (canonical) free energy, allows us to give a straightforward physical interpretation of all the terms and of the Lagrange multipliers that fix the necessary constraints. Furthermore, we establish a direct link with spectral methods and show an alternative variational formulation of the so-called driven process [1, 20, 36, 37] (the Markov process responsible for the creation of fluctuations in the large-deviation regime). In section 4 we show explicitly in a general two-state model the equivalence of our approach and the standard spectral techniques to compute the moment generating function at finite time n.

2. Pair empirical measure rate functional

In this section we show how the rate functional H in (7) can be derived via the exponential-tilting method. We start by writing the path-probability definition

$\begin{align}\hfill \mathbb{P}\left({L}_{n}^{(2)}=\frac{T}{n}\right)& {:=}\mathbb{P}\left({L}_{n}^{(2)}(i,j)=\frac{{t}_{ij}}{n}\quad \forall \,i,j\in {\Gamma}\right)\hfill \\ \hfill & \,=\sum\limits _{{X}_{1},{X}_{2},\dots ,{X}_{n+1}}\mathbb{P}({X}_{1},{X}_{2},\dots ,{X}_{n+1}){\delta }_{{L}_{n}^{(2)},T/n}\hfill \end{align} \tag{ 9 }$

$\begin{equation}\quad \quad \quad \quad \qquad \;=\sum\limits _{{X}_{1},{X}_{2},\dots ,{X}_{n+1}}\mathbb{P}({X}_{1}){{\Pi}}_{{X}_{1},{X}_{2}}\dots {{\Pi}}_{{X}_{n},{X}_{n+1}}{\delta }_{{L}_{n}^{(2)},T/n},\end{equation} \tag{ 10 }$

where t_ij represents the number of jumps that the Markov chain X makes between nodes i and j, and in (10) we make use of the Markov property. We also notice in (9) that we can interpret the set of t_ijs as the elements of a matrix T, which will be a central object in the rest of this work.

We now introduce a new i.i.d. process ${X}^{\prime }={\left({X}_{\ell }^{\prime }\right)}_{\ell =1}^{n+1}=({X}_{1}^{\prime },{X}_{2}^{\prime },\dots ,{X}_{n+1}^{\prime })$ based on the probability distribution $\zeta ={\left({\zeta }_{i}\right)}_{i=1}^{N}$ on the state space and with its own pair empirical occupation measure that, with abuse of notation, have the same form of (4). A large deviation principle for the pair empirical measure of X' is known to hold (see, for instance, chapter 9 of [38] or section II.2 of [16]) with rate functional

$\begin{equation}{H}_{\text{i.i.d.}}[\nu ]=\sum\limits _{i,j}\,{\nu }_{ij}\,\mathrm{ln}\left(\frac{{\nu }_{ij}}{{\mu }_{i}{\zeta }_{j}}\right).\end{equation} \tag{ 11 }$

Consequently, we multiply and divide in the summation of (10) by the path-probability ${\mathbb{P}}^{\prime }({L}_{n}^{(2)}=T/n)$ of this i.i.d. process and then introduce an exponential function as follows

$\begin{align}\hfill \mathbb{P}\left({L}_{n}^{(2)}=\frac{T}{n}\right)& =\sum\limits _{{X}_{1},{X}_{2},\dots ,{X}_{n+1}}\mathbb{P}({X}_{1}){{\Pi}}_{{X}_{1},{X}_{2}}\dots {{\Pi}}_{{X}_{n},{X}_{n+1}}\hfill \\ \hfill & \quad \times \frac{{\mathbb{P}}^{\prime }({X}_{1}){\mathbb{P}}^{\prime }({X}_{2})\dots {\mathbb{P}}^{\prime }({X}_{n+1})}{{\mathbb{P}}^{\prime }({X}_{1}){\mathbb{P}}^{\prime }({X}_{2})\dots {\mathbb{P}}^{\prime }({X}_{n+1})}{\delta }_{{L}_{n}^{(2)},T/n}\hfill \end{align} \tag{ 12 }$

$\begin{align}\hfill & \hfill \qquad \quad =\sum\limits _{{X}_{1},{X}_{2},\dots ,{X}_{n+1}}\frac{\mathbb{P}({X}_{1})}{{\mathbb{P}}^{\prime }({X}_{1})}\,{\mathrm{e}}^{\sum\limits _{\ell =1}^{n}\left[\mathrm{ln}{{\Pi}}_{{X}_{\ell },{X}_{\ell +1}}-\mathrm{ln}{\zeta }_{{X}_{\ell +1}}\right]}\\ \hfill & \quad \qquad \quad \times {\mathbb{P}}^{\prime }({X}_{1}){\mathbb{P}}^{\prime }({X}_{2})\dots {\mathbb{P}}^{\prime }({X}_{n+1}){\delta }_{{L}_{n}^{(2)},T/n}.\hfill \end{align} \tag{ 13 }$

The derivation continues by observing, in the exponential function, the equality

$\begin{equation}\sum\limits _{\ell =1}^{n}\left[\mathrm{ln}\,{{\Pi}}_{{X}_{\ell },{X}_{\ell +1}}-\mathrm{ln}\,{\zeta }_{{X}_{\ell +1}}\right]=n\sum\limits _{i,j=1}^{N}{L}_{n}^{(2)}(i,j)\left[\mathrm{ln}\,{{\Pi}}_{ij}-\mathrm{ln}\,{\zeta }_{j}\right],\end{equation} \tag{ 14 }$

obtained by using the definition of the pair empirical measure (7). Hence, we get

$\begin{align}\hfill \mathbb{P}\left({L}_{n}^{(2)}=\frac{T}{n}\right)& =\sum\limits _{{X}_{1},{X}_{2},\dots ,{X}_{n+1}}\frac{\mathbb{P}({X}_{1})}{{\mathbb{P}}^{\prime }({X}_{1})}\,{\mathrm{e}}^{n\sum\limits _{i,j=1}^{N}{L}_{n}^{(2)}(i,j)\left[\mathrm{ln}{{\Pi}}_{ij}-\mathrm{ln}{\zeta }_{j}\right]}{\mathbb{P}}^{\prime }({X}_{1})\hfill \\ \hfill & \quad \times {\mathbb{P}}^{\prime }({X}_{2})\dots {\mathbb{P}}^{\prime }({X}_{n+1}){\delta }_{{L}_{n}^{(2)},T/n}.\hfill \end{align} \tag{ 15 }$

Eventually, by taking minus the logarithm of the probability $\mathbb{P}$ , dividing by n, and taking the limit n → ∞ we get

$\begin{align}-\underset{n\to \infty }{\mathrm{lim}}\,\frac{1}{n}\,\mathrm{ln}\,\mathbb{P}\left({L}_{n}^{(2)}=\nu \right)\hfill & =\sum\limits _{i,j=1}^{N}{\nu }_{ij}\,\mathrm{ln}\,\frac{{\zeta }_{j}}{{{\Pi}}_{ij}}\hfill \\\ & \quad +\underset{n\to \mathrm{\infty }}{\mathrm{l}\mathrm{i}\mathrm{m}}\;\frac{1}{n}\;\mathrm{l}\mathrm{n}\left(\sum\limits _{{X}_{1},{X}_{2},\dots ,{X}_{n+1}}\frac{\mathbb{P}({X}_{1})}{{\mathbb{P}}^{\mathrm{\prime }}({X}_{1})}{\mathbb{P}}^{\mathrm{\prime }}({X}_{1}){\mathbb{P}}^{\mathrm{\prime }}({X}_{2})\dots {\mathbb{P}}^{\mathrm{\prime }}({X}_{n+1}){\delta }_{{L}_{n}^{(2)},\nu }\right)\\ & =\sum\limits _{i,j=1}^{N}{\nu }_{ij}\left(\mathrm{l}\mathrm{n}\;\frac{{\zeta }_{j}}{{{\Pi}}_{ij}}-\mathrm{l}\mathrm{n}\left(\frac{{\nu }_{ij}}{{\mu }_{i}{\zeta }_{j}}\right)\right),\end{align} \tag{ 16 }$

where the matrix ν is defined as

$\begin{equation}\nu =\frac{T}{n}.\end{equation} \tag{ 17 }$

In the derivation of (16), we make use of the fact that ${L}_{n}^{(2)}\to \nu$ , and also that for the probability ${\mathbb{P}}^{\prime }({L}_{n}^{(2)}=\nu )$ a large deviation principle holds with rate functional (11). The last formula obtained in (16) is exactly (7). We remark that it is only because of the long-time limit that we can get rid of the boundary term $\mathbb{P}({X}_{1})/{\mathbb{P}}^{\prime }({X}_{1})$ in (16) and thus get the form of the rate functional for the pair empirical occupation measure of the Markov process X. We also notice that, although extremely useful, the use of an i.i.d. process with its pair empirical rate functional is not strictly necessary for the purpose of the proof. Indeed, if the asymptotics of the pair empirical probability of another process were known and easy to handle, we could have tilted the path probability measure of the Markov process in (12) with respect to it and we would have obtained the same result. For further details on this and on how to best use the tilting method we refer to [1].

The derivation presented in this section makes use of methods that are well known in the large deviation community. Nevertheless, for a more rigorous proof of the large deviation principle for the pair empirical measure (4) having rate functional (7)—which focuses on lower and upper bounds over closed and open sets—we refer the reader to [16, 18, 38].

The derivation presented in this section takes into consideration only leading order terms in n and, furthermore, lacks some physical interpretations of the form of the rate functional (7). The finite n behaviour, captured by subleading terms in (6), is in general much harder to study than the large deviations regime. For the continuous-time setting, in [39] the authors use matrix product states to study finite-time large fluctuations of one-dimensional lattice models. For discrete Markov chains, estimates and bounds for subleading terms in (6) are known in the literature [40, 41], and derived by using spectral methods. In [35] the author proposes a characterisation of subleading terms using graph-combinatorial arguments. Using a similar approach as [35], we provide an exact formula for the moment generating function valid for any finite n, a first step towards an alternative derivation of the subleading terms in (6).

3. Graph-combinatorial approach

In this section, we present an alternative derivation of the rate functional associated with the pair empirical occupation measure in (4). The proposed derivation moves the focus from the probability distribution $\mathbb{P}$ and rate functional H in (4) to the moment generating function Z_N,n and SCGF

$\begin{equation}{\lambda }_{N}[s]{:=}\underset{n\to \infty }{\mathrm{lim}}\,\frac{1}{n}\,\mathrm{ln}\,{Z}_{N,n}[s]=\underset{n\to \infty }{\mathrm{lim}}\,\frac{1}{n}\,\mathrm{ln}\,\mathbb{E}\left[{\mathrm{e}}^{ns\cdot {L}_{n}^{(2)}}\right],\end{equation} \tag{ 18 }$

where, with abuse of notation with respect to (3), $s={\left({s}_{ij}\right)}_{i,j=1}^{N}$ is now a set of Lagrange parameters. This paradigm shift is equivalent to a change of ensemble in statistical mechanics [1]. Instead of working with the probability distribution $\mathbb{P}({L}_{n}^{(2)}=T/n)$ at a fixed t, we introduce Lagrange parameters s_ijs that fix the t_ijs only on average, and thus work with a moment generating function. The equilibrium statistical mechanics analogue would be a change from the microcanonical ensemble, where the energy is fixed, to the canonical ensemble, where only the average energy is fixed by the Lagrange parameter β, the inverse temperature.

In this canonical framework, thanks to Markovianity and ergodicity, it is known [17, 18] that we can map the large deviation problem to a spectral one. This is because the SCGF can be calculated as the logarithm of the dominant eigenvalue of the so-called tilted matrix ${{\Pi}}_{s}={\left({{\Pi}}_{s}\right)}_{ij}\enspace \forall i,j\in {\Gamma}$ , which has the form

$\begin{equation}{\left({{\Pi}}_{s}\right)}_{ij}={{\Pi}}_{ij}\,{\mathrm{e}}^{{s}_{ij}}.\end{equation} \tag{ 19 }$

Noticeably, thanks to a graph-combinatorial mapping [33, 34], we can derive an exact expression for the moment generating function Z_N,n at finite N and n. In principle, the exact form Z_N,n allows one to evaluate sub-leading terms (in n) that cannot be calculated within a purely large deviation approach as that of section 2. Historically, graph-combinatorial arguments similar to those used in this work have been proposed for cyclic Markov chains by Dawson and Good in [33], and later on extended for general Markovian paths by Goodman in [34]. The derivation that follows explains in the details, with a theoretical-physics approach, a similar graph-combinatorial calculation but moves the focus onto the moment generating function Z_N,n of the pair empirical occupation measure. This allows us to naturally give a physical interpretation of the interaction and entropic terms in the SCGF λ_N (18).

3.1. An alternative expression for the moment generating function

The graph-combinatorial approach is based on the representation of the state space connectivity as a graph G with associated adjacency matrix A—see figure 2(a). This has elements A_ij = 1 if state j can directly be reached from i, and 0 otherwise. In this context, we will refer to states also as nodes or vertices. The transition matrix Π of the Markov chain X, in turn, embeds in its elements the connectivity of the state space as Π_ij = A_ij p_ij with p_ij the jump probability between i and j.

The moment generating function of the probability $\mathbb{P}({L}_{n}^{(2)}=T/n)$ is

$\begin{equation}{Z}_{N,n}(s)=\sum\limits _{{X}_{1},\dots ,{X}_{n+1}}\mathbb{P}({X}_{1})\prod\limits _{\ell =1}^{n}{{\Pi}}_{{X}_{\ell },{X}_{\ell +1}}\,{\mathrm{e}}^{\sum\limits _{ij}\,{s}_{ij}{\delta }_{{X}_{\ell },\,i}{\delta }_{{X}_{\ell +1},j}},\end{equation} \tag{ 20 }$

where $\mathbb{P}({X}_{1})$ indicates the probability distribution of our process at initial time n = 1 and $s={({s}_{ij})}_{i,j=1}^{N}$ indicates the set of tilting parameters.

The specific form of the distribution $\mathbb{P}({X}_{1})$ will play a role for finite time behaviour or sub-leading asymptotics, but it will not matter in the large deviation regime—it only amounts to a boundary term—provided that the graph G is strongly connected. For convenience, we choose $\mathbb{P}({X}_{1})={\delta }_{{X}_{1},1}$ , viz the starting node is fixed to be node 1.

The core idea of our work is to perform a change of variables: we transform the sum over all states X_ℓ, for ℓ ∈ {1, ..., n + 1}, to a sum over variables t_ij ∈ {0, ..., n}, for nodes i, j ∈ G. The new variable t_ij, as in the previous section 2, is the number of times the Markov chain jumps from state i to state j, in particular t_ij can be different from zero only if there is an edge in G between nodes i and j.

For a matrix T to represent the number of jumps of a chain of states (X₁, X₂, ..., X_n+1) the following constraints must be satisfied: (i) the total number of jumps is equal to the total length of the chain minus one, ∑_ij t_ij = n, (ii) all jumps can be temporally arranged like domino tiles (1, X₂), (X₂, X₃), ..., (X_n, X_n+1) reflecting the fact that if at time ℓ the Markov chain jumps to state i, then at time ℓ + 1 it has to start from state i. Constraints (i) and (ii) do not make the change of variables one to one—there can be many instances of the Markov chain that correspond to the same set of t_ijs. In fact, the variables t_ij do not carry any information regarding the temporal order of the jumps. In other words, given an instance of T we have to count in how many ways we can order the jumps as (1, X₂), (X₂, X₃), ..., (X_n, X_n+1) so that ${\sum }_{\ell =1}^{n}{\delta }_{{X}_{\ell },i}{\delta }_{{X}_{\ell +1},j}={t}_{ij}$ and {X₁, X₂, ..., X_n+1} realises a walk in G: we call this number Θ_T. Hence, we can express Z_N,n as

$\begin{equation}{Z}_{N,n}(s)=\sum\limits _{{t}_{11}=0}^{n}\dots \sum\limits _{{t}_{ij}=0}^{n}\dots \sum\limits _{{t}_{NN}=0}^{n}{\delta }_{\sum\limits _{ij}{t}_{ij},n}{{\Theta}}_{T}\prod\limits _{i,j}\left({{\Pi}}_{ij}^{{t}_{ij}}\,{\mathrm{e}}^{{s}_{ij}{t}_{ij}}\right).\end{equation} \tag{ 21 }$

We now face the problem of computing Θ_T. We notice that for many instances of T, this number is simply zero: this is because the aforementioned domino-like constraint (ii) imposes stringent conditions on the form of T. First of all, the set of edges (i, j), for which t_ij > 0, together with the union of all their extremes i and j must form a connected graph. This is because the Markov chain starting from a node i can only hop to neighbours of i according to the connectivity of G. Mathematically, this condition is equivalent to requiring that the dimension of the kernel of the Laplacian L = D_in − T is 1 [42], where D_in is a diagonal matrix with elements ${({D}_{\text{in}})}_{ii}={\sum }_{j}{t}_{ji}$ . Second, the number of times a Markov chain jumps towards a state i have to be related to the number of jumps starting from that state i, a phenomenon analogous to the Kirchhoff law in electric circuits that encodes the global balance of the dynamics. We hereby distinguish two possible scenarios in which these conditions on T are satisfied. In the first one, for every state the incoming flux and outgoing flux are equal, that is ∑_j t_ij = ∑_j t_ji: this situation corresponds to a Markov chain starting and ending in the same node, and we will refer to this as the cycle scenario. In the second one, for all but two states the incoming and outgoing fluxes are equal. The two special states are the initial, that we set to 1 choosing $\mathbb{P}({X}_{1})={\delta }_{{X}_{1},1}$ , and the final, F, for which one must have ∑_j t_1j = 1 + ∑_j t_j1 and 1 + ∑_j t_Fj = ∑_j t_jF: we will refer to this as the path scenario. This leads to a natural way to express

$\begin{align}\hfill {{\Theta}}_{T}& ={{\Theta}}_{T}^{\text{path}}\sum\limits _{F\ne 1}\left(\prod\limits _{i\ne 1,F}^{N}{\delta }_{\sum\limits _{j=1}^{N}{t}_{ij},\,\sum\limits _{j=1}^{N}{t}_{ji}}\right){\delta }_{\sum\limits _{j=1}^{N}{t}_{1j}+1,\sum\limits _{j=1}^{N}{t}_{j1}}{\delta }_{\sum\limits _{j=1}^{N}{t}_{Fj},\,\sum\limits _{j=1}^{N}{t}_{jF}+1}\hfill \\ \hfill & \quad +{{\Theta}}_{T}^{\text{cycle}}\prod\limits _{i=1}^{N}{\delta }_{\sum\limits _{j=1}^{N}{t}_{ij},\,\sum\limits _{j=1}^{N}{t}_{ji}}\left(1-{\delta }_{\sum\limits _{j=1}^{N}{t}_{1j},0}\right),\hfill \end{align} \tag{ 22 }$

where ${{\Theta}}_{T}^{\text{path}}$ $({{\Theta}}_{T}^{\text{cycle}})$ is the number of distinct permutations of the set of t_ijs in the path (cycle) scenario which give a realisation of a walk in G and the deltas enforce Kirchhoff law. The factor $1-{\delta }_{{\sum }_{j=1}^{N}{t}_{1j},0}$ ensures that the cycle will pass at least once from node 1: this condition is required because the starting node is node 1. We will show that it is not necessary to enforce connectedness explicitly because the expressions for ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$ are automatically zero when T is not connected.

We now note that we can interpret the matrix T as the adjacency matrix of a directed multi-graph M_T with t_ij directed links, having unitary weight, between nodes i and node j—see figure 2(b). A directed multi-graph is a collection of nodes and directed links, in which multiple links between two nodes are permitted. We refer to the collection of links between two nodes as a multi-link. As a preliminary step in the computation of ${{\Theta}}_{T}^{\text{path}}$ $({{\Theta}}_{T}^{\text{cycle}})$ , we consider a related combinatorial problem, that is counting how many paths there are on M_T that start in 1 and end in F (cycles that start in 1) and pass through every link exactly once. We can interpret this as the number of non-distinct ways we can arrange the jumps like domino tiles that respect the matrix of jumps T. This number overestimates ${{\Theta}}_{T}^{\text{path}}$ (respectively ${{\Theta}}_{T}^{\text{cycle}}$ ). To see this, we can consider a multi-link in M_T having at least two links l₁ and l₂. Given a path (or cycle) that passes through every link in M_T, we can, for instance, generate another distinct one by swapping the order in which we visit l₁ and l₂. This new path (cycle) will not contribute to ${{\Theta}}_{T}^{\text{path}}$ $({{\Theta}}_{T}^{\text{cycle}})$ , as the time-ordered jumps (1, X₂), (X₂, X₃), ..., (X_n, X_n+1) are unaffected by the swap. Nonetheless, this calculation is a useful starting point as we can compute this number using results available in the literature [43], and we will show how to correct this overcounting later on.

3.2. Computation of ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$

So far we have described the key steps that underlie our approach, which are also summarised in the flowchart in figure 1. In the following instead we will provide the details of the calculation. For this reason, some definitions will be useful and we collect them in this paragraph. An Eulerian multi-graph is a multi-graph for which, at every node i, in-degree and out-degree are the same, viz ${k}_{i}^{\mathrm{i}\mathrm{n}}={k}_{i}^{\text{out}}$ . Noticing that in M_T we have ${k}_{i}^{\mathrm{i}\mathrm{n}}={\sum }_{j}{t}_{ji}$ and ${k}_{i}^{\text{out}}={\sum }_{j}{t}_{ij}$ , it follows from Kirchhoff law that M_T is either an Eulerian multi-graph (for the cycle scenario) or close to an Eulerian multi-graph (for the path scenario), in the sense that only the initial and final nodes do not satisfy k_in = k_out. An Eulerian cycle (path) on a multi-graph M_T, as already mentioned in section 2, is a cycle (path) that passes through every link exactly once. We denote the number of Eulerian cycles (paths) with ec(M_T) (ep(M_T)). Furthermore, we will indicate by ec(M_T|χ) (ep(M_T|χ)) the number of Eulerian cycles (paths) given some specified condition χ, that in our case will be a combination of the starting node 1, the final node F, the starting edge e₁ and the final edge e_F.

In the literature on the topic, a result is known for the number ec(M_T|e₁) of Eulerian cycles of an Eulerian multi-graph M_T with a fixed starting edge. This goes by the name of BEST theorem [43–45] and reads

$\begin{equation}ec({\mathbf{M}}_{T}\vert {e}_{1})={{\Omega}}_{w}({\mathbf{M}}_{T})\prod\limits _{i=1}^{N}({k}_{i}^{\mathrm{i}\mathrm{n}}-1)!,\end{equation} \tag{ 23 }$

where Ω_w(M_T) is the number of arborescences, i.e., spanning trees rooted in a node w such that there exists a unique path from every vertex of M_T to w. We note that Ω_w does not depend on the choice of root w when M_T is an Eulerian multi-graph, so that Ω_w(M_T) = Ω(M_T) [44, 46]. Similarly, the rhs of (23) does not show any explicit dependence on the starting edge e₁ because of the inherent symmetry in M_T. An explicit expression for Ω(M_T) is given by

$\begin{equation}{\Omega}({\mathbf{M}}_{T})=\mathrm{d}\mathrm{e}\mathrm{t}({L}_{w}),\end{equation} \tag{ 24 }$

where det is the determinant operator and L_w is a submatrix of the Laplacian of the Eulerian multi-graph M_T obtained by removing (any) wth row and column, a result known in the literature as Tutte's theorem or Matrix tree theorem.

In the following, we first consider the path scenario. In this case, the multi-graph M_T is not Eulerian, but we can make it so simply by adding a link e_F from F to 1. We refer to this modified graph as ${\tilde{\mathbf{M}}}_{T}$ . Using BEST theorem we have

$\begin{equation}ec({\tilde{\mathbf{M}}}_{T}\vert {e}_{1})={\Omega}({\tilde{\mathbf{M}}}_{T})\prod\limits _{i\ne 1}^{N}\left(\sum\limits _{j=1}^{N}{t}_{ji}-1\right)!\left(\sum\limits _{j=1}^{N}{t}_{j\,1}\right),\end{equation} \tag{ 25 }$

where we use the fact that the in-degree of node 1 is ${\sum }_{j=1}^{N}{t}_{j1}+1$ in ${\tilde{\mathbf{M}}}_{T}$ . The number of Eulerian cycles starting from node 1 is related to $ec({\tilde{\mathbf{M}}}_{T}\vert {e}_{1})$ by $ec({\tilde{\mathbf{M}}}_{T}\vert 1)=ec({\tilde{\mathbf{M}}}_{T}\vert {e}_{1})\left({\sum }_{j}\,{t}_{j1}+1\right)$ . Furthermore, ep(M_T|1, F) is equal to the number of Eulerian cycles in ${\tilde{\mathbf{M}}}_{T}$ starting in 1 and ending with the link we added to construct it, viz $ec({\tilde{\mathbf{M}}}_{T}\vert 1,{e}_{F})$ . This number can be computed by considering an Eulerian cycle in M_T as a collection of loops passing through node 1. The number of these loops is given by the in-degree of node 1, so that we have $ep({\mathbf{M}}_{T}\vert 1,F)=ec({\tilde{\mathbf{M}}}_{T}\vert 1)/({\sum }_{j}{t}_{j1}+1)$ . All these considerations put together give

$\begin{equation}ep({\mathbf{M}}_{T}\vert 1,F)={{\Omega}}_{1}({\mathbf{M}}_{T})\prod\limits _{i\ne 1}^{N}\left(\sum\limits _{j=1}^{N}{t}_{ji}-1\right)!\left(\sum\limits _{j=1}^{N}{t}_{j\,1}\right),\end{equation} \tag{ 26 }$

where we used ${\Omega}({\tilde{\mathbf{M}}}_{T})=\mathrm{det}({L}_{1})={{\Omega}}_{1}({\mathbf{M}}_{T})$ with L₁ the cofactor of the graph Laplacian L obtained by removing the first row and column. We note that while ${\Omega}({\tilde{\mathbf{M}}}_{T})$ does not depend on the choice of the root since ${\tilde{\mathbf{M}}}_{T}$ is Eulerian, Ω₁(M_T) does, because M_T is not Eulerian.

We now consider the cycle scenario. In this case, since M_T is already an Eulerian graph, we can readily express ec(M_T, 1) as

$\begin{equation}ec({\mathbf{M}}_{T}\vert 1)=ec({\mathbf{M}}_{T}\vert {e}_{1})\sum\limits _{j=1}^{N}{t}_{j\,1}={\Omega}({\mathbf{M}}_{T})\prod\limits _{i\ne 1}^{N}\left(\sum\limits _{j=1}^{N}{t}_{ji}-1\right)!\left(\sum\limits _{j=1}^{N}{t}_{j\,1}\right).\end{equation} \tag{ 27 }$

As previously argued, ep(M_T|1, F) (ec(M_T|1)) overestimates ${{\Theta}}_{T}^{\text{path}}$ $({{\Theta}}_{T}^{\text{cycle}})$ . To correct this, one must consider all the links belonging to a given multi-link as totally equivalent. This boils down to considering a weighted graph W_T (see figure 2(c)) in place of the multi-graph M_T. The weighted graph W_T has adjacency matrix T and directed links (e.g., between nodes i and j) obtained by merging all the multi-links (between i and j) in M_T together. In analogy with [45], we define the notion of T-Eulerian cycle (path) as a cycle (path) that passes through every link (i, j) a number t_ij of times. With an abuse of notation, we denote the number of T-Eulerian cycles (paths) by ec(W_T|χ) (ep(W_T|χ)), as it will be clear by the graph we are considering whether we are referring to Eulerian or T-Eulerian cycles (paths). Crucially, in the cycle scenario ${{\Theta}}_{T}^{\text{cycle}}$ is equal to the number of T-Eulerian cycles starting from node 1, i.e., ec(W_T|1) in W_T, whereas in the path scenario ${{\Theta}}_{T}^{\text{path}}$ is equal to the number of T-Eulerian paths from 1 to F, i.e., ep(W_T|1, F) in W_T. The combinatorial factor connecting ep(W_T|1, F) (ec(W_T|1)) to ep(M_T|1, F) (ec(M_T|1)) is simply the number of permutations of links in a multi-link for every multi-link in M_T

$\begin{equation}\begin{array}{cccc}ep({\mathbf{M}}_{T}\vert 1,F)& =ep({\mathbf{W}}_{T}\vert 1,F)\prod\limits _{i,j=1}^{N}{t}_{ij}!\\ ec({\mathbf{M}}_{T}\vert 1)& =ec({\mathbf{W}}_{T}\vert 1)\prod\limits _{i,j=1}^{N}{t}_{ij}!.\end{array}\end{equation} \tag{ 28 }$

**Figure 2.** (a) State-space connectivity G, un-directed and un-weighted, with adjacency matrix A; (b) directed un-weighted multi-graph M_T, with adjacency matrix T, a multi-link from i to j is composed by t_ij links; (c) directed weighted graph W_T, with adjacency matrix T, the boldness of links is proportional to the integer weights t_ij.
Download figure:
Standard image High-resolution image

This allows us to write explicit expressions for ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$

$\begin{equation}{{\Theta}}_{T}^{\text{path}}={{\Theta}}_{T}^{\text{cycle}}=\mathrm{det}({L}_{1})\prod\limits _{i=1}^{N}\frac{\left({\sum }_{j=1}^{N}{t}_{ji}-1\right)!}{{\prod }_{j=1}^{N}{t}_{ij}!}\sum\limits _{k=1}^{N}{t}_{k1},\end{equation} \tag{ 29 }$

where we recall that L₁ is—in both the cycle and the path scenarios—the submatrix of the graph Laplacian L obtained by removing the first row and column. We remark that, although the expressions for ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$ in (29) are formally the same, the variable T is of different nature in the path and cycle scenario as it satisfies different sets of constraints. With this expression, we can write the moment generating function explicitly as

$\begin{equation}\begin{aligned}\hfill {Z}_{N,n}(s)& =\sum\limits _{{t}_{11}=0}^{n}\dots \sum\limits _{{t}_{ij}=0}^{n}\dots \sum\limits _{{t}_{NN}=0}^{n}{\delta }_{{\sum }_{ij}\,{t}_{ij},n}\prod\limits _{i,j}\left({{\Pi}}_{ij}^{{t}_{ij}}\,{\mathrm{e}}^{{s}_{ij}{t}_{ij}}\right)\mathrm{det}({L}_{1})\hfill \\ \hfill & \quad \times \sum\limits _{j=1}^{N}{t}_{j1}\prod\limits _{i=1}^{N}\frac{\left({\sum }_{j=1}^{N}{t}_{ji}-1\right)!}{{\prod }_{j=1}^{N}{t}_{ij}!}\left(\sum\limits _{F\ne 1}\left(\prod\limits _{i\ne 1,F}^{N}{\delta }_{\sum\limits _{j=1}^{N}{t}_{ij},\sum\limits _{j=1}^{N}{t}_{ji}}\right)\right.\hfill \\ \hfill & \left.\quad \times {\delta }_{\sum\limits _{j=1}^{N}{t}_{1j}+1,\sum\limits _{j=1}^{N}{t}_{j1}}{\delta }_{\sum\limits _{j=1}^{N}{t}_{Fj},\sum\limits _{j=1}^{N}{t}_{jF}+1}\right.\hfill \\ \hfill & \quad \left.+\left(\prod\limits _{i=1}^{N}{\delta }_{\sum\limits _{j=1}^{N}{t}_{ij},\,\sum\limits _{j=1}^{N}{t}_{ji}}\right)\left(1-{\delta }_{{\sum }_{j}\,{t}_{1j},0}\right)\right).\hfill \end{aligned}\end{equation} \tag{ 30 }$

We note that the factor det(L₁) kills configurations of T that have the null-space dimension of the Laplacian greater than 1. This ensures that we only consider graphs M_T—equivalently, W_T—that are connected, as it is known in the literature that the dimension of the null-space of the graph Laplacian is the number of connected components of a graph [42]. Remarkably, in equation (30) the contributions for paths and cycles are split, giving an interesting physical perspective. In general, this difference is more pronounced when n is small, in particular when the walker has not explored the full state space. In the limit of large n, contributions relative to paths and cycles are comparable and share the same asymptotics, as we show in the next section.

Compared to the spectral method to compute the moment generating function [17], which requires the computation of all eigenvalues and eigenvectors of an N × N matrix, our formula is computationally favourable when n is small and N is large. If n is large, instead, the spectral method is numerically more efficient.

3.3. Long-time asymptotics

Expression (30) is valid for every finite n, and can be used to derive the large n limit and, in principle, finite n corrections. In the following, we focus on the large deviation regime, which corresponds to taking n to be much greater than the longest relaxation time of the system τ(N), n ≫ τ(N). In this limit it is useful to rescale time-additive variables with n as in (17), as we can approximate the sums over t₁₁, ..., t_NN with integrals

$\begin{equation}\frac{1}{{n}^{\vert {E}_{{\mathbf{W}}_{T}}\vert }}\sum\limits _{{t}_{11}}\dots \sum\limits _{{t}_{NN}}\to \prod\limits _{i,j}{\int }_{0}^{1}\phantom{\rule{0ex}{0ex}}\mathrm{d}{\nu }_{ij},\end{equation} \tag{ 31 }$

where $\vert {E}_{{\mathbf{W}}_{T}}\vert$ is the number of directed edges in the weighted graph W_T and ν_ijs are defined as in (17). In the rhs of (31) and in the following, by ∑_ij and ∏_ij we mean sums and products over (i, j) such that (i, j) is a directed link in W_T. To leading order in n we obtain the following asymptotic expressions

$\begin{equation}\prod\limits _{i,j}\left({{\Pi}}_{ij}^{{t}_{ij}}\,{\mathrm{e}}^{{s}_{ij}{t}_{ij}}\right)\to {\mathrm{e}}^{n\sum\limits _{i=1}^{N}\sum\limits _{j=1}^{N}\left({s}_{ij}+\mathrm{log}{{\Pi}}_{ij}\right){\nu }_{ij}}\end{equation} \tag{ 32 }$

$\begin{align}\hfill & \quad \left(\sum\limits _{j=1}^{N}{t}_{j1}\right)\left(\prod\limits _{i=1}^{N}\frac{\left({\sum }_{j=1}^{N}{t}_{ji}-1\right)!}{{\prod }_{j=1}^{N}{t}_{ij}!}\right)\to {\mathrm{e}}^{n\sum\limits _{i=1}^{N}\sum\limits _{j=1}^{N}{\nu }_{ij}\left(\mathrm{log}\left(\sum\limits _{k=1}^{N}{\nu }_{ik}\right)-\mathrm{log}({\nu }_{ij})\right)}\hfill \end{align} \tag{ 33 }$

$\begin{equation}{\delta }_{{\sum }_{ij}{t}_{ij},n}\to {\delta }_{{\sum }_{ij}{\nu }_{ij},1}\end{equation} \tag{ 34 }$

$\begin{equation}{n}^{\vert {E}_{{\mathbf{W}}_{T}}\vert }\to {\mathrm{e}}^{\vert {E}_{{\mathbf{W}}_{T}}\vert \mathrm{l}\mathrm{o}\mathrm{g}n}.\end{equation} \tag{ 35 }$

The Kirchhoff constraints tend to the same form for large n, giving explicitly

$\begin{align}\hfill & \quad \sum\limits _{F\ne 1}\left(\prod\limits _{i\ne 1,F}{\delta }_{\sum\limits _{j=1}^{N}{t}_{ji},\sum\limits _{j=1}^{N}{t}_{ij}}\right){\delta }_{\sum\limits _{j=1}^{N}{t}_{j1}+1,\sum\limits _{j=1}^{N}{t}_{1j}}{\delta }_{\sum\limits _{j=1}^{N}{t}_{jF},1+\sum\limits _{j=1}^{N}{t}_{Fj}}\hfill \\ \hfill & \to (N-1)\prod\limits _{i=1}^{N}{\delta }_{\sum\limits _{j=1}^{N}{\nu }_{ji},\sum\limits _{j=1}^{N}{\nu }_{ij}}\hfill \end{align} \tag{ 36 }$

$\begin{equation}\quad \prod\limits _{i=1}^{N}{\delta }_{\sum\limits _{j=1}^{N}{t}_{ji},\sum\limits _{j=1}^{N}{t}_{ij}}\to \prod\limits _{i=1}^{N}{\delta }_{\sum\limits _{j=1}^{N}{\nu }_{ji},\sum\limits _{j=1}^{N}{\nu }_{ij}}.\end{equation} \tag{ 37 }$

We also notice that

$\begin{equation}\mathrm{det}({L}_{1})={n}^{N-1}\,\mathrm{det}\left(\frac{{L}_{1}}{n}\right)={\mathrm{e}}^{(N-1)\mathrm{log}n+\mathrm{Tr}\left[\mathrm{log}\left(\frac{{L}_{1}}{n}\right)\right]}\to {\mathrm{e}}^{(N-1)\mathrm{log}n},\end{equation} \tag{ 38 }$

where we use the fact that the determinant is multi-linear in the rows and that each element in L₁ is proportional to n by construction, so that $\mathrm{Tr}\left[\mathrm{log}\left(\frac{{L}_{1}}{n}\right)\right]$ is finite for large n. Remarkably det(L₁) becomes sub-leading in the large n limit, while it may be an interesting term to study the finite-time transient behaviour of the Markov chain. Finally, the term ${\delta }_{{\sum }_{j}\,{t}_{1j},0}$ present in the factor $1-{\delta }_{{\sum }_{j}\,{t}_{1j},0}$ becomes negligible for large n.

Putting all together, we obtain to exponential leading order in n

$\begin{align}{Z}_{N,n}(s)& \approx {\int }_{0}^{1}\dots {\int }_{0}^{1}\phantom{\rule{0ex}{0ex}}\left(\prod\limits _{i,j}\mathrm{d}{\nu }_{ij}\right){\mathrm{e}}^{n[{\sum }_{ij}\;{\nu }_{ij}(\mathrm{l}\mathrm{o}\mathrm{g}({\sum }_{k}{\nu }_{ik})-\mathrm{l}\mathrm{o}\mathrm{g}({\nu }_{ij}))+{\sum }_{ij}\;({s}_{ij}+\mathrm{l}\mathrm{o}\mathrm{g}{{\Pi}}_{ij})\;{\nu }_{ij}]}\\ & \quad \times \left(\prod\limits _{i}{\delta }_{\sum\limits _{j}{\nu }_{ji},\;\sum\limits _{j}{\nu }_{ij}}\right){\delta }_{{\sum }_{ij}{\nu }_{ij},1}.\end{align} \tag{ 39 }$

We note that the integrand in (39) can be brought to the form ${\mathrm{e}}^{n{\lambda }_{N}[\nu ]}$ , with the following definitions:

$\begin{equation}{\lambda }_{N}[\nu ]={\lambda }_{1}[\nu ]+{\lambda }_{2}[\nu ]+{\lambda }_{3}[\nu ]+{\lambda }_{4}[\nu ]\end{equation} \tag{ 40 }$

$\begin{equation}{\lambda }_{1}[\nu ]=\sum\limits _{i=1}^{N}\sum\limits _{j=1}^{N}{\nu }_{ij}\left(\mathrm{log}\left(\sum\limits _{k=1}^{N}{\nu }_{ik}\right)-\mathrm{log}({\nu }_{ij})\right)\end{equation} \tag{ 41 }$

$\begin{equation}{\lambda }_{2}[\nu ]=\sum\limits _{i=1}^{N}\sum\limits _{j=1}^{N}\mathrm{log}({{\Pi}}_{ij})\,{\nu }_{ij}\end{equation} \tag{ 42 }$

$\begin{equation}{\lambda }_{3}[\nu ]=\sum\limits _{i=1}^{N}\sum\limits _{j=1}^{N}{s}_{ij}\,{\nu }_{ij}\end{equation} \tag{ 43 }$

$\begin{equation}{\lambda }_{4}[\nu ]={\epsilon}\left(\sum\limits _{i=1}^{N}\sum\limits _{j=1}^{N}{\nu }_{ij}-1\right)+\sum\limits _{i=1}^{N}{\eta }_{i}\left(\sum\limits _{j=1}^{N}{\nu }_{ij}-\sum\limits _{j=1}^{N}{\nu }_{ji}\right),\end{equation} \tag{ 44 }$

where and η_i are Lagrange multipliers fixing the respective constraints. In (40) each term has a clear physical interpretation: λ₁, in (41), is the geometric—viz related to the connectivity of the graph G—entropy of a random walk on a graph with nodes and links contributions, akin to the entropy of a free particle; λ₂, in (42), is the entropy due to the dynamics, encoded in the transition matrix; λ₃, in (43), is the tilting potential necessary to drive the system towards a fluctuation of the pair empirical occupation measure; finally, λ₄ in (44), enforces the normalisation and Kirchhoff-law (global balance).

We can calculate the leading order in n of (39) via a saddle-point approximation, arriving at

$\begin{equation}{Z}_{N,n}(s)\approx {\mathrm{e}}^{n{\lambda }_{N}[{\nu }^{\ast }]},\end{equation} \tag{ 45 }$

where ν* = argmin_ν,,η λ_N[ν] and ν* are the minimisers of λ_N[ν] with respect to the set of ν_ijs, η_is and . From the Euler–Lagrange equations for critical points of (40), we find the following implicit expression for ${\nu }_{ij}^{\ast }$ :

$\begin{equation}{\nu }_{ij}^{\ast }={({{\Pi}}_{s})}_{ij}\left(\frac{{\text{e}}^{-{\eta }_{j}}}{{\mathrm{e}}^{-{\epsilon}}\;{\text{e}}^{-{\eta }_{i}}}\right)\sum\limits _{k=1}^{N}{\nu }_{jk}^{\ast },\end{equation} \tag{ 46 }$

where the tilted matrix introduced in (19) appears. From (46) we can write self-consistent conditions for and η_i as follows:

$\begin{equation}\sum\limits _{j}{\left({{\Pi}}_{s}\right)}_{ij}\,{\text{e}}^{-{\eta }_{j}}={\mathrm{e}}^{-{\epsilon}}\,{\text{e}}^{-{\eta }_{i}}\end{equation} \tag{ 47 }$

$\begin{equation}\sum\limits _{i}{\left({{\Pi}}_{s}\right)}_{ij}\frac{{\sum }_{k}\,{\nu }_{ik}^{\ast }}{{\text{e}}^{-{\eta }_{i}}}={\mathrm{e}}^{-{\epsilon}}\frac{{\sum }_{k}\,{\nu }_{jk}^{\ast }}{{\text{e}}^{-{\eta }_{j}}},\end{equation} \tag{ 48 }$

which reveal that e⁻ is an eigenvalue of the tilted matrix Π_s with right eigenvector components ${r}_{i}={\text{e}}^{-{\eta }_{i}}$ and left eigenvector components ${l}_{j}={\sum }_{k}{\nu }_{jk}^{\ast }/{\text{e}}^{-{\eta }_{j}}$ . Substituting (47) into (40) we get

$\begin{equation}{\lambda }_{N}[{\nu }^{\ast }]=-{\epsilon},\end{equation} \tag{ 49 }$

and, in particular, since λ_N[ν*] is a maximum, e is the dominant eigenvalue of (19). The same conclusion can be reached by noticing that the left and right eigenvector elements in (47) and (48) are all positive, which is true only for the dominant eigenvalue. These arguments provide a direct link with spectral methods. In particular, (49) provides an expression for the logarithm of the dominant eigenvalue of the tilted matrix.

Remarkably, this approach also provides an alternative expression for the so-called driven (or effective) process. This is a modified Markov chain that explains how specific fluctuations are created in time [1, 20, 36, 37]; under certain conditions, it is equivalent to the original Markov chain conditioned to visiting the fluctuation of interest. Useful spectral and variational expressions of the driven process already appeared in the papers just mentioned. Here, we offer another explicit variational representation valid for discrete-time Markov chains. In agreement with [20], the minimisers ${\nu }^{\ast }=\left\{{\nu }_{ij}^{\ast }\right\}$ of the action functional (40) characterise the driven process transition matrix with components

$\begin{equation}{\tilde{{\Pi}}}_{ij}=\frac{{\nu }_{ij}^{\ast }}{{\sum }_{k=1}^{N}{\nu }_{ik}^{\ast }}.\end{equation} \tag{ 50 }$

This last expression offers an alternative way to physically study and simulate the appearance of fluctuations and rare events in discrete-time Markov chain models.

Concluding, in (40) we have obtained λ_N, the SCGF associated with the probability distribution of the pair empirical occupation measure in (4). To get the rate functional (2) we only need to Legendre–Fenchel transform the SCGF in (40), i.e.,

$\begin{align}\underset{s}{\mathrm{s}\mathrm{u}\mathrm{p}}(\sum\limits _{i=1}^{N}\sum\limits _{j=1}^{N}{s}_{ij}{\nu }_{ij}^{\ast }-{\lambda }_{N}[{\nu }^{\ast }])& =\underset{s}{\mathrm{s}\mathrm{u}\mathrm{p}}({\lambda }_{3}[{\nu }^{\ast }]-{\lambda }_{N}[{\nu }^{\ast }])\\ & =-{\lambda }_{1}[{\nu }^{\ast }]-{\lambda }_{2}[{\nu }^{\ast }]-{\lambda }_{4}[{\nu }^{\ast }]=H[{\nu }^{\ast }],\end{align} \tag{ 51 }$

where in the last step we recognise the pair empirical rate functional (with the necessary constraints—mentioned and understood in (2)–fixed by the Lagrange multipliers in λ₄).

Assuming that one is interested in studying large fluctuations of an observable of the form (1), we remark that the associated SCGF can be obtained simply replacing λ₃[ν] in (43) with

$\begin{equation}{\lambda }_{3}[\nu ]=s\sum\limits _{i,j=1}^{N}f(i,j){\nu }_{ij},\end{equation} \tag{ 52 }$

where s is the tilting parameter conjugated to C_n. For instance, in physics applications, it is often of interest to consider the empirical current ${\mathbb{J}}_{n}(i,j)={L}_{n}^{(2)}(i,j)-{L}_{n}^{(2)}(j,i)$ , viz the antisymmetric part of the pair-empirical occupation measure in (4), or again the occupation measure itself ${L}_{n}(i)={\sum }_{j=1}^{N}{L}_{n}^{(2)}(i,j)$ . The empirical current is an important observable as it allows us to estimate how far a system lies from equilibrium, whereas the occupation measure gives an estimate of the time spent by the system in each state of the state space.

4. Two-state model

In this section, in order give a more pedagogical understanding of how one could use (30) to derive leading, i.e., the SCGF in (40), and finite n behaviour, we compare our method with the more standard spectral approach on a simple two-state Markov chain. We show that the two methods give equivalent results and propose a physical interpretation of all terms appearing in the SCGF. We consider a general two-state Markov chain, whose transition matrix Π reads

$\begin{equation}{\Pi}=\left(\begin{matrix}\hfill 1-p\hfill & \hfill p\hfill \\ \hfill q\hfill & \hfill 1-q\hfill \end{matrix}\right),\end{equation} \tag{ 53 }$

with p and q between 0 and 1. We choose to observe the flux between node 1 and node 2, that is

$\begin{equation}{C}_{n}=\frac{1}{n}\sum\limits _{\ell =1}^{n}{\delta }_{{X}_{\ell },1}{\delta }_{{X}_{\ell +1},2}=\frac{{t}_{12}}{n}.\end{equation} \tag{ 54 }$

The long-time behaviour of C_n is given by ${\mathrm{lim}}_{n\to \infty }\,{C}_{n}=\frac{pq}{p+q}=:{c}^{\ast }$ . Intuitively, when t₁₂ is large, the Markov chain jumps frequently from 1 to 2 and from 2 to 1; instead, when t₁₂ is small the chain spends most of the time jumping from 1 to 1 and/or from 2 to 2. This situation is reminiscent of a particle in a double well potential immersed in a thermal bath, where temperature—that is, the strength of noise—regulates the frequency of jumps between the two minima. In this two-state model, the tilting parameter s plays a role analogous to the temperature.

4.1. Spectral approach

The moment generating function can be computed using spectral methods. We start from (20) (restricted to the case of the observable (54)), i.e.,

$\begin{equation}{Z}_{N,n}(s)=\sum\limits _{{X}_{1},\dots ,{X}_{n+1}}\mathbb{P}({X}_{1})\prod\limits _{\ell =1}^{n}{{\Pi}}_{{X}_{\ell },{X}_{\ell +1}}\;{\mathrm{e}}^{s{\delta }_{{X}_{\ell },1}{\delta }_{{X}_{\ell +1},2}},\end{equation} \tag{ 55 }$

which can be cast in the form

$\begin{equation}{Z}_{N,n}(s)=\langle {\mathbb{P}}_{1}\vert {\left({{\Pi}}_{s}\right)}^{n}\vert 1\rangle ,\end{equation} \tag{ 56 }$

where $\langle {\mathbb{P}}_{1}\vert =(1,0)$ is the vector of initial probabilities, |1⟩ = (1, 1) and Π_s is the tilted matrix, viz (19) restricted to the case at hand, which reads

$\begin{equation}{{\Pi}}_{s}=\left(\begin{matrix}\hfill 1-p\hfill & \hfill p\,{\mathrm{e}}^{s}\hfill \\ \hfill q\hfill & \hfill 1-q\hfill \end{matrix}\right).\end{equation} \tag{ 57 }$

We can use the spectral decomposition of Π_s to get

$\begin{align}{Z}_{2,n}(s)& =\langle {P}_{0}\vert (\vert {r}^{+}\rangle \langle {l}^{+}\vert {{\Lambda}}_{+}^{n}+\vert {r}^{-}\rangle \langle {l}^{-}\vert {{\Lambda}}_{-}^{n})\vert 1\rangle \\ & ={r}_{1}^{+}{{\Lambda}}_{+}^{n}({l}_{1}^{+}+{l}_{2}^{+})+{r}_{1}^{-}{{\Lambda}}_{-}^{n}({l}_{1}^{-}+{l}_{2}^{-}),\end{align} \tag{ 58 }$

where Λ_± are the eigenvalues of Π_s and l^±, r^± the corresponding left and right eigenvectors, respectively. We notice that—for the spectral decomposition of Π_s to be valid—left and right eigenvectors have to be bi-orthonormal.

By computing the eigenvalues and eigenvectors of Π_s explicitly, we arrive at

$\begin{equation}\begin{aligned}\hfill {Z}_{2,n}(s)& =\frac{1}{{2}^{n+1}\sqrt{{(p-q)}^{2}+4pq\,{\mathrm{e}}^{s}}}\hfill \\ \hfill & \quad \times \left\{\left((1-2\,{\mathrm{e}}^{s})p-q\right)\left[{\left(2-p-q-\sqrt{{(p-q)}^{2}+4pq\,{\mathrm{e}}^{s}}\right)}^{n}\right.\right.\hfill \\ \hfill & \left.\left.\quad -{\left(2-p-q+\sqrt{{(p-q)}^{2}+4pq\,{\mathrm{e}}^{s}}\right)}^{n}\right]\right.\hfill \\ \hfill & \left.\quad \times \left[\sqrt{{(p-q)}^{2}+4pq\,{\mathrm{e}}^{s}}{\left(2-p-q-\sqrt{{(p-q)}^{2}+4pq\,{\mathrm{e}}^{s}}\right)}^{n}\right.\right.\hfill \\ \hfill & \left.\left.\quad +{\left(2-p-q+\sqrt{{(p-q)}^{2}+4pq\,{\mathrm{e}}^{s}}\right)}^{n}\right]\right\}.\hfill \end{aligned}\end{equation} \tag{ 59 }$

4.2. Graph-combinatorial approach

The moment generating function can also be computed using (30). Remarkably, this other formulation highlights two different contributions coming from cycles and paths travelled starting from state 1 of the state space. These can explicitly be written as

$\begin{align}{Z}_{2,n}^{\text{cycles}}(s)& ={(1-p)}^{n}+\sum\limits _{{t}_{11}=0}^{n}\sum\limits _{{t}_{12}=1}^{n}\sum\limits _{{t}_{22}=0}^{n}{\delta }_{{t}_{11}+2{t}_{12}+{t}_{22},n}{(pq\;{\mathrm{e}}^{s})}^{{t}_{12}}{(1-p)}^{{t}_{11}}{(1-q)}^{{t}_{22}}\\ & \quad \times \left(\genfrac{}{}{0pt}{}{{t}_{11}+{t}_{12}}{{t}_{11}}\right)\left(\genfrac{}{}{0pt}{}{{t}_{22}+{t}_{12}-1}{{t}_{22}}\right)\end{align} \tag{ 60 }$

$\begin{align}\hfill {Z}_{2,n}^{\text{paths}}(s)& =\sum\limits _{{t}_{11}=0}^{n}\sum\limits _{{t}_{12}=1}^{n}\sum\limits _{{t}_{22}=0}^{n}{\delta }_{{t}_{11}+2{t}_{12}+{t}_{22}-1,n}\frac{{\left(pq\,{\mathrm{e}}^{s}\right)}^{{t}_{12}}}{q}{(1-p)}^{{t}_{11}}{(1-q)}^{{t}_{22}}\hfill \\ \hfill & \times \left(\genfrac{}{}{0pt}{}{{t}_{11}+{t}_{12}-1}{{t}_{11}}\right)\left(\genfrac{}{}{0pt}{}{{t}_{22}+{t}_{12}-1}{{t}_{22}}\right),\qquad \hfill \end{align} \tag{ 61 }$

where in ${Z}_{2,n}^{\text{cycles}}(s)$ we made explicit the cycle contribution coming from staying for n consecutive steps on state 1, and in ${Z}_{2,n}^{\text{paths}}(s)$ the counting needs to start from t₁₂ = 1 because to have a meaningful path contribution the Markov chain needs to hop at least once from state 1 to state 2. We remark that in this simple model det(L₁) = t₁₂ if t₁₂ ≠ 0 (in such a case the term is absorbed in (60) and (61) by the binomial coefficients), while when t₁₂ = 0 the Laplacian is a 1 × 1 matrix: L₁ is thus an empty matrix, and we take its determinant to be 1 for consistency.

We can find an explicit expression for (60) and (61) analytically. We replace the delta functions appearing by their contour integral representations

$\begin{equation}{\delta }_{i,j}=\frac{1}{2\pi i}{\oint }_{\vert z\vert =1}{z}^{i-j-1}\;\mathrm{d}z.\end{equation} \tag{ 62 }$

After making the substitution, we notice that the integrands in (60) and (61) are analytic functions everywhere except in 0. This allows us to deform the integration contour to a circle of radius ≪ 1. The reason for this is to avoid spurious poles in the following steps.

We now let all the sums run up to ∞. This procedure is allowed as higher order terms in the sums do not affect the residue in 0. The infinite sums can be explicitly evaluated and by doing so we get

$\begin{align}\hfill {Z}_{2,n}^{\text{cycles}}(s)& ={(1-p)}^{n}+\frac{1}{({\mathrm{e}}^{s}pq-pq+q+p-1)}\frac{1}{2\pi \mathrm{i}}\hfill \\ \hfill & \quad \times {\oint }_{\vert z\vert ={\epsilon}}\frac{1}{{z}^{n-1}}\frac{{\mathrm{e}}^{s}pq}{(z(1-p)-1)(z-{z}_{1}^{\ast })(z-{z}_{2}^{\ast })}\hfill \end{align} \tag{ 63 }$

$\begin{equation}{Z}_{2,n}^{\text{paths}}(s)=-\frac{1}{({\mathrm{e}}^{s}pq-pq+q+p-1)}\frac{1}{2\pi \mathrm{i}}{\oint }_{\vert z\vert ={\epsilon}}\frac{1}{{z}^{n}}\frac{{\mathrm{e}}^{s}p}{(z-{z}_{1}^{\ast })(z-{z}_{2}^{\ast })},\end{equation} \tag{ 64 }$

where

$\begin{equation}\qquad {z}_{1}^{\ast }=\frac{2}{2-p-q-\sqrt{{(p-q)}^{2}+4\,{\mathrm{e}}^{s}pq}}\end{equation} \tag{ 65 }$

$\begin{equation}{z}_{2}^{\ast }=\frac{2}{2-p-q+\sqrt{{(p-q)}^{2}+4\;{\mathrm{e}}^{s}pq}}.\end{equation} \tag{ 66 }$

Notice that ${z}_{1}^{\ast }$ and ${z}_{2}^{\ast }$ are exactly the inverse of the eigenvalues found with spectral methods. We remark that the integrands in (63) and (64) have acquired new singularities, in the form of simple poles at ${z}_{1}^{\ast }$ , ${z}_{2}^{\ast }$ and 1/(1 − p): these poles are unphysical, in the sense that their residue should not be considered when computing the contour integrals.

We can express ${Z}_{2,n}^{\text{cycles}}(s)$ and ${Z}_{2,n}^{\text{paths}}(s)$ as

$\begin{align}{Z}_{2,n}^{\text{cycles}}(s)& ={(1-p)}^{n}+\frac{1}{({\mathrm{e}}^{s}pq-pq+q+p-1)}{\mathrm{R}\mathrm{e}\mathrm{s}}_{z=0}\\ \hfill & \quad \times \left(\frac{1}{{z}^{n-1}}\frac{{\mathrm{e}}^{s}pq}{(z(1-p)-1)(z-{z}_{1}^{\ast })(z-{z}_{2}^{\ast })}\right)\hfill \end{align} \tag{ 67 }$

$\begin{equation}{Z}_{2,n}^{\text{paths}}(s)=-\frac{1}{({\mathrm{e}}^{s}pq-pq+q+p-1)}\,{\mathrm{R}\mathrm{e}\mathrm{s}}_{z=0}\left(\frac{1}{{z}^{n}}\frac{{\mathrm{e}}^{s}p}{(z-{z}_{1}^{\ast })(z-{z}_{2}^{\ast })}\right).\end{equation} \tag{ 68 }$

Computing the residues we find

$\begin{align}\hfill {Z}_{2,n}^{\text{cycles}}(s)& ={(1-p)}^{n}\hfill \\ & \quad +\frac{({\mathrm{e}}^{s}pq{({z}_{2}^{\ast })}^{-n}({z}_{1}^{\ast }({({z}_{2}^{\ast }/{z}_{1}^{\ast })}^{n}-{({z}_{2}^{\ast }-p{z}_{2}^{\ast })}^{n})+{z}_{2}^{\ast }(-1+{({z}_{2}^{\ast }-p{z}_{2}^{\ast })}^{n}+(-1+p){z}_{1}^{\ast }(-1+{({z}_{2}^{\ast }/{z}_{1}^{\ast })}^{n}))))}{({\mathrm{e}}^{s}pq-pq+p+q-1)((1+(-1+p){z}_{1}^{\ast })({z}_{1}^{\ast }-{z}_{2}^{\ast })(1+(-1+p){z}_{2}^{\ast }))}\end{align} \tag{ 69 }$

$\begin{align}\hfill {Z}_{2,n}^{\text{paths}}(s)& =\frac{({\mathrm{e}}^{s}p{({z}_{2}^{\ast })}^{-n}(-1+{({z}_{2}^{\ast }/{z}_{1}^{\ast })}^{n}))}{({\mathrm{e}}^{s}pq-pq+q+p-1)({z}_{1}^{\ast }-{z}_{2}^{\ast })}.\hfill \end{align} \tag{ 70 }$

By summing these two contributions and replacing ${z}_{1}^{\ast }$ and ${z}_{2}^{\ast }$ from (65) and (66), we obtain exactly (59).

In figure 3 we show the functions ${Z}_{2,n}^{\text{cycles}}$ and ${Z}_{2,n}^{\text{paths}}$ and compare them with the moment generating function previously obtained via spectral methods.

Evidently, the moment generating function obtained by summing up cycles and paths contributions completely matches the moment generating function obtained with spectral methods, as the two curves are indistinguishable. An advantage of the graph-combinatorial approach with respect to the spectral calculation is the possibility to split the contributions coming from cycles and paths. As expected for the simple model investigated, cycles contribute less to the moment generating function for s > 0 with respect to paths, and viceversa for s < 0. The reason for this is that in the path scenario the Markov chain has to jump at least once from 1 to 2, contributing to C_n. The larger the n, the less pronounced is this effect. We also show in figure 4 the ratio ${Z}_{2,n}^{\text{cycles}}/{Z}_{2,n}^{\text{paths}}$ for a few fixed values of the tilting parameter s as a function of time n. Noticeably, the ratios become constant for n big enough, supporting the fact that both ${Z}_{2,n}^{\text{cycles}}$ and ${Z}_{2,n}^{\text{paths}}$ share the same asymptotics for large n and differ only by a constant prefactor that is a function of s.

**Figure 4.** Ratios $\frac{{Z}_{2,n}^{\text{cycles}}}{{Z}_{2,n}^{\text{paths}}}$ as a function of n for five different values of s, which are, from top to bottom: cyan, s = −1.0; magenta, s = −0.5; black, s = 0; red, s = 0.5; orange, s = 1.0.
Download figure:
Standard image High-resolution image

**Figure 4.** Ratios $\frac{{Z}_{2,n}^{\text{cycles}}}{{Z}_{2,n}^{\text{paths}}}$ as a function of n for five different values of s, which are, from top to bottom: cyan, s = −1.0; magenta, s = −0.5; black, s = 0; red, s = 0.5; orange, s = 1.0.
Download figure:
Standard image High-resolution image

4.3. Large deviation regime

We now investigate fluctuations in the large-n limit computing the SCGF λ(s). We compare the spectral and the variational formulae, to highlight the benefits of both approaches.

Using the spectral approach, the logarithm of the dominant eigenvalue is the SCGF (59) and reads

$\begin{equation}\lambda (s)=\mathrm{log}\,\frac{2-p-q+\sqrt{{(p-q)}^{2}+4pq\,{\mathrm{e}}^{s}}}{2}.\end{equation} \tag{ 71 }$

We can arrive at the same result by minimising (40). Noticing that Kirchhoff law reduces to ν₁₂ = ν₂₁, the action functional reduces to

$\begin{align}\lambda [\nu ]& =({\nu }_{11}+{\nu }_{12})\mathrm{l}\mathrm{o}\mathrm{g}({\nu }_{11}+{\nu }_{12})+({\nu }_{12}+{\nu }_{22})\mathrm{l}\mathrm{o}\mathrm{g}({\nu }_{12}+{\nu }_{22})\\\ & \quad -\mathrm{l}\mathrm{o}\mathrm{g}\;{\nu }_{11}-\mathrm{l}\mathrm{o}\mathrm{g}\;{\nu }_{22}+{\nu }_{11}\;\mathrm{l}\mathrm{o}\mathrm{g}(1-p)+{\nu }_{12}(\mathrm{l}\mathrm{o}\mathrm{g}\;p+\mathrm{l}\mathrm{o}\mathrm{g}\;q)\\ & \quad +{\nu }_{22}\;\mathrm{l}\mathrm{o}\mathrm{g}(1-q)+s{\nu }_{12}+{\epsilon}({\nu }_{11}+2{\nu }_{12}+{\nu }_{22}-1).\end{align} \tag{ 72 }$

The determinant of the system of equations satisfied by the minimum of (73), which is linear in ν, must be 0 to have non-trivial solutions. This condition gives an equation for e⁻ ( is the Lagrange multiplier fixing the normalisation condition) whose solution gives—through (49)—expression (72). This last can be replaced in the form of the minimisers obtained by solving the linear system, which read

$\begin{equation}{\nu }_{11}^{\ast }=\frac{{(1-q)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}-1}{{(1-q)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}+2({(1-p)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}-1)({(1-q)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}-1)+{(1-p)}^{-1}\,{e}^{-{\epsilon}}-2}\end{equation} \tag{ 73 }$

$\begin{equation}{\nu }_{12}^{\ast }=\frac{({(1-p)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}-1)({(1-q)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}-1)}{{(1-q)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}+2({(1-p)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}-1)({(1-q)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}-1)+{(1-p)}^{-1}\,{\mathrm{e}}^{-{\epsilon}}-2}\end{equation} \tag{ 74 }$

$\begin{equation}{\nu }_{22}^{\ast }=\frac{{(1-p)}^{-1}\;{\mathrm{e}}^{-{\epsilon}}-1}{{(1-q)}^{-1}\;{\mathrm{e}}^{-{\epsilon}}+2({(1-p)}^{-1}\;{\mathrm{e}}^{-{\epsilon}}-1)({(1-q)}^{-1}\;{\mathrm{e}}^{-{\epsilon}}-1)+{(1-p)}^{-1}\;{\mathrm{e}}^{-{\epsilon}}-2},\end{equation} \tag{ 75 }$

to get their explicit form as a function of p, q, and the tilting parameter s.

In the top-left and bottom-left panels of figure 5 we plot the minimisers (74)–(76) as a function of the tilting parameter s. For the top-left case, we use p = q = 0.5, while for the bottom case p = 0.5 and q = 0.9. We notice that when p = q, as in the top-left panel, ${\nu }_{11}^{\ast }={\nu }_{22}^{\ast }$ identically. This reflects a permutation symmetry of the system: when p = q, switching states 1 and 2 does not affect the transition matrix. In this case, the Markov chain smoothly transitions between two regimes: for s ≪ 0, the chain spends half of the time in node 1 and half on 2; for s ≫ 0, the chain spends all the time jumping from state 1 to state 2 and back. When p ≠ q, instead, for s < 0 the system smoothly transitions to a localised state, where the Markov chain is mostly located on 1 (resp. 2) if p < q (resp. p > q). Interestingly, we notice that for p < q the maximum of ${\nu }_{22}^{\ast }$ occurs at a finite and negative value of s.

**Figure 5.** Top-left panel: plot of the minimizers ${\nu }_{11}^{\ast }$ , ${\nu }_{12}^{\ast }$ , ${\nu }_{22}^{\ast }$ of the form (73)–(75) for p = q = 0.5 as a function of s. We notice that the curve of ${\nu }_{11}^{\ast }$ coincides with that of ${\nu }_{22}^{\ast }$ , due to the symmetry of p and q. Top-right panel: plot of all the contributions to the SCGF as defined in (42)–(44), (76), (77) of the two-state model and their sum as a function of s for p = q = 0.5. For this choice of parameters, we notice that the curve of λ_1,states coincides with that of λ₂, and interestingly they do not depend on s. Bottom-left panel: plot of the minimizers ${\nu }_{11}^{\ast }$ , ${\nu }_{12}^{\ast }$ , ${\nu }_{22}^{\ast }$ of the form (73)–(75) for p = 0.5, q = 0.9 as a function of s. Bottom-right panel: plot of all the contributions to the SCGF as defined in (42)–(44), (76), (77) of the two-state model and their sum as a function of s for p = 0.5 and q = 0.9.
Download figure:
Standard image High-resolution image

In the top-right and bottom-right panels of figure 5 we plot each contribution to the SCGF obtained with our approach alongside their sum. The SCGF is in perfect agreement with the one obtained using spectral methods. Furthermore, our approach allows us to understand the magnitude of each physical term. We split λ₁(s), as defined in (41), into two terms as follows:

$\begin{equation}{\lambda }_{1,\text{states}}=\sum\limits _{i=1}^{2}\sum\limits _{j=1}^{2}{\nu }_{ij}\,\mathrm{log}\left(\sum\limits _{k=1}^{2}{\nu }_{ik}\right)\end{equation} \tag{ 76 }$

$\begin{equation}{\lambda }_{1,\text{links}}=-\sum\limits _{i=1}^{2}\sum\limits _{j=1}^{2}{\nu }_{ij}\;\mathrm{l}\mathrm{o}\mathrm{g}({\nu }_{ij}),\end{equation} \tag{ 77 }$

and plot them separately. In the top-right panel we used p = q = 0.5, while in the bottom-right panel we used p = 0.5 and q = 0.9. In both cases, when |s| is large we notice that λ_1,states and λ_1,links balance each other, and their sum is close to zero. This is because in both cases, the Markov chain spends most of the time in just a fraction of the available links. For s ≫ 0, the dominant contribution in both cases is due to the tilting term λ₃(s). For s ≪ 0, we notice a striking difference: when p = q, both λ_1,states and λ_1,links tend to a finite value. This is because the chain still visits both node 1 and node 2. Instead, in the case p ≠ q, λ_1,states and λ_1,links tend to 0. This is because of the aforementioned localisation behaviour. In both cases, since the tilting term λ₃(s) becomes negligible, the SCGF λ is well approximated by λ₂, the dynamical entropy.

5. Conclusion

In this work we propose a way to study the large deviation regime of fluctuations of two-point observables of a discrete-time Markov chain. Adopting graph-combinatorial arguments similar to those in [33–35], we show how to calculate the finite-time moment generating function and the SCGF, objects that have a clear interpretation in the framework of statistical physics as they correspond, respectively, to the canonical partition function and Helmoltz free energy. In particular, all terms of the Helmoltz free energy have a clear physical meaning—see (40) and following discussion. We establish a direct and explicit link with spectral methods, as the Lagrange multipliers in (40) can be shown to be the dominant eigenvalue and right eigenvector of the tilted matrix—see (47). Furthermore, from the minimisers ν* we show how to compute in a simple way the occupation measure on the nodes and the driven process.

We illustrate the benefits of our method in a general two-state model, for which we can compute analytically both the moment generating function and the SCGF. We show plots where we highlight the new information accessible with our method: in particular, we compare the different contributions of paths and cycles to the moment generating function. For the large deviation regime, analysing the minimisers ν* as well as all the terms in our formula for the SCGF, we find an interesting localisation behaviour of the Markov chain when the two-state model is not symmetric.

Remarkably, the finite-time expression for the moment generating function could be used as the starting point for future investigations on the role of sub-leading terms in the fluctuations of observables, for which to our knowledge not much is known. A remarkable contribution in this direction is [39], where authors use matrix products states to characterise fluctuations at finite time. An interesting avenue for future research would be to try to apply our methods in the continuous-time setting.

Furthermore, once we fix the state-space connectivity and probability weights, it would be interesting to understand the interplay between the long-time limit and the large number of states limit. In the framework of Markov chains satisfying detailed balance, this approach could, in principle, be adopted to investigate transient behaviour and metastability in rough energy landscapes, a problem relevant to many areas in statistical physics [47, 48]. More generally, Markov chains that satisfy global balance but not detailed balance are a paradigmatic model for out of equilibrium phenomena. In this context, understanding finite-time behaviour is challenging—see [49] for applications to biology. Out of equilibrium steady states are directly accessible in the large deviations framework [19, 50] and are of interest to many communities.

Finally, we remark that the Helmoltz free energy associated with the pair empirical occupation measure is a powerful tool to investigate dynamical phase transitions in fluctuations of one and two-point observables. For instance, in [21, 24] the authors show evidence of a localisation phase transition in random walks on random graphs. In an upcoming work, we intend to investigate toy models where this phenomenon can be analytically characterised using the approach outlined in this paper.

Acknowledgments

GC and FC are thankful to Gianmichele Di Matteo for insightful discussions and to Mayank Shreshtha for having designed figure 2. FC is grateful to Hugo Touchette for pointing to interesting literature in the topic and for the hospitality in Stellenbosch (South Africa) during the writing stage of the manuscript. GC is supported by the EPSRC Centre for Doctoral Training in Cross-Disciplinary Approaches to Non-Equilibrium Systems (CANES, EP/L015854/1).

Data availability statement

No new data were created or analysed in this study.

Graph-combinatorial approach for large deviations of Markov chains

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Fluctuations for discrete-time Markov chains in the large deviations regime

2. Pair empirical measure rate functional

3. Graph-combinatorial approach

3.1. An alternative expression for the moment generating function

3.2. Computation of ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$

3.3. Long-time asymptotics

4. Two-state model

4.1. Spectral approach

4.2. Graph-combinatorial approach

4.3. Large deviation regime

5. Conclusion

Acknowledgments

Data availability statement

Footnotes

Graph-combinatorial approach for large deviations of Markov chains

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Fluctuations for discrete-time Markov chains in the large deviations regime

2. Pair empirical measure rate functional

3. Graph-combinatorial approach

3.1. An alternative expression for the moment generating function

3.2. Computation of {{\Theta}}_{T}^{\text{path}} and {{\Theta}}_{T}^{\text{cycle}}

3.3. Long-time asymptotics

4. Two-state model

4.1. Spectral approach

4.2. Graph-combinatorial approach

4.3. Large deviation regime

5. Conclusion

Acknowledgments

Data availability statement

Footnotes

3.2. Computation of ${{\Theta}}_{T}^{\text{path}}$ and ${{\Theta}}_{T}^{\text{cycle}}$