Level 2 large deviation functionals for systems with and without detailed balance

Large deviation functions are an essential tool in the statistics of rare events. Often they can be obtained by contraction from a so-called level 2 large deviation {\em functional} characterizing the empirical density of the underlying stochastic process. For Langevin systems obeying detailed balance, the explicit form of this functional has been known ever since the mathematical work of Donsker and Varadhan. We rederive the Donsker-Varadhan result by using stochastic path-integrals and then generalize it to situations without detailed balance including non-equilibrium steady states. The proper incorporation of the empirical probability flux turns out to be crucial. We elucidate the relation between the large deviation functional and different notions of entropy production in stochastic thermodynamics and discuss some aspects of the ensuing contractions. Finally, we illustrate our findings with examples.


Introduction
Thermodynamic quantities of small systems fluctuate measurably. With the discovery of fluctuation theorems, the probability distributions of work, heat, and entropy became the focus of stochastic thermodynamics, for recent reviews see [1,2]. In particular, the tails of these distributions turn out to be important for the properties of small systems. Therefore, the statistics of rare events received an increase of interest in the past decades.
The mathematical framework to address the statistics of rare events is large deviation theory [3,4]. Of particular interest are quantities of the type sometimes called Brownian functionals [5]. Here x(t) denotes a d-dimensional stochastic process. If it is ergodic with some stationary probability measure p st (x), then, for large oberservation times T , a T will converge to its mean value, If the limit exists, the random variable a is said to satisfy a large deviation principle. The large deviation function J(a) contains the desired information about the statistics of a. First of all, consistency requires J( a st ) = 0 and J(a) ≥ 0 for all a. Taylor expansion around the minimum of J(a) up to second order yields a Gaussian fluctuation statistics and generalizes Einsteins theory of equilibrium fluctuations [6]. In addition, the complete function J(a) characterizes the statistics of exponenially rare events deviating from a st . Recent applications of large deviation functions to describe rare events in statistical mechanics can be found in [7,8,9,10,11,12,13].
Different Brownian functionals deriving from the same stochastic process x(t) have different large deviation functions. On the other hand, we may rewrite (1.1) as = dy A(y) T (y; x(t)) (1. 5) with the so-called empirical density T (y; x(·)) = 1 T T 0 dt δ(y − x(t)). (1.6) Due to its dependence on x(·) the empirical density is a random function. Clearly, lim T →∞ T (y; x(·)) = p st (y) . (1.7) If we assume that T by itself obeys a large deviation principle, To be more specific, let us consider as underlying stochastic process an overdamped Langevin dynamicṡ with a force f (x, t) and white noise ξ(t) with correlation ξ i (t)ξ j (t ) = 2δ ij δ(t − t ). The dynamics may also be defined by the Fokker-Planck equation where p(x, t) is the probability density and j(x, t) the probability current density for the state x at time t. Accordingly, we get for the stationary state with distribution p st and current j st Finding an explicit expression for I[ (·)] for Langevin processes may seem hopeless, but remarkably, for processes that satisfy detailed balance, j st (x) ≡ 0, Donsker and Varadhan showed quite some time ago [14,15,16,17] that the large deviation functional I[ (·)] is given by (1.15) Note I[ = p st ] = 0 and I[ (·)] ≥ 0 for all (·) as it should be. The Donsker-Varadhan result is limited to equilibrium situations. It is therefore natural to ask whether it can be generalized to non-equilibrium steady states (NESSs) where detailed balance is violated. NESSs come in a large variety since their stationary flux j st is, besides being divergenceless, largely arbitrary. We therefore expect that the large deviation functional I[ (·)] for a NESS will also depend on the stationary flux. We are hence motivated to introduce the empirical current in analogy to (1.7). In further analogy to (1.10), the supplemented large deviation functional for the joint probability distribution P T [ (·), µ(·)] is suitable to derive all large deviation functions J(b) of Brownian functionals of the type where we did not write down the constraints ∇·µ(y) = 0 and dy (y) = 1 for clarity of notation. Since entropy productions are typically of the form (1.19), the large deviation functional I[µ(·)] is of particular interest.
In the present paper we elucidate the relation between the large deviation functional for the empirical density in stationary states of Langevin systems, the structure of the corresponding Fokker-Planck equation, and the entropy production in NESSs. In section 2 we first rederive the Donsker-Varadhan result (1.15) using a path-integral formulation. Our derivation is a novel alternative to the mathematical treatment of Donsker and Varadhan [14,15,16,17] and is meant to extend the accessibility to a less mathematically minded readership. In section 3 we obtain the large deviation functional I[ (·), µ(·)] by generalizing the derivation to NESSs, i.e. to situations without detailed balance, and highlight the role of the probability current in a NESS. In section 4 we show that I[ (·), µ(·)] can be split into two contributions I a [µ(·)] and I na [ (·), µ(·)] linked to adiabatic and non-adiabatic entropy productionsṠ a andṠ na [18,19,20,21]. Moreover, we consider in some detail the contraction of I[ (·), µ(·)] to I[ (·)] and I[µ(·)] respectively and exemplarily discuss the contraction to J(s a ) where the adiabatic entropy production s a is a typical instance of b T in (1.19). Finally, in section 5, we illustrate our results by some examples.

Systems with Detailed Balance
For systems with detailed balance the external force derives from a potential, f (x) = −∇V (x), and the stationary current is zero, j st (x) ≡ 0. The stationary distribution is the equilibrium distribution p st (x) = 1/Z exp(−βV (x)) with partition sum Z. We can hence write the force as and may define the dynamics by fixing p st instead of f . To derive the Donsker-Varadhan result (1.15) for the large deviation functional I[ (·)], we write P T [ (·)] as the probability transformation of the probability density P [x(·)] for observing a trajectory x(·): (2.2) Using the integral representation of the δ-functional, P T [ (·)] may be written as with the cumulant generating function  For large T the integral is dominated by its saddle-point and we find Hence, −λ[−q(·)] is the Legendre-Fenchel transform of −I[− (·)] [4]. Our strategy will be to determine λ[q(·)] first and than solve the minimization problem (2.7) to obtain I[ (·)]. The probability density of a trajectory is given by where p(x 0 ) is the probability density of the initial point x 0 = x(0) and P [x(·)|x 0 ] the conditional probability density of the trajectory {x(·)}, Expanding the square in the exponent, the mixed term gives rise to a boundary term that, for large T , contributes to the o(T ) terms only. We may hence write Even for simple choices of V (x) this path-integral can not be solved for general q(·).
Using the Feynman-Kac formula we may, however, obtain information on its large T behavior from the differential equation for the time evolution of G q (x, t|x 0 , 0). The initial condition is G q (x, 0|x 0 , 0) = δ(x−x 0 ) and L q is the so-called tilted generator of the Fokker-Planck dynamics To determine λ[q(·)], we express the solution of (2.14) in terms of the eigenvectors and eigenfunctions of L q . We denote the right eigenvectors by φ ν q and the corresponding eigenvalues by λ ν q : Owing to the symmetry of L q , the left eigenvectors are simply given by φ ν q * : Assuming that L q has a complete set of left and right eigenvectors ‡, i.e.
we can formally write the solution of the differential equation (2.14) for arbitrary q(·) as Inserting the above expression for G q (x T , T |x 0 , 0) into (2.12) we get where λ 0 q is the eigenvalue with the largest real part. For simplicity, we denote in the following the right eigenvector corresponding to this eigenvalue by φ q . The two integrals in (2.20) contribute to the o(T ) term in (2.5) only. Hence we do not need to know their value and can carry on without actually knowing the eigenvectors φ ν q . Still, λ[q(·)] is hard to get for general q(·). However, we do not need an explicit expression. The Euler-Lagrange equation corresponding to the minimization problem (2.7) is of the form whereq(·) denotes the q(·) that minimizes the r.h.s. of (2.7). Now, from standard first order perturbation theory This leads to Since (x) is positive, φq is real and Lq has the same left and right eigenvectors.
To finally determine I[ (·)], we plug the minimizing φq = √ into (2.7) and find Fortunately, the so far undeterminedq(·) drops out and we arrive at the result (1.15) found by Donsker and Varadhan: (2.26) Being normalized, (x) tends to zero for |x| → ∞ and the boundary terms in the integration by parts do not contribute. By substituting from the definitions (1.12b) and (1.13b), we find for j st (x) ≡ 0 a compact form of the Donsker-Varadhan result, (2.28)

Systems without Detailed Balance
We now consider systems in which the force f (x) does not derive from a potential. Consequently, it is j st (x) = 0 and detailed balance is violated. Nevertheless, we can still relate the force to the stationary distribution p st and the stationary current j st by As it is a hard problem to find p st and j st from (1.13a) and (1.13b) for a given force f , it often is convenient to follow the reverse strategy and fix p st and j st and then determine the associated force f from the above equation. In this way, p st defines the conservative and j st the non-conservative part of f . In the following analysis we closely follow the lines in section 2. The essential difference to the case with detailed balance becomes apparent by inspecting the probability density of a trajectory, The mixed term T 0 dtẋf (x) is no longer negligible in the large T limit but may be of order T . Therefore, the simplification applied in (2.11) can not be used here. This entails a more complicated form of the tilted generator which in turn implies that there is no such simple relation between the right and left eigenvectors as (2.17).
The fact that we now have j st in addition to p st to define the dynamics motivates to include the empirical current µ defined in (1.16) into our considerations and to investigate the form of the joint probability density P T [ (·), µ(·)]. The corresponding large deviation form reads The cumulant generating function of P T [ (·), µ(·)] now depends on two functions and is given by (3.4) The corresponding large deviation functional λ[q(·), k(·)] is defined by At this point, it is convenient to substitute q(·) and k(·) by two new functions η(·) and γ(·) such that In analogy with (2.12) we have Here, we have used (3.2) and (3.4).
with the tilted generator We assume that L γ,η has a complete set of right eigenvectors φ ν γ,η and left eigenvectors ψ ν γ,η § : where the adjoint operator is given by 14) The completeness of the eigenvector set is expressed by As in the detailed balance case, we get as formal solution of (3.10) and identify where λ 0 γ,η is the eigenvalue with the largest real part. For simplicity of notation the corresponding eigenvectors will be denoted by φ γ,η and ψ γ,η in the following.
Analogously to the minimization in q(·) for the detailed balance case, the minimization in η(·) leads in the present case to For the optimization in γ(·) it is convenient to use the relation where We are now able to put everything together and get from (3.18) Unlike the detailed balance case, the dependence onγ andη does not vanish completely but survives in the last term involving ψ * γ,η . However, this term contributes to the o(T ) terms in (3.5) only. To prove this, we first integrate by parts: Next we recall the definition (1.16) of µ for an arbitrary trajectory {y(·)} and find We are thus left with (3.27) I[ (·), µ(·)] may be written in a number of alternative forms. Replacing f by using (3.1), we find (3.28) Upon repeated integrations by parts, the mixed term of the square can be shown to vanish, The integral over the boundary in the last line must vanishes due to the normalizability of the involved distributions. Thus, we are left with our final result (3.30) As for the Donsker-Varadhan result, by substituting the current j from (2.27) into (3.28), we obtain the compact form (3.31)

Discussion
The large deviation functionals derived in the previous section characterize the distributions of empirical density and empirical current. They acquire a more intuitive meaning due to their relation to different forms of entropy production. In the first part of the discussion we establish these connections. We then consider the question how to obtain the large deviation functionals I[ (·)] of the empirical density alone and I[µ(·)] of the empirical current alone for a non-equilibrium steady state. Finally, we discuss the large deviation function J(s a ) of the adiabatic entropy production s a [µ(·)].

Entropy production
We start with the simpler case of systems obeying detailed balance. The stationary current is zero, j st (x) ≡ 0, and therefore, the dynamics is uniquely defined by the stationary distribution p st alone. If the system is initially out of equilibrium in a state p = p st , it relaxes to equilibrium, and entropy is being produced with a rate [2] (4.1) Here, j p (x) denotes the current corresponding to p(x) according to Comparison with the Donsker-Varadhan result (2.28) reveals [22,23] i.e., the large deviation functional I[ (·)] is, up to a constant factor, nothing but the entropy production of the system when being in the state (·) instead of the equilibrium state p st . For large T , the entropy production hence quantifies how unlikely a deviation of the empirical density from the true equilibrium distribution is. For systems without detailed balance it has been shown [19,20,21,18] that the entropy production rateṠ i may be subdivided into two contributions, the so-called adiabatic and non-adiabatic parts,Ṡ i =Ṡ a +Ṡ na , wherė (4.5) The non-adiabatic partṠ na is the generalization of (4.1) and describes the entropic cost to relax to the stationary state. Correspondingly, it vanishes for j p (x) = j st (x) and p(x) = p st (x). The adiabatic partṠ a , on the other hand, remains non-zero even in the steady state and characterizes the dissipation necessary to maintain stationarity away from equilibrium. The two contributions therefore address the two basic mechanisms that commonly go along with non-equilibrium situations: driving and out of equilibrium boundary conditions respectively [19,20,21,18]. Using (2.27) we may write the large deviation functional (3.31) in the form which suggests to split it into the two contributions The second term in I[ (·), µ(·)], I na [ (·)], is hence (up to a constant factor) equal to the non-adiabatic part of the entropy production that accounts for deviations of from p st . Accordingly, it characterizes the difference between empirical and true stationary distribution and is independent of µ.
The correspondence between I a [ (·), µ(·)] andṠ a [p(·)] as defined in (4.4) is not quite as close. The reason is twofold. Firstly,Ṡ a [p(·)] is a functional of p alone and has no dependence on a current. Secondly, I a [ (·), µ(·)] must be zero for = p st and µ = j st , whereasṠ a [p(·)] has to remain non-zero even in the steady state. To connecṫ S a [p(·)] with I a [ (·), µ(·)], we have hence (in addition to the prefactor 1/4) to make the replacement j st /p st → j st /p st − µ/ . The first contribution to I[ (·), µ(·)], I a [ (·), µ(·)], hence measures the distance between the empirical and the true stationary current. Evaluated at µ = 0 it gives (up to a constant factor) the adiabatic entropy production in state .

Contractions
The large deviation functional I[ (·), µ(·)] characterizes the joint probability distribution P [ (·), µ(·)] for empirical density and empirical current. In many situations one is interested in the distribution of either the density or the current alone. Including the constraint with a Lagrange multiplier field κ(x) yields the Euler-Lagrange equation The optimal current can hence be determined from the equations The Euler-Lagrange equation for¯ reads where κ is the Lagrange parameter to be determined from the constraint ¯ (x; κ) dx = 1. Solving the above Euler-Lagrange equation analytically is barely possible and no progress seems possible.
However, as discussed below (1.19), we stress that I[µ(·)] is particularly relevant for contractions to large deviation functions of entropy productions. Consider for example the adiabatic entropy production s a [x(·)] on the trajectory level [2,24] Note that as opposed to the spatial averageṠ a in (4.4), s a [x(·)] is a fluctuating quantity and satisfies a fluctuation theorem, see [2,9]  and then substitute the optimal density¯ . The Euler-Lagrange equation of the above contraction reads where κ 1 (x) is the Lagrange function for the constraint ∇ · µ = 0 and κ 2 the Lagrange parameter for the constraint s a = s a [µ] from (4.20). The optimal current hence reads (4.24) The Lagrange function κ 1 and the Lagrange parameter κ 2 has to be determined from the equations for the respective constraints, , (4.25a) Substituting the empirical adiabatic entropy production rateṠ a [ (·)] from (4.4), we can rewrite (4.25b) as and find from substituting (4.23) and (4.24) where the second line follows after two integrations by parts. The remaining quantities to be determined are κ 1 (x; s a ) and κ 2 (s a ) from (4.25a) and (4.25b), and¯ (κ) from (4.17) where µ =μ(s a ) needs to be substituted from (4.23) and κ follows from the constraint ¯ (x; κ) dx = 1. Hence, by first contracting to I[ (·), s a ] and then to J(s a ), as opposed to first contract to I[µ(·)] and then to J(s a ), all necessary equations to determine J(s a ) are known and a numerical treatment is possible.

Illustration
To illustrate our findings, we first consider a simple example without detailed balance given by the force with the constants a > 0 and b. The stationary distribution and the stationary current are and we can thus also write The conservative force −∂ r Φ(r) points to the origin and keeps the dynamics bounded, the non-conservative force v st (r) generates a circular current and violates the detailed balance condition. For a numerical illustration, we fix the parameters to be a = 1, b = 2, T = 500, β = 2 and numerically solve the Langevin equation (1.11) with f (r) from (5.3) using 10 5 timesteps to obtain an ensemble of 10 3 trajectories {r(·)}. The initial values r(0) are drawn from the stationary distribution (5.2a) in order to save the relaxation time and have the system in the NESS the whole time. To each trajectory, we determine the empirical density (·) and current µ(·) from the definitions (1.6) and (1.16) and calculate from that the values of I a [ (·), µ(·)] and I na [ (·)] using (4.7a) and (4.7b).
In figure 1 we show a typical and a rare realization of , and in figure 2 a typical and a rare realization of µ. The typical realizations are selected by picking from the ensemble the and µ with the minimum value of I[ (·), µ(·)] respectively, and correspondingly, the rare realizations are qualified by the maximum value of I[ (·), µ(·)] within the ensemble. Note that in order to select the typical and rare realization of µ, we only need I a [ (·), µ(·)]. We see that the typical realization of is close to p st , whereas the rare realization clearly deviates from p st . In contrast, both realizations of µ are close to j st , implying that I[ (·), µ(·)] is much sharper with respect to µ than to , that is, the system predominantly fluctuates with regard to position and barely with regard to direction of movement. ensuring that ∇ · (p st (r)v st (r, ϕ)) = 0 still holds. To obtain an ensemble of and µ for this system, we repeat the simulation of the Langevin equation with the new force f (r, ϕ) according to (5.3) and v st (r, ϕ) from (5.5). In figure 3 and figure 4 we show the typical and rare realizations of and µ. Again, we see that the typical and µ are close to p st and j st respectively and the rare substantially deviates from p st , but now also the rare µ clearly deviates from j st . Hence, for this example, the system fluctuates in both position and direction of movement. It should furthermore be interesting to perform the contractions discussed in the previous section. Due to the non-linearity of the Euler-Lagrange equations, however, analytical contractions are, even for the simple example (5.3), barely possible. We regard a numerical treatment of the contractions in section 4.2 as beyond the scope of this paper. Instead, we use the parametrization to illustrate all findings of the previous section. The empirical density is chosen such that for σ = σ st ≡ 1/ √ βa it is (r) = p st (r), and the empirical current is chosen such that for c = b and σ = σ st it is µ(r) = j st and the constraint ∇ · µ(r) = 0 is always met.   For this functional choice of and µ, and p st and j st from (5.2a) and (5.2b), the rate function I = I a + I na can be calculated analytically: where in the second line we plugged inṠ a (σ) = 2βσ 2 b 2 from (4.4 As expected, it is J(s a = TṠ a ) = 0, and we see that p(s a ) has exponential tails.

Conclusion
We presented an instructive rederivation of the Donsker-Varadhan result for the level 2 large deviation functional I[ (·)] using a path-integral approach for Langevin systems that obey detailed balance. The derivation makes use of a simple relation between the left and right eigenfunctions of the tilted Fokker Planck operator. Due to the lack of such a relation in cases without detailed balance, a generalization of our derivation to NESSs is not immediate. Introducing the empirical current µ and considering I[ (·), µ(·)] instead of I[ (·)], however, provides the missing relation between the eigenfunctions and we arrive at a closed form of I[ (·), µ(·)] which was also found by Maes et al. [22] along other lines. In our derivation it was important to note that the definition of the empirical current implies a vanishing divergence, ∇µ = 0, except for the starting and end point of the underlying trajectory. This fact constitutes an argument why I[ (·), µ(·)] is only to be considered for divergence-less µ.
Just as the irreversible entropy production splits into an adiabatic and nonadiabatic part for systems violating detailed balance, the large deviation functional I[ (·), µ(·)] can be written as the sum of two non-negative contributions I a [ (·), µ(·)] and I na [ (·)], in direct correspondence to the respective entropy production rates. The empirical current µ accordingly enters the large deviation functional only by the adiabatic part I a [ (·), µ(·)].
Building on the contraction principle, the "master" functional I[ (·), µ(·)] is a starting point to derive all desirable large deviation functionals and functions. Contraction to I[ (·)] and making use of the detailed balance condition reproduces the Donsker-Varadhan result and demands a vanishing optimal currentμ for all . Large deviation functions of entropy productions follow from contractions of I[µ(·)], the large deviation functional for the empirical current hence is worthy of more attention than having received by the literature. The contraction to I[µ(·)], however, turns out to be quite involved, but a set of equations to determine by numerical means the large deviation function for the adiabatic entropy production, J(s a ), has been set up exemplarily. Carrying out the numerics for J(s a ) and analytical progress in the determination of I[µ(·)] are subjects for future efforts.
where the Jordan blocks J n are s n × s n matrices, with λ n on the diagonals and ones on the superdiagonals. For the exponential, this gives e tJ =   e tJ 1 . . . with φ (n,m) = f j (x T )y j,(n,m) and ψ (n,m+l) = z (n,m+l),j f * j (x 0 ). Hence, like in the case with N different eigenvalues, for large t, Q T is dominated by e λ 1 t , where λ 1 is the eigenvalue with the largest real part and the integrals over x T and x 0 are part of the o(T ) therm in (2.5) and (3.5) .