Cycle representatives for the coarse-graining of systems driven into a non-equilibrium steady state

A major current challenge in statistical mechanics is the systematic construction of coarse-grained models that are dynamically consistent, and, moreover, might be used for systems driven out of thermal equilibrium. Here we present a novel prescription that extends the Markov state modeling approach to driven systems. The first step is to construct a complex network of microstates from detailed atomistic simulations with transition rates that break detailed balance. The coarse-graining is then carried out in the cycle space of this network. To this end we introduce the concept of representatives, which stand for many cycles with similar properties. We show how to find these cycle communities using well-developed standard algorithms. Removing all cycles except for the representatives defines the coarse-grained model, which is mapped back onto a network with far fewer states and renormalized transition rates that, however, preserve the entropy production of the original network. Our approach is illustrated and validated for a single driven particle.


Introduction
The reduction of degrees of freedom is arguably the most crucial step in computational sciences.Numerical models can be formulated at several levels of detail: from ab initio including the electronic degrees of freedom, to classical force fields, to effective coarse-grained models, up to continuum models on the macroscopic scale.Limited computational capacities then imply that these levels correspond to increasing length and time scales that can be accessed at the expense of decreasing molecular detail.Indeed, for many applications details of the molecular interactions are irrelevant and models can be devised based on symmetry considerations and conservation laws alone, with the Navier-Stokes equations being a prominent example for a continuum model of fluid dynamics.
On intermediate levels, systematic approaches to structural coarse-graining have been developed [1,2] such as iterative Boltzmann inversion, which determines pair potentials for a reduced set of degrees of freedom from (partial) radial distribution functions obtained either from experiments or atomistic simulations.These relevant degrees of freedom have to be provided by the user and are typically informed by physical or chemical insights.One side effect of such a procedure is the loss of dynamical information, which is understood qualitatively (removed degrees of freedom act as a "bath" leading to stochastic dynamics [3]) but still hard to control.
Quite a different approach is followed through the construction of Markov state models [4][5][6], which aim to bridge the long time scales involved in, e.g., the folding of proteins from an initially disordered coil to the native state.These models are built by clustering microstates into a few mesoscopic states that are kinetically distinct, i.e., they correspond to basins that are separated by free energy barriers with a time-scale separation between fast intra-basin transitions and slow inter-basin transitions.In the case of a complete separation this can be exploited to endow the mesoscopic states with a Markovian dynamics for the slow transitions assuming quasi-equilibrium for the fast transitions.
One should note that a rigorous approach to dynamically consistent coarsegraining has been around for quite some time using projection techniques [7,8].By projecting the dynamics onto the subspace spanned by the reduced variables, a non-Markovian generalized Langevin equation for the time evolution in this subspace is obtained, which is formally exact but now takes on the form of an integro-differential equation.The complexity of the original problem is, therefore, preserved as ones trades the reduction of the degrees of freedom for a memory kernel.Further approximations are needed to obtain a numerically tractable problem.In the case of a clear separation of time scales one can employ the Markovian approximation [9].But even then the resulting equations are highly non-trivial both from a physical and mathematical point of view.Within this framework attempts have been made to derive dynamical density functional theory [10] and its extension to non-equilibrium situations [11,12].
Indeed, another layer of complexity is added when going away from thermal equilibrium to driven systems.
Stochastic thermodynamics has emerged as a comprehensive theoretical framework in particular for driven systems that are still in contact with a heat reservoir that itself remains in equilibrium [13].Here two classes of non-equilibrium situations can be distinguished: time-dependent processes with dynamics that still obey detailed balance and non-equilibrium steady states with dynamics that explicitly break detailed balance through non-conservative forces and/or flows [14].Fluctuating path functionals like work, heat, and entropy production obey certain symmetries called fluctuation theorems [13], which have been studied for (chemical) networks of discrete states [15][16][17].Hidden (unobservable) degrees of freedom modify the fluctuation theorems in various ways [18,19], and the consequences of coarse-graining have been discussed for removing fast states [20], "bridge" states [21], and the clustering of microstates [22,23].
So far, only relatively simple models have been studied appealing to stochastic thermodynamics including a network model of kinesin with six states [21,24] and a model for motor proteins [25,26].Building on these works, the purpose of the present paper is to extend the Markov state model approach to driven many-body systems and to construct coarse-grained models that are consistent with stochastic thermodynamics.The paper is organized as follows: in section 3 we briefly revisit the prerequisites, in particular how to obtain a network of discrete states from atomistic simulation data and introduce the relevant notions from stochastic thermodynamics.In particular cycles and the decomposition of networks into cycles [24,[27][28][29][30] will play a crucial role.The actual algorithm consists of two parts: the identification and clustering of cycles discussed in section 4 followed by the actual coarse-graining in cycle space described in section 5. Finally, we provide some critical remarks before concluding.

Motivation and basic idea
In order to maintain a non-equilibrium steady state, work has to be supplied constantly, which is dissipated as heat q into the environment.This non-vanishing entropy production allows for the transport of some quantity X, which might be the distance travelled by a particle, the number of molecules produced in a chemical reaction, etc.Although the mapping to Markovian dynamics proposed in this paper is not rigorous, there is evidence that the preservation of entropy production also implies the preservation of the major dynamical properties as well as macroscopic transport.Indeed, very recently Barato and Seifert have conjectured that the dissipation of a process leading to a squared relative uncertainty (δX) 2 ≡ (X − X ) 2 / X 2 is at least where t is the duration of the process [31].We employ dimensionless quantities throughout.In particular, entropies are measured in units of Boltzmann's constant k B and energies in units of k B T , where T is the temperature of the surrounding heat bath.The first step is to construct a discrete Markov state model and to determine the transition rates from particle-resolved simulation data.Although already this step is non-trivial it is not the focus of the present manuscript, in which we study a sufficiently simple model to employ a straightforward construction.For our purposes it is important to include many discrete states to approximate the entropy production of the particle-resolved model as closely as possible.Entropy production is related to cycles in the network of these discrete states.It seems quite clear that removing states will lead to "cutting open" some of these cycles and thus to a reduced entropy production [20].Our approach to this challenge is an additional step before the actual coarse-graining, which consists of identifying communities of cycles with similiar properties.The crucial step is then to identify a representative for each community, which will receive the entire entropy production of its community.All other cycles are then removed, and with them all states that are not visited by one of the representatives.This procedure preserves the entropy production rate, and Eq.(1) therefore implies that fluctuations of (measurable) transported quantities obey the same lower bound in the original and the coarse-grained system.

Model system: driven particle in a double well potential
We will illustrate our approach to coarse-graining with a specific example: a Brownian particle in a two-dimensional symmetric potential driven by a non-conservative force F nc .Working in two dimensions allows to directly visualize the configurations space as well as cycles, although the discussed algorithm is more generally applicable to many-body systems.
The equation of motion follows as where η(t) is a random force with correlations η(t)η(t ) = 2δ(t − t ) and is the potential energy.As non-conservative force we choose F nc (x) = ω (−y, x) T , where ω denotes the driving strength.Figure 1 shows the contour lines of the potential and an exemplary trajectory.Due to F nc , the particle trajectory does not obey the symmetry of the conservative potential anymore.

Building Markov state models, in a nutshell
We start by giving a short overview of how to create a Markov state model from, in general, molecular dynamics simulations, while here we employ Brownian dynamics (BD) propagating eq. ( 2).A detailed review can be found in [32].The first step in building a Markov state model is to approximate the continuous state space (only the positions are taken into account) by a set of discrete states that we will refer to as the microstates.The spatial discretization is typically obtained by cluster analyses, e.g.via the k-means algorithm.After a proper spatial discretization is found, the continuous d-dimensional state space is discretized into k partitions.Each of the k partitions is represented by its center, called centroid, allowing the original data points to be mapped onto the centroids by minimizing their Euclidean distance.
Once the full state space is discretized, the BD trajectory can be projected onto its centroids, storing its dynamical information as a simple sequence of centroid indices.The dynamical information can be extracted from this sequence of centroids by counting the number of transitions C i→j .Whenever centroid i is followed by j we increase C i→j by one.Finally, the transition matrix T is approximated by which is also the maximum probability estimator for the true transition operator [32].
If ∆t is chosen small enough, we can further expand with rate matrix W .It is important to note that for driven systems neither W nor T are symmetric in general.Hence, their eigenvalues are complex, making the identification of metastable sets difficult.The reason is that the real part of the spectrum does not necessarily offer a gap, an illustrative example for which can be found in ref. 33.If all transition rates w i j = W i→j are known ‡, the time evolution of the state probabilities is given by the master equation with normalization Here Φ i j ≡ w i j p i are the directional probability fluxes indicating the amount of probability flowing from state i to state j per unit time.The subsequent description requires that the graph spanned by the Markov model is (i) ergodic, i.e., every state can be reached by every other state in finite time, and (ii) the transition rates are reversible, i.e., if w i j > 0 then also w j i > 0. Hence, there is one edge for the forward and one edge for the backward transition between any two states that are connected through non-zero transition rates.
If the system is in a steady state the probability distribution, and hence probability fluxes, are time independent and we can drop the time argument, p i (t) = p i .The left hand side of equation ( 6) becomes zero, In the following we assume that the system resides in a (non-equilibrium) steady state.
A special steady state is identified as thermodynamic equilibrium.Here each summand of the right hand side of eq. ( 8) vanishes individually, i.e.Φ i,eq j − Φ j,eq i = 0, while the condition w i j p eq i = w j i p eq j is called detailed-balance stating that the amount of transported probability per unit time in the transition i → j equals the amount of the reverse transition j → i.If detailed-balance is broken a net probability current flows from i → j and eq.( 8) can be identified as Kirchhoff's current law stating that the same amount of probability flowing into state i is also flowing out.

Entropy production Following refs. 27,34, the total average entropy production rate Ṡtot reads
The first term is identified as the time derivative of the Gibbs-entropy S sys ≡ − i p i ln p i and thus describes the entropy change of the system itself.The second term represents the coupling of the system with its environment (medium) in such a way that the system cannot reach equilibrium.The coupling can be, for example, caused by a heat or particle flux flowing from the medium into the system.Only the sum of both entropy production rates is larger than zero in accordance with the second law.If the system is in a non-equilibrium steady state (NESS), the first term vanishes and consequently Ṡtot = Ṡmed balance each other.
A second important quantity related to entropy production are affinities, or generalized thermodynamic forces [27].The edge affinity between two states is defined as which can, again, be split into two parts.The first part σ i j is identified as the entropy produced in the medium [34] for every transition i → j.Interestingly, neither the affinities nor the dissipated heat depend on the time scales governing the dynamics of the system as in both expressions only the ratio of rates are taken into account.
Finally, the central concept in our approach to coarse-graining is cycles.A cycle is an ordered set of states (vertices), at the end of which the starting state is reached again.Cycles that differ only in their cyclic permutation of vertices are considered identical.For example, {1, 2, 3, 1} = {2, 3, 1, 2} = {3, 1, 2, 3} all denote the same cycle but {2, 1, 3, 2} is a different cycle.We further distinguish between trivial cycles (cycles with only two different states, i.e., {i, j, i}), and non-trivial cycles that contain at least three different states.
Two types of observables § can be distinguished for each cycle: (i) Current-like observables that are summed along the edges corresponding to each cycle.Consider an observable defined on the edges of a graph, that is given by a matrix X ∈ R |V |×|V | .The cycle observable is computed as The notation ij∈α denotes the summation over all directed edges i → j that are part of cycle α.(ii) State-like observables that are summed over the states forming a cycle.Here the observables are defined on the graph vertices given by a vector Y ∈ R |V | , and thus § Throughout, greek indices denote cycles.
An example for the latter is given by the average cycle period τ α = i∈α τ i (the average time that is spend in cycle α) with τ i = j w i j −1 being the average time spend in state i.

Cycle affinities
The most prominent example for a current-like observable is the cycle affinity It has two important properties: (i) The cycle affinity of trivial cycles is always zero since affinities are anti-symmetric, A i j = −A j i , and thus A {i,j,i} = A i j + A j i = A i j − A i j = 0. (ii) All cycle affinities are independent of the state probabilities and thus can be expressed by σ i j 's only.To prove (ii), Since the p i 's are state functions, going along a cycle they cancel pairwise.A remarkable property of a NESS is thus that the average entropy production can be calculated either by using the A i j 's or σ i j 's, which correspond to S tot and S med , respectively.

Cycle-flux decomposition
For convenience, we define the indicator and passage function depending on whether a vertex i or directed edge i → j is part of a cycle α, respectively.Following the ideas of refs.24, 35, 36, the cycle-flux decomposition expresses the flux field Φ through a linear combination of cycles, The cycle weight ϕ α quantifies how much probability flows through cycle α per unit time.As shown in refs.35, 36, ϕ α ≥ 0 is non-negative and the decomposition eq. ( 17) always exists if Φ satisfies the steady state condition (Kirchhoff's current law).However, this decomposition is not unique as discussed in the next section 4.2.
Using the cycle observables eq. ( 11) and inserting the cycle decomposition eq. ( 17), the average of a current-like observable can be expressed as a cycle average with weights ϕ α [24].In particular, the cycle average of affinities equals the total average entropy production with the definition of edge affinities eq. ( 10).Furthermore, we define the conditional average entropy production of a single cycle s α ≡ ϕ α A α so that the total average entropy production becomes the sum Ṡtot = α s α .

Algorithm
Several types of algorithms have been proposed in the literature to accomplish the decomposition eq. ( 17) [27,35,37].For example the "method of derived chain" introduced in ref. 37, which is stochastic in nature and has the advantage that the cycle weights ϕ α in eq. ( 17) are unique and that they correspond to the mean number of passages through cycle α.However, negative cycle entropies (s α < 0) may occur, which greatly complicates the identification of communities (as will become clear shortly).Moreover, the number of cycles used in the decomposition can be orders of magnitude larger than for the other approaches discussed below.Another important type of cycle decompositions, first mentioned by Schnakenberg [27], considers fundamental cycles that span a basis of the cycle space.Although the number of contributing cycles is as small as for the described algorithm below, the fundamental cycles are not unique and can have negative cycle entropies as well.
We employ a variant of the "cycle-flux" decomposition introduced by MacQueen [35].This is a deterministic algorithm that has a polynomial complexity in the number of vertices |V |, making it computationally affordable even for large graphs.Again, the decomposition (and thus the cycle weights ϕ α ) is not unique but rather depends on the initial cycle sequencing, where already a minor variation in the sequencing can lead to different cycle weights.In particular, some cycle weights become zero while others become non-zero.An example illustrating this arbitrariness is given in ref. 24, where the cycle-flux decomposition is applied to the totally asymmetric simple exclusion process (TASEP) model.
Let C = {α} denote the set of all possible cycles.Only the subset C ⊂ C of cycles with non-vanishing cycle weights contribute to averages, an upper bound |C | ≤ B of which is given by the Betti number B ≡ |E| − |V | + 1 [36].The problem is that one cannot know beforehand what the contributing cycles are, meaning that theoretically all possible cycles are needed to ensure a complete decomposition, although most of them will have zero weights especially for graphs with many vertices.Already the number of possible cycles of an undirected graph is bounded by B ≤ |C| ≤ 2 B [38].
We now describe the algorithm in more detail.To this end, we sort cycles,   22) of cycle entropy production of our model system.The first red line marks the number of cycles to recover 95% and the second to recover 99% of the total entropy production.
where first come the trivial cycles followed by the non-trivial cycles containing three or more states.The number of trivial cycles is given by half the number of edges, |E|/2.Initialize Φ ← Φ and take the first cycle α from C : (i) Take all fluxes Φ along cycle α and find the smallest value, which becomes the cycle weight ϕ α , (ii) Subtract ϕ α from all fluxes along α, Φi j → Φi j − ϕ α χ i j,α .The new flux field has at least one edge less.
(iii) Advance to the next cycle α in C and repeat.
(iv) The algorithm stops when the residuum || Φ|| max has become smaller than some threshold.
After the first |E|/2 iterations, all trivial cycles are subtracted and hence the flux field Φ at this stage contains only irreversible edges, i.e., if Φi j > 0 then Φj i = 0. Hence, the fluxes have become the (positive) currents of the original graph, Φ i j − Φ j i → Φi j , where Φi j > 0 since the algorithm subtracts the smaller value.The original affinities A i j along the remaining edges are thus positive, cf.eq. ( 10), and hence all cycle affinities for the remaining non-trivial cycles are positive.An illustrative example for a concrete graph is presented in appendix 7.1.Note that this implies s α > 0 for all non-trivial cycles.
In appendix 7.2 an algorithm is outlined how to obtain all contributing cycles C .Removing the trivial cycles, we can thus further reduce the set of cycles C ⊂ C to be considered in the following with |C| ≤ B − |E|/2.

Absence of dominant cycles
Markov state models that are constructed from molecular dynamics simulations contain thousands, or even millions, of cycles.This is in stark contrast to the semi-empirical models discussed so far containing a handful of -already coarse-grained -states [21,25].At this stage it is important to realize that there are no dominant cycles in terms of the entropy production.This is illustrated in figure 2  of the first n cycles for sorted cycle entropies s 0 ≥ s 1 ≥ . . .Already for our simple model system a large number of cycles contribute and no dominant cycles appear.To recover 95% of the total entropy production, almost 2000 cycles have to be included, although many of the non-trivial cycles do not contribute substantially to the overall entropy production.

Linking cycles
Our approach in the following is based on the idea that many cycles are similar in their length, traversed states (region in state phase), and cycle affinities.We will propose how to quantify this "similarity" of cycles and how to group these cycles together, forming cycle communities.
The cycles in C can again be interpreted as vertices of a graph G = (C, Ω).It is important to realize that we have a considerable freedom in defining edge weights (analogous to the original transition rates w i j ).Here we opt to link cycles α and β using which is symmetric as emphasized by the subscripts.Two cycles are linked only if they share at least one vertex on the original graph, while the connectivity strength 0 ≤ Ω αβ ≤ 1 depends on the cumulated probability of the shared states.Other proposals for linking cycles can be found in refs.24, 33.In practice, we found that the resulting graph is connected too strongly and that better results are achieved if Ω is modified using additional information.We thus cut a link if (i) two cycles have dissimilar cycle affinities and (ii) if their spatial extensions differ too much.For the later, the spatial extension of a cycle is defined as the maximal Euclidian distance of centroid positions where x i is the state-space vector representing centroid i.Hence, with ξ a , ξ d ∈ (0, 1).Specifically, for our model system we have set ξ a = ξ d = 1 2 .The resulting cycle graph is shown in figure 3.

Cycle communities
Having defined Ω to be symmetric and thus undirected, we can make use of well-developed community finding algorithms for undirected graphs.Formally, the communities are disjoint sets C l of the non-trivial cycles, A community is a group of vertices that share similar properties and are consequently stronger connected (higher link density) to vertices inside their community than with vertices from other communities.The number of communities is inherent to its graph and not preassigned.A review of the most important algorithms can be found in ref. 39.In general, a measure of how good community finding algorithms perform is the modularity function Q, which measures the link density inside communities as compared to other communities by assigning it a value between -1 and 1 [40].It is defined as with k β ≡ α Ω αβ and m ≡ 1 2 β k β .Specifically, we use a community finding algorithm [41] that has been shown to be fast and achieves good results also for a large number of cycles (vertices in the cycle space).Here we are interested in finding communities among all non-trivial cycles that were found by applying the cycle-flux decomposition algorithm described in the previous section 4.2.The communities found for our model system are displayed in figure 3 in cycle space and in figure 4 for the original state space.Three communities have been detected: The red and green colored communities are in agreement with the loci of highest probability, cf.figure 1, while the blue community contains cycles connecting these two loci.This result agrees well with the intuitive picture of entropy production due to the interplay between conservative and non-conservative forces for the particle trapped in a basin, with rare transitions between both basins.

Cycle representatives
We have now identified communities of cycles with similiar properties determined through the link strength Ω αβ .However, it is not possible to compute something like an average cycle because the cycle space does not have any physical metric.Furthermore, we are restricted to the original states and transition rates (or rather σ i j 's) as they have a physical meaning, i.e., position in state space and medium entropy production per transition, respectively.To overcome this problem, we determine one cycle out of each cycle community, which we will refer to as the (community) representative.This cycle is then rescaled in order to preserve the entropy production of the whole community, and the other cycles are discarded.
To this end, we rewrite the modularity function eq. ( 27) as the sum over all community modularities where l denotes the l-th community.Our task is to find the cycle γ for each cycle community l that maximizes Q l − Q l\γ , with Q l\γ being the modularity of the l-th community without cycle γ.In other words, we want to identify the cycle γ for each community that has the biggest impact on the community modularity.In particular, Q l\γ increases if cycle γ matches poorly and decreases when it provides many links to other cycles inside its community.After some algebra, the difference is given by For our model system, the three representatives found from maximizing the modularity difference are illustrated in figure 5.

Rescaling
After determining the representatives, the original fluxes Φ i j and transition rates w i j have to be rescaled.The physical constraints are: (i) The total entropy production Ṡtot is preserved.
(ii) The σ i j 's of all non-vanishing edges are preserved.(iii) The cycle affinities A α of all contributing cycles are preserved.
It is easy to check that (iii) is valid if (ii) is fulfilled, cf.eq. ( 14).For every cycle community, we define a community entropy production Because all non-trivial, and thus possibly entropy producing, cycles are partitioned into communities, the total entropy production of the original graph is the sum of all community entropy productions.In what follows, all rescaled quantities are labeled by hats, e.g., ϕ α → φα .Moreover, the subscript l to cycle quantities denotes the cycle representative of the l-th community, e.g., s l is the cycle entropy production.We now identify the new entropy production of each representative cycle ŝl with the entropy production of its community S l .By making use of (iii), we compute the new cycle weight φl as The crucial coarse-graining step consists of removing all other non-trivial cycles by setting their weights φα to zero.Because the entropy production of trivial cycles is always zero, Ṡtot is preserved and (i) fulfilled.All states that are not part of a community representative are thus removed with the remaining states V constituting the coarse-grained Markov state model, see figure 5.
All cycles in the coarse-grained model are either a representative or trivial.We now show that the weights of the remaining trivial cycles are always positive and the coarse-grained model thus still constitutes a valid cycle-flux decomposition for the rescaled fluxes Φ, cf.eq. ( 17).This can be achieved by demanding that all non-vanishing edge affinities A i j are preserved.By virtue of eq. ( 17), the new cycle weights φα can then be computed from which can be rearranged to We now pick one edge i → j for which the edge affinity A i j > 0 is positive.We then split the sum over all cycles into a sum over trivial and non-trivial cycles, = φβ (exp For every edge there is exactly one trivial cycle, here denoted β, for which χ i j,β = χ j i,β = 1.As explained in section 4.2, for positive affinity all non-trivial cycles are oriented to follow the net flow, which implies χ j i,α = 0.The remaining sum over all non-trivial cycles participating in edge i → j reduces to the weight of the representative cycles since we have set the weight of all other non-trivial cycles to zero.We have thus determined the remaining weights ϕ β > 0 of the trivial cycles, which clearly are positive.
The final step is to obtain new probabilities pi for the remaining states.We simply scale out the probability of the removed states implying p i /p j = pi /p j with normalization i∈V pi = 1.Moreover, this is sufficient to preserve the ratio of transition rates fulfilling condition (ii).Together with the rescaled fluxes we thus obtain transition rates ŵi j = Φi j /p i , which completes the formulation of the coarse-grained Markov state model.

Numerical test
To test the accuracy of the coarse-grained model, we compute the forward and backward rate of traversing between both basins for the continuous BD trajectory and our reduced Markov state model.The mesoscopic rates for the reduced Markov state model have been calculated using a kinetic Monte-Carlo scheme.Figure 6  the normalized histograms and the exponential fit (dotted blue line) obtained for the coarse-grained model as well as the exponential fit obtained from the original BD simulations (red line).Due to the symmetry of the forces, these rates should be equal, which is recovered by the coarse-grained model.Moreover, the numerical values of both rates are in excellent agreement with the BD results (see table 1), illustrating that our coarse-graining method indeed preserves the mesoscopic time scales of the BD simulation.

Critical remarks and conclusion
There are at least two steps within the approach described here that will require further clarification and investigation.The first issue is how to find optimal cycle communities.Detecting communities on a graph is, in principle, already a challenge on its own as the vast number of heuristic approaches have shown in the past.Also, it is still a matter of discussion whether the detected communities ought to fully or only partially partition the graph.The latter is also known as fuzzy partitioning, the advantage of which is that not all cycles are assigned to a specific community as some might be matching only poorly with any of the communities.To further improve the community detection it is possible to define the link strengths Ω αβ given in eq. ( 25) somewhat differently by assigning directions to the cycle edges.In any case, once obtained the cycle communities should be checked for consistency.Of course, it is not possible to visualize the cycles by plotting them in configuration space for more than two dimensions.One way would be to project the cycles onto suitable reaction coordinates.
The second issue is how to find suitable community representatives.Here we have formulated an optimization problem with an objective function, eq. ( 29), but it might be worthwhile to also explore other strategies.It seems to be of particular importance that the cycle affinities of the representatives are, in some way, representing the averaged community affinity and their internal time scales since they will govern the mesoscopic time scales of the coarse-grained model.Especially the latter is part of our ongoing investigation.
To conclude, we have presented a simple and scalable algorithm to construct a Markov state model in non-equilibrium that preserves entropy production and cycle affinities and, therefore, the characteristics of macroscopic non-equilibrium transport.

Contributing cycles
In this appendix, we explain how to effectively obtain the cycles that are used for the cycle-flux decomposition without first finding all possible cycles.We make use of a widely-used method in graph theory [27].For every connected graph G(V, E) a spanning tree T (G) can be constructed, which is the set V of all vertices connected by |V | − 1 edges.We further demand that the edge directions in T (G) are the same as in G.The ν edges of G that are not part of T (G) are called chords, with ν = |E|−|V |+1.Again, it is important to preserve the original direction of each chord.
After subtracting all trivial cycles, the new flux field Φ is obtained, which contains only irreversible edges.For this graph the spanning tree T ( Φ) is constructed.Adding one chord completes one non-trivial cycle.We determine all non-trivial cycles corresponding to the chords and again apply the cycle-flux decomposition as outlined in section 4.2, leading to a new flux field Φ.If fluxes remain, we repeat this procedure by which iteratively new cycles are added.
The complete algorithm reads as follows: (i) Apply cycle-flux decomposition for all trivial cycles.
(iii) Identify directed chords and find corresponding cycles.
(iv) Apply cycle-flux decomposition to the newly identified cycles.

Figure 1 .
Figure 1.Contour lines of the symmetric two-dimensional potential eq.(3) and exemplary trajectory in the presence of the non-conservative force (the arrows indicate the direction of the driving).The system is still invariant under inversion with the loci of highest probability being shifted from the minima of the potential.

3. 3 . Stochastic thermodynamics 3 . 3 . 1 .
Steady state Markov models The discrete states and their corresponding transition rates can be represented by a graph G(V, E), where vertices V represent the states and edges E the transitions connecting the states.The number of vertices in G is denoted by |V |, while the number of edges is |E|.

Figure 2 .
Figure 2. Normalized cumulative sum eq.(22) of cycle entropy production of our model system.The first red line marks the number of cycles to recover 95% and the second to recover 99% of the total entropy production.
for our specific model system.We have chosen |V | = 200 (this is an input to the k-means clustering algorithm) and found |E| = 10612 edges implying B = 10413 with non-trivial cycles |C| ≤ 5107.Shown is the normalized cumulative sum sn ≡ α≤n s α α s α (22)

Figure 3 .
Figure 3. Cycle graph constructed from our model system.Each vertex represents a non-trivial cycle.Colors indicate cycles belonging to the same cycle community.The vertices are arranged utilizing a force-directed graph drawing algorithm.

Figure 4 .
Figure 4. Cycle communities in state space, where edges are colored according to the community of their cycles, cf.figure3.Three communities with red, green, and blue have been identified by the community finding algorithm.

Figure 5 .
Figure 5. Illustration of the cycle representatives for each community of our model system.The centroids belonging to a representative are colored according to the communities in figure 4, whereas the cyan states highlight the crossings of representatives.The gray points indicate the centroids not belonging to a representative, which are thus absent in the coarse-grained model.

Figure 6 .
Figure 6.Rate computation for the transition between both minima.The normalized histograms show the distributions of the times needed for the specific transition, computed for the reduced Markov state model.The blue dotted line shows the exponential fit to obtain the mesoscopic transition rate, while the red line is the fit using the histogram computed by the BD trajectory.

Figure 7 .
Figure 7. Example for the cycle-flux decomposition.The numbers next to the arrows are the numerical values for the fluxes Φ along edges: (a) Initial graph with Φ = Φ, (b) after removing the trivial cycles, and (c) last remaining cycle.

Table 1 .
Mesoscopic transition rates for the model system for the original Brownian dynamics (BD) and the coarse-grained (CG) model.

Table 2 .
Cycles and cycle weights for the graph shown in figure7.