Paper The following article is Open access

Minimal entropy production rate of interacting systems

Published 13 November 2020 © 2020 The Author(s). Published by IOP Publishing Ltd on behalf of the Institute of Physics and Deutsche Physikalische Gesellschaft
, , Citation David H Wolpert 2020 New J. Phys. 22 113013 DOI 10.1088/1367-2630/abc5c6

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1367-2630/22/11/113013

Abstract

Many systems are composed of multiple, interacting subsystems, where the dynamics of each subsystem only depends on the states of a subset of the other subsystems, rather than on all of them. I analyze how such constraints on the dependencies of each subsystem's dynamics affects the thermodynamics of the overall, composite system. Specifically, I derive a strictly nonzero lower bound on the minimal achievable entropy production rate of the overall system in terms of these constraints. The bound is based on constructing counterfactual rate matrices, in which some subsystems are held fixed while the others are allowed to evolve. This bound is related to the 'learning rate' of stationary bipartite systems, and more generally to the 'information flow' in bipartite systems. It can be viewed as a strengthened form of the second law, applicable whenever there are constraints on which subsystem within an overall system can directly affect which other subsystem.

Export citation and abstract BibTeX RIS

1. Introduction

Many systems are naturally modeled as composite systems, with two or more interacting subsystems. For example, a biological cell is naturally modeled as a composite system, composed of many separate organelles and biomolecule species. As another example, digital devices are naturally modeled as a set of separate, interacting logical gates. Recent research in stochastic thermodynamics [7, 26, 32, 38] has started to investigate such composite systems [1, 9, 11, 13, 2022]. Most of this research has considered the special case of bipartite processes [1, 2, 9, 11, 13, 2022, 27, 28], i.e., systems composed of two co-evolving subsystems, whose states fluctuate according to independent noise processes (e.g., since they are physically separated and so are connected to different parts of any shared thermodynamic reservoirs). However, given that many systems have more than just two interacting subsystems, research is starting to extend to fully multipartite processes [10, 13, 40].

The definition of any composite system specifies which subsystems directly affect the dynamics of which other subsystems. It is now known that just by itself, such a specification of which subsystem affects which other one can cause a strictly positive lower bound on the entropy production rate (EP) of the overall composite system [4, 36, 38]. In contrast, if all subsystems were allowed to interact with all other subsystems, the minimal EP would be zero. Accordingly this minimal EP due to constraints on which subsystem can affect the dynamics of which other subsystems has sometimes been called 'Landauer loss' [3638]. Landauer loss can be viewed as a strengthened form of the second law, applicable whenever there are constraints on which subsystem within an overall system can directly affect the dynamics which other subsystem.

Previous analyses of this strengthened second law focused on scenarios where at every moment, each subsystem evolves in isolation, in a 'modular' fashion, without any direct coupling to the other subsystems. This is a severe limitation of those analyses. As an illustration of a simple scenario not covered by such analyses, consider a composite system with three subsystems A, B and C. B evolves independently of A and C. However, B is continually observed by C as well as A. Moreover, suppose that A is really two subsystems, 1 and 2. Only subsystem 2 directly observes B, whereas subsystem 1 observes subsystem 2, e.g., to record a running average of the values of subsystem 2 (see figure 1).

Figure 1.

Figure 1. Four subsystems, {1, 2, 3, 4} interacting in a multipartite process. The red arrows indicate dependencies in the associated four rate matrices. B evolves autonomously, but is continually observed by A and C. (The implicit assumption that B is not affected by the back-action of the measurement holds for many real systems such as colloidal particles and macromolecules [24].) So the statistical coupling between A and C could grow with time, even though their rate matrices do not involve one another. The three overlapping sets indicated at the bottom of the figure specify the three units of a unit structure for this process.

Standard image High-resolution image

Physically, such a scenario arises whenever any of the many stochastic thermodynamics models of one classical system observing another classical system without any back-action [12, 20, 23, 27, 33, 34] are 'chained together'. As an example, [3, 9] considers a tripartite system where receptors in the wall of a cell observe the concentration level of a ligand in a surrounding medium, with no back-action on that concentration level, while a memory observes the state of those receptors, again with no back-action. This is exactly the scenario considered in figure 1, just without subsystem 4; subsystem 3 is the concentration level in the medium, subsystem 2 is the set of receptors in a cell observing that concentration level, and subsystem 1 is the memory within the cell observing the state of the ligand receptors. To extend this scenario to the precise scenario presented in figure 1 we just need to introduce a second cell, which is observing the same medium as the first cell; subsystem 4 is the state of the receptors of that second cell.

To investigate the second law in such composite systems, here I model them as multipartite processes, in which each subsystem evolves according to its own rate matrix [10]. So restrictions on the direct coupling of any subsystem i to the other subsystems are modeled as restrictions on the rate matrix of subsystem i, to only involve a limited set of other subsystems, called the 'unit' of i.

In this paper I derive a lower bound on the EP of composite systems, by deriving an exact equation for that minimal EP rate as a sum of non-negative expressions. One of those expressions is related to quantities that were earlier considered in the literature. It reduces to what has been called the 'learning rate' in the special case of stationary bipartite systems [1, 5, 9]. That expression is also related to what (in a different context) has been called the 'information flow' between a pair of subsystems [10, 11].

2. Rate matrix units

I write $\mathcal{N}$ for a particular set of N subsystems, with finite state spaces {Xi : i = 1, ... N}. x and x' both indicate a vector in X, the joint space of $\mathcal{N}$. For any $A\subset \mathcal{N}$, I write $-A{:=}\mathcal{N}{\backslash}A$. So for example xA is the vector of all components of x other than those in A. A distribution over a set of values x at time t is written as pX(t), with its value for xX written as ${p}_{x}^{X}\left(t\right)$, or just px(t) for short. Similarly, ${p}_{x,y}^{X\vert Y}\left(t\right)$ is the conditional distribution of X given Y at time t, evaluated for the event X = x, Y = y (which I sometimes shorten to px|y(t)). I write Shannon entropy as S(pX(t)), St(X), or SX(t), as convenient. I also write the conditional entropy of X given Y at t as SX|Y(t). I write the Kronecker delta as both δ(a, b) or ${\delta }_{b}^{a}$.

Since the joint system evolves as a multi-partite process, there is a set of time-varying stochastic rate matrices, $\left\{{K}_{x}^{{x}^{\prime }}\left(i;t\right):i=1,\dots ,N\right\}$, where x' refers to the current state of the system, while x refers to the next state; for all i, ${K}_{x}^{{x}^{\prime }}\left(i;t\right)=0$ if x'ixi; and the joint dynamics over X is governed by the master equation

Equation (1)

Equation (2)

Note that each subsystem can be driven by its own external work reservoir, according to a time-varying protocol. For any $A\subseteq \mathcal{N}$ I define

Equation (3)

Each subsystem i's marginal distribution evolves as

Equation (4)

Equation (5)

due to the multipartite nature of the process [15]. Equation (5) shows that in general the marginal distribution ${p}_{{x}_{i}}$ will not evolve according to a continuous-time Markov chain (CTMC) over ${{\Delta}}_{{X}_{i}}$.

For each subsystem i, I write r(i; t) for any set of subsystems at time t that includes i where we can write

Equation (6)

for an appropriate set of functions ${K}_{{x}_{r\left(i;t\right)}}^{{x}_{r\left(i;t\right)}^{\prime }}\left(i;t\right)$. In general, r(i; t) is not uniquely defined, since I make no requirement that it be minimal. (A minimal such r(i; t) is called a 'neighborhood' in [10].) I refer to the elements of r(i; t) as the leaders of i at time t. Note that the leader relation need not be symmetric. A unit ω at time t is a set of subsystems such that iω implies that r(i; t) ⊆ ω.

Any intersection of two units is a unit, as is any union of two units. Intuitively, a unit is any set of subsystems whose evolution is independent of the states of the subsystems outside the unit (although in general, the evolution of those external subsystems may depend on the states of subsystems in the unit). A specific set of units that covers $\mathcal{N}$ and is closed under intersections is a unit structure. Unless explicitly stated otherwise, any unit structure being discussed does not have $\mathcal{N}$ itself as a member.

As an example of these definitions, [1, 8, 9] investigate a special type of bipartite system, where the 'internal' subsystem B observes the 'external' subsystem A, but cannot affect the dynamics of that external subsystem. So A is its own unit, evolving independently of B, while B is not its own unit; its dynamics depends on the state of A as well as its own state. Another example of these definitions is illustrated in figure 1.

For simplicity, from now on I assume that the set of units does not change with t. Accordingly I shorten r(i; t) to r(i). For any unit ω I write

Equation (7)

So ${K}_{x}^{{x}^{\prime }}\left(\omega ;t\right)={K}_{{x}_{\omega }}^{{x}_{\omega }^{\prime }}\left(\omega ;t\right)\delta \left({x}_{-\omega }^{\prime },{x}_{-\omega }\right)$, by equations (3) and (6).

At any time t, for any unit ω, ${p}_{{x}_{\omega }}\left(t\right)$ evolves as a CTMC with rate matrix ${K}_{{x}_{\omega }}^{{x}_{\omega }^{\prime }}\left(\omega ;t\right)$:

Equation (8)

(See appendix C.) So a unit evolves according to a self-contained CTMC, in contrast to the general case of a single subsystem (cf. equation (5)).

I assume that each subsystem is attached to at most one thermal reservoir, and that all such reservoirs have the same temperature (or equivalently, that all subsystems are attached to separate, statistically independent parts of the same thermal reservoir [10]). Accordingly, the expected entropy flow (EF) rate of any unit $\omega \subseteq \mathcal{N}$ at time t is the sum of the EFs of the subsystems in ω:

Equation (9)

which I often shorten to $\langle {\dot {Q}}^{\omega }\left(t\right)\rangle $ [7, 32]. (Note that this is EF from ω into the environment.) Make the associated definition that the expected EP rate of ω at time t is

Equation (10)

Equation (11)

which I often shorten to $\langle {\dot {\sigma }}^{\omega }\left(t\right)\rangle $.

I refer to $\langle {\dot {\sigma }}^{\omega }\left(t\right)\rangle $ as a local EP rate, and define the global EP rate as $\langle \dot {\sigma }\left(t\right)\rangle {:=}\langle {\dot {\sigma }}^{\mathcal{N}}\left(t\right)\rangle $. For any unit ω, $\langle {\dot {\sigma }}^{\omega }\left(t\right)\rangle {\geqslant}0$, since $\langle {\dot {\sigma }}^{\omega }\left(t\right)\rangle $ has the usual form of an EP rate of a single system. In addition, that lower bound of 0 is achievable, e.g., if ${K}_{{x}_{\omega }}^{{x}_{\omega }^{\prime }}\left(\omega ;t\right){p}_{{x}_{\omega }^{\prime }}\left(t\right)={K}_{{x}_{\omega }^{\prime }}^{{x}_{\omega }}\left(\omega ;t\right){p}_{{x}_{\omega }}\left(t\right)$ at time t for all xω, x'ω. Note also that while the EF of a unit is the sum of the EFs of the subsystems within the unit, in general that is not true for the EP.

It is worth comparing the local EP rate to similar quantities that have been investigated in the literature. In contrast to $\langle {\dot {\sigma }}^{\omega }\left(t\right)\rangle $, the quantity 'σX' introduced in the analysis of (autonomous) bipartite systems in [28] is the EP of a single trajectory, integrated over time. More importantly, its expectation can be negative, unlike (the time-integration of) $\langle {\dot {\sigma }}^{\omega }\left(t\right)\rangle $. On the other hand, the quantity '${\dot {S}}_{i}^{X}$' considered in the analysis of bipartite systems in [11] is a proper expected EP rate, and so is non-negative. However, it (and its extension considered in [10]) is one term in a decomposition of the expected EP rate generated by a single unit. It does not concern the EP rate of an entire unit in a system with multiple units. In addition, the quantity 'σΩ' considered in [28] is also non-negative. However, it gives the total EP rate generated by a subset of all possible global state transitions, rather than the EP rate of a unit [16]. Finally, papers on the 'thermodynamics of information' considers scenarios in which two subsystems interact but the second one does not change its state [20, 23, 25]. Most of those papers use a variant of the usual definition of 'entropy production' whose expectation can be strictly negative, in contrast to $\langle {\dot {\sigma }}^{\omega }\left(t\right)\rangle $ [17].

3. EP bounds from counterfactual rate matrices

To analyze the minimal EP rate in multipartite processes, I need to introduce two more definitions. First, given any function $f:{{\Delta}}_{X}\to \mathbb{R}$ and any $A\subset \mathcal{N}$ (not necessarily a unit), define the A-(windowed) derivative of f(p(t)) under rate matrix K(t) as

Equation (12)

(See equation (3).) Intuitively, this is what the derivative of f(p(t)) would be if (counterfactually) only the subsystems in A were allowed to change their states.

In particular, the A-derivative of the conditional entropy of X given XA is

Equation (13)

which I sometimes write as just ${\mathrm{d}}^{A}{S}^{X\vert {X}_{A}}p\left(t\right)/\mathrm{d}t$. ${\mathrm{d}}^{A}{S}^{X\vert {X}_{A}}p\left(t\right)/\mathrm{d}t$ measures how quickly the statistical coupling between XA and XA changes with time, if rather than evolving under the actual rate matrix, the system evolved under a counterfactual rate matrix, in which xA is not allowed to change. In the special case of two subsystems, both leading each other, with A one of those subsystems, $-{\mathrm{d}}^{A}{S}^{X\vert {X}_{A}}p\left(t\right)/\mathrm{d}t$ is the same as the 'information flow' analyzed in [11]. (See equation (4) in [10] for a generalization of information flow to multiple subsystems that is similar to the general expression $-{\mathrm{d}}^{A}{S}^{X\vert {X}_{A}}p\left(t\right)/\mathrm{d}t$.)

${S}^{{X}_{-A}}\left(t\right)$ does not change under the counterfactual rate matrix K(A; t). Accordingly, $\frac{{\mathrm{d}}^{A}}{\mathrm{d}t}{S}^{X\vert {X}_{A}}\left(t\right)$ is the derivative of the negative mutual information between XA and XA, under that counterfactual rate matrix. In appendix C it is shown that in the special case that A is a unit, this expression is non-negative [18].

The second definition we need is a variant of $\langle {\dot {\sigma }}^{\omega ;K}\left(t\right)\rangle $, which will be indicated by using subscripts rather than superscripts. For any $A\subseteq B\subseteq \mathcal{N}$ where B is a unit (but A need not be),

Equation (14)

which I abbreviate as $\langle {\dot {\sigma }}_{K\left(A;t\right)}\left(t\right)\rangle $ when $B=\mathcal{N}$. $\langle {\dot {\sigma }}_{K\left(A;t\right)}\left(t\right)\rangle $ is a global EP rate, only evaluated under the counterfactual rate matrix K(A; t). Therefore it is non-negative. In contrast, $\langle {\dot {\sigma }}^{\omega ;K}\left(t\right)\rangle $ is a local EP rate. In the special case that A = ω is a unit, these two EP rates are related by $\langle {\dot {\sigma }}_{K\left(A;t\right)}\left(t\right)\rangle =\langle {\dot {\sigma }}^{A;K}\left(t\right)\rangle +\frac{{\mathrm{d}}^{A}}{\mathrm{d}t}{S}^{X\vert {X}_{A}}\left(t\right)$ (see equation (B3) in appendix B).

In appendix D it is shown that for any pair of units, ω and ω' ⊂ ω,

Equation (15)

(See figure 1 for an illustration of such a pair of units ω, ω' ⊂ ω.) The first term on the rhs is the EP rate arising from the subsystems within unit ω', and the second term is the 'left over' EP rate from the subsystems that are in ω but not in ω'. The third term is a time-derivative of the conditional entropy between those two sets of subsystems. All three of these terms are non-negative, so each of them provides a lower bound on the EP rate.

Equation (15) is the major result of this paper. It holds no matter what the scale of the full system, so long as that system can be modeled as a multipartite process. In particular, setting $\omega =\mathcal{N}$ and then consolidating notation by rewriting ω' as ω, equation (15) shows that for any unit ω,

Equation (16)

Equation (17)

(where the shorthand notation has been used). This lower bound on the expected EP rate can be evaluated without knowing any of the detailed physics occurring within unit ω, only knowing how the statistical coupling between ω and the rest of the subsystems evolves with time.

As an example of equation (17), consider again the type of bipartite process analyzed in [1, 8, 9]. Suppose we set ω to contain only what in [8] is called the 'external' subsystem. Then if we also make the assumption of those papers that the full system is in a stationary state, $\mathrm{d}{S}^{X}/\mathrm{d}t=\mathrm{d}{S}^{{X}_{\omega }}/\mathrm{d}t=\mathrm{d}{S}^{{X}_{-\omega }}/\mathrm{d}t=0$. Moreover,

Equation (18)

Therefore by equation (13),

Equation (19)

(The rhs of equation (19) is called the 'learning rate' of the internal subsystem about the external subsystem—see equation (8) in [5], noting that the rate matrix is normalized.)

So in this scenario, equation (17) above reduces to equation 7 of [1], which lower-bounds the global EP rate by the learning rate. However, equation (17) lower-bounds the global EP rate even if the system is not in a stationary state, which need be the case with the learning rate [19]. More generally, equation (16) applies to arbitrary multipartite processes, not just those with two subsystems, and is an exact equality rather than just a bound.

4. Extensions

In some situations we can get an even more refined decomposition of EP rate by substituting equation (15) into equation (16) to expand the first EP rate on the rhs of equation (16). This gives a larger lower bound on $\langle \dot {\sigma }\left(t\right)\rangle $ than the one in equation (17). For example, if ω and ω' ⊂ ω are both units under K(t), then

Equation (20)

Equation (21)

Both of the terms on the rhs in equation (21) are non-negative. In addition, both can be evaluated without knowing the detailed physics occurring within units ω or ω', only knowing how the statistical coupling between units evolves with time.

This can be illustrated with the scenario depicted in figure 1. Using the units ω and ω' specified there, equation (21) says that the global EP rate is lower-bounded by the sum of two terms. The first is the derivative of the negative mutual information between subsystem 4 and the first three subsystems, if subsystem 4 were held fixed. The second is the derivative of the negative mutual information between subsystem 3 and the first two subsystems, if those two subsystems were held fixed.

Alternatively, suppose that ω is a unit under K, and that some set of subsystems α is a unit under $K\left(\mathcal{N}{\backslash}\omega ;t\right)$. Then since the term $\langle {\dot {\sigma }}_{K\left(\mathcal{N}{\backslash}\omega ;t\right)}\left(t\right)\rangle $ in equation (16) is a global EP rate over $\mathcal{N}$ under rate matrix $K\left(\mathcal{N}{\backslash}\omega ;t\right)$, we can again feed equation (15) into equation (16), [this time to expand the second rather than first term on the rhs of equation (16)] to get

Equation (22)

Equation (23)

The rhs of equation (23) also exceeds the bound in equation (17), by the negative α-derivative of the mutual information between ${X}_{\mathcal{N}{\backslash}\alpha }$ and Xα, under the rate matrix $K\left(\mathcal{N}{\backslash}\omega ;t\right)$.

Note that depending on the full unit structure, we may be able to combine equations (15) and (22) into an even larger lower bound on the global EP rate than equation (23). An example of this is illustrated below, in section 5. (Indeed, the more subsystems the overall system contains, the more times one might be able to iterate this process, getting progressively larger and larger lower bounds.)

As a final comment, $\langle \dot {\sigma }\rangle $ is just the conventional expected EP rate of stochastic thermodynamics [32]. So it obeys all of the usual fluctuation theorems, thermodynamic uncertainty relations, speed-limit theorems, etc. directly, without any modification. (As mentioned at the end of section 2, this is not true for some of the other quantities referred to as 'entropy production' in the literature.) So for example,

As a final comment, $\langle \dot {\sigma }\rangle $ is the standard expected EP considered in the literature, i.e., the rate of increase of entropy in the combination of the full system and all baths that are connected to it. As a result, the powerful results of stochastic thermodynamics concerning entropy production—fluctuation theorems, thermodynamic uncertainty relations, speed-limit theorems, etc.—all hold when the EP is set to $\langle \dot {\sigma }\rangle $. (Those results do not all hold for some of the variants of entropy production discussed at the end of section 2.) As an illustration, write the EP of unit ω during some time interval as ${\Delta}{\sigma }^{\omega }=\int \mathrm{d}t\langle {\dot {\sigma }}^{\omega }\rangle $, and define ${\Delta}{\sigma }_{K\left(\mathcal{N}{\backslash}\omega \right)}$ similarly. Then plugging equation (17) into the usual integral fluctuation theorem gives

Equation (24)

(recall that the windowed derivative in the integrand in the exponential is not a proper time-derivative). Using the non-negativity of EP during the process, equation (25) gives

Equation (25)

This inequality characterizes how the mutual information between ω and the rest of the system can vary during the process, independent of all details of the physical process besides the fact that ω is a unit.

5. Example

To illustrate some of the results above, return to the physical scenario depicted in figure 1, in which there are two distinct cells in a medium both observing the concentration level of a ligand in that medium. Recall that subsystem 3 is the concentration level, subsystem 2 is the set of receptors in the first cell observing that concentration level, and subsystem 1 is the memory within that first cell which observing the state of that cell's ligand receptors. Subsystem 4 is the set of receptors in the second cell that are observing the same concentration level. (A simple variant would involve only a single cell, where subsystem 4 is a different set of receptors observing the concentration level.)

Take ω = {1, 2, 3} and α = {3, 4}, as indicated in figure 1. So ω is the joint state of the concentration level, the receptors in the first cell, and the memory in the first cell. α instead is the joint state of the same concentration level, along with the state of the receptors in the second cell.

Note that the four sets {1}, {2}, {3}, {3, 4} form a unit structure of

Equation (26)

since in fact, under the rate matrix ${K}_{x}^{{x}^{\prime }}\left(\left\{4\right\};t\right)$, neither subsystem 1, 2 nor 3 changes its state. So α is a member of a unit structure of $K\left(\mathcal{N}{\backslash}\omega ;t\right)$, and we can apply equation (22).

The first term in equation (22), $\langle {\dot {\sigma }}^{\omega }\left(t\right)\rangle $, is the local EP rate that would be jointly generated by the set of three subsystems {1, 2, 3}, if they evolved in isolation from the other subsystem, under the self-contained rate matrix

Equation (27)

The third term in equation (22) is the local EP rate that would be jointly generated by the two subsystems {3, 4}, if they evolved in isolation from the other two subsystems, but rather than do so under the rate matrix K(α; t) = K({3, 4}; t), they did so under the rate matrix ${K}_{x}^{{x}^{\prime }}\left(\mathcal{N}{\backslash}\omega ;t\right)$ given in equation (26). (Note that ${K}_{x}^{{x}^{\prime }}\left(\mathcal{N}{\backslash}\omega ;t\right)=0$ if x'3x3, unlike ${K}_{x}^{{x}^{\prime }}\left(\left\{3,4\right\};t\right)$.) The fourth term in equation (22) is the global EP rate that would be generated by evolving all four subsystems under the rate matrix for the subsystems in $\left(\mathcal{N}{\backslash}\omega \right){\backslash}\alpha $. But there are no subsystems in that set. So this fourth term is zero.

Those first, third and fourth terms in equation (22) are all non-negative. The remaining two terms, the second and the fifth, are also non-negative. However, in contrast to the terms just discussed, these two depend only on derivatives of mutual informations. Specifically, the second term in equation (22) is the negative derivative of the mutual information between the joint random variable X1,2,3 and X4, under the rate matrix ${K}_{x}^{{x}^{\prime }}\left(\left\{1,2,3\right\};t\right)$. Next, since $\mathcal{N}{\backslash}\alpha =\left\{1,2\right\}$, the fifth term is the negative derivative of the mutual information between X1,2 and X3,4, under the rate matrix given by windowing α onto $K\left(\mathcal{N}{\backslash}\omega ;t\right)$, i.e., under the rate matrix ${K}_{x}^{{x}^{\prime }}\left(\left\{4\right\};t\right)$.

Recalling that ω := {1, 2, 3}, α := {3, 4} and defining γ := {4}, we can combine these results to express the global EP rate of the system illustrated in figure 1 in terms of the rate matrices of the four subsystems:

Equation (28)

All four terms on the rhs of equation (28) are non-negative. Translated to this scenario, previous results concerning learning rates consider the special case of a stationary state px(t), and only tell us that the global EP rate is bounded by the fourth term on the rhs of equation (28):

Equation (29)

Finally note that we also have a unit ω' = {3} which is a proper subset of both ω and α. So, for example, we can plug this ω' into equation (15) to expand the first term in equation (22), $\langle {\dot {\sigma }}^{\omega ;K\left(\omega ;t\right)}\left(t\right)\rangle $, replacing it with the sum of three terms. The first of these three new terms, $\langle {\dot {\sigma }}^{{\omega }^{\prime };K\left(\omega ;t\right)}\left(t\right)\rangle $, is the local EP rate generated by subsystem {3} evolving in isolation from all the other subsystems. The second of these new terms, $\langle {\dot {\sigma }}_{K\left(\omega {\backslash}{\omega }^{\prime };t\right);\omega }\left(t\right)\rangle $, is the EP rate that would be generated if the set of three subsystems {1, 2, 3} evolved in isolation from the remaining subsystem, 4, but under the rate matrix

Equation (30)

The third new term is the negative derivative of the mutual information between X1,2 and X3, under rate matrix K(ω'; t). All three of these new terms are non-negative.

6. Discussion

There are other decompositions of the global EP rate which are of interest, but do not always provide non-negative lower-bounds on the EP rate. One of them, discussed in appendix E, generalizes the results in [38] which relate 'subsystem Landauer loss' to multi-information. Future work involves combining these (and other) decompositions, to get even larger lower bounds.

Acknowledgments

I would like to thank Sosuke Ito, Artemy Kolchinsky, and Alec Boyd for stimulating discussion. This work was supported by the Santa Fe Institute, Grant No. CHE-1648973 from the US National Science Foundation and Grant No. FQXi-RFP-IPW-1912 from the FQXi foundation. The opinions expressed in this paper are those of the author and do not necessarily reflect the view of the National Science Foundation.

Appendix A.: Proof of equation (8)

Write

Equation (A1)

If jω, then a sum over all xω in particular runs over all xj. Therefore we get

Equation (A2)

Using the fact that we have a multipartite process and then the fact that ω is a unit, we can expand this remaining expression as

Equation (A3)

To complete the proof plug in the definition of ${K}_{{x}_{\omega }}^{{x}_{\omega }^{\prime }}\left(\omega ;t\right)$.

Appendix B.: Expansions of EP rates in multipartite processes

This appendix derives an expansion of EP rates that is used in the main text and the other appendices.

Lemma 1. Suppose we have a multipartite process over a set of systems $\mathcal{N}$ defined by a set of rate matrices $\left\{{K}_{x}^{{x}^{\prime }}\left(i;t\right)\right\}$ and a subset $A\in \mathcal{N}$. Then

Equation (B1)

Equation (B2)

If in addition A is a unit under K, then we can also write the quantity in equation (B1) as

Equation (B3)

Proof. Invoking the multipartite nature of the process allows us to write

Equation (B4)

Equation (B5)

Equation (B4) establishes equation (B1) and equation (B5) establishes equation (B2).

To establish equation (B3), use the hypothesis that A is a unit to expand

Equation (B6)

Equation (B7)

Equation (B8)

Equation (B9)

Appendix C.: Proof that if A is a unit, then $\frac{{\mathrm{d}}^{A}}{\mathrm{d}t}{S}^{X\vert {X}_{A}}\left(t\right){\geqslant}0$

First simplify notation by using P rather than p to indicate joint distributions that would evolve if K(t) were replaced by the counterfactual rate matrix K(A; t), starting from px(t). By definition,

Equation (C1)

However, since by hypothesis A is a unit,

Equation (C2)

Plugging this into equation (C1), summing both sides over xA(t + δt), and using the normalization of K(A; t) shows that to leading order in δt,

Equation (C3)

Equation (C3) in turn implies that to leading order in δt,

Equation (C4)

Equation (C5)

This formalizes the statement in the text that under the rate matrix K(A), xA does not change its state.

Next, since A is a unit under K(A; t), we can expand further to get

Equation (C6)

So the full joint distribution is

Equation (C7)

I can use this form of the joint distribution to establish the following two equations

Equation (C8)

Equation (C9)

Applying the chain rule for entropy to decompose SP(XA(t), XA(t + δt) | XA(t + δt)) in two different ways, and plugging equations (C8) and (C9), respectively, into those two decompositions, we see that

Equation (C10)

Next, use equation (C10) to expand

Equation (C11)

Equation (C12)

Add and subtract SP(XA(t)) in the numerator on the rhs to get

Equation (C13)

Since XA(t) and XA(t + δt) are conditionally independent given XA(t), we have a Markov chain XA(t) ↔ XA(t) ↔ XA(t + δt). So we can apply the data-processing inequality [6] to establish that the difference of mutual informations in the numerator on the rhs of equation (C13) is non-positive.

This completes the proof.

Appendix D.: Proof of equation (15)

For simplicity of the exposition, treat ω as though it were all of $\mathcal{N}$, i.e., suppress the ω index in xω and x'ω, suppress the ω argument of K(ω; t), and implicitly restrict sums over subsystems i to elements of ω. Then using the definition of K(ω'; t), we can expand

Equation (D1)

Equation (D2)

Since ω' is a unit, by equation (B3) we can rewrite the first sum on the rhs of equation (D2) as

Equation (D3)

Moreover, by equation (B2), even though ω \ ω' need not be a unit, the second sum in equation (D2) can be rewritten as

Equation (D4)

Equation (D5)

Combining completes the proof. In order to express that proof as in the main text, with the implicit ω once again made explicit, use the fact that windowing K(ω; t) to ω' ⊂ ω is the same as windowing K(t) to ω'.

Appendix E.: EP bounds from the inclusion-exclusion principle

For all n > 1, write ${\mathcal{N}}^{n}$ for the multiset of all intersections of n of the sets of subsystems ωi:

Equation (E1)

Equation (E2)

and so on, up to ${\mathcal{N}}^{\vert {\mathcal{N}}^{1}\vert }$. Any unit structure ${\mathcal{N}}^{1}$ specifies an associated set of sets,

Equation (E3)

Note that every element of ${\mathcal{N}}^{{\ast}}$ is itself a unit, since intersections of units are units.

Suppose we also have a real-valued vector indexed by the sets of ${\mathcal{N}}^{{\ast}}$, written as fω. The associated inclusion–exclusion sum (or just 'in-ex sum') is defined as

Equation (E4)

(Note that the precise assignment of integer indices to the units in ${\mathcal{N}}^{{\ast}}$ is irrelevant.)

The time-t in–ex information is defined in terms of this notation, as

Equation (E5)

As an example, if ${\mathcal{N}}^{{\ast}}$ consists of two units, ω1, ω2, with no intersection, then the expected in-ex information at time t is just the mutual information between those units at that time. More generally, if there an arbitrary number of units in ${\mathcal{N}}^{{\ast}}$ but none of them overlap, then the expected in–ex information is what is called the 'multi-information', or 'total correlation', among those units [31]. Mutual information is the special case where there are exactly two random variables [14, 31].

Given any function $f:{\mathcal{N}}^{{\ast}}\to \mathbb{R}$, the associated inclusion–exclusion sum (or just 'in–ex sum') is

Equation (E6)

In particular, given any distribution px, there is an associated real-valued function mapping any $\omega \in {\mathcal{N}}^{{\ast}}$ to the marginal entropy of (the subsystems in) ω. So using ${S}^{{\mathcal{N}}^{{\ast}}}$ to indicate that function,

Equation (E7)

where Sω is shorthand for ${S}^{{X}_{\omega }}$. I refer to ${{\Sigma}}^{S}-{S}^{\mathcal{N}}$ as the in–ex information. As an example, if ${\mathcal{N}}^{1}$ consists of two subsets, ω1, ω2, with no intersection, then the in–ex information is just the mutual information $I\left({X}_{{\omega }_{1}};{X}_{{\omega }_{2}}\right)$. As another example, if ${\mathcal{N}}^{1}$ consists of all singletons $i\in \mathcal{N}$, then the in–ex information is the multi-information of the N separate random variables.

The global EP rate is the negative derivative of the in–ex information, plus the in–ex sum of local EP rates:

Equation (E8)

Equation (E9)

Proof. To establish equation (E8), first plug in to the result in appendix B and use the normalization of the rate matrices to see that the EP rate of the full set of N coupled subsystems is

Equation (E10)

Now introduce the shorthand

Equation (E11)

Note that $\mathcal{N}$ itself is a unit; G is an additive function over subsets of $\mathcal{N}$; and $\langle \dot {\sigma }\left(t\right)\rangle =G\left(\mathcal{N}\right)$. Accordingly, we can apply the inclusion–exclusion principle to equation (E10) for the set of subsets $\mathcal{N}\left(t\right)$ to get

Equation (E12)

Now use equation (B3) in lemma 1 to rewrite equation (E12) as

Equation (E13)

Next, use the same kind of reasoning that resulted in equation (E13) to show that the sum

Equation (E14)

can be written as ${\sum }_{x,{x}^{\prime }} {K}_{x}^{{x}^{\prime }}\left(t\right){p}_{{x}^{\prime }}\left(t\right)\mathrm{ln}\enspace {p}_{x}\left(t\right)={S}^{\mathcal{N}}\left(t\right)$. I can use this to rewrite equation (E13) as

Equation (E15)

This establishes the claim.

If we use equation (10) to expand each local EP term in equation (E9) and then compare to equation (E8), we see that the global expected EF rate equals the in–ex sum of the local expected EF rates:

Equation (E16)

Note as well as that we can apply equation (E9) to itself, by using it to expand any of the local EP terms σω(t) that occur in the in–ex sum $\hat{\sum _{\omega \in {\mathcal{N}}^{{\ast}}}}\langle {\dot {\sigma }}^{\omega }\left(t\right)\rangle $ on its own rhs.

Example 1. A particularly simple application of equation (E9) is to extend the results in [38] on 'subsystem Landauer loss'. Suppose that all units that are intersections of two or more other units don't ever change their state. That means that the unit EPs of such intersection units all equal 0, and that their drops in entropy during the process all equal 0. Accordingly, equation (E9) gives

Equation (E17)

So by non-negativity of expected unit EPs:

Equation (E18)

The analysis in [38] on 'subsystem Landauer loss' derives these results for the special case where there are in fact no intersection units; in that special case, the unit structure is just a partition of $\mathcal{N}$, so the rhs of equation (E18) is just the change in multi-information (sometimes called 'total correlation' [30, 31, 35]) of the system during the process.

Example 2. Equation (E9) can be particularly useful when combined with the fact that for any two units ω, ω' ⊂ ω, $\langle {\dot {\sigma }}^{{\omega }^{\prime }}\rangle {\leqslant}\langle {\dot {\sigma }}^{\omega }\rangle $ (see equation (15)). To illustrate this, return to the scenario of figure 1. There are three units in ${\mathcal{N}}^{1}$ (namely, {1, 2, 3}, {3}, {3, 4}), three in ${\mathcal{N}}^{2}$ (namely, three copies of {3}), and one in ${\mathcal{N}}^{3}$ (namely, {3}). Therefore,

Equation (E19)

Equation (E20)

Note that the derivatives in equation (E20) are conventional, non-windowed derivatives; none of the terms in equation (E20) involve counterfactual rate matrices.

All of the local EPs in equation (E19) can equal 0, in a quasi-static process. In addition, $\langle \dot {\sigma }\left(t\right)\rangle {\geqslant}0$. Therefore dS4|1,2,3(t)/dt − dS4|3(t)/dt ⩾ 0, i.e.,

Equation (E21)

As a final comment, it is worth noting that in contrast to multi-information, in some situations the in–ex information can be negative. (In this it is just like some other extensions of mutual information to more than two variables [14, 31].) As an example, suppose N = 6, and label the subsystems as $\mathcal{N}=\left\{12,13,14,23,24,34\right\}$. Then take ${\mathcal{N}}^{1}$ to have four elements, {12, 13, 14}, {23, 24, 12}, {34, 13, 23} and {34, 24, 14}. (So the first element consists of all subsystems whose label involves a 1, the second consists of all subsystems whose label involves a 2, etc.). Also suppose that with probability 1, the state of every subsystem is the same. Then if the probability distribution of that identical state is p, the in-ex information is −S(p) + 4S(p) − 6S(p) = −3S(p) ⩽ 0.

Please wait… references are loading.