Entropic equality for worst-case work at any protocol speed

We derive an equality for non-equilibrium statistical mechanics in finite-dimensional quantum systems. The equality concerns the worst-case work output of a time-dependent Hamiltonian protocol in the presence of a Markovian heat bath. It has has the form"worst-case work = penalty - optimum". The equality holds for all rates of changing the Hamiltonian and can be used to derive the optimum by setting the penalty to 0. The optimum term contains the max entropy of the initial state, rather than the von Neumann entropy, thus recovering recent results from single-shot statistical mechanics. Energy coherences can arise during the protocol but are assumed not to be present initially. We apply the equality to an electron box.

General Introduction-Average values of quantities are not always typical values: Outcomes may fluctuate significantly. In non-equilibrium nano and quantum systems this is often the case, with, for example, the work output of a protocol having a significant probability of deviating from the average. Hence, in these important systems, statements about averages have limited use when it comes to predicting what will happen in any given trial; the fluctuations need to be discussed explicitly.
Two key relations concerning fluctuations in work, Crooks' Theorem [1] and Jarzynski's Equality [2], have been studied extensively theoretically and experimentally. Amongst other things they can be used to determine free energies of equilibrium states from nonequilibrium experiments.
A recently developed alternative approach to nonequilibrium statistical mechanics is single-shot statistical mechanics [3][4][5][6][7][8][9][10][11][12], inspired by single-shot information theory [13,14]. The focus is on statements that are guaranteed to be true in every trial, rather than on average behaviors. For example, one can ask whether a process's work output is guaranteed to exceed some threshold value (such as an activation energy), or whether a process's work cost is guaranteed not to exceed some threshold value (beyond which the system may break from dissipating heat). These statements concern the worst-case work of a process. A key realisation is that the optimal worst-case work is determined not by the von Neumann/Shannon entropy of the initial state, but rather the max entropy, which is the logarithm of the number of non-zero eigenvalues of the density matrix. Thus, which entropy one should use in statements about optimal work depends on which property of the work probability distribution one is interested in.
Single-shot statistical mechanics began with almost no a priori relation to fluctuation theorems, but promising links were made in [6,15]. We shall use two realizations from [15], namely that (i) in the trajectories model for work extraction, both single-shot and fluctuation results apply; and (ii) Crooks' Theorem can be used to make a certain statement about worst-case work. A natural question that arose from these results is how to link Crooks' Theorem to the existing single-shot statements concerning optimal work in terms of the entropy of the initial state.
We here show that key expressions concerning optimal worst-case work from [3,5,6] follow from Crooks' Theorem plus some extra thought. We moreover generalise them by giving an equality for the worst-case work that holds for any protocol in the set-up, including fast protocols. The equality holds in every process in a general set-up that involves a time-varying Hamiltonian and a single Markovian heat bath, modelled using trajectories. It has the form 'worst-case work=penalty-optimum,' and the optimum can thus be derived by setting the penalty to zero. To make the link to physics clear, we apply the result to an electron box experiment [16][17][18].
We begin with defining the set-up. One-shot relative entropies-The standard relative entropy is D(ρ||σ) := −Tr(ρ[log ρ − log σ]) [19], where log in this paper means the natural logarithm also known as ln. This is part of a wider class of relative entropies known as the Renyi relative entropies, which are parameterized by an integer α. We shall use two other members of that family: the (classical version of the) ∞ relative entropy D ∞ (P ||Q) := sup x log( px qx ) and the 0 relative entropy D 0 (ρ||σ) := −Tr(π ρ log σ), wherein π ρ projects onto the support of ρ [20]. These are called one-shot relative entropies as they arise naturally in one-shot (also called single-shot) information theory [13,14,20]. Protocols, trajectory model of-We now describe the theoretical model, using the notation of [21]. The physical scenario we have in mind is depicted in Fig. 1 A protocol will be a sequence of elementary changes: (i) changes of the Hamiltonian and (ii) thermalizations. We shall initially assume there is a finite number of such arXiv:1504.05152v1 [quant-ph] 20 Apr 2015 Figure 1: A schematic of the setup. There is a working-medium system, a battery system from which work is taken or given to, and a single heat bath. The battery system has the effect of altering the Hamiltonian of the working-medium, depicted with the blue arrow shifting an energy level. The heat bath has the effect of hopping the system between energy levels, depicted by the red arrow.
steps (but later show that the continuum limit is welldefined and corresponds to a master equation, at least in the discrete-classical case). The Hamiltonian is parameterized by λ m , where m is an integer that labels the step.
1. Hamiltonian changes map λ m to λ m+1 . We follow [21] in supposing there is an energy measurement in the instantaneous energy eigenbasis at the beginning and end of each Hamiltonian-changing step. In a given realisation the system then evolves from |i m , λ m to |i m , λ m+1 , where i m labels the energy eigenstate. This costs work given by the energy difference: w m = E(|i m , λ m+1 ) − E(|i m , λ m ). An important special case is i m = i m , which arises in the quasi-static (quantum adiabatic) limit, as well as if the energy eigenbasis is constant and only the energy eigenvalues change; this can be termed the discrete-classical case.
2. Thermalizations map i m to i m+1 , cost no work, and preserve the Hamiltonian: |i m , λ m+1 → |i m+1 , λ m+1 . For notational simplicity let us label this as |i → |j with energy E i → E j . The hopping probabilities respect thermal detailed balance: A trajectory is the time-sequence of energy eigenstates occupied: The probability of a given trajectory is accordingly, assuming a Markovian heat bath, A trajectory's inverse is the reverse of the sequence. The inverse corresponds, in the discrete-classical case, to the Hamiltonian changes running in reverse, from λ f to λ 0 , and to the same thermalizations as in the forward protocol, with the sequence exactly inverted. This process is termed the reverse process. Beyond the discrete-classical case, the unitary associated with the inverse process Hamiltonian change is defined such that p(|i m , λ m+1 → |i m , λ m ) = p(|i m , λ m → |i m , λ m+1 ). Our results will hold under that condition. There are at least two ways of satisfying that condition: (i) Simply let the unitary of the corresponding elementary step in the reverse process be U −1 , where U is that of the forwards process, (ii) apply a suitable 'time-reversal' operator Θ to all states and operators involved, as in [21]. The reverse trajectory is then the reverse sequence of the time-reversed energy eigenstates: Θ|i f , λ f ...Θ|i 0 , λ 0 , with the condition p(|i m , λ m+1 → |i m , λ m ) = p(Θ|i m , λ m → Θ|i m , λ m+1 ) being satisfied, as time reversal implies taking the complex conjugate of the states, in a preferred basis, and the transpose of the time-evolution in the same basis: U → U T . The condition is thus satisfied as b|U |a = ( a|U † |b ) * = a| * U T |b * . A given trajectory has some work cost w = m w m , in line with the definition of the Hamiltonian-changing steps. The inverse trajectory has work cost −w. A given protocol on a given initial state induces some probability distribution over trajectories, with an associated probability distribution over work p(w). The forwards and reverse protocol gives rise to p fwd (w) and p rev (−w) respectively.
If the initial density matrix of the forwards process and reverse processes are both thermal, i.e. exp −(βH(λ 0 ))/Z 0 and exp −(βH(λ f ))/Z f respectively, Crooks' Theorem holds [21]: (To derive it take the ratio of Eq. 1 and the corresponding reverse trajectory expression. Apply thermal detailed balance and the equality of reverse hopping probabilities for the Hamiltonian-changing steps. Sum over trajectories with the same w, and note that the reverse of a trajectory has the same work up to a minus sign).
Worst-case work-The central object of interest is the worst-case work w 0 := max{w : p(w) > 0}, also known as the guaranteed work [7]. In practice this may be realised by some very unlikely trajectory, and it is then natural to consider the worst-case work of some subset of trajectories T : w 0 T := max{w : p(w) > 0 and traj ∈ T }.
Equality for worst-case work-Consider an initial state ρ 0 , and a protocol of thermalizations and Hamiltonian changes with initial and final Hamiltonians H(λ 0 ) and H(λ f ) respectively. This induces a work probability distribution p(w) and an associated w 0 . We shall derive an equality of the form w 0 = penalty -optimum. We consider initial states of form ρ 0 = i p i |i 0 , λ 0 i 0 , λ 0 |, i.e. diagonal in the energy eigenbasis though not necessarily thermal (energy coherence may still arise during the protocol). We take p i = 0. This is because we wish to avoid divergences from dividing by p i . (See [22] for an alternative way of approaching this divergence problem).
To apply Crooks' Theorem (Eq. 2) here, even though the initial state is not assumed to be thermal, our approach is as follows. Note that if a state is not thermal, e.g. if one has a degenerate two-level system the thermal state is γ = 1/2|0 0| + 1/2|1 1|, but if one instead had ρ 0 = 2/3|0 0| + 1/3|1 1|, then this scenario has the same worst case work as γ. This follows because the set of trajectories with non-zero probability is the same in both cases, as can be seen from Eq.1 which gives the probability of a trajectory. Given a ρ 0 we will then find a corresponding thermal state which has the same worstcase work and apply Crooks' Theorem to that.
An important practical consideration which makes this more subtle is that some p i may be negligible. It is then natural to exclude trajectories starting in those states when calculating the worst-case work. We therefore divide the initial energy eigenstates into two sets: one set which is the one of interest: E IN and the rest which we call E OUT , corresponding to those we shall exclude when calculating the worst-case work. The probability of being in E OUT is given by We define T IN as the set of possible (meaning p > 0) trajectories beginning in E IN and similarly T OUT as the set of possible trajectories beginning in E OUT . Recall that each trajectory has some work value associated with it. We call the worst-case work of T IN , w 0 IN ; this cannot be worse than the worst-case over all trajectories: w 0 IN ≤ w 0 . Now we design an associated thermal state to yield the same worst-case work as ρ 0 , i.e. w 0 IN and later show this to be indeed be the case under an additional mild assumption. We define it as changing the energies of E OUT to new ones, E i , such that p i = exp(−β E i )/ Z, and leaving the other energy levels the same. Our definition implies that This partition function differs from that of the actual Hamiltonian H(λ 0 ). Ignoring the E OUT levels helps lower the calculated work cost, as can be seen in the Z being smaller.
In this scenario with γ as the initial state and the E OUT levels lifted the protocol is the same as in the actual scenario, except that initially the E OUT are lowered down to the levels of the actual Hamiltonian of interest. The worst-case work of this scenario is called w 0 . We show (see Methods) that under a mild additional assumption that the worst-case work is bounded from below, as desired.
To get w 0 from Crooks' Theorem (Eq. 2) we follow [15]. Take the initial state of the forwards process to be ρ 0 = γ; and the initial state of the reverse process as γ = e −βH(λ f ) /Z f . Consider the equality of Crooks' Theorem (for values of w such that p fwd (w) > 0) and select the value for w which maximises the LHS (and thus the RHS) [15]: The RHS is monotonic in w, so maximizing the RHS over the support of p fwd (w) leads to the maximum wvalue w 0 . Taking the logarithm and recalling the D ∞ definition yields [15] Main result-Combining Eq.5 and Eq. 4 we thus have Thus the worst case work of the trajectories of interest w 0 IN is this equal to (kT times) a relative entropy minus (the logarithm) of two partition functions, one of which encodes information about how many of the initial energy eigenstates have negligible occupation probability. Discussion-Equation 6 has the form βw 0 = penalty -optimum.
The penalty is given by the difference between the forward and reverse distributions, quantified by D ∞ . The optimum one can hope for, with a given initial state and given initial and final Hamiltonian, is to set the penalty to 0 (as relative entropies are non-negative), which leaves − log Z f / Z . This term is made more negative the smaller the support of ρ is and the lower the final energies are relative to the initial ones. To illustrate the notation used, a very simple example of applying the formula is given in Fig. 2.
We now consider the optimum term in two important special cases where the single-shot entropy of the initial state emerges: (i) If p(OU T ) → 0 and H(λ 0 ) = H(λ f ), The forwards protocol here is to lift the second level from −δE to 0. The reverse is to lower it back. Suppose for concreteness that ρ0 = 0.9|0, λ0 0, λ0| + 0.1|1, λ0 1, λ0|. This table describes the two possible trajectories, their work costs and probabilities (reverse trajectories in brackets). traj traj set work w prob Thus the equality is in this case: 0 = log 2(0.9) − log 2(0.9).
. Thus in this case the equality has the form ). This recovers the known results from [3, 5, 6] that these are optimal in the respective cases. (In the more general case where p(OU T ) is finite one recovers the smooth relative entropy as the optimal quantity-see Methods). The message is that it is the max entropy S max which determines the optimal worstcase work, rather than the von Neumann entropy. If one defines thermodynamic entropy in terms of optimally extractable worst-case work, it is the max entropy which should be used.
To make the connection to physics clear, we apply the results to a recent realization of a Szilard engine with an electron box [16][17][18]. A great advantage with using this trajectories model from the fluctuation theorem approach is that it allows the application of single-shot results to such experiments. We described the set-up in in Figure 3 and in the Methods we analyse what controls the penalty term D ∞ in this scenario.
As described in the trajectories section, these results also apply if the evolution includes unitaries that create energy coherences. One might think that coherences will always worsen the worst-case work or its probability. As a counter-example according to this trajectory model, suppose H(λ 0 ) = 0; ρ 0 = 1/3|0 0| + 2/3|1 1|, and H(λ f ) = δE|i i|. If the energy eigenstates stay the same throughout such that |i = |1 the worst-case work is δE and it has probability 2/3 (even if the shift is done quickly). If instead the Hamiltonian eigenstates change such that |0 → |+ , and |1 → |i = |− then the worst- Figure 3: A schematic of an "electron box" (D) coupled to a metallic electrode (R) via tunnelling (with rate Γ) and the capacitor with capacitance CJ , and to the gate electrode via the capacitor with Cg. The gate voltage Vg controls the number of excess electrons on the electron box, which at low temperatures is restricted to two possible values and serves as a logical basis |0 and |1 for a qubit. Namely, it tunes the relative energy by H ∝ −CgVg|1 1|. The electrode R plays the role of a heat bath, where the tunnelling in/out of the box D corresponds to thermal excitation/relaxation. Experimentally, the work and heat can be measured by probing the charge on D in real time with a single-electron transistor next to D (not shown in the figure) as demonstrated in Refs. [16][17][18]. In the Szilard engine protocol, In the methods we derive a master equation for the characteristic function Z(ξ) = e ξw of the work distribution function P (w). This new master equation allows us to calculate efficiently the characteristic function, the work distribution itself (Figure 4), and a bound for D∞ (see the Methods). case work is still δE corresponding to outcome |− of the final energy measurement. However the probability of this can be as low as 1/2 (if H is changed suddenly p(|− ) = T r(ρ 0 |− −|) = 1/2). This shows that the probability of the worst case can actually be improved (lowered) by coherence due to suddenly changing then Hamiltonian, though at the cost of randomising the work distribution.
In the Methods we go further and consider a smaller subset of trajectories, cutting away also trajectories that start in a likely initial state but nevertheless have low probability. We describe the continuous time limit, and the electron box scenario in detail. Summary and outlook-We showed that in any protocol with a time-varying Hamiltonian and thermalizations, the worst-case work takes the form of "penaltyoptimum". The model we used could be generalised in various ways, including non-Markovian baths and baths that decohere in other bases than the energy basis. It is also important to find more bounds for the penalty term in terms of controllable parameters. Note added: Similar results were obtained independently by Salek and Wiesner, using a different set-up and different starting assumptions, in: Fluctuations in  For a given initial state ρ = p i |i i| and initial energy eigenvalues E i , the associated thermal state is defined as γ = i e −β Ei / Z|i i|, where E i = E i for |i ∈ E IN , but for |i ∈ E OUT , E i is chosen such that e −β Ei / Z = p i . Physically, this implies replacing the energy levels with small occupation probability p i by much higher energy levels such that their thermal occupation probability is as small as p i . The Hamiltonian associated with γ is accordingly H := IN E i |i i| + OUT E i |i i|. The normalising factor is Z = |i e −β Ei . These definitions imply that Apart from the given actual protocol, we also design a ∼-protocol such that it gives the same worst-case work in the case of γ as the initial state. We define the ∼-protocol as beginning with H, then lowering the OUT levels back to E i , i.e. setting H → H. After that it is the same as the actual protocol. We call the ∼-protocol applied to γ "the ∼-scenario." In the ∼-scenario we similarly have T IN and T OUT , and w 0 IN . The following holds: i.e. the worst-case work is the same in the ∼-scenario as in the actual scenario, for the T IN subset of trajectories. This is because the protocol is defined such that the added initial step in the ∼-scenario only involves the OUT levels. The set of possible work values are the same in T IN and T IN . We now make the following mild restriction on protocols allowed: We say this is mild, because the trajectories T OUT have an extra work gain relative to their sister trajectories in T OUT following from their initial lowering. This gain tends to infinity as p(OU T ) → 0. The restriction of equation (A3) then means that the negative infinite work from a T OUT trajectory is not a worse work cost than that from a T IN trajectory. Combining Eqs.A2 and A3 gives the desired expression used in the main body:

Appendix B: Smooth relative entropy
As noted in the main body, the optimum term reduces to a relative entropy in a special case. If H(λ 0 ) = H(λ f ), (noting γ = 1/d). This recovers the known results from [3, 5, 6] that these are optimal in the respective cases. If p(OU T ) defined above is not necessarily zero, this optimal term depends on which levels are chosen to be in E OU T . If one chooses the best cut between IN and OUT, in the sense of minimising Z and thus the worst-case work, the optimal term becomes in those cases D 0 (ρ 0 ||γ) := min D 0 (ρ ||γ) such that d(ρ 0 , ρ ) ≤ where d is the trace distance (this is called the smooth relative entropy). The interpretation is that the optimal worst-case work allowing for an error tolerance of = p(OU T ) is kT D 0 (ρ 0 ||γ), consistent with [3, 5, 6].
Appendix C: Cutting the work-tail, as well as the state-tail There can actually be (sets of) trajectories which are unlikely even if the initial state of the trajectory is likely, as the hopping probability may be low. For example if one lifts one level towards a very high value whilst thermalizing, there is one trajectory corresponding to staying in that level throughout, which would then be the one that gives the worst-case work. However if this is very unlikely one would wish to ignore such a trajectory when stating the worst-case work. In this section we show a way to do that, by not only cutting off a part of the initial state as previously, but also a part of the work distribution. This gives a different penalty term-lower in general-in the equality for the worst-case work.
Proof overview-We shall again take the initial density matrix to have the form ρ 0 = d i=1 p i |i 0 , λ 0 i 0 , λ 0 |, not necessarily a thermal state. Then a sequence of Hamiltonian changes and thermalizations as described above is applied. This induces some work probability distribution and some worst-case work for the trajectories of interest. The argument is split in two. First, we define a set of trajectories of interest: Some trajectories are unlikely enough to be ignorable. We derive the worst-case work for that set. Next, we consider the probability that some trajectory is in that set. Combining these two parts gives our new equality for worst-case work.
The set of trajectories of interest-We wish to ignore unlikely trajectories. We identify a set of trajectories of interest, defined as excluding trajectories of two types: 1. ρ 0 -tail trajectories: These are those which are called T IN above, i.e. trajectories which start in E IN . We now call them ρ 0 -tail trajectories as using IN risks generating confusion because of the second type of cut we shall make on the set of trajectories.
2. Work-tail trajectories: We also ignore trajectories associated with the worst work values, if those values are sufficiently improbable. This ignoring amounts to cutting off the worst-case tail of the work probability distribution. To simplify the proof, we define this tail in terms of the work probability distribution of the fictional thermal state γ. By "the work-tail," we mean the set of trajectories associated with the following work values w: If the initial state is γ, there is an associated work probability distribution p fwd (w) for the given protocol, and an associated worst-case work w . The work tail trajectories are by definition those with work cost w > w . Since the actual initial state ρ 0 may differ from γ, the probability that some trajectory begins in the work tail does not necessarily equal .
These sets are depicted in Fig. 5. We shall call the worst-case work in the set of interest w 0

IN,IN
The worst-case work in that set-We now derive the worst-case work in that set of trajectories, i.e. we maximise the work cost w over that set of interest. We shall for the first part draw inspiration from an argument in [15] concerning scenarios where Crooks' Theorem holds. Take the initial state of the forwards process to be ρ 0 = γ; and the initial state of the reverse process as γ = e −βH(λ f ) /Z f . Maximize Crooks' Theorem over the support of p fwd (w) [15]: The RHS is monotonic in w, so maximizing the RHS over the support of p fwd (w) leads to the maximum w-value w 0 . Taking the logarithm and recalling the D ∞ definition yields [15], Now, we cut off the work tail by defining a cut-off probability distribution p fwd (w) := 0, if w ≤ w and p fwd (w) 1− , otherwise wherein w denotes the work guaranteed up to probability if γ is the initial state. [Dividing by (1 − ) normalizes the distribution.] For work values outside the work tail, Crooks' Theorem can be reformulated as Since the RHS is monotonic, wherein the maximization is over the support of p fwd . Taking the logarithm and rearranging yields The LHS is the worst-case work in the set of trajectories of interest.
Probability that a trajectory is in the set of interest-The trajectories of interest are effectively the possible trajectories. To make precise what is meant by "effective," we bound the probability of not being in that set.
Consider a trajectory followed by a system initialized to ρ 0 . The probability that the trajectory lies outside the set of interest is bounded by p(ρ 0 −tail) + p(work−tail), as shown in Fig. 5. p(ρ 0 −tail), defined via ρ 0 and the choice of effective support, is specified by input parameters. p(work−tail) denotes the probability that the trajectory is in the set associated with a worse work cost than w (the work guaranteed up to probability not to be exceeded, if the initial state is γ). p(work−tail) does not necessarily equal for an arbitrary ρ 0 . As p(work−tail) is not an input parameter, we wish to bound it with input parameters.
Let us drop the subscript "fwd" and refer simply to p(w). The weight p(w > x) in the actual work tail with ρ 0 cannot differ arbitrarily from the weight p(w > x) in the work tail associated with γ: This bound follows from the definition of the variation distance d, which equals the trace distance between diagonal states. 1 The variation distance d is contractive under stochastic matrices, because the trace distance is contractive under completely positive trace-preserving (CPTP) maps. We note that the work distribution is the result of a stochastic matrix acting on the probability distribution over initial energy eigenstates. Let us now in this paragraph for convenience use Dirac notation for classical probability vectors, representing a probability distribution p(w) as w|p . The work distribution comes from the stochastic matrix j |p j j| mapping a state |ρ 0 to a work distribution, wherein j labels projectors onto H(λ 0 ) eigenstates, |p j labels the work distribution when starting with an initial state |j (i.e. p j (w) = w|p j ), and |ρ 0 = j q j |j . For example, if there are two possible eigenstates, we can write |ρ 0 = q 1 |1 + q 2 |2 = (q 1 q 2 ) T , and the resulting work distribution p(w) = ( w|p 1 1| + w|p 2 2|)|ρ 0 = q 1 p 1 (w) + q 2 p 2 (w). Thus, For some x = x , by definition, p(w > x ) = p(work−tail) = , and p(work−tail) := p(w > x ). Thus Main result, also cutting work tail-We conclude that the worst-case work from the trajectories of interest, The probability that the trajectory is not in the set of interest is upper bounded by p(ρ-tail) + p(work-tail) ≤ p(ρ-tail) + d(p i , γ) + .

From discrete to continuous
We consider a discrete sequence of times, t m = t 0 + m dt (m = 0, 1, 2 · · · ), and the sequence λ m ≡ λ(t m ) of values of the external parameter. As the waiting time decreases (dt → 0), the transition probability p(|i, λ(t), t → |j, λ(t + dt), t + dt ) due to thermalization should vanish. To first order, it behaves as The transition rate Γ i→j (t) is a possibly complicated function of instantaneous energy levels E(|i, λ(t), t ). However, the transition rates inherit the condition from detailed balance and the condition j Γ i→j (t) = 0 (D3) from probability conservation. The occupation probability is If the occupation probability is a smooth function of time, the master equation follows. The equivalence is further illustrated in Appendix E in the example of an electron box.

From continuous to discrete
Going in the other direction, we now show explicitly how the discrete-time model can be derived from a physical master equation. Consider a two-level system that has a state |0 , kept at zero energy, and a state |1 whose energy ω(t) changes. The Hamiltonian is H(t) = ω(t)|1 1|, and the system interacts with a temperature-T heat bath. In [23], a master equation for the density matrix ρ(t) was derived for a such system. In the present case, the master equation isρ The heat bath's thermal photon number n th (ω) = (e β ω − 1) −1 depends on time because the upper level shifts. d(ω) is the dimensionless heat-bath density of states; Γ denotes a rate assumed to be constant; σ − = |0 1| denotes the usual lowering operator; and σ + = σ † − . Equation (D5) has the form of the usual Lindblad master equation, but the Lindblad operator depends on time. The dependence arises only from the level spacing's time dependence. The Hamiltonian part contains the Lamb shift.
In the derivation of Eq. (D5) one assumes, as usual, weak coupling to the heat bath, the Markovian approximation, and the rotating-wave approximation. One also assumes that the adiabatic approximation holds, i.e. the system always remains in its time-local energy eigenstates when the interaction with the heat bath is ignored. This condition is always fullfilled under the assumption of vanishing energy coherences at all times that we made in this section. Indeed, the part of (D5) pertaining to the diagonal elements of ρ(t) can be derived without the adiabatic assumption [24].
We now consider discrete times t n := n∆t, n = 0, . . . , N , with ω(t) constant during the time intervals ∆t, ω n := ω(t n ). Restricting ourselves to changes of the Hamiltonian that only involve its spectrum, H(t) and L(t) are constant during a given time interval.
Consider first the Hamiltonian changes. Heisenberg's equation of motion for the system-and-bath composite implies thatρ(t) has a finite jump when the Hamiltonian has a finite jump. Therefore, ρ(t) is continuous when the Hamiltonian has a finite jump. Hence for finite Hamiltonian changes during a time δt, the system-and-bath composite's density matrix is unchanged in the limit as δt → 0. Hence the system's reduced density matrix is unchanged during the instantaneous shift of energy levels. As for the relaxation process, the initial thermal state is described in terms of occupation probabilities p n for the n-th level. The evolution during the relaxation process is given by p(t) = e T t p(0), where T is a matrix that connects the diagonal matrix elements of ρ in the master equation (D5),ρ nn = m T nm ρ mm . The transition rates T nm inherit detailed balance from the rates appearing in the master equation, i.e. T ij = e −β( i− j ) T ji . Expanding e T t into a power series, one realizes that for each power T k of T detailed balance holds, i.e. (T k ) ij = e −β( i− j ) (T k ) ji for all k ∈ N, and therefore also for e T t . We thus have derived, from a physical model of a system that is coupled to a heat bath and whose energy levels are piece-wise-constant, the discrete-time model considered in the paper.
Appendix E: Application to solid-state system: Electron box To demonstrate the physical relevance of our results, we take a realistic example, the so-called electron box, and apply our results to it. We first derive a time-local master equation for the level-occupation probabilities in Appendix E 1. As shown in Appendix D, it is equivalent to the discrete-time trajectory model discussed in the main text. Then the work distribution functions are analyzed numerically in Appendix E 2 and analytically in Appendix E 3. Finally, we provide an upper bound of the penalty term D ∞ , which reveals the direct physical relevance of our results.

Theoretical model and its justification
We consider the type of system in [16][17][18]. Following a semiclassical theory (known as "the orthodox theory") such as in [25], we derive a master equation and illustrate the work fluctuations. While a more complete quantum description is possible [e.g., 24], the semiclassical approach is useful for interpreting and identifying work and heat, which are often ambiguous.
The system (Fig. 6) consists of a large metallic electrode R that serves as a charge reservoir, a small metallic island (or quantum dot) D, and a gate electrode. The island D is coupled only capacitively to the gate electrode but couples to the reservoir R capacitively and via tunnelling. The Hamiltonian has four parts: H = H R + H D + H C + H T . The first two terms, describe the non-interacting parts of the electrode R and the island D. Here, c † k (d † q ) creates an electron with momentum k ( q) and energy ε k (ε q ). The single-particle dispersions ε k and ε q form continua of energy levels. H C signifies the Coulomb interaction among electrons confined in the island. Describing it within the capacitor model is sufficient: wherein C J and C g denote the junction and gate capacitances, and Q J and Q g are equilibrium charges stored on them. One can find that wherein C := C g C J /(C g +C J ) is the system's effective capacitance and N = k d † k d k is the number of excess electrons on the island D. H C can thus be rewritten as wherein E C := e 2 /2(C g + C J ) is the single-electron charging energy, one of the largest energy scales of the system. Finally, the tunnelling of electrons between R and D is described by wherein t kq is the tunnelling amplitude. For common metals, which have wide conduction bands, t kq = t d is independent of the momenta (or energy). We are primarily interested in the macroscopic variable N but not in the microscopic degrees of freedom c k and d q , whose dynamics is typically much faster. One can thus integrate out c k and d q to get the effective Hamiltonian expressed only in terms of N . In the semiclassical approach, this can be achieved by considering the energy that an electron gains by tunnelling.
Suppose that an electron tunnels into the island D from the reservoir R. This will change the charge Q J → Q J − e and the excess number of electrons N → N + 1. This new charge configuration, right after the tunnelling, is redistributed quickly to a new equilibrium configuration by the gate voltage source. The voltage source moves the amount of charge through the transmission line from the junction interface to the gate capacitor by doing the amount of work on the system. Therefore, the electron's overall energy gain ∆E is given by the work W minus the change in the electrostatic energy: As this energy gain comes from the transition N → N + 1, the effective Hamiltonian for the macroscopic variable N can be regarded as wherein N g := C g V g /e. Recall that the second term comes from the work done on the system by the voltage source. The remaining effect of the microscopic degrees of freedom that have been removed from the macroscopic effective model is to fluctuate N randomly. As the transition N → N ± 1 is associated with tunnelling of an electron into/from the island, the transition rate can be obtained from Fermi's Golden Rule: wherein ρ R and ρ D are the density of states of R and D, respectively, and Finally, at sufficiently low temperatures (βE C 1), higher changing levels play no role, and considering the two lowest levels N = 0 and N = 1 is sufficient for N g ∈ [0, 1]. 2 Together with Eqs. (E10) and (E11), this two-level approximation leads to the master equationṗ wherein the transition rates are Γ ± (t) := Γ(± (t)) and Γ( ) : Here, ε c is the bath's high-frequency cutoff (i.e., /ε c is the correlation time), and Γ 0 is a constant that characterizes the strength of the coupling to the bath. Γ 0 /ε c is related to the material properties by Γ 0 /ε c = 2π|t d | 2 ρ R ρ D / . Note that the transition rates satisfy the detailed-valance relation The time-local master equation (E13) is equivalent to the discrete-time trajectory model (see Appendix D). Therefore, the electron box is a realistic prototype system to which our results can apply.

Monte Carlo simulation of the Electron Box
We performed a Monte Carlo of simulation of an erasure protocol in the electron box set-up. Our simulation discretizes the protocol into time steps δt that are small enough to justify the linear approximation that the population of level i evolves from time step t to t + δt according to p i (t + δt) = p i (t) + δtṗ i (t). Using Eqs. (E13), we can write a stochastic matrix acting on the probabilities: For a two-level system which does not build up quantum coherences, a stochastic thermalizing matrix (which by its definition evolves all states towards the Gibbs state) has only one degree of freedom remaining once the Gibbs Figure 7: Work guaranteed to be extracted from a Szilárd engine up to probability : w . A Monte Carlo simulation was used to predict the work from the single-electron-box. w approaches kT ln 2 as a function of the protocol's speed. For smaller , w approaches from below; and for higher, from above.
state has been chosen: the speed of a thermalization matrix. This means that all models of two-level thermalizations for a given Gibbs state are equivalent. For our simulation we pick the conceptually straightforward partial swap, in which with some probability p sw the current state of the system is exchanged with the Gibbs state, and otherwise it is unchanged: M swap = (1 − p sw )1 + p sw |Gibbs ones|, where |ones means the vector of 1's. For a Gibbs state associated with an energy level splitting , we can write this explicitly as: Equating Eq. (E16) with Eq. (E17), we can find the partial swap probability in terms of the physical parameters of the electron box: where we have written the swap probability p sw (t) and the energy level splitting (t) as functions of time, to stress that this swap probability changes as the protocol evolves. Note that the probability changes only as a function of an external parameter, the splitting, (as opposed to e.g., the current state) and so Crooks' Theorem is still applicable to thermalizations of this type). In our Monte Carlo simulation in Fig. 7, we randomly generate trajectories by picking a random initial microstate according to the initial state probability distribution, and then evolve the system by small steps, testing at each step if a swap should occur (with probability p sw ), and if it does, we replacing the state with a new micro-state randomly chosen from the Gibbs state associated with the current Hamiltonian. By recording which microstate is occupied when the energy level is raised, we calculate the work cost associated with a particular trajectory. Repeated runs of the simulation allow us to build up a work distribution, to which the results in this paper can be applied.
The total work distribution function can be written in a series P (W ) = p 0 e −S0(0) δ(W ) + p 1 e −S0(1) δ(W − W c ) + ∞ J=1 σ0 p σ0 P J (W ; σ 0 ) (E24) P J (W ) has a factor of (Γ 2 0 e −β ) J and at low temperatures, P J is rapidly suppressed as J increases. The expression (E24) for the work distribution is essentially a perturbative expansion in Γ 2 0 and converges very quickly for small Γ 0 . For large Γ 0 , however, it becomes impractical to use it for actual calculation because of its slow convergence. Therefore, it will be useful to devise a more general method and we examine the characteristic function Z(ξ) = e ξw of the work distribution function P (W ). We first consider the characteristic function Z σ (ξ) = e ξw σ conditioned that all trajectories start from a definite initial state σ 0 . Regarded as a function of the operation time τ , Z σ (ξ; τ ) satisfies the master equation and the initial condition Z(ξ; 0) = e ξ σ (0) .
Compared with the original master equation (E13) for the level occupation probability, the new master equation (E25) for the characteristic function contains additional diagonal terms. The full characteristic function is then given by Recall that Z(ξ) contains the same information as P (W ). Indeed, one can calculate P (W ) itself and, as shown in Section E 4 below, a bound for D ∞ (P fwd (W ) P rev (−W )).
Let us now show that the work distribution in Eq. (E24) satisfies the Crooks fluctuation theorem: