A measure of majorization emerging from single-shot statistical mechanics

The use of the von Neumann entropy in formulating the laws of thermodynamics has recently been challenged. It is associated with the average work whereas the work guaranteed to be extracted in any single run of an experiment is the more interesting quantity in general. We show that an expression that quantifies majorization determines the optimal guaranteed work. We argue it should therefore be the central quantity of statistical mechanics, rather than the von Neumann entropy. In the limit of many identical and independent subsystems (asymptotic i.i.d) the von Neumann entropy expressions are recovered but in the non-equilbrium regime the optimal guaranteed work can be radically different to the optimal average. Moreover our measure of majorization governs which evolutions can be realized via thermal interactions, whereas the non-decrease of the von Neumann entropy is not sufficiently restrictive. Our results are inspired by single-shot information theory.

General introduction.-Therelationship between information, quantified by entropy, and work has been the centre of much intriguing and arguably very productive debate, c.f. Maxwell's daemon [1][2][3], Szilard's engine [4], Landauer's erasure [5], and Bennett's reversible measurements [6].To our knowledge connecting smooth entropies [7,8] with work was first considered in [9].Particularly important for [9] and our considerations here is the notion of a Szilard engine [4] and Bennett's extensions thereof [6].A Szilard engine extracts work from a heat-bath at the cost of using up knowledge about the microstate of the working medium.It was shown in [9] that one should not use the Shannon/von Neumann entropy in general to quantify the extractable work in the Szilard engine, but rather the smooth entropies.This has been followed by several results.In [10] it was shown how to interpret negative conditional entropy in these settings ( [9] does not deal with conditional entropy).Very recently, in [11] and independently [12] the non-conditional case is considered in a significantly more sophisticated and general manner than in [9].These articles taken together indicate that a neat and greatly generalised statistical mechanics, tentatively dubbed single-shot statistical mechanics, is emerging.As mentioned above a key advantage with this approach is that one can answer questions such as "how much work can I extract in any given go (single shot extraction) with a probability x of success".In standard thermodynamics one would ask what the potential average work output is.To see that in general these are two very different questions, consider a scenario where there is a threshold that needs to be exceeded.One may e.g.need to lift a weight onto a table of given height, or get an electron into the conduction band.Then if there is a significant variance around the average, as is common in nano-scale scenarios, one cannot infer the probability of reaching the threshold from the average alone.It is even possible for the average to be above the threshold whilst with a high probability the threshold is not exceeded.
In this Letter we take this approach much further.We give an expression for the extractable work in a single extraction which reduces to the expressions of [9,11,12] in the appropriate limits.We take the system constituting the working medium of the generalised Szilard engine to have a given but arbitrary set of energy levels before the extraction.Moreover there is an arbitrary probability distribution over these levels, representing our knowledge thereof.Similarly we take the post-extraction energy and probability distributions to also be fixed but arbitrary.(In [11,12] the initial and final energy levels are taken to be the same, and one of the two states is taken to be the Gibb's state).Our finding is that the maximal work that can be extracted given these initial and final conditions is determined by a to our knowledge new measure of how much one distribution majorises another.The expression only reduces to the standard von Neumann entropy result in very specific limits, implying that the associated standard laws of thermodynamics need to be modified in general.We use our results to propose new laws of thermodynamics.In particular the second law requires a significant tightening.
Single-shot statistical mechanics-relevant key result.-Webegin with briefly reviewing key results that we shall later recover as special cases of our expression.(This is thus not an exhaustive list of all previous results).The details of the models of work extraction in the different papers are not a priori identical, but we shall recover the same expressions within the model here.
In [9] an n-cylinder Szilard engine was considered and the following expression derived: arXiv:1207.0434v2 [quant-ph] 11 Jan 2013 Here W ε is the work that can be extracted in a process with maximum probability of failure ε.H ε max is the smooth max entropy of the density matrix representing the work-extracting agent's initial knowledge about the state of the working medium.This is defined as H ε max (ρ) = log (rank ε (ρ)), with rank ε (ρ) the number of non-zero eigenvalues minimised over all states within ε trace distance of ρ. (Actually there is an alternative definition as well but they are both known to coincide up to an additive log 1  ε term, so for simplicity we mention only one definition here.)T is the temperature of the heat bath, and k Boltzmann's constant.H ε max (ρ) reduces to the von Neumann entropy in what we shall here call the von Neumann regime, where ρ = τ ⊗n , n → ∞ and ε → 0.
A key result obtained independently in the more recent papers [11,12] is that given an initial state ρ and a final thermal state ρ T over the same energy levels, the work that can be extracted with up to ε failure probability is: where ρ is taken to be diagonal in the energy eigenbasis, ρ T the corresponding thermal state on the same energy levels, and D ε 0 (ρ||ρ T ) is the ε-smooth relative entropy of order 0 (see [13]).This reduces to W = kT ln(2)D(ρ||ρ T ) for the standard relative entropy in the von Neumann regime.That latter expression is well-established, see e.g.[14].Eq. 2 reduces to Eq. 1 in the case of degenerate energy levels, as shown in [11].In [12] an expression is also given for the inverse process of taking a thermal state to any diagonal state with the same energy spectrum.
The work extraction game.
-Our work extraction model can be thought of as a game with simple but minimal rules.(It will nevertheless not be trivial to analyse as there is a multitude of different strategies one may choose).There are three systems and a work-extraction agent.One system is the working medium, another is a heat bath of temperature T, and the last is the work reservoir.The agent wishes to transfer as much energy as possible into the work reservoir in a single extraction.We shall be concerned with quantifying how much energy can be transferred with a maximum failure probability ε, calling this the work, W ε .
The initial energy spectrum {E} of the working medium is arbitrary.The initial density matrix ρ of the system is diagonal in the energy basis.The final energy spectrum {F } and diagonal density matrix σ are also arbitrary.The agent has a few elementary processes it can combine in any way it chooses: (i) it may couple the working medium to the heat bath.This has the effect of changing the probabilities (not the energy levels) in such a way that they approach, by a stochastic matrix, the Gibbs thermal state for the given energy spectrum; (ii) it may change the energy levels (without altering the probabilities) by external intervention taking {E} j to {E} j+1 where j labels the time step.Here the energy must be accounted for by being taken from or given to the work reservoir.In a given realisation the system is in one, possibly unknown, energy eigenstate and only changes to that particular eigenstate cost or yield work.The combination of these elementary processes the agent chooses is called it's strategy.Energy conservation is assumed throughout: any energy leaving or entering the system must enter or leave the heat bath and/or the work-extraction system.
Relative mixedness.-Wenow introduce a measure of how much more mixed one state ρ is than another, σ, calling this the relative mixedness M (ρ||σ).The definition of M (ρ||σ) will later be justified by operational statements we will make.
Definition 1 (Relative mixedness).The relative mixedness of two states ρ and σ with compact support and descending-order spectra f (x 1 ), g(x 2 ) respectively is given by For states with discrete spectra {λ i } one evaluates M for the associated step-function where the i-th 'block' has constant height λ i and all blocks have width 1.

If and only if
The actual number M can be viewed as a measure of by how much.
Gibbs-rescaling.-We shall make use of a powerful insight from [15][16][17]; it is also used in [12].The insight bridges a particular gap between information theory and statistical mechanics: the fact that the former does not care about energy.In information theory, the Shannon/von Neumann entropy of a state, − i λ i log λ i is independent of the energies of the states involved.As the extractable work is expected to depend on the energy levels involved it follows that it is not expected to be uniquely determined by an entropy.
A key way in which energy enters into statistical mechanics is that in a Gibbs state the probability of any given energy eigenstate state with energy E is given by p T (E) = exp(− E kT )/Z, where Z is the partition function.The insight we adapt from [15][16][17] is that we can take this bias into account by what essentially amounts to rescaling the density matrix's eigenvalue distribution by p T (E).After the rescaling the occupation probabilities will turn out to uniquely determine our expression for the extractable work.More specifically, we shall be employing an operation we term Gibbs-rescaling to the eigenvalue spectrum.Consider states with discrete spectra {λ i }.We firstly transform the spectrum into the associated step-function.Then we take each block, rescale its height as λ i → λ i / exp(− Ei kT ), and its width l = 1 → e − E i kT such that the area of the new block is λ i as before.We write this operation applied to a density matrix ρ as G T (ρ).
The main theorem.-Havingdefined the relative mixedness M (.||.) and Gibbs-rescaling G T (.) we can now give the main result.The Gibbs-rescaling takes a Gibbs state with partition function Zp(Zq) to a uniform distribution p(q) of width Zp(Zq) and height 1  Zp ( 1 Zq ) (upper graph).The integrals p and q (lower graph) are used to evaluate the relative mixedness.m defined in the relative mixedness definition must satisfy p 1−ε (ml) ≥ q (l) ∀l, and we see this holds for m ≤ Zq Zp(1−ε) , implying that W ε = kT ln Zq Zp + kT ln 1 1−ε .A special case of the above corresponds to a generalisation of Landauer's erasure principle, as we can take the initial state to be two thermalised degenerate levels at energy 0 (i.e. one unknown bit) and the final state to be thermalised but with one energy level at 0 and the other extremely high (i.e. a bit taking one value only).Then we see W ε = −kT ln 2 + kT ln 1  1−ε .(The reverse direction is also possible, corresponding to a singlequbit Szilard engine).
Theorem 1.In the work extraction game defined above, consider an initial density matrix ρ = i λ i |e i e i | and final density matrix σ = j ν j |f j f j | with {|e i }, {|f j } the respective energy eigenstates and both ρ and σ having finite rank.Let the work one can extract except with a probability ε of failure using strategy S be denoted W ε S (ρ → σ).For any strategy this respects One may, under a certain proviso, construct a strategy that saturates the bound in the sense that W ε (ρ → σ) is achieved except with probability of ε; the proviso is that the agent can access a single extra two-level system which is fixed to be in one of its energy eigenstates |ξ ξ| both initially and finally, i.e. ρ = ...⊗|ξ ξ| and σ = ...⊗|ξ ξ|.
The proofs are in the technical supplement.A rough intuition for the bound is that the Gibbs-rescaling is needed to take into account the bias imposed on the energy levels by the Gibbs statistics, and up to that bias, only the amount of majorisation matters as the work extraction process lowers the majorisation amount, the relative mixedness.See Figure 1 for a simple example of how to apply the theorem.
We now discuss the implications of the statement and develop the argumentation further.We focus on the laws of thermodynamics.There are two related reasons why some of the standard laws need modification: (i) they are about averages whereas we are making single-shot statements, (ii) they are based on the von Neumann entropy which we found is not the appropriate quantity away from the von Neumann regime.0 th and 1 st laws of thermodynamics.-The0 th law can be stated as: There exists for every thermodynamic system in equilibrium a property called temperature.Equality of temperature is a necessary and sufficient condition for thermal equilibrium.This also holds after our generalisation.In particular we are still assuming heat baths that take the working medium closer to a Gibbs thermal state upon interaction.
The first law can be viewed as both asserting the conservation of energy as well as stating that it can be divided into two parts, work and heat, which are normally defined in the description accompanying the first law equation: dU = dQ − dW .U = tr(ρH) is the expected internal energy of the working medium with Hamiltonian H, Q is 'heat' and W 'work'.The associated physical setting is that there is a system, the working medium, which can either exchange energy with another system in a thermal state dubbed a heat bath, or with a work reservoir system normally implicitly assumed to be in some energy eigenstate of its own Hamiltonian.Exchanges of energy with the heat bath are dubbed heat and those with the work reservoir work.This essentially carries over into our approach but with some important subtleties.We assume energy conservation (in every single extraction), as well as allowing for interactions with a heat bath and a work reservoir.Thus the following is respected: dE sys = −dE bath − dE reservoir .We, more subtly, break dE reservoir into two parts: dE reservoir = dW ε S + dE extra .There is the energy transfer which is predictable (up to ε probability of failure) in that it corresponds to dW ε S (ρ → σ) for the infinitessimal state change ρ → σ using strategy S. We view anything beyond that, given by dE extra , as heat.The idea behind this is that only predicted energy transfer should count as work.One may for example imagine buckets lifting water out of a mine up to a certain height (or as a quantum example an electron excited into the conduction band).The height at which the buckets are tipped into a reservoir is specified in advance.If they go higher than this, the extra potential energy will be transferred to other degrees of freedom associated with the reservoir system, e.g.into movement of the water or heating of the semi-conductor.We may express the following first law for this approach: In any given extraction, with probability p≥ 1 − ε Second law.-Consider next the so-called Kelvin statement of the second law: No process is possible in which the sole result is the absorption of heat from a reservoir and its complete conversion into work.This does not say anything about processes with a non-zero probability of failure.We show in the appendix that for given states of the working medium A and B respectively, W ε (A → B) + W ε (B → A) ≤ W 2ε (A → A).We call this the 'triangle inequality'.It implies together with the main theorem that all strategies in our game respect the following generalisation of Kelvin's second law: where S i is the choice of strategy in the i-th step of the cycle.Note that W 0 (A → A) = 0 (see main theorem), implying that deterministically no work can be extracted in such a cycle.One may still gain work in a single cycle at the cost of having ε > 0 for one or more of the steps.
The second law is also closely related to entropy increasing with time and one may wonder what the corresponding generalisation of the statement is.A particular standard expression is that where S and E are the von Neumann entropy and expected energy of a system interacting with a heat-bath with inverse temperature β. (∆ indicates the change in these values during the interaction.)This actually still holds in our more general model; we show this in the technical supplement.However, crucially, it is not sufficient to guarantee that an evolution is possible.Instead it should be replaced by the statement that a state change ρ → ρ due to a thermalisation with a heat-bath at temperature T is possible if and only if This is significant as there are processes that respect Eq. 6 but violate Eq. 7. A simple example is to consider degenerate energy levels, so that ∆ E = 0, and three levels with probabilities (1/2 1/2 0) T → (2/3 1/6 1/6) T .Then ∆S ≈ 0.25 but W 0 is negative.
Strikingly, such evolutions enable the deterministic violation of Kelvin's second law (see technical supplement).The inequivalence of entropy and majorisation has been noted previously in the context of the second law [15,16].Presumably this has not received more attention to date because in the von Neumann regime this inequivalence disappears.More precisely, if we consider a tensor product of n identical states each with von Neumann entropy S and let n → ∞, then with asymptotically small error we may approximate the spectrum as a uniform distribution with value 2 −nS in the range [0,2 −nS ].For such distributions the partial orders induced by S and majorisation respectively coincide.
Recovering existing results.-Eq. 2 above from [11,12] and accordingly 1 from [9] are special cases of our main result-see the technical supplement.Eq. 2 corresponds to the case where the final state ρ T is demanded to have the same eigenspectrum and be a Gibbs state ). (It was also shown in Fig. 1 that a generalisation of Landauer's principle is a special case).
Outlook.-The results could be made more general still by considering small heat baths, along the lines of [18].We also note the striking similarities between what is discussed above and the question of quantifying entanglement in the non-asymptotic regime.It was shown in a seminal paper by Nielsen [19] that majorisation is the central quantity there.We anticipate that many of the results from our work can be applied also in that context.In this section we define the setting more carefully and derive the upper bound on W ε .We first give certain definitions and lemmas which will be needed.We now move towards defining the work extraction game.The essential idea is that the work extracting agent wants to extract work under the restriction that the pre-and post-extraction states are given, both in terms of the energy spectra and the occupation probabilities of the energy levels.The agent must choose a strategy before the extraction and then not intervene further during the extraction.A strategy is defined by the action the agent takes at each step, with the steps labelled as j.The actions it can choose from, which will be defined in detail later, are of two types: (i) thermalisations (changing occupation probabilities towards the Gibb's state) or (ii) work extractions (changing a specified energy level but not any occupation probabilities).We will say there are m work extractions in total (this may tend to infinity).
In a given realisation the system will be in a certain energy level to start with (though it is not in general know to the agent which one).It may hop between energy levels during thermalisations.When a level is changed it costs energy only if the system occupies that level.Definition 3 (Notation).s ∈ {0, 1} m : a vector with one entry for each of m work extractions (subsequently called "steps"): s j = 1: system is in chosen states for work extraction s j = 0: system is not in chosen states for work extraction ŝj is the complement of s j : s j = 1 ⇔ ŝj = 0 and s j = 0 ⇔ ŝj = 1 w j s : logarithmical work (kT ln(w j s ) = W j s ) one extracted in the step j on the path s. w j : The logarithmical work one extracts in step j if the specified level is occupied.w: total logarithmical work demanded in order to call the total extraction successful.
G is the set of the successful paths, i.e. those yielding as much work as demanded: probability of doing step j on the path s.P S : total probability of success: P S = s∈G j η j s .
φ j s : state of the system after step j if the previous evolution of the system is given by the path s λ j s : eigenvalues of state φ j s .p j s = G φ j s : Gibbs rescaled probability distribution after step j (before thermalizing) conditioned on the previous steps on path s. p j s,t : Gibbs rescaled probability distribution after step j (after thermalizing) conditioned on the previous steps on path s. q: final Gibbs rescaled probability distribution, conditioned on successful work extraction: B j : Bistochastic matrix one chooses after step j by thermalizing the system (this has to be the same for all paths).E j (x): Energy of the level labelled by x after step j.Θ U (x): Step function on U : For a < b the interval (a, b] is said to be a block corresponding to a level k, if p j s is constant on this interval ∀ s. We now turn to how interactions with the heat bath, thermalizations, act on the state of the system.Roughly speaking these take the density matrix closer to the associated Gibb's state.As already mentioned the thermalization is taken to only change occupation probabilities and not energy eigenvalues.We take the thermalisation to act as a stochastic process on the energy eigenstates, in that the probability of occupying a given energy state, P (i), becomes P (i) = j P (j → i)P (j) where the summation is over all eigenstates, P (j → i) is a transition probability, and P (j) an occupation probability (before the interaction with the heat bath .This can equivalently be written as P = B P where B is a stochastic matrix (entries are probabilities and columns sum to 1).
Not any stochastic matrix B is allowed however.The Gibb's state (associated with temperature T ) is taken to be invariant under a thermalisation.Consider the implications firstly for the fully degenerate case of all energies being the same.In this case the Gibb's state is the uniform distribution.The only stochastic matrices that leave the uniform distribution invariant are bistochastic ones (rows also sum to 1).Thus in the fully degenerate case B must be bistochastic.We see no reason to impose further restrictions, so any such B is allowed.(choosing different allowed B's is taken to correspond physically to choosing different times of coupling to the heat bath.) Consider secondly the non-degenerate case.Here it is again convenient to use the Gibb's rescaled distribution.Note that the Gibb's state becomes uniform after the Gibb's rescaling.Thus one may hope to model a thermalisation as a bistochastic matrix on the Gibb's-rescaled distribution.
Consider dividing the Gibb's-rescaled distribution into fine blocks such that all fine blocks have the same width w.Let N be the number of fine blocks.(As the total support is given by the partition function Z we have w = Z/N ).Let N k be the number of fine grained blocks associated with level k, such that n k=1 N k = N .Each energy level is associated with one block only labelled by k.Each l-th fine block is associated with a level k l .
Fine blocks associated with the same energy level k must all have the same height, given by P (k l )/e(k l ).Let f contain the N heights of the fine blocks, with P (k l )/e(k l ) = P (k l )N/(ZN k l ) as its l-th entry.Now when the occupation probabilities transform under B, f undergoes an associated transform.We will argue it is given by a matrix F whose entry in the l-th row and m-th column is given by To see this note firstly that P i = j B ij P j = j Nj Nj B ij P j , and recall that f l = P k l N/(ZN k l ).Thus As B ij and N are nonnegative real numbers F has nonnegative real entries only.To see that the columns sum to 1 so that F is a stochastic matrix, note that the column sums are the same as for B which is stochastic.Moreover as B must leave the Gibb's state invariant, and this is a uniform distribution after the Gibb's rescaling, F must leave the uniform distribution (or anything proportional to it) invariant.Then for any row i: j F ij (1/N ) = 1/N so each row of F must sum to 1. Therefore F is a bistochastic matrix.Note that F is additionally restricted, through being defined via B, to keep the heights of fine blocks the same whenever these are associated with the same level.Accordingly we define interactions with the heat-baths, thermalizations, to act in the following way on the system.
Definition 5 (Thermalization).If after step j one chooses to do a thermalization the energy levels are invariant.The occupation probabilities transform under a stochastic matrix B j .B j leaves the Gibb's state invariant.This implies as shown above that any fine-grained Gibb's-rescaled distribution where all blocks have same width transforms under a bistochastic matrix F defined above.F cannot introduce differences between heights of fine blocks associated with the same energy level.
The only assumptions we have used in the above to arrive at the above definition are: 1.Only the occupation probabilities of the levels change.
2. The occupation probabilities of the energy levels transform according to a stochastic matrix.
3. The Gibb's state (of temperature T) is invariant (when heat bath has temperature T).
Note that a permutation matrix is bistochastic so one may permute the levels in the thermalization step.We use this to simplify the notation of the work extraction definition.We take the levels lowered or raised to form the first l levels.For x ∈ (0, 1], the eigenvalues of the levels after step j, conditioned on the previous state are given by: In the case s j = 1 (state of the system is found to be in the levels corresponding to (0, a]): where In the case s j = 0 (state of the system is not in the levels corresponding to (0, a]): where dx.In this case there is no change in the work reservoir: w j s|sj =0 = 1.

Definition 7 (Our work extraction game).
There are three systems and a work-extraction agent.One system is the working medium, another is a heat bath of temperature T, and the last is the work reservoir.
The initial energy spectrum {E} of the working medium is arbitrary but given.The initial density matrix ρ of the same is diagonal in the energy basis.The final energy spectrum {F } and diagonal density matrix σ are also arbitrary but given.
The agent can combine thermalization (defined above) and work extraction(also defined above) in any sequence.This sequence, together with the specifications for each step is called its strategy.We shall be interested in bounding W given ε and the initial and final conditions.We break the calculation into several lemmas which will later be combined to prove the main theorem.
We next consider how much work extraction flattens the Gibbs rescaled distribution.Lemma 2. For x ∈ (0, Z j ] (with Z j the partition function after step j), by doing a work extraction in step j, the two possible Gibbs rescaled probability distribution after the step, conditioned on the previous steps on path s are given by the following.In the case where s j = 1: In the case where s j = 0: One easily sees that p j (x) = 0 for x ≥ a w .The proof for the case s j = 0 is analogous.
The next lemma is in contrast concerned with how much wider the Gibbs rescaled distribution is after the work extraction (recall that the distribution has support on (0, Z] where Z is the partition function).
Lemma 3. The partition function Z j immediately after step j is given by: Proof.Let (0, a] be an interval consisting of blocks corresponding to the levels {1, . . ., l} and n such that it can split the interval (a, Z j ] into n − l blocks.We now combine the two previous lemmas to gain another relation between the Gibbs rescaled distribution and steps j and j − 1.We shall use this later in an iterative manner to relate the very first and final Gibbs rescaled distributions.Lemma 4. The Gibbs rescaled probability distributions at steps j and j − 1 respectively satisfy the relation w j s|sj =k η j s|sj =k p j s|sj =k xw j s|sj =k + c j s|sj =k with constants c j s|sj =1 = 0 and c j s|sj =0 = aw j − a.
Where τ reorders s∈{0,1} j p 1 in descending order in (0, aw] and s∈{0,1} j p 0 in (aw, Z j ].This is possible since p 1 and p 0 have disjoint support, also for different s, since a in definition 6 has to be chosen independently of the path.(See lemma 2).l 1 ∈ (0, min(a, l)] is a value which maximizes the right hand side of the last line.Using the same argument backwards after changing variables, we get: Applying any bistochastic matrix B on the probabilities p 0 and p 1 and reordering in descending order with τ j+1 t afterwards, we get: Where the inequality follows out of the inequality Bp p for any bistochastic matrix B and vector p, which is proved in [20].
Theorem (First part of Theorem 1 in main body, giving the bound).In the work extraction game defined above, if one is given an initial density matrix ρ = i λ i |e i e i | and final density matrix σ = j ν j |f j f j | with {|e i }, {|f j } the respective energy eigenstates and both ρ and σ having finite rank, then the work W ε one can extract with certainty except with ε probability respects Proof.Define p 0 s = p.W.l.o.g.s = {0, . . ., 0} (the first probability distribution is independent of the path afterwards).Inductively using lemma 5 one gets: Therefore (with P S = 1 − ε): This proves the first part of the main theorem.
Appendix B: W ε is achievable This section concerns the second statement of the main theorem (theorem 1).We specify a protocol that achieves the bound given in theorem 1, i.e. it extracts W ε of work with a failure probability no greater than ε.The protocol is within the rules of the game (defined in section A).The protocol works for the initial (ρ) and final (σ) states taking the form ρ = ... ⊗ |ξ ξ| and σ = ... ⊗ |ξ ξ|, where |ξ is one of the energy eigenstates of a system with two energy eigenstates in total.This is a small restriction.It amounts to allowing the agent an extra two-level system in a known state, working as a catalyst in the sense that it aids the process but is ultimately unchanged by it.
Before giving the general protocol it is instructive to consider an example.We begin with a state φ with energy-eigenvalues λ i (j) and energy eigenvalues E i (j) and A i defined by A i (j) = exp −Ei(j) kT .These are given by: and therefore: The final state we want to reach is defined through: , 0 , and therefore: With a risk ε = 1 2 the work for this game is limited by In this example we show how this amount of work can be extracted.
We first want to raise as many levels as we can to infinite energy, such that if we succeed, we start with a more known state.Unfortunately the sum of the occupation probabilities of the lowest levels will never yield exactly ε, so we need to change this first.
We start by raising the empty level to infinite energy, such that even if one mixes it completely with any other level it will stay empty.Then we lower the energy of the empty level, while constantly mixing this level with the first one.At the same time we enhance the energy of the first level, such that in total the energy of the work reservoir is unchanged with probability 1 (the details of this action can be found below in definition 8 and the following lemma).We then have: The lowest two eigenvalues now sum up to ε.We enhance the energy of these two levels by doing a work extraction changing the energy of their states by ∞.With probability 1 − ε = 1 2 we get the work 0 and the state: which in this case is a pure state (the state would not have been pure if we had chosen ε to be smaller than 1  3 ).With probability 1  2 we get the work −∞, so the above is the only step to be considered.Now we extract the work W = kT ln 4 3 on all the levels.This succeeds with probability 1.The state afterwards is given by: λ 3 = (1, 0, 0) , 0, 0 Again we need two levels where we only have one.Acting again as defined in definition 8 on the first two levels we can get: The energy of the second level is now too high and we need to lower it by kT ln(2): (1, 0, 0) , with probability 1 2 (0, 1, 0) , with probability , 0 The work extracted in this step is in both cases at least 0. So by measuring whether the energy in the work-reservoir has been enhanced by at least W = kT ln 4 3 , we get a "yes" and the wanted final state with probability 1  2 .As seen in the above example, we need some algorithm which allows us to shift some probability from one level to the other, if they are in thermal equilibrium.We only want to change these two levels (say j, k), so the sum of their eigenvalues remains constant (λ j + λ k = const).Also we hope to be able to do this without needing to do any work, so we keep our total knowledge of these levels constant.To achieve this it seems a good idea to have p j +p k = const and constantly thermal equilibrium.This is the guiding idea for the following algorithm.

Definition 8 (Isothermal shift of boundary). Let
, where E j is the energy eigenvalue of the j'th level.Let the levels j, k = j + 1 have the same Gibbs rescaled probability.We call the limit n → ∞ of the following process an isothermal shift of the boundary between j and k by w ∈ − A(j) A(j)+A(k) , A(k) A(j)+A(k) in direction k: 1. Do a permutation, which brings the level j in front and level k as second.
2. Do a work extraction on (0, A(j)] by: 3. Do a permutation, which brings the level k in front and level j second.
4. Do a work extraction on (0, A(k)] by: 5. Do a thermalization totally mixing the two levels j and k and letting all others untouched (i.e. the matrix with entries 1/2 in (1, 1), (1, 2), (2, 1) and (2, 2) and δ m,l everywhere else, such that the first entry of the vector it is applied on, is the probability of the level j after work extraction and the second is the probability of the level k).
6. Restart with 1. n times in total, redefining A(j) and A(k) as above for the probabilities after this process.
7. Do a permutation, which brings back the levels j and k = j + 1 at their position at the beginning (we show below, that this is possible).
Instead of the first four actions, we could have simply said we do extract the work w 1 on the level j and the work w 2 on the level k.Then we would have had to continue with doing the total mixing also between these levels (instead of at the first and second position of the matrix) and so on.What we mean here with doing a work extraction on the level j is the action: do a permutation bringing the block (a, b] corresponding to the level j in front, extract work, permute the level back.This yields the same result, as if we just defined the work extraction generally on any blocks (a, b] (i.e.squeezing of the block by w, keeping the corresponding area constant), instead of only on blocks (0, c] (the proof is trivial).In later definitions we will make use of this.Here we do not, since the algebra would get slightly more complicated.
The following Lemma shows that the above process costs no work with probability 1 and that it can indeed be seen as a shift of the separation between the levels.

Lemma 6 (Action of the isothermal shift of boundary).
Let A(j) = exp −E(j)

kT
, where E(j) is the energy eigenvalue of the j'th level.Let the levels j, k = j + 1 have the same Gibbs rescaled probability.After an isothermal shift of the boundary between j and k by w ∈ − A(j) A(j)+A(k) , A(k) A(j)+A(k) in direction k: 1. (a) the energy eigenvalues of all levels but j and k remain constant.
(b) At the end A f (j) = exp is the energy of the eigenvalue j after the shift).
we see, that after l times one goes through the algorithm, one ends up with: In order to derive 2. and 3. we need to have a closer look at how the eigenvalues change each of the n times we go through the algorithm.The eigenvalues are given by the Gibbs rescaled probabilities multiplied with the corresponding A(l).Let q be the Gibbs rescaled probability distribution after step 1. of the i'th time one goes through the algorithm in definition 8.After step 2. we have: , with prob.η(q j ) q(x−Aj w1+Aj )Θ (A j ,Z(q)] (x) 1−η(qj ) , with prob. 1 − η(q j ) where η(q j ) = Aj 0 q(x) d x and Z(q) is the partition function of q.After step 4. we thus have: Noting that q(x) = q(x/w 2 ) for x ∈ (0, A k ] and similarly for x ∈ (A j , A k +A j ] and x−A j w 1 −A k w 2 +A k +A j = x, we can rewrite this as: Which means that after step 5. we get: For 2. note that with probability 1 − (λ(j) + λ(k)) we get after the first time one goes through the algorithm: q j = q k = 0 (which just means, that the state is measured to be orthogonal to j and k).And therefore in the subsequent steps we have η(q j ) = η(q k ) = 0.So we get with probability 1 − (λ(j) + λ(k)), the final probability distribution: Since the energy eigenvalues of these levels are unchanged, we get λ(l) 1−(λ(j)+λ(k)) for l = j, k and 0 for l = j, k for the eigenvalues, which proves 2. The final Gibbs rescaled probabilities of the levels j and k have the same value (since we completely mix them in step 5.).Their integral ( Aj +A k 0 q(x) d x), after the first time one goes through the algorithm keeps 1 (with probability λ(j) + λ(k)).As noticed before, Thus we get that with probability λ(j) + λ(k) the eigenvalues of the levels are given by: A f (l) A(j)+A(k) for l = j, k and 0 else.Which proves 3.
Suppose in the first time one goes through the algorithm the state is orthogonal to the levels j, k: then the energy in the work reservoir is unchanged throughout the whole n times one goes through the algorithm and for this case, 4. follows trivially.We now look at the other case (the case where the state is projected onto the levels j, k the first time one goes through the algorithm).Let s ∈ {1, 2} n .Define σ(2) = 1 and σ(1) = −1.Define α 1 = A(j) A(j)+A(k) and α 2 = 1 − α 1 .In the l'th time one goes through the algorithm one either gets the logarithmical work or the similarly derivable value for w l (2) (A l is defined in the proof of 1.(b)).Thus we can write: In total we get the logarithmical work: with probability (given, that we have the case where the state is projected onto the levels j, k the first time one goes through the algorithm): The expectation value of w tot can be computed as follows (for n < ∞): We now look at how much the work W = ln(w tot ) changes, if in step l one replaces s l by ŝl (remember that s j = 0 ⇔ ŝj = 1 and vice versa): Using the McDiarmid inequality [21] we get that the probability that W differs from its expectation value is bounded by: which tends to 0 for any δ > 0. Therefore we get that the work in this process is given by 0 with probability 1, which proves 4.
Next we need an algorithm which makes it possible to get the end state σ out of the initial state ρ, if p p f (the generalization of the step 4 → 5 in the example).To write down the algorithm, we first need a definition and a technical lemma simplifying the notation: (Note that the above definition reduces to the usual sum if c ∈ N).Lemma 7. Let p and p f be Gibbs rescaled probability distributions of two states ρ and σ, with the same number of levels n.Let Z = Z f where Z is the partition function of ρ and likewise for Z f .Let p p f .Then: and:

kT
, where E(j) is the energy eigenvalue of the level j of the state ρ and likewise for A f (j) for the end state σ.
We first prove that such a k exist: For k = 1 we have: is continuous in a 1 it follows that there is an a 1 for which A f (1) .
And again we can follow out of the continuousness of that there is an a k for which and reproducing the arguments above, we see that we can chose a k ≥ b k .
Having this technical lemma, we can now define the algorithm which makes it possible to get the end state σ out of the initial state ρ, if p p f .Definition 10 (Assimilation of ρ to σ).Let p and p f be Gibbs rescaled probability distributions of two states ρ and σ, with the same number of levels 2n + 1.Let ρ, σ have at least n + 1 levels with eigenvalues e(i) = λ(i + n) = 0 and energy eigenvalues E = ∞ Let p p f .We call the following algorithm an assimilation of ρ to σ: 1. Do a work extraction on the n + 1'th empty level s.t Z = Z f .
2. Choose a i and b i for the 2n + 1 levels as in the above lemma.Lemma 8 (Action of the assimilation ρ to σ).Let p and p f be Gibbs rescaled probability distributions of two states ρ and σ, with the same number of levels 2n + 1.
Let ρ, σ have at least n + 1 levels with eigenvalues λ e = 0 and energy eigenvalues E e = ∞ Let p p f .After an assimilation of ρ to σ the final state is given by σ with probability 1.The extracted work is 0.
Proof.Note that z < Z f since p p f .Therefore one can lower the energy of one empty level, such that we get Z = Z f .This costs no work and we still have p p f afterwards.Thus after step (1) we can apply lemma 7.
and likewise : (see lemma 6).After steps 3.(a),(b): ), this is trivially true for k = 1.With this assumption we have sum of all involved "A" is conserved in isothermal shifts, see the first part of the proof of lemma 6).After step 3.(c) we get for any level j in { b k−1 , . . ., a k }: We can now define an algorithm which is more physical.We assume here that we have at least n/2 levels, with 0 probability, but make sure that in the end these levels have again 0 probability (note, that this does not change the upper bound for the work).We assume that the levels are ordered in descending order of their Gibbs rescaled probability.
Definition 11 (Work extraction algorithm).Let p and p f be Gibbs rescaled probability distributions of two states ρ and σ, with the same number of levels 2n + 1.Let ρ, σ have at least n + 1 levels with eigenvalues λ e = 0 and energy eigenvalues E e = ∞ Define W = kT ln(M ε (p, q)).
1. First we lift all the empty levels to infinite energy, which costs no work.
2. If there is no k for which 1 − ε = k i=1 λ(i): Take an empty level e (the level n + 1) and a level k for which Make an isothermal shift of the boundary between e and k by 1 5. Make an assimilation of the obtained state to the final state.
6. Permute the levels of the obtained state such, that one gets the final state.
Theorem 9 (Bound can be achieved (second part of main theorem)).Let p and p f be Gibbs rescaled probability distributions of two states ρ and σ, with the same number of levels 2n + 1.
The work extraction algorithm on ρ yields the work W with probability 1 − ε.If the work extraction is successful, the final state is given by σ with probability 1.
Proof.After step 2. the sum of the eigenvalues n+1 i=k+1 λ(i) = ε (see lemma 6).Therefore the work extraction in step 3. succeeds with probability ε and if it does not succeed it yields 0 work (else −∞).After step 3. the eigenvalues are given by λ 3 (i) = λ(i) 1−ε for i = 1, . . ., k and λ 3 (i) = 0 else (if the work extraction succeeds).After step 4. by the definition of W we have that p 4 (i) p f (i), the extracted work is W . Therefore one can make an assimilation of the obtained state to the final state and one gets the final state ρ (up to permutation) with probability 1 (see lemma 8).After the permutation (if the levels have some special physical meaning) we get the final state ρ with probability 1.In total we get the final state ρ with probability 1, if the work extraction succeeds and the extracted work is W with probability 1 − ε.
For the proof of ii) assume, that: Now we have all we need to proof the theorem above: Proof.let p ε be a probability function with the smallest possible support such that δ(p, p ε ) ≤ ε and define d ε as in lemma 11.For l ≤ d ε the requirement for maximal work extraction reads (using the lemma) The above is an equation in the case l = d ε .Which shows that the maximal w as defined in theorem 1 is given by Eq. 2 is a special case of the above theorem, recovered when the final state is a Gibbs state and has also the same energy eigenvalues as the initial.
Corollary.Let ρ be a diagonal state with energy eigenvalues E i and σ T be the Gibbs state with the same energy eigenvalues E i at the bath temperature T .Then the maximal extractable work at risk ε is given by: Proof.Let p be the Gibbs-rescaled probability function corresponding to ρ and P (j) the eigenvalues of ρ.Let a be the flat energy probability function corresponding to , where E(j) are the energyeigenvalues of ρ and σ T and Z is the corresponding partition function.This means by definition, that and likewise a(x) = 1/Z (both defined for x ∈ [0, Z]).
From the above theorem we get: Appendix E: Entropy increase law Consider the interaction of the working medium system with the heat bath.Let S be the Von Neumann entropy of the system, β the inverse temperature associated with the bath, and E = i λ i E i the expected internal energy of the system.This section compares the standard law for entropy increase: with the one we propose should replace it: 1. Our model respects standard expression Lemma 13.In the model for thermalisation used here Eq.E1 is always respected.
Proof.We firstly recall the model and define certain notation.
Recall that the thermalisation model states that when two levels, 1 and 2, are coupled to the heat bath, their ratio λ 1 /λ 2 gets closer to exp(−β(E 1 − E 2 )), and the other λ's are untouched.In our model one may concatenate several such interactions to implement any allowed multi-level interaction with the bath.It will therefore suffice to show that Eq.E1 holds for a single two-level interaction with the heat bath.
For notational convenience let the probability of being in level 1 or 2 be called λ 12 := λ 1 + λ 2 .This is then constant for the given two-level interaction with the bath.In the extreme case of the two levels interacting with the bath for an arbitrary amount of time we have λ 1 := λ T 1 and λ 2 := λ T 2 (T reminds us of the temperature dependence).These values must then obey the relation We also assume without loss of generality that E 2 ≤ E 1 .This implies that λ T 1 ≤ 0.5λ 12 .Now we begin to prove the statement.Firstly we simplify δS by noting that only two levels change their probabilities.We write We see that in any two-level interaction ∆S = ∆S 12 . (E4) It is helpful to re-express S 12 in terms of an actual entropy S 12 , so that we can use known properties of entropies to make statements about S 12 .We let λ 1 := λ 1 /λ 12 and λ 2 := λ 2 /λ 12 such that λ 1 + λ 2 = 1.We define S 12 := −λ 1 log λ 1 − λ 2 log λ 2 .
One can then see in a few lines of algebra that S 12 = λ 12 S 12 − λ 12 log λ 12 .
We Note now that E may similarly to the entropy be written as such that ∆ E = ∆ E 12 = (∆λ 1 )(E 1 − E 2 ), with ∆λ 1 = λ 1 − λ 1 the change in λ 1 .So E (λ 1 ) is a line with gradient given by Similarly Comparing this with the gradient of the tangential line to S 12 in Eq.E6, we see that 1 λ12 β E 12 has the same gradient as the tangential line.We therefore only need to show that the change in the tangential line is upper bounded by the change in the entropy curve, as it is equivalent to showing that ∆S 12 ≥ 1 λ12 β E 12 .This must hold for all possible initial and final values of λ 1 and all possible values of λ 1 T (recall that we assumed without loss of generality that λ 1 T ≥ 0.5 ).These can be grouped into three cases.
This implies the lemma.

Evolutions respecting standard expression may violate Kelvin's second law
Recall that our condition on thermalising evolutions was stronger than Eq.E1.There are, as mentioned in the main body, examples of evolutions that respect Eq.E1 but violate our condition: Eq E2.In this subsection we consider whether these evolutions may violate Kelvin's second law: No process is possible in which the sole result is the absorption of heat from a reservoir and its complete conversion into work.
We use standard results concerning majorisation, as well as our main theorem.We will consider degenerate energy levels for simplicity so that Eq.E1 reduces to ∆S ≥ 0. We assume the evolution corresponds to a stochastic matrix.Lemma 14.Any stochastic matrix A which for some state violates Eq.E2 but respects the entropy condition ∆S ≥ 0 will for some input state, namely the uniform distribution, violate ∆S ≥ 0.
Proof.(i) Eq.E2 is respected iff the matrix is bistochastic.Thus A is NOT bistochastic.
(ii) The uniform distribution is invariant under a stochastic matrix iff it is bistochastic.Thus A does NOT preserve the uniform distribution.Now the uniform distribution is unique in having maximal von Neumann entropy.Thus ∆S ≥ 0 is violated if the input state is the uniform distribution.Lemma 15.An evolution A of a state to another which violates Eq.E2 but respects the entropy condition ∆S ≥ 0 would allow for the violation of Kelvin's second law within our game: deterministic work extraction would be possible from a cycle where the system is in the thermal state both initially and finally.
Proof.Recall that we are for simplicity considering degenerate energy levels in this subsection.The thermal state is then the uniform distribution.Apply A to this (at no work cost as it represents an interaction with the heat bath).Now we have a state σ other than the uniform distribution, so it must majorise the uniform distribution.
To see that this implies deterministic work extraction we firstly show that W 0 > 0 for some process using A and allowed operations within the game.Consider taking n copies of σ and going to the von Neumann limit by taking n to infinity as well as taking the risk of failure ε to 0. To evaluate W ε in this limit it is convenient to use Theorem 10 which re-expresses W ε .Recall that in the von Neumann limit the smooth max entropy reduces to the von Neumann entropy S. We therefore have, for the case of degenerate levels: Recall secondly the subtlety that we proved that W ε (σ → σ ) is achievable within the game when there is access to a catalyst system.Consider extracting work from n copies of σ ⊗ |ξ ξ| which will be set to n copies of 1/d ⊗ |ξ ξ| at the end.Now H max (1/d ⊗ |ξ ξ|) − S(σ ⊗ |ξ ξ|) > 0 as neither entropy of a state is changed by adding a pure system in this way.Thus including the catalyst system does not change the statement that W 0 > 0 for the above procedure in the von Neumann limit.Accordingly this process violates Kelvin's law.

Figure 1 :
Figure 1: Case of initial and final states being Gibbs states.The Gibbs-rescaling takes a Gibbs state with partition function Zp(Zq) to a uniform distribution p(q) of width Zp(Zq) and height1  Zp ( 1 Zq ) (upper graph).The integrals p and q (lower graph) are used to evaluate the relative mixedness.m defined in the relative mixedness definition must satisfy Appendix A: Upper bounding W ε

Definition 2 (.
Gibbs rescaling).Consider a density matrix ρ = n i=1 λ i |e i e i | with eigenvalues {λ i } n i=1 and take the energy eigenstates of the system to be {|e i } n i=1 with energies {E i } n i=1 respectively.There is an associated step function for the spectrum, λ(xn) = λ xn where x ∈ (0, 1].Similarly there is an energy step function E(xn) = E xn where x ∈ (0, 1].The Gibbs rescaling associated with temperature T combines λ(xn) and E(xn) to a new function G T (y) given by It follows that G T (y) is defined on (0, Z], with Z = n j=1 exp − Ej kT the partition function.Moreover G T (y) is a probability distribution satisfying Z 0 G T (y)dy = 1.

Definition 6 (
Work extraction).To do a work extraction in step j one first defines an interval (0, a] = l k=1 (a k−1 , a k ], where (a k−1 , a k ] correspond to levels {1, . . ., l}, on which one wants to change the energy by ∆E = −kT ln w j s|sj =1 .The remaining levels remain untouched.

A
given work extraction will transfer some energy ν to the work extraction reservoir.Before the extraction the agent must specify W .If ν ≥ W the work extraction is termed successful.The probability of success is called 1 − ε.

=
Z j−1 + aw 1 − a out of which the lemma follows.

Figure 2 :
Figure 2: Isothermal shift: The isothermal shift of the boundary between the levels 2 and 3 in direction 3 leaves p, λ2 + λ3 and A2 +A3 invariant, while it increases λ2 and A2.The work cost is 0.

3 . 4 .
With probability λ(j) + λ(k), the eigenvalues of the final state are given by A f (l) A(j)+A(k) for l = j, k and 0 else.With probability 1 the energy in the work reservoir is changed by W = 0.Proof.1.(a) just follows out of the algorithm, since we did not do any work extraction on any levels and this is the only way we can change energies in our game.For 1.(b) we need to look at how the energy eigenvalues of the j'th and k'th level change each of the n times one goes through the algorithm in definition 8. directly from the algorithm we get, that in the first time one goes through it A(j) changes to A 1 (j) = exp −E(j)+kT ln(w1) kT and we get A 1 (j) = w 1 A(j) = A(j) + w n (A(j) + A(k)) and by the same argument A 1 1) w n with c = wσ(s l ) (and therefore wσ(ŝ l ) = −c), a = α s l (and α ŝl = 1 − a), x = a + c l n and y = 1 − a − c l n we get:

3 .
For k = 1 to n: (a) Do an isothermal shift on the level e k in direction a k by a k − a k (b) Totally mix e k with a k and do an isothermal shift on the level a k in direction e k by 1 (c) Totally mix b k−1 , . . ., a k .(d) Do an isothermal shift on the level e k in direction b k by b k − b k

Figure 3 :
Figure 3: Assimilation: In the k'th step one mixes so many levels (or part of levels using isothermal shifts), that the probability function p at the position k is equal to the final one.Then one cuts the levels such (using isothermal shift) that the eigenvalue of the former empty level e k,e (k) equals the wanted eigenvalue λ f (k).
After step 3.(d) of the k − 1 time one goes through the algorithm the eigenvalue of the level λ k−1,d ( b k−1 ) (the index stands for the k − 1, d time one goes through the algorithm after step d) is given by:

Figure 4 :
Figure 4: Work extraction algorithm: We choose the last levels such that the sum of their eigenvalues equals ε, then we lift them to infinity, which succeeds with probability 1 − ε (steps 1 to 3).Afterwards we extract the work W ε and get a state which still majorises the wanted final one (step 4).Thus we can get to the wanted state by doing an assimilation (step 5, see lemma 8).

3 .. 4 .
Do a work extraction on the levels k + 1, . . ., n, e by −∞Make a work extraction on all levels by W .