Inadequacy of von Neumann entropy for characterizing extractable work

The lack of knowledge that an observer has about a system limits the amount of work it can extract. This lack of knowledge is normally quantified using the Gibbs/von Neumann entropy. We show that this standard approach is, surprisingly, only correct in very specific circumstances. In general, one should use the recently developed smooth entropy approach. For many common physical situations, including large but internally correlated systems, the resulting values for the extractable work can deviate arbitrarily from those suggested by the standard approach.

Introduction-The relation between work and information has been the cause of great debate since the beginnings of statistical mechanics.Focal points of this debate include Maxwell's demon, Szilard's engine, Landauer's erasure and Bennett's reversible measurements [1,2,3,4].
That there should be such a relation can be seen intuitively by noting that harnessing motion, e.g.wind, for ones benefit requires knowing its directionality.In thermodynamical work extraction from the pressure of a gas one uses the knowledge that the particles are confined and will only push the piston from one known direction.The simplest example of such extraction is perhaps Szilard's seminal engine, described in Figure 1.
In the context of Szilard's engine, previous efforts to quantify the relation between work and information yielded expressions of the type W = (n−S)kT ln 2, where W is the work out, n the number of particles, S the lack of information (entropy) about the positions of individual particles and nkT ln 2 the amount of work that would be gained if there were no uncertainty [3,5].Feynman argued this expression defines entropy S [5].
In this Letter we revisit this relation in the light of the recently developed smooth entropy approach [6,7].This approach has enabled the extension of Shannon theory in a simple yet accurate manner so that is also valid for finite sized and correlated bit strings, and it is intriguing to ask whether something analogous can be achieved for * Electronic address: dahlsten@itp.phys.ethz.chstatistical mechanics.
We suggest as part of our approach that work extraction should be treated as a game where an agent uses its information to extract work by guessing aspects of the microstate of the system.The agent uses information compressing unitaries as part of this process.The agent also has to choose a trade-off between risk of failure and work extracted if successful.The work value of information therefore depends on the risk tolerance too.To recover a simple theory, we focus on the extreme cases of effectively no-and arbitrary risk tolerance respectively, as these cases envelope all others.We derive the two respective work values of information, and discuss the consequences.We recover the standard result in the appropriate limit, but show the work value(s) can in general be very different from that.The results hold universally for quantum systems and classical systems in the same way information compression bounds do.
The presentation proceeds as follows.We firstly summarize the smooth-entropy approach.We then describe existing ideas on how to use information to extract work, in particular the idea of using information compression in quantum systems.We go on to define the work extraction game within which to quantify the work value of information.We derive the two statements concerning the work value of information, and then discuss the implications.Smooth entropies-Given a probability distribution P with entries p i , or equivalently a density matrix with eigenvalues λ i , there are numerous ways to assign to it a number quantifying the associated ignorance, i.e., entropy.A commonly used function is the Shannon entropy An operational meaning of H max is that it answers the question of how many bits (two-level systems) a memory would need in order to store a message from the distribution.H min on the other hand bounds how many out of the n bits that are unbiased, in the sense that the marginal distribution on them is uniform.The marginal distribution on any number of bits will always have an entry which is at least of size p max := max i p i .One can accordingly, bearing in mind that the marginal distribution is normalised, not find a marginal distribution that is uniform and has more than 1/p max events.Thus no more than log(1/p max ) = H min bits can be uniformly distributed.One can moreover say (up to a small term) that H min bits are completely unknown, as will later be shown in the proof of Theorem II.
In practical applications one will normally not care about extremely unlikely events.This motivated the recently suggested [7] modified versions of the two entropies.The modified versions are called the smooth min and max entropy respectively, since they typically do not vary much under small changes in the probability distributions.They are defined in the following manner: H ǫ max (P ) := min P H max (P ). ( The maximum/minimum is taken over all P such that the statistical distance d(P, P ) ≤ ǫ (the trace distance in the quantum case).The parameter ǫ can be interpreted as the maximum probability of events one is prepared to ignore in the analysis and is normally taken to be very small, but non-zero.
In line with the definition, with probability p > 1 − ǫ, a memory of size H ǫ max will be enough to store a string from the distribution correctly.For example, with p > 1 − 0.00002 a memory of size H 0.00002 max = log(2) = 1 bit would suffice for P ex from before.
By the asymptotic equipartition theorem, both entropies converge to the Shannon entropy for n i.i.d.distributed particles as n → ∞ and ǫ → 0, see e.g.[8].This is only true for the smooth versions.
Readers familiar with the smooth entropy literature can note that what we call H ǫ max here is the smooth Renyi-entropy of order 0 and not that of order 1/2, but these only differ by at most log(1/ǫ) [7].Szilard's engine and Bennett's development-In this work we will consider the work value of information in a quite general work extraction scenario.To understand why we chose this scenario it is instructive to recall certain specific examples existing in the literature.Bennett, in particular, considered n Szilard's engines (like in Figure 1) together extracting work from a heat bath [3].The experimenter's knowledge is encoded in the probability distribution on particle positions {L, R} n .Each box has a work value c = kT ln 2 associated with knowing L or R perfectly.Thus if all boxes are either fully known or completely unknown, W = (n − n u )kT ln 2, where n u is the number of completely unknown boxes.Bennett notes, crucially, that correlations can be exploited too, even if the marginal distributions on the bits are uniformly random.The experimenter can implement a reversible interaction between the boxes to compress the total randomness/information (which is constrained by the correlations) onto individual bits, so that the others can then be used for work extraction.As a simple example, let n = 2, p(LL) = p(RR) = 1/2.Then performing the reversible so-called controlled-not interaction [9] would yield p(LL) = p(RL) = 1/2 so that the second box could be used to extract c work [5].
The considerations, like many other information theoretical arguments, apply also to quantum systems [10].In fact, the nano and quantum regimes should be the focus for implementations of these ideas, due to the probably unavoidable presence of friction in macroscopic systems.We then replace the n bits with n qubits, and the distribution on {L, R} n with a density matrix ρ representing the state of the n qubits [10].The reversible interactions compressing the information are then unitaryas discussed e.g. in [11] and in a closely related setting in [12,13].We may assume ρ is diagonal, i.e. a classical distribution, be- FIG.2: Unitary interaction compressing the information such that some bits are made fully known, some are biased and some uniformly random.Darker colour indicates higher probability density.Each bit represents a box such as in Figure 1.The agent can use this process to remove or minimise fluctuations in the work output since known bits will not yield any fluctuations.Note that it is not always obvious whether a box should be coupled to the weight or not.If the box is only biased but not certain, then a risk-averse agent would not use it, but another agent may.
cause the experimenter would otherwise apply a unitary to diagonalise it to minimise the uncertainty in the basis in question.Our work extraction scenario-We now construct a work extraction game from the above examples, wherein an agent tries to extract as much work as possible given its information.We say the agent succeeds in extracting work if and only if it lifts a weight (or does work against an analogous counter-force) to a predetermined level.We assume, as is common, that the relative thermal fluctuations in the piston are negligible because all particles are working on the same piston-see [14] for more discussion.The game is defined so that the agent: • Is given n bits/qubits of work value c and a distribution on {L, R} n /{|L , |R } n .
• Presets a unitary to be applied to the particles.
• Presets which of the boxes to use for work extraction after the unitary.
• Presets the weight to be lifted.
• Then interferes no more in the extraction.
The work value of information-The above is a welldefined, physically concrete, and quite general scenario within which to quantify the work value of information.
The agent has several choices to make.It is natural to assume the experimenter chooses the most information compressing unitary.Using uncertain bits and choosing a heavier weight both increase the possible yield and the probability of failure.To simplify the situation we focus on the extreme cases of an agent accepting effectively no risk of failure, and another accepting effectively arbitrary risk of failure.The following Theorems give the work value of information in the two respective cases and hold for single realizations.
Theorem I: Except with p < ǫ, the agent can be certain to extract W = (n − H ǫ max ) c work, and no more.
Theorem II: Except with p < 2ǫ, for an agent willing to risk failing to extract work, We proceed to outline the proof of the Theorems, omitting some tedious but straightforward calculations for clarity.
Theorem I follows from the following argument.By a standard smooth entropy result no fewer than H ǫ max bits can be uncertain (except with p < ǫ) so an agent unwilling to use any uncertain bits cannot extract more than (n−H ǫ max )c work.That the agent can in fact extract that amount of work with certainty follows from noting that the agent can apply the unitary which takes the initial distribution to a state [p 1 , ...p k , 0...0] where k is the size of the support of the post-smoothing distribution.Then only H ǫ max = log k of the bits are uncertain, and the agent can use the remaining ones to extract work.That concludes the proof of Theorem I.
We now proceed to prove Theorem II.We prove, crucially, that for the agent to guess all bits it is using successfully with p > ǫ, it has to desist from using at least H min + log ǫ bits.To see this, note that p max ≥ 2 (n−n) p max , where p max is the peak probability of the marginal distribution on the subset.Since p max > ǫ, n ≤ n − (H min + log ǫ).Thus at least H min +log ǫ bits have to be traced out to get p > ǫ chance of guessing the remaining bits correctly, in which case W ≤ n−H min +log 1 ǫ c.It can moreover be shown that the agent cannot exceed this by using an even larger set of bits, nor by varying the counterweight (under the restriction p > ǫ).Finally, to recover the smooth version, we go through the same derivation for an ǫ-close distribution, yielding Theorem II, which accordingly only holds with p < 2ǫ.Discussion-Four illustrative examples of distributions are discussed in Table 1.
It is moreover interesting that the standard thermodynamical heat engine work extraction scheme is a strategy amongst those considered here, corresponding to the unitary being the identity and all bits being selected.In general this is a suboptimal strategy (it could for example not extract any work given example four), although in the thermodynamical limit (of the first example above) it is, interestingly, optimal.This is because the standard thermodynamical work from P (L)n (P (R)n) particles on the left (right) of the divider can be shown to be given by W = n(1 − H S )kT ln 2. This quantity is identical to the extractable work if one also allows the information compressing unitary -see the first example in  FIG.3: Three entropies, Hmin, HS and Hmax are evaluated for n uncorrelated bits with p(L) = 0.7 (a choice motivated by the experiment [15]).One sees that in the n → ∞ limit the entropies coincide.This is because all bits are in this limit (after an appropriate unitary) uniformly random or fully known.Accordingly the first example in Table 1 has the same amount of 'min' and 'max' work.The figure also shows that n needs to be large in general for the entropies to coincide, which is why the second example in Table 1 shows that the type I and II work extractable are significantly different for n = 1000.For more details on this figure, see [16].
extraction, it is nevertheless significantly easier than full quantum computation.One will in general not require a universal set of unitaries nor a high accuracy in order to demonstrate non-trivial work extraction (or 'resetting' which is the inverse of that [5]) via information compression.It seems likely that at least some of the multitude of methods being developed for performing quantum gates will be suitable for this purpose, and that they will be sufficiently good for this application significantly earlier than they can be used for quantum computing [17].Similar experiments have already been performed in the context of NMR algorithmic cooling [13,18].
We finally stress that two agents with differing knowledge about the same system and/or differing risk tolerance can extract different amounts of work; the extractable work must be seen as subjective in that sense.
The results of this work will be presented in detail in forthcoming publications.

Conclusion-
We have quantified the relation between work and information, employing the smoothentropy approach.We suggested work extraction is a guessing game where the amount of work an agent can extract depends both on its knowledge and its risk tolerance.We nevertheless recovered a simple relation between work and information by noting that all risk tolerances are in between 0 and arbitrary risk, and deriving the work value of information for those two cases.The results point the way to achieving for statistical mechanics what smooth entropies accomplished for information theory more generally.A natural next step is to apply our approach to related work/information scenarios, including NMR algorithmic cooling.
= − i p i log(p i ) (we shall also denote it by H S ).The reader is less likely to be familiar with the max and min entropies which shall both be needed here.They are called this because H min ≤ H ≤ H max .They are defined by H max (P ) := log |supp(P)| and H min (P ) := − log(max i p i ) respectively, where |supp(P)| is the size of the support of the distribution, max i p H i is the peak value and the logarithm is to base 2. For the distribution P ex = [0.5 0.49998 0.00001 0.00001 0] for example, H max (P ) = log(4) = 2 and H min

Table 1 .
Whilst this type of work extraction is experimentally more challenging than standard thermodynamical work

TABLE I :
Examples of distributions and their work values.We set the work value of a box c = kT ln 2. The notation [(.)] ⊗n means that the distribution is combined with itself independently n times.The first two examples are discussed more in Figure 3.The third example shows that the statements are both needed as they do not in general approximate one another.The fourth distribution is an example where access to the information compressing unitary leads to an almost maximal amount of work being extractable.