Sequential measurements and entropy

We sketch applications of the so-called J-equation to quantum information theory concerning fundamental properties of the von Neumann entropy. The J-equation has recently be proposed as a sort of progenitor of the various versions of the Jarzynski equation. It has been derived within a general framework of sequential measurements that is slightly generalised here.


I. INTRODUCTION
Entropy and its increase in closed systems, the socalled 2 nd law, historically arose in classical thermodynamics and statistical physics of many-body systems. The status of these concepts in quantum theory is not completely clear despite the vast amount of literature on this subject. The time-honoured definition of the von Neumann entropy S(ρ) = −Tr(ρ log ρ) has the following properties: It remains constant under unitary time evolution and increases (always understood in the sense of including the case of remaining constant) during projective measurement [1]: with self-explaining notation. These properties suggest to relate the 2 nd law and, more generally, the basic concepts of quantum thermodynamics to sequential measurements, real or hypothetical ones. A successful approach following these lines of thought has lead to the Jarzynski equation that is the most famous representative of a class of similar fluctuation theorems [2] - [14]. The Jarzynski equation is an exact statement about the expectation value of a non-linear function of the work w, viewed as the energy difference of two sequential energy measurements. Between the two measurements an arbitrary unitary time evolution takes place. Although "work" is not an observable in the traditional sense of a self-adjoint operator [15] it can be understood as an example of the generalised observable concept described by a positive operator valued measure, see [16] - [18]. We have performed an analysis [19] of (2) and more general variants of the Jarzynski equation with the result that it can be derived from an equation, called J-equation, that concerns the statistics of sequential measurements and is initially independent of any realization in quantum theory. From the J-equation one derives in the usual way, via Jensen's inequality, an inequality that resembles the 2 nd law. However, some caution is advised: The J-equation contains an undetermined probability distribution that can be chosen in such a way as leading to the Jarzynski equations or, alternatively, to an approach initially considered by W. Pauli [20] in the context of time-dependent perturbation theory ("Golden Rule"). Only the second choice gives a proper account of the 2 nd law. A more general version of this approach has been published three years later by O. Klein [21] and is since known as "Klein's inequality", see [22], although detached from the thermodynamic context. We will see in Section III B that it can also be derived from the J-equation. Thus we come across the finding that the 2 nd law has some aspects that can be viewed as statements about sequential measurements and are independent of many-body physics. One could object to this viewpoint that measurements are only possible by interactions with a macroscopic measuring device which in turn brings many-particle aspects into play. Without conclusively clarifying these issues, we note that applications of the J-equation result in a domain that could be seen as a pre-theory of quantum information theory and is, e. g., covered by chapter 11 of [22]. These applications are the subject of the present work.
However, we have to slightly extend the mathematical framework presented in [19], in order to include, for example, also statistical operators with the eigenvalue 0. This is done in Section II. The next Section III deals with realisations of the statistical model of sequential measurements in quantum theory, first in general, see Section III A, and then in a special form tailored for current purposes, see Section III B. In addition to the mentioned Klein's inequality we also prove the statement (1) within our framework. One may ask: What is the purpose of proving familiar propositions anew? The obvious rationale is to uncovering unexpected relationships between seemingly disjoint domains as non-equilibrium quantum statistics and quantum information theory. These relationships could, hopefully, also be used to obtain new results or simplified proofs of known ones, which is, however, beyond the scope of the present article. In the summary and outlook Section IV we will shortly hint at these possibilities.

II. STATISTICAL MODEL OF SEQUENTIAL MEASUREMENTS
We consider two sequential measurements at the same physical system at times t 0 < t 1 with respective outcome sets I and J . These sets are assumed to be finite or countably infinite. Hence the joint outcome of the two measurements can be represented by the pair (i, j) ∈ I × J . We define as the set of "elementary events". The probability of elementary events will be obtained by means of some auxiliary functions Π, x, x that have no direct statistical meaning. We assume the existence of the functions x : x : J → Ê ≥ , where Ê ≥ denotes the set of non-negative reals. Π will be called the "conditional matrix" and its entries are written as Π(j|i). Further, the x(i) and x(j) will be called "abstract eigenvalues" of the first and second kind for reasons that will become clear in the next Section. The marginal sums of Π will be denoted by and may assume values in Ê ≥ ∪ {∞}.
and postulate our central axiom as and (i,j)∈E P (j, i) = 1 .
Eq. (11) especially means that x(i) = 0 if d(i) = ∞, analogously Eq. (12) has to be understood as x(j) = 0 if d(j) = ∞. Both functions P (i, j) and P (i, j) can be used to describe probabilities of elementary events. Correspondingly, we obtain the following four marginal probabilities = d(i) x(i) , (13) q(j) ≡ i∈I P (i, j) p(j) ≡ i∈I P (j, i) (10) = i∈I Π(j|i) x(j) where p(i) = d(i) x(i) has to be set to 0 if d(i) = ∞, analogously for p(j) = d(j) x(j). According to Assumption 1 all four marginal probabilities sum to unity. It may be instructive to calculate the conditional probability belonging to P (i, j), where we preliminary restrict ourselves to the case p(i) > 0 for all i ∈ I: for all j ∈ J . It satisfies a kind of modified double stochasticity, namely i∈I π(j|i) d(i) (17) = i∈I Π(j|i) In accordance with the usual nomenclature of probability theory, functions X : E → Ê are also called "random variables". Their expectation value is defined as if the series converges. Using a sloppy notation the expectation value will be sometimes also written as X(i, j) if no misunderstanding is likely to occur.
For the applications we have in mind it is necessary to calculate the expectation value X(i, j) also if for some points X(i, j) diverges and the probability P (i, j) vanishes. It is not sufficient to simply exclude these points from the calculation of X(i, j) . It seems that these mathematical difficulties are connected with the rare events sampling problem discussed in the literature, see, e. g., [23]. For our purposes we need only to consider the points (i, j) ∈ E where x(i) = 0 and the X(i, j) are of the form X(i, j) = c(i,j) x(i) with some finite numbers c(i, j). Since P (i, j) = Π(j|i) x(i) according to (9) the obvious regularisation of the otherwise undefined expectation value will be to cancel x(i) and to set the contribution of (i, j) ∈ E to the expectation value to P (i, j) X(i, j) = c(i, j) Π(j|i). Also the above considerations on the conditional probability would have to be reformulated by using this regularisation. This lends additional meaning to the auxiliary concept of the conditional matrix Π(j|i) that has already been introduced in [19] in order to obtain a more symmetric formulation of the framework for sequential measurements.
Taking into account this regularisation procedure we have the following result: Proposition 1 Under the preceding conditions the following holds The proof is elementary, see Appendix A.
We will call Eq. (20) the "J-equation" since we think that it contains the probabilistic core of the Jarzynski equation but should be distinguished from the latter for the sake of clarity. This claim has been further explained in [19]. Due to the symmetry of our assumptions a reciprocal J-equation could be proven using the second probability distribution P (i, j), but this will not be needed in what follows.
The probability distributions q(j) and p(j) defined in (14) and (15) are completely independent. A possible specialization of the model for sequential measurements is given by the choice of x(j) that results in p(j) = q(j) for all j ∈ J , namely for d(j) < ∞ and x(j) = 0 else. This will be called the "minimal case" for reasons to be explained below. In contrast to [22] we will always denote by "log" the natural logarithm. Since it is a concave function, Jensen's inequality yields log X ≤ log X for any random variable X : E → Ê ≥ . We will define the "modified Shannon entropy", see [24], by and obtain: The proof can be found in the Appendix A. Obviously, H(q) minimises the right hand side of (24), thereby justifying the denotation of the choice (21) resulting in p(j) = q(j) as the minimal case.

A. Sequential measurements in quantum theory
We consider a quantum system with a Hilbert space H and a finite number of mutually commuting self-adjoint operators E 1 , . . . , E L defined on (suitable domains of) H. They are assumed to have a pure point spectrum and hence a family of common eigenprojections (P i ) i∈I such that Here I is a finite or countable infinite index set to be identified with the outcome set of the first measurement according to Section II. In general, the P i may be of infinite degeneracy; hence we define Further, the P i are chosen as maximal projections in the for at least one λ = 1, . . . , L. Note the completeness relation Physically, the E 1 , . . . , E L correspond to observables that can be jointly measured. We assume a (mixed) state of the system before the time t = t 0 described by a density operator ρ 0 and perform a joint Lüders measurement, cf. [16] (10.22), of E 1 , . . . , E L at the time t = t 0 . The probability of the outcome i ∈ I will be In accordance with the remarks after (16) we will make the following After the first measurement of the E 1 , . . . , E L the system is subject to a further time evolution and a second measurement of (possibly) other observables. Thus the primary preparation together with the first measurement may be considered as another preparation of a certain state ρ, in general different from the initial state ρ 0 . If a selection according to a particular outcome i ∈ I ′ is involved this state will be, according to the assumption of a Lüders measurement, cf. [16] (10.22), If no selection according to a particular outcome is involved the state resulting after the first measurement will rather be the mixed state In order to apply the results of the preceding section we will make the following crucial assumption Assumption 3 If P i is a one-dimensional projection, i. e., if d(i) = 1, the assumption (33) will be automatically satisfied. In the case of d(i) > 1 this assumption means that ρ 0 is diagonal w. r. t. any common eigenbasis of the E 1 , . . . , E L . An important case where (33) holds is given if ρ 0 is a function of the operators E 1 , . . . , E L , say, For example, the choice of G as the Boltzmann distribution leads to a Jarzynski equation of the form (2) for L = 1.
Next we consider a second set of observables described by the mutually commuting self-adjoint operators F 1 , . . . , F L subject to analogous assumptions. Hence the following holds: and Assumption 4 Here p(j) denotes an arbitrary probability distribution. We have chosen another index set J for the second set of observables in order to stress that no natural identification between both index sets is required in what follows. Obviously, J has to be identified with the second outcome set introduced in Section II. In general the E λ will not commute with the F µ . We assume that a second measurement of the F 1 , . . . , F L will be performed at the time t = t 1 > t 0 , not necessarily of Lüders type. Between the two measurements in the time interval (t 0 , t 1 ) the evolution of the system can be quite arbitrary and will be described by a unitary evolution operator U = U (t 1 , t 0 ).
In order to apply the results of the last section we will define the quantities Π, x, x and show that Assumption 1 will be satisfied in the quantum case. Moreover, we will show that the probability function P (i, j) has its usual meaning here.
We set Π(j|i) = Tr Q j U P i U * for all i ∈ I, j ∈ J , (39) It follows that the marginal sums of Π(j|i) agree with the degeneracies d(i) and d(j) defined above. Moreover, is the correct probability of the outcome (i, j) according to the rules of quantum theory. Moreover, the following holds: Proposition 3 If the above Assumptions 2, 3, and 4 are satisfied then the quintuple (I, J , Π, x, x) defined in (39-41) also satisfies Assumption 1 and hence represents a model of sequential measurements.
The proof can be found in the Appendix A. Especially, the J-equation (20) holds in quantum theory as well as the 2 nd law-like inequality (24).

B. Results on the von Neumann entropy
Next we will prove some well-known results connected with the von Neumann entropy using the framework of sequential measurement sketched in Section II. Recall the definition of the von Neumann entropy for arbitrary statistical operators ρ. As usual, the limit lim x↓0 x log x = 0 is tacitly understood for vanishing eigenvalues of ρ. For this subsection we will slightly specialise the definitions of the preceding subsection III A. We note that the eigenvalues of the operators E λ and F λ to be measured do not enter into the scheme of sequential measurement but only the corresponding eigenprojections. We use this freedom of choosing the eigenvalues in the following way. Let ρ and σ be two statistical operators with respective spectral decompositions and traces Since ρ and σ are Hermitean operators they can also be viewed as observables. Thus we choose the initial state ρ 0 = ρ and perform a first Lüders measurement of ρ at time t = t 0 with outcome i ∈ I. The state after this measurement without selection is obviously again ρ. It follows that the condition (33) will be automatically satisfied. Between t = t 0 and t = t 1 > t 0 no interaction takes place, i. e., U (t 1 , t 0 ) = ½. Then at time t = t 1 a second measurement of σ is performed with outcome j ∈ J . The assumption U (t 1 , t 0 ) = ½ does not imply any loss of generality since σ is completely arbitrary. The conditional matrix (39) of the sequential measurement will assume the simplified form Moreover, the abstract eigenvalues can be identified with the actual eigenvalues of ρ and σ, i. e., x(i) = r i and x(j) = s j for all i ∈ I and j ∈ J . (52) For arbitrary statistical operators ρ, σ the "relative entropy" is defined as compare [22], (11.50). The relative entropy may diverge for certain choices of ρ and σ, see [25]. It is never negative according to

Proposition 4 (Klein's inequality)
compare [21] and, for a more recent reference, [22], Theorem 11.7. Our alternative proof using sequential measurements can be found in Appendix A.
For the remainder of this subsection we will concentrate on the case where the statistical operator σ is chosen in such a way that the "minimal case" according to (21) is obtained. More precisely, we define Definition 1 The pair (ρ, σ) of statistical operators will be called "minimal" iff for all j ∈ J , where the Q j are the eigenprojections of σ according to (49).
The proof can be found in Appendix A.
It is worthwhile noting that the converse of this Proposition does not hold. We will present a counter-example where S(ρ||σ) = S(σ) − S(ρ) without (ρ, σ) being minimal. The Hilbert space of this counter-example will be H = 4 ∼ = 2 ⊗ 2 and ρ will be the projector onto an entangled state φ: Obviously, the q(j), the diagonal entries of ρ, are different from the p(j), the diagonal entries of σ, and hence the pair (ρ, σ) is not minimal.
Next we turn to the problem how the von Neumann entropy of a state changes during a quantum measurement. Obviously, this problem depends on the theoretical description of state changes during measurements and hence leads to the notions of operations and instruments, see [16]. The simplest case is that of a Lüders measurement I, see [16], (10.22), where P n n∈AE is a complete system of orthogonal projections, not necessarily finite-dimensional ones. The state change without any selection will be the trace preserving map It is well-known, see [1] and [22], theorem 11.9, that Lüders measurements increase entropy: Proposition 6 With the preceding definitions the following holds: A proof using the framework of sequential measurement can be found in Appendix A. We stress that the hypothetical sequential measurement used in this proof and the original Lüders measurement (60) are different, although related. This will be underscored by the following remarks: • The first measurement of ρ used in the proof with outcome i ∈ I is not part of the Lüders measurement.
• The projections Q j of the second hypothetical measurement used in the proof are finite-dimensional in contrast to the P n , they rather represent a refinement of the family P n n∈AE .
• Moreover, the second hypothetical measurement used in the proof need not be of Lüders type.
In the case of a more general instrument than that of Lüders type it is well-known that a statement analogous to (62) may fail, i. e., a generalised measurement can decrease entropy, see [22], exercise 11.15. This may sound paradoxical at first sight but can be understood by considering the "measurement dilation" of a general instrument, see [16], chapter 7.7. This means that the object system with Hilbert space H 1 is coupled to a second system ("measuring device") with Hilbert space H 2 and, after some interaction of the total system described by a unitary evolution operator U , a Lüders measurement with projectors P n is performed at the measuring device. The final step of the state change consists of a partial trace Tr 2 that yields a mixed state σ of the object system. We thus obtain for the state change without selection the following expression where we have used that the initial state of the measuring device can be chosen as a pure state P φ , see [16], chapter 7.7. Without the partial trace Tr 2 this can be understood as a Lüders measurement of the total system with projectors and the initial state ρ ⊗ P φ . It follows from Proposition 6 that the total entropy does not decrease, i. e., where and we have used the fact that the entropy of a tensor product is additive and the entropy of a pure state vanishes. By forming partial traces the total entropy further increases, see [22], chapter 11.3.4, and thus Hence the total entropy during a generalised measurement does not decrease if the possible entropy increase of the measuring device is taken into account. The mentioned counter examples of a decreasing entropy of the object system occur if S 31 < S 1 which is well possible in spite of S 1 ≤ S 31 + S 32 . This means that the decrease of the entropy of the object system must be (over)compensated by an increase of the measurement device's entropy, see the corresponding discussion in [25] and [26], chapter III.5.

IV. SUMMARY AND OUTLOOK
A general framework for sequential measurement including the J-equation has been recently formulated and shown to be realised in quantum theory [19]. The Jequation comprises the various variants of the famous Jarynski equation. In the present paper we have slightly generalised this framework in order to cope with the problem of vanishing probabilities. A standard application of the J-equation results from Jensen's inequality and the fact that the logarithm is a concave function. The resulting inequality has been shown to be essentially equivalent to Klein's inequality already derived in 1931 in the context of quantum thermodynamics. This opens an unexpected connection between nonequilibrium quantum statistical mechanics and general entropy theory mainly used in context with quantum information, see [22]. The concept of sequential measurements is proving fruitful not only in its direct application to sequential measurements but also in the sense of a mathematical tool. Thus the new proofs of well-known laws like Proposition 6 opens up a new perspective in this field, insofar as the various 2 nd law-like statements can be viewed as consequences of an underlying J-equation. It would be desirable to use these tools to simplify the involved proofs of, say, the strong subadditivity of the von Neumann entropy, see [22], chapter 11.4, but this is definitively outside the scope of the present paper.
Proof of Proposition 1: x(j) x(i)) (19) = (i,j)∈E P (i, j) x(j) x(i)) (A1) We would like to point out that the summations in (A1) -(A3) have to be extended over the whole domain E including those points where x(i) vanishes. The contribution to the expectation value from these points has to be calculated according to the regularisation procedure explained in the text following (19).
Proof of Proposition 2: 0 = log 1 (20) = log (13,14) = j∈J q(j) log x(j) − i∈I p(i) log x(i) (A8) (15,23) = j∈J q(j) log p(j) Hence and especially for the minimal case It remains to show the second inequality in (24). The log function is bounded by its tangent at x = 1: and thus This entails (A16) thereby completing the proof of Proposition 2.
Proof of Proposition 3: We have to prove (11) and (12). To this end we consider = i∈I ′ p(i) = 1 .
Proof of Proposition 4: With the definitions of Section III B we obtain S(ρ) = −Tr (ρ log ρ) and further Tr (ρ log σ) x(i) log( x(j)) Π(j|i)(A24) (14) = j∈J q(j) log( x(j)) (A25) = j∈J q(j) log s j (A32) (14) = j∈J i∈I P (i, j) log s j (A33) (9) = j∈J i∈I Π(j|i)x(i) log s j (A34) is the spectral decomposition of σ it follows that all Q j commute with all P n for j ∈ J and n ∈ AE. It follows that p(j) = Tr σ Q j (A38) = n∈AE Tr P n ρP n Q j (A39) = n∈AE Tr P n ρ Q j P n (A40) = Tr ρ Q j n∈AE P n (A41) = Tr ρ Q j = q(j) , (A42) where we have used [Q j , P n ] = 0 in (A40) and n∈AE P n = ½ in (A42). Hence the Definition 1 of a minimal pair (ρ, σ) is satisfied and the proof of Proposition 6 is complete.