Entanglement and thermodynamics in general probabilistic theories

Entanglement is one of the most striking features of quantum mechanics, and yet it is not specifically quantum. More specific to quantum mechanics is the connection between entanglement and thermodynamics, which leads to an identification between entropies and measures of pure state entanglement. Here we search for the roots of this connection, investigating the relation between entanglement and thermodynamics in the framework of general probabilistic theories. We first address the question whether an entangled state can be transformed into another by means of local operations and classical communication. Under two operational requirements, we prove a general version of the Lo–Popescu theorem, which lies at the foundations of the theory of pure-state entanglement. We then consider a resource theory of purity where free operations are random reversible transformations, modelling the scenario where an agent has limited control over the dynamics of a closed system. Our key result is a duality between the resource theory of entanglement and the resource theory of purity, valid for every physical theory where all processes arise from pure states and reversible interactions at the fundamental level. As an application of the main result, we establish a one-to-one correspondence between entropies and measures of pure bipartite entanglement. The correspondence is then used to define entanglement measures in the general probabilistic framework. Finally, we show a duality between the task of information erasure and the task of entanglement generation, whereby the existence of entropy sinks (systems that can absorb arbitrary amounts of information) becomes equivalent to the existence of entanglement sources (correlated systems from which arbitrary amounts of entanglement can be extracted).


I. INTRODUCTION
The discovery of quantum entanglement [1,2] introduced the revolutionary idea that a composite system can be in pure state while its components are in mixed states.In Schrödinger's words: "maximal knowledge of a total system does not necessarily imply maximal knowledge of all its parts" [2].This new possibility, in radical contrast with the paradigm of classical physics, is at the root of quantum non-locality [3][4][5][6] in all its counterintuitive manifestations [7][8][9][10][11][12][13]. With the advent of quantum information, it quickly became clear that entanglement was not only a source of foundational puzzles, but also a resource [14].Harnessing this resource has been the key to the invention of seminal protocols such as quantum teleportation [15], dense coding [16], and secure quantum key distribution [17,18], whose implications deeply impacted physics and computer science [19,20].
The key to understand entanglement as a resource is to consider distributed scenarios where spatially-separated parties perform local operations (LO) in their laboratories and exchange classical communication (CC) from one laboratory to another [21][22][23].The protocols that can be implemented in this scenario, known as LOCC protocols, provide a means to characterize entangled states and to compare their degree of entanglement.Precisely, a state is i) entangled if it cannot be generated by an LOCC protocol, and ii) more entangled than another if there exists an LOCC protocol that transforms the former into the latter.
Comparing the degree of entanglement of two quantum states is generally a hard problem [24][25][26][27][28][29][30].Nevertheless, the solution is simple for pure bipartite states, where the majorization criterion [31] provides a necessary and sufficient condition for LOCC convertibility.Essentially, the criterion identifies the degree of entanglement of a bipartite system with the degree of mixedness of its parts: the more entangled a pure bipartite state, the more mixed its marginals.Mixed states are compared here according to their spectra, with a state being more mixed than another if the spectrum of the latter majorizes the spectrum of the former [32][33][34][35].
The majorization criterion shows that for pure bipartite states the notion of entanglement as a resource beautifully matches Schrödinger's idea of entanglement as non-maximal knowledge about the parts of a composite system.Moreover, it establishes an intriguing duality between entanglement and thermodynamics [36][37][38][39][40], where the reduction of entanglement caused by LOCC protocols becomes dual to the increase of mixedness (and therefore entropy [41]) caused by thermodynamic transformations.This duality has far-reaching consequences, such as the existence of a unique measure of entanglement in the asymptotic limit [36,37,42].In addition, it has provided guidance for the development of entanglement theory beyond the simple case of pure bipartite states [39].
The duality between entanglement and thermodynamics is a profound and fundamental fact.As such, one might expect it to follow directly from basic principles.However, it is not a priori clear what these principle are: up to now, the relation between entanglement and thermodynamics has been addressed in a way that is heavily dependent on the Hilbert space framework, using technical results that lack an operational interpretation (such as, e.g. the singular value decomposition).It is then natural to search for a derivation of the entanglementthermodynamics duality that uses only high-level quantum features, such as the impossibility of instantaneous signalling or the no-cloning theorem.In the same vein, one can ask whether the duality holds for physical theo-ries other than quantum mechanics, adopting the broad framework of general probabilistic theories (GPTs) [43][44][45][46][47][48][49][50][51].In the landscape of GPTs, entanglement is a generic feature [44,52], which provides powerful advantages for a variety of information-theoretic tasks [53][54][55][56][57][58].But what about its relation with thermodynamics?Is it also generic?
In this paper we investigate the relation between entanglement and thermodynamics in an operational, theoryindependent way.Motivated by the quantum case, we focus on the entanglement of pure bipartite states and ask which conversions are possible under LOCC protocols.We will restrict ourselves to causal theories [47].Our first result is a generalized Lo-Popescu theorem [23], proving that under suitable assumptions every LOCC protocol acting on a pure bipartite state can be simulated by a protocol using only one round of classical communication.The assumptions of our result are satisfied by quantum mechanics, and also by all bipartite extreme no-signaling correlations studied in the literature [59][60][61][62].In order to connect with thermodynamics, then we introduce an ordering between mixed states.Specifically, we define a state to be more mixed than another if the former can be obtained from the latter via a random reversible evolution.This definition has a number of appealing features.First of all, in the quantum case it coincides with the usual definition based on majorization [31,63].Moreover, it is completely operational, and applies also to theories that do not possess an analogue of the spectral theorem.Finally, it fits into the paradigm of resource theories [64], providing a natural generalization of the quantum resource theory of purity [65].In this generalization, the random reversible evolutions are regarded as free operations, and states that are purer (i.e. less mixed) are interpreted as more useful resources.
Once the resource theories of entanglement and purity are put into place, we search for a duality between them.To make the connection, we consider physical theories that admit a fundamental level where all states are pure and all the interactions are reversible.For these theories we establish the desired duality between entanglement and thermodynamics, showing that the degree of entanglement of a pure bipartite system coincides with the degree of mixedness of its parts.As a consequence, finding a measure of entanglement of pure bipartite states is equivalent to finding a measure of mixedness of singlesystem states.However, if we require that this measure has a reasonable behavior, we must add another axiom: the No Entropy Sinks requirement, stating that it is not possible have free erasure of information, i.e. restoring a pure state from a mixed state, a process similar to Landauer's erasure [66], even with the assistance of a catalyst.
Not only is No Entropy Sinks requirement important for a sensible theory of thermodynamics, but it also has some consequences as far as purification is concerned.Indeed, when supplemented by Purification, Purity Preservation and Local Exchangeability, it ensures that every state has a symmetric purification, as it happens in quantum theory.
The paper is organized as follows.In section II we introduce the framework adopted in the paper.The entanglement ordering between bipartite states is introduced in section III.Then we prove an operational version of Lo-Popescu theorem in section IV, making use of two operational requirements.In section V we formulate an operational resource theory of dynamical control, which gives rise to a canonical resource theory of purity under some conditions.Dually, this enables us to define an ordering of single-system states according to their mixedness.In section VI we prove the duality between entanglement and thermodynamics for theories admitting a fundamental level where all processes are pure and reversible.Its consequences and the No Entropy Sinks requirement are discussed in section VII.Eventually, in section VIII we address the issue of Symmetric Purification, showing that it is implied by the present sets of axioms, including No Entropy Sinks.The conclusions are drawn in section IX.
Processes can be combined either in sequence or in parallel, giving rise to circuits like the following Here, A, A , A , B, B are systems, ρ is a bipartite state, A, A and B are transformations, a and b are effects.The two transformations A and A are composed in sequence, while the transformations A and B and the effects a and b are composed in parallel.In the example, the circuit has no external wires-circuits of this form represent probabilities.We denote as Let us introduce a few basic notions.A test from A to B is a collection of transformations {C i } i∈X from A to B, which can occur in an experiment.A transformation is called deterministic if it belongs to a test with a single outcome.We will often refer to deterministic transformations as channels, following the terminology of quantum information.Among all possible channels, reversible channels are particularly important: a channel U from A to B is called reversible if there exists a channel U −1 from B to A such that U −1 U = I A and UU −1 = I B , where I X is the identity channel on system X.If there exists a reversible channel transforming A into B we say that A and B are operationally equivalent, denoted by A B. The composition of systems is required to be symmetric [68][69][70][71], meaning that A ⊗ B B ⊗ A. The reversible channel that implements the equivalence is the swap channel, SWAP, which satisfies the condition for every pair of transformations A and B and for generic systems A, A , B, B .
In this paper we restrict our attention only to causal theories [47], where the choice of future measurement settings does not influence the outcome probability of present experiments.Mathematically, causality is equivalent to the fact that for every system A there is only one deterministic effect, which we denote here by Tr A , in analogy with the partial trace in quantum mechanics.In a causal theory, one can define the norm of a state ρ as ρ := Tr ρ.
The set of normalized states of A will be denoted by In a causal theory, every state is proportional to a normalized state [47].In quantum mechanics, St 1 (A) is the set of normalized density matrices of system A, while St (A) is the set of all sub-normalized density matrices.
In a causal theory channels admit a simple characterization, which will be useful later in the paper: The proof can be found in lemma 5 of Ref. [47].

A. Pure states and transformations
In every probabilistic theory one can define pure states, and, more generally, pure transformations.Both concepts are based on the notion of coarse-graining, i.e. the operation of joining two or more outcomes of a test.More precisely, a test {C i } i∈X is a coarse-graining of the test {D j } j∈Y if there is a partition {Y i } i∈X of Y such that C i = j∈Yi D j for every i ∈ X.In this case, we say that {D j } j∈Y is a refinement of {C i } i∈X .The refinement of a given transformation is defined via the refinement of a test: if {D j } j∈Y is a refinement of {C i } i∈X , then the transformations {D j } j∈Yi are a refinement of the transformation C i .
A transformation is called pure if it has only trivial refinements: Definition 1 The transformation C ∈ Transf (A, B) is pure if for every refinement {D j } one has D j = p j C, where {p j } is a probability distribution.
Pure transformations are those for which the experimenter has maximal information about the evolution of the system.In the special case of states (transformations with no input), the above definition coincides with the usual definition of pure state.We denote the set of pure states of system A as PurSt (A).As usual, non-pure states are called mixed.
Pure states will play a key role in this paper.An elementary property of pure states is that they are preserved by reversible transformations.The proof is standard and is reported in Appendix A for convenience of the reader.

III. ENTANGLEMENT A. The resource theory of entanglement
The resource theory of quantum entanglement [14] is based on the notion of LOCC protocols, that is, protocols in which distant parties are allowed to communicate classically to one another and to perform local operations in their laboratories [21,22].This is an operational notion and, as such, can be easily exported to arbitrary theories.
In this paper we will consider protocols involving two parties, Alice and Bob.A generic LOCC protocol consists of a sequence of tests, performed by Alice and Bob, with the property that the choice of the test at a given step can depend on the outcomes produced at the previous steps.For example, consider a two-way protocol where .Every instance of the protocol will correspond to a sequence of outcomes, which can be represented by a circuit of the form

B1
, where the dashed lines represent classical communication.By coarse-graining over all possible outcomes, one obtains a channel, given by Entangled states are those states that cannot be generated using an LOCC protocol.Equivalently, they can be characterized as those states that are not separable, i.e. they are not of the form where {p i } is a probability distribution allowed by the theory, α (i) is a state of A, and Like in quantum theory, LOCC protocols can be used to compare entangled states.
Definition 2 Given two states ρ ∈ St (A ⊗ B) and ρ ∈ St (A ⊗ B ), we say that ρ is more entangled than ρ , denoted by ρ ent ρ , if there exists an LOCC protocol that transforms ρ into ρ , i.e. if ρ = Lρ for some LOCC channel L.
Mathematically, the relation ent is a preorder, i.e. it is reflexive and transitive.Moreover, it is stable under tensor products, namely ρ⊗σ ent ρ ⊗σ whenever ρ ent ρ and σ ent σ .In other words, the relation ent turns the set of all bipartite states into a preordered monoid, a mathematical structure arising in all resource theories [64].The resource theory of entanglement here fits completely into the framework of Ref. [64], with the LOCC channels as free operations.The states which can be prepared by LOCC (i.e. the separable states) are free by definition, and all the other states represent resources.If a state can be converted into another by free operations, then it constitutes a more valuable resource.If ρ ent ρ and ρ ent ρ, then we say that ρ and ρ are equally entangled, denoted by ρ ent ρ .Note that ρ ent ρ does not imply that ρ and ρ are equal: for example, every two separable states are equally entangled.Similarly, every two states that differ by a local reversible channel are equally entangled: this is the case, e.g. of two pure states Ψ = Ψ such that for some reversible channel U on A [recall, indeed, that reversible transformations send pure states into pure states, cf.lemma 2].

IV. AN OPERATIONAL LO-POPESCU THEOREM
Given two bipartite states, it is natural to ask whether one is more entangled than the other.A priori, answering the question requires one to check all possible LOCC protocols, which in general is a hard task.However, the situation is much simpler when the initial state is pure.Now we prove that in this case every LOCC protocol can be replaced without loss of generality by a protocol involving only one round of classical communication-i.e. a one-way LOCC protocol.Our argument provides an operational version of the Lo-Popescu theorem [23], which lays the foundation for the quantum theory of pure-state entanglement.

A. Two operational requirements
Our derivation of the operational Lo-Popescu theorem is based on two physical requirements.The first is Axiom 1 (Purity Preservation [45,48,72]) The sequential and parallel composition of two pure transformations yield a pure transformation.
As a special case, Purity Preservation implies that the product of two pure states is a pure state: Note that this conclusion could also be obtained from the Local Tomography axiom [47].Nevertheless, there exist examples of theories that satisfy Purity Preservation and violate Local Tomography.An example is quantum theory on real vector spaces [73,74].In general, we regard Purity Preservation as a fundamental requirement, even more fundamental than Local Tomography.Considering the theory as an algorithm to make deductions about physical processes, Purity Preservation ensures that, when presented with maximal information about two processes, the algorithm outputs maximal information about their composition.
Our second requirement imposes a symmetry on pure bipartite states.
This requirement is trivially satisfied by classical probability theory, where all pure states are of the product form.Less trivially, it is satisfied by quantum theory, on both complex and real Hilbert spaces.
Example 1 Suppose that A and B are quantum systems, and let H A and H B be the corresponding Hilbert spaces.By the Schmidt decomposition, every pure state in the tensor product Hilbert space can be written as where From the partial isometries C and D it is immediate to construct the desired channels C and D, which can be defined as where ρ and σ are generic input states of systems A and B, respectively.With this definition, one has which is the Hilbert space version of the local exchangeability condition of Eq. (2).
Note that Local Exchangeability does not violate the nosignalling principle.In order to exchange their systems, Alice and Bob need to know in advance what pure state they share, because the two channels C and D depend on the state Ψ.
Example 2 Consider a scenario where two space-like separated parties, Alice and Bob, perform measurements on a pair of systems, A and B, respectively.Let x (resp.y) be the index labeling Alice's (resp.Bob's) measurement setting and let a (resp.b) the index labeling the outcome of the measurement done by Alice (resp.Bob).In the theory known as box world [44,75] all no-signaling probability distributions p ab|xy are physically realizable and represent states of the composite system A ⊗ B. Such probability distributions form a convex polytope [61] and the extreme points are the pure states of the theory.
For x, y, a, b ∈ {0, 1} the systems A and B are operationally equivalent, and we will denote by I the reversible transformation that converts A into B. The extreme nonlocal correlations have been characterized in [59] and are known to be equal to the standard PR-box correlation [60] up to exchange of 0 with 1 in the local settings of Alice and Bob and in the outcomes of their measurements.In the circuit picture, these operations are described by local reversible transformations: denoting by Φ the standard PR-box state, one has that every other pure state Ψ ∈ PurSt (A ⊗ B) is of the form , where U and V are reversible transformations.
To see that the Local Exchangeability property holds, note that swapping systems A and B is equivalent to swapping x with y and a with b.Now, the standard PRbox correlation of Eq. (3) is invariant under the swappings x ↔ y, and a ↔ b, meaning that one has Then it is clear that every pure state of A ⊗ B can be swapped by local operations: defining C := VU −1 and D := UV −1 , one immediately obtains where Finally, the last category of extreme non-local correlations characterized in the literature corresponds to the case of arbitrary number of settings and to 2-outcome measurements.In this case, the extreme correlations are characterized explicitly in Ref. [62].Up to local reversible transformations, the pure states are invariant under swap.Hence, the same argument used in Eq. (4) shows that Local Exchangeability holds.

B. Inverting the direction of classical communication
Purity Preservation and Local Exchangeability have an important consequence.For one-way protocols acting on a pure input state, the direction of classical communication is irrelevant: every one-way LOCC protocol with communication from Alice to Bob can be replaced by a one-way LOCC protocol with communication from Bob to Alice, as shown by the following.
Lemma 1 (Inverting CC) Let Ψ be a pure state of A ⊗ B and let ρ be a (possibly mixed) state of A ⊗ B .Under the validity of axioms 1 and 2, the following are equivalent: 1. Ψ can be transformed into ρ by a one-way LOCC protocol with communication from Alice to Bob 2. Ψ can be transformed into ρ by a one-way LOCC protocol with communication from Bob to Alice.
Proof.Suppose that Ψ can be transformed into ρ by one-way LOCC protocol with communication from Alice to Bob, namely where {A i } i∈X is a test, and, for every outcome i ∈ X, B (i) is a channel.Note that one can assume every transformation A i to be pure without loss of generality.For every fixed i ∈ X, one has By Local Exchangeability, the first swap can be realized by two local channels C : A → B and D : B → A. Moreover, since A i is pure, Purity Preservation implies that the (unnormalized) state (A i ⊗ I B ) Ψ is pure.Hence, also the second swap in Eq. ( 7) can be realized by two local channels C (i) : A → B and D (i) : B → A .Substituting into Eq.( 7) one obtains and, therefore, having defined By construction B i i∈X is a test, because it can be realized by performing the test {A i } after the channel D and subsequently applying the channel B (i) C (i) , depending on the outcome [47].On the other hand, A (i) is a channel for every i ∈ X.Hence, we constructed a oneway LOCC protocol with communication from Bob to Alice.Combining Eqs. ( 6) and ( 8) we obtain B , meaning that Ψ can be transformed into ρ by a one-way protocol with communication from Bob to Alice.Clearly, the same argument can be applied to prove the converse direction.
Note that the target state ρ need not be pure: the fact that the direction of classical communication can be exchanged is based only on the purity of the input state Ψ.

C. Reduction to one-way protocols
We are now ready to derive our operational Lo-Popescu theorem.The theorem states that the action of an arbi-trary LOCC protocol on a pure state can be simulated by a one-way LOCC protocol: Theorem 1 (Operational Lo-Popescu theorem) Let Ψ be a pure state of A ⊗ B and ρ be a (possibly mixed) state of A ⊗ B .Under the validity of axioms 1 and 2, the following are equivalent 1. Ψ can be transformed into ρ by an LOCC protocol 2. Ψ can be transformed into ρ by a one-way LOCC protocol.
Proof.The non-trivial implication is 1 =⇒ 2. Suppose that Ψ can be transformed into ρ by an LOCC protocol with N rounds of classical communication.Without loss of generality, we assume that Alice starts the protocol and that all transformations occurring in the first N − 1 rounds are pure.
Let s = (i 1 , i 2 , . . ., i N −1 ) be the sequence of all classical outcomes obtained by Alice and Bob up to step N − 1, p s be the probability of the sequence s, and Ψ s be the pure state after step N − 1 conditional on the occurrence of s.For concreteness, suppose that the outcome i N −1 has been generated on Alice's side.Then, the rest of the protocol consists in a test B (s) i N , performed on Bob's side, followed by a channel A (s,i N ) performed on Alice's side.By definition, one has Now, using lemma 1 one can invert the direction of the classical communication in the last round, obtaining and suitable channels B (s,i N ) .Now, since both the (N − 1)-th and the N -th tests are performed by Alice, these two test can be merged into a single test, thus reducing the original LOCC protocol to an LOCC protocol with N − 1 rounds.Iterating this argument for N − 1 times one finally obtains a one-way protocol.
In quantum theory, the Lo-Popescu theorem provides the foundation for the resource theory of pure state entanglement.Having the operational version of this result will be crucial for our study of the relation between entanglement and thermodynamics.

V. PURITY A. A resource theory of dynamical control
Consider the scenario where a closed system A undergoes a reversible dynamics governed by some parameters which can be controlled by an experimenter.For example, system A could be a charged particle moving in an electric field, whose intensity and direction can be engineered in order to obtain a desired trajectory.In general, the experimenter may not have full control and the actual values of the control parameters may fluctuate randomly.As a result, the evolution of the system will be described by a Random Reversible (RaRe) channel, that is a channel R of the form where U (i) | i ∈ X is a set of reversible transformations and {p i } i∈X is their probability distribution.Assuming that the system remains closed during the whole evolution, RaRe channels are the most general transformations the experimenter can implement.Letting system A evolve jointly with another system B is not an allowed evolution here, because it would generally require an interaction, which by definition is not possible for closed systems.
An important question in all problems of control is whether a given input state can be driven to a target state using the allowed dynamics.With respect to this task, an input state is more valuable than another if the set of target states that can be reached from the former contains the set of target states that can be reached from the latter.In our model, this idea leads to the following definition.
Definition 3 (More controllable states) Given two states ρ and ρ of system A, we say that ρ is more controllable than ρ , denoted by ρ ρ , if ρ can be obtained from ρ via a RaRe channel.
This definition fits into the general framework of resource theories [64], with the RaRe channels playing the role of free operations.Note that at this level of generality there are no free states: since the ability of the experimenter is limited to controlling the evolution, every state is regarded as a resource.Physically, this is in agreement with the fact that the input state in a control problem is not chosen by the experimenter-for example, it can be a thermal state or the ground state of an unperturbed Hamiltonian.
As it is always the case in resource theories, the relation is clearly reflexive and transitive, i.e. it is a preorder.Moreover, since the tensor product of two RaRe channels is a RaRe channel, the relation is stable under tensor products, namely ρ ⊗ σ ρ ⊗ σ whenever ρ ρ and σ σ .

B. From dynamical control to purity
There is a close relation between the controllability of a state and its purity.For example, if a state is more controllable than a pure state, then it must be pure.
Proposition 3 If ψ ∈ St (A) is a pure state and ρ is more controllable than ψ, then ρ must be pure.Specifically, ρ = Uψ for some reversible channel U.
Proof.Since ψ is pure, the condition i p i U (i) ρ = ψ implies that U (i) ρ = ψ for every i, meaning that ρ = V (i) ψ, where V (i) is the inverse of U (i) .Proposition 2 then guarantees that ρ is pure.
In other words, pure states can be reached only from pure states.A natural question is whether every state can be reached from some pure state.The answer is positive in quantum theory and in a large class of GPTs, but there are nevertheless counterexamples, which prevent an easy identification of the theory of dynamical control with a "theory of purity".Three different examples are illustrated in Fig. 1.
Example 3 Consider first a system with a state space like in Fig. 1a.In this case, there are only two reversible transformations, namely the identity and the reflection around the vertical symmetry axis.As a consequence, there is no way to obtain the mixed states on the two vertical sides by applying a RaRe channel to a pure state.These states represent a valuable resource, even though they are not pure.Hence, in this case the resource theory of dynamical control cannot be thought as a resource theory of purity.
As a second example, consider instead a system whose state space is a half-disk, like in Fig. 1b.Also in this case there are only two reversible transformations (the identity and the reflection around the vertical axis).However, now every mixed state can be generated from some pure state through a RaRe channel.The state space can be foliated into horizontal segments generated by pure states under the action of RaRe channels.As a result, the pure states are the most useful resources and one can interpret the relation as a way to compare the degree of purity of different states.Nevertheless, pure states on different segments are inequivalent resources: in this "resource theory of purity" there are different, inequivalent classes of pure states.
Finally, consider a system with a square state space, like in Fig. 1c and suppose that all the symmetry transformations in the dihedral group D 4 are allowed reversible transformations.In this case, all the pure states are equivalent under reversible transformations and every mixed state can be obtained by applying a RaRe channel to a fixed pure state.Here, the resource theory of dynamical control becomes a full-fledged resource theory of purity.
The above examples show that not every GPT supports a sensible resource theory of purity.Motivated by the examples, we put forward the following definition: Definition 4 A theory of purity is a resource theory of dynamical control where every state ρ can be compared with at least one pure state.The theory is called canonical if every pure state is more controllable than every other state.
In this paper we will focus on canonical theories of purity.Canonical theories can be characterized as those where all pure states are comparable to one another: The following are equivalent: 1. the theory is a canonical theory of purity 2. for every system A, the group of reversible channels acts transitively on the set of pure states.
3. for every system A, there exists at least one state that is more controllable than every state.
Proof. 1 =⇒ 2. Suppose that the theory is canonical.By definition, this means that every pure state ψ ∈ PurSt (A) is more controllable than every other state.In particular, ψ must be more controllable than any other pure state ϕ, i.e. ϕ = i p i U (i) ψ, for some probabilities {p i } and some reversible channels U (i) .Since ϕ is pure, this relation implies ϕ = U (i) ψ for every i, which in turn implies that ψ is pure and ψ = V (i) ϕ, with V (i) being the inverse of U (i) .Hence, ψ and ϕ are connected by a reversible channel.We conclude that the group of reversible transformations acts transitively on the set of pure states. 2 =⇒ 3. Every state ρ can be expressed as a convex combination of the form ρ = i p i ϕ i , where {p i } is a probability distribution allowed by the theory and ϕ i are pure states.Now, suppose that ψ is a pure state.For every i, by picking a reversible channel U (i) such that U (i) ψ = ϕ i , one obtains the relation ρ = i p i U (i) ψ, meaning that ψ is more controllable than ρ.Since ρ is generic, we conclude that ψ is more controllable than every state.
3 =⇒ 1. Suppose that there exists a state ρ that is more controllable than every state.In particular, ρ must be more controllable than every pure state ψ.By proposition 3, ρ must be pure and there exists a reversible transformation U such that ρ = U −1 ψ.Hence, the pure state ψ = Uρ is more controllable than every state.Since ψ is generic, this proves that every pure state is more controllable than every other state, i.e. the theory is canonical.
Starting from Hardy's work [43], the transitivity of the reversible channels on pure states has featured in a number of axiomatizations of quantum theory, either directly as an axiom [76][77][78] or indirectly as a consequence of an axiom [48].Proposition 4 provides a new angle on this axiom, as a necessary and sufficient condition for a well-behaved theory of purity, and, ultimately, for a wellbehaved thermodynamics.

C. Maximally mixed states
In a canonical theory of purity, we say that ρ is purer than ρ iff ρ ρ and we adopt the notation ρ pur ρ .In this case, we also say that ρ is more mixed than ρ, denoted by ρ mix ρ.When ρ mix ρ and ρ mix ρ we say that ρ and ρ are equally mixed, denoted by ρ mix ρ .Clearly, every two states that differ by a reversible channel are equally mixed.We say that a state χ ∈ St (A) is maximally mixed iff it satisfies the property Maximally mixed states can be characterized as the states that are invariant under all reversible channels: Proposition 5 A state χ ∈ St (A) is maximally mixed if and only if it is invariant, i.e. if and only if χ = Uχ for every reversible channel U : A → A.
The proof is straightforward.Note that maximally mixed states do not exist in every theory: for example, for infinite-dimensional quantum systems there is no maximally mixed density operator, i.e. no trace-class operator that is invariant under the action of the full unitary group.For finite-dimensional canonical theories, however, the maximally mixed state exists and is unique under the standard assumption of compactness of the state space [47,77].In this case, the state χ is not only a maximal element, but also the maximum of the relation mix , that is, This is in analogy with the quantum case, where the maximally mixed state is given by the density matrix χ = I/d, where I is the identity operator on the system's Hilbert space and d is the Hilbert space dimension.Another example of finite-dimensional canonical theory is provided by the square bit: Example 4 Consider a system whose state space is a square, as in Fig. 1c and pick a generic (mixed) state ρ.
The states that are more mixed than ρ are obtained by applying all possible reversible transformations to ρ (i.e.all the elements of the dihedral group D 4 ) and taking the convex hull of the orbit.The set of all states that are more mixed than ρ form an octagon depicted in blue in Fig. 2. All the vertexes of the octagon are equally mixed.The centre of the square is the maximally state χ, the unique invariant state of the system.

VI. ENTANGLEMENT-THERMODYNAMICS DUALITY
In quantum theory, it is well known that the ordering of pure bipartite states according to the degree of entanglement is equivalent to the ordering of their marginals Figure 2: Mixedness relation for the state space of a square bit: vertices of the octagon represent the states that can be reached from a given state ρ via reversible transformations.Their convex hull is the set of states that are more mixed than ρ.Note that it contains the invariant state χ, which can be characterized as the maximally mixed state.
according to the degree of mixedness [31][32][33][34]63].In this section we will prove the validity of this equivalence based only on first principles.The key axiom to prove this equivalence is the purification principle.

A. Purification
In order to establish the entanglementthermodynamics duality, we consider theories that satisfy Purification [47,48], which expresses a strengthened version of the principle of conservation of information [72].Purification characterizes all the physical theories admitting a description where all processes are pure and reversible at a fundamental level [72,79].In a causal theory, one can define marginal states.

Definition 5
The marginal state of a bipartite state ρ AB on system A is the state ρ A := Tr B ρ AB obtained by applying the deterministic effect on B.
Purification ensures that every state arises as a marginal state of a pure state of a larger system: Axiom 3 (Purification [47,48]) Every state has a purification, unique up to reversible channels on the purifying system, namely for every two pure states Ψ, Ψ ∈ PurSt (A ⊗ B) such that for some reversible transformation U : B → B.
Purification has a lot of important consequences [47].First of all, purification implies that every two pure states are connected to one another by a reversible transformation: Proposition 6 For every system B and every pair of pure states ψ, ψ ∈ PurSt (B) there is a reversible channel U : B → B such that ψ = Uψ.
The existence of a reversible transformation connecting ψ and ψ is a consequence of the uniqueness of purification [Eq.( 11)], in the special case where A is the trivial system [47] (no wire corresponding to system A).Since all pure states are equivalent under reversible transformations, every theory with purification gives rise to a canonical theory of purity, in the sense of subsection V B.
Another important consequence of Purification is the steering property, stating that every ensemble decomposition of a given state can be induced by a measurement on its purifying system: Proposition 7 (Steering) Let ρ be a state of system A and let Ψ ∈ PurSt (A ⊗ B) be a purification of ρ.For every ensemble of states {ρ i } i∈X such that i ρ i = ρ, there exists a measurement {b i } i∈X on the purifying system B and such that See theorem 6 of Ref. [47] for the proof.The steering property will turn out to be essential in establishing the duality between entanglement and thermodynamics.

B. One-way protocols transforming pure states into pure states
The operational Lo-Popescu theorem guarantees that every LOCC protocol acting on a pure bipartite input state can be simulated by a one-way protocol.We now consider one-way protocols where both the input and the output states are pure.In this case, Purification guarantees that every such protocol can be simulated by a one-way protocol where all conditional operations are reversible.
Lemma 2 Let Ψ and Ψ be two pure states of A ⊗ B. Under the validity of Purification and Purity Preservation, every one-way protocol transforming Ψ into Ψ can be simulated by a one-way protocol where all conditional operations are reversible.
Proof.Suppose that Ψ can be transformed into Ψ via a one-way protocol where Alice performs a test {A i } i∈X and Bob performs a channel B (i) conditional on outcome i.By definition, we have Since Ψ is pure, this implies that there exists a probability distribution {p i } such that for every outcome i.Now, without loss of generality each transformation A i can be assumed to be pure (if not, one can always decompose it into pure transformations).Then, Purity Preservation guarantees that the normalized state Ψ i defined by is pure.With this definition, Eq. ( 12) becomes Tracing out system B on both sides one obtains Tr , the second equality coming from the normalization of the channel B (i) (cf.proposition 1).Hence, the pure states Ψ i and Ψ have the same marginal on A. By the uniqueness of Purification, they must differ by a reversible channel U (i) on the purifying system B, namely In conclusion, we obtained , where we used Eqs.( 12), ( 14), and (13).In other words, the initial protocol can be simulated by a protocol where Alice performs the test {A i } and Bob performs the reversible transformation U (i) conditionally to the outcome i.
The reduction to one-way protocols with reversible operations is the key to connect the resource theory of entanglement with the resource theory of dynamical control, defined earlier in subsection V A.
C. The more entangled a pure state, the more mixed its marginals Lemma 3 Let Ψ and Ψ be two pure states of system A ⊗ B and let ρ, ρ and σ, σ be their marginals on system A and B, respectively.Under the validity of Purification, Purity Preservation, and Local Exchangeability, if Ψ is more entangled than Ψ , then ρ (resp.σ) is more mixed than ρ (resp.σ ).
Proof.By the operational Lo-Popescu theorem, we know that there exists a one-way protocol transforming Ψ into Ψ .Moreover, thanks to Purification, the conditional operations in the protocol can be chosen to be reversible (lemma 2).Let us choose a protocol with classical communication from Alice to Bob, in which Alice performs the test {A i } i∈X and Bob performs the reversible transformation U (i) conditional on outcome i.Since Ψ is pure, we must have where {p i } is a suitable probability distribution.Denoting by V (i) the inverse of U (i) and applying it on both sides of the equation, we obtain Summing over all outcomes the equality becomes with A := i A i and R := i p i V (i) .Finally, we obtain where we used the normalization of channel A in the second equality and Eq. ( 15) in the third.Since R is a RaRe channel by construction, we proved that σ is more mixed than σ .The fact that ρ is more mixed than ρ can be proven by the same argument, starting from a one-way protocol with classical communication from Bob to Alice.
The relation between degree of entanglement of a pure state and degree of mixedness of its marginals holds not only for bipartite states, but also for multipartite states.Indeed, suppose that Ψ and Ψ are two pure states of system A 1 ⊗ A 2 ⊗ • • • ⊗ A N and that Ψ is more entangled than Ψ , in the sense that there exists a (multipartite) LOCC protocol converting Ψ into Ψ .For every subset S ⊂ {1, . . ., N } one can define A := n ∈S A n and B := n∈S A n and apply lemma 3.As a result, one obtains that the marginals of Ψ are more mixed than the marginals of Ψ on every subsystem.D. The more mixed a state, the more entangled its purification We now prove the converse direction of the entanglement-thermodynamics duality: if a state is more mixed than another, then its purification is more entangled.Remarkably, the proof of this fact requires only the validity of Purification.
Lemma 4 Let ρ and ρ be two states of system A and let Ψ, Ψ ∈ PurSt (A ⊗ B) be two purifications of ρ, ρ , respectively.Under the validity of Purification, if ρ is more mixed than ρ , then Ψ is more entangled than Ψ .

Proof. By hypothesis, one has
for some RaRe channel R := i p i U (i) .Let us define the bipartite state Θ as By construction, Θ is an extension of ρ: indeed, one has Let us take a purification of Θ, say Clearly, Γ is a purification of ρ, since one has Then, the uniqueness of purification implies that Γ must be of the form for some reversible transformation U and some pure state γ.In other words, Ψ can be transformed into Γ by local operations on Bob's side.Now, Eq. ( 16) implies that the states p i U (i) ⊗ I B Ψ i∈X are an ensemble decomposition of Θ.Hence, the steering property (proposition 7) implies that there exists a measurement {c i } i∈X such that Combining Eqs. ( 17) and (18), we obtain the desired result.
where {B i } i∈X is the test defined by In conclusion, if the marginal state of Ψ is more mixed than the marginal state of Ψ , then Ψ can be converted into Ψ by a (one-way) LOCC protocol.

E. The duality
Combining lemmas 3 and 4 we identify the degree of entanglement of a pure bipartite state with the degree of mixedness of its marginals: Theorem 2 (Entanglement-thermodynamics duality) Let Ψ and Ψ be two pure states of system A ⊗ B and let ρ, ρ and σ, σ be their marginals on system A and B, respectively.Under the validity of Purification, Purity Preservation, and Local Exchangeability, the following statements are equivalent: 1. Ψ is more entangled than Ψ 2. ρ is more mixed than ρ 3. σ is more mixed than σ .Here the map implementing the duality is (a choice of) the purification.Such a map cannot be realized as a physical operation [47].Instead, it corresponds to the theoretical operation of modelling mixed states as marginals of pure states.

VII. CONSEQUENCES OF THE DUALITY
In this section we discuss the simplest consequences of the entanglement-thermodynamics duality, including the relation between maximally mixed and maximally entangled states, as well as a link between information erasure and generation of entanglement.From now on, the axioms used to derive the duality will be treated as standing assumptions and will not be written explicitly in the statement of the results.

A. Maximally entangled states
As a consequence of the duality, there exists a correspondence between maximally mixed and "maximally entangled" states, the latter being defined as follows Definition 6 A pure state Φ of system A ⊗ B is maximally entangled if no other pure state of A ⊗ B is more entangled than Φ, except for the states that are equivalent to Φ under local reversible transformations-i.e. if for every Ψ ∈ PurSt (A ⊗ B) one has for some reversible transformations U : A → A and V : B → B.
Theorem 2 directly implies the following.

Corollary 1
The purification of a maximally mixed state is maximally entangled.
Proof.Suppose that Ψ ∈ PurSt (A ⊗ B) is more entangled than Φ, where Φ is a purification of the maximally mixed state of system A (assuming such a state exists for system A).By theorem 2, the marginal of Ψ on system A, denoted by ρ, must satisfy ρ mix χ.Since χ is maximally mixed, this implies ρ = χ.The uniqueness of purification then implies the condition Ψ = (I A ⊗ V B ) Φ for some reversible transformation V B on B.
Under standard assumptions [47,77], the maximally mixed state is not only a maximal element of the mixedness relation, but also the maximum-i.e. it is more mixed than every state [cf.Eq. (10)].Under the same standard assumptions, it is immediate to obtain that the purification of a maximally mixed state is more entangled than every state, namely The relation follows directly from theorem 2 when Σ is a pure state, and in the general case can be proved by convexity, using the fact that the set of LOCC channels is closed under convex combinations.

B. Duality for states on different systems
Theorem 2 concerns the convertibility of states of the same system.To generalize it to arbitrary systems, it is enough to observe that the tensor product with local pure states does not change the degree of entanglement: for arbitrary pure states Ψ, α , and β of systems A ⊗ B, A , and B one has relative to the bipartition (A ⊗ A) ⊗ (B ⊗ B ).As a consequence, one has the equivalence for arbitrary pure states α, α , β, β of A, A , B, B , respectively.This fact leads directly to the generalization of the duality to states of different systems: Corollary 2 Let Ψ and Ψ be two pure states of systems A ⊗ B and A ⊗ B , respectively, and let ρ, ρ , σ and σ be their marginals on system A, A , B and B , respectively.Under the validity of Purification, Purity Preservation, and Local Exchangeability, the following statements are equivalent: 1. Ψ is more entangled than Ψ 2. ρ ⊗ α is more mixed than α ⊗ ρ for every pair of pure states α ∈ PurSt (A) and α ∈ PurSt (A ).
The duality is now implemented by the operation of discarding systems and preparing pure states, as illustrated by the commutative diagrams At this point, a cautionary remark is in order.Inspired by Eq. ( 20) one may be tempted compare the degree of mixedness of states of different systems, by postulating the relation for arbitrary states ρ and arbitrary pure states α .The appeal of this choice is that the duality would maintain the simple form even for states of different systems.However, Eq. ( 21) would trivialize the resource theory of purity: as a special case, it would imply the relation 1 mix α for a generic pure state of a generic system, meaning that pure states can be freely generated from nothing and can be freely erased.Since in a canonical theory of purity the pure states are the most resourceful ones, having pure states for free would mean having every state for free.

C. Information erasure and entanglement generation
The entanglement-thermodynamics duality establishes a link between the two tasks of erasing information and generating entanglement.By "erasing information" we mean resetting a mixed state to a pure state of the same system (cf.[66]).Clearly, erasure is a costly operation in the resource theory of purity: there is no way to reset a mixed state to a pure state using only RaRe channels (cf.proposition 3).The corresponding operation in the resource theory of entanglement is the generation of entangled states from product states.By the duality, the impossibility of erasing information by RaRe channels and the impossibility of generating entanglement by LOCC are one and the same thing.
Let us consider now the task of erasure assisted by a catalyst, represented by a system C whose state remains unaffected by the erasure operation.In this case, the operation of erasure transforms the product state ρ ⊗ γ ∈ St (A ⊗ C) into the state α 0 ⊗ γ for some pure state α 0 ∈ PurSt (A).By duality, it is immediate to see that catalyst-assisted erasure is equivalent to catalyst-assisted entanglement generation: Corollary 3 Let Ψ and Γ be two pure states of systems A ⊗ B and C ⊗ D, respectively, and let ρ and γ be their marginals on systems A and C, respectively.Then, the following are equivalent 1. ρ can be erased by a RaRe channel using γ as a catalyst 2. Ψ can be generated by a LOCC channel using Γ as a catalyst.Now, it is well known that that in quantum mechanics no LOCC protocol can generate entanglement from nothing, even in the presence of a catalyst.Equivalently, no random-unitary channel can erase information, even in the presence of a catalyst.Quite surprisingly, however, these impossibilities do not seem to follow from the axioms used to derive the duality: in principle, there may exist theories that satisfy Purification, Purity-Preservation, and Local Exchangeability and nevertheless allow one to catalytically erase information and generate entanglement for free.In such theories, the catalysts would behave like "entropy sinks", which can absorb mixed states without becoming more mixed, or like "entanglement reservoirs", from which entanglement can be borrowed indefinitely.This scenario can be excluded at the level of first principles, by postulating the following Axiom 4 (No Entropy Sinks) Random reversible dynamics cannot erase information, even with the assistance of a catalyst.Axiom 4 is a necessary requirement for a quantitative theory of entanglement.Indeed, suppose that one is trying to define a measure of entanglement M that i) assigns a non-zero value to every entangled state and ii) it is additive (or super-additive) on product of entangled states, namely Clearly, such measure can exist only in theories satisfying Axiom 4. A similar consideration holds for measures of mixedness, which ultimately provide the foundation of the notion of entropy and of a quantitative theory of thermodynamics.Axiom 4 will also lead to an elegant reformulation of our results, which sheds light to the physical picture underlying the entanglement-thermodynamics duality.

VIII. SYMMETRIC PURIFICATION
Here we show that the axioms of Purification, Local Exchangeability, and, No Entropy Sinks lead to a new property, namely the existence of a symmetric purification for every state.By a symmetric purification of the state ρ ∈ St (A) we mean a purification Ψ ∈ PurSt (A ⊗ A), where marginals on both systems are equal to ρ.By adding the requirement that the purifications of a state should be defined uniquely up to local reversible transformations, we obtain the following for some reversible channel U. Since all purifications of ρ are equivalent to Ψ under local operations and since Ψ is locally swappable, we conclude that every purification of ρ is locally exchangeable [by the same argument used in Eq. ( 4)].Therefore Symmetric Purification is a stronger assumption than Local Exchangeability.However, in the present setting we need not assume Symmetric Purification, because it is a consequence of the other axioms, as shown in the the following.Proof.Let ρ be a pure state of system A, and let Ψ ∈ PurSt (A ⊗ B) be one of its purifications.By Local Exchangeability there exist two channels C and D such that Now, in a theory satisfying Purification, every channel can be realized through a reversible transformation acting on the system and on an environment, initially in a pure state and finally discarded [47].In particular, channel C can be realized as where E and E are suitable systems, U is a reversible transformation, and η is a pure state.Similarly, channel D can be realized as or, equivalently, Discarding system E one obtains for some suitable state Σ.Since the l.h.s. is a pure state, Σ must be a pure state.Now discard the second system A and system F .We have, recalling Eq. ( 23), Therefore Σ is a symmetric purification of ρ.By the uniqueness of the purification, all symmetric purifications of ρ differ by local reversible transformations.Since ρ is arbitrary, we conclude that every state has a symmetric purification, unique up to local reversible transformations.

IX. CONCLUSIONS
In this paper we explored the links between entanglement and thermodynamics in general probabilistic theories.Specifically, we focussed on the relation between LOCC transformations of pure bipartite states and random reversible transformations of mixed single-system states.We associated random-reversible transformations to a resource theory of dynamical controllability, which can be used to order single-system states.The resource theory of controllability can be interpreted as a resource theory of purity when the set of reversible transformations acts transitively on the set of pure states of the system.Using these notions, one can formulate a general duality between bipartite entanglement and an singlesystem purity, whereby a pure bipartite state is more entangled than another if and only if the marginal states of the latter are purer than the marginal states of the former.
After formulating the duality, we set out to prove it.Our proof follows from a set of operational axioms, including Causality, Purity Preservation, and Purification-plus a new axiom, which we named Local Exchangeability.As the first step in our proof, we established a Lo-Popescu theorem for general probabilistic theories, showing that every LOCC protocol with a pure input state can be replaced by a one-way protocol producing the same output state.Interestingly, this result does not need Purification.
Finally, we used the duality to explore the task of erasing information.We found out that our four axioms are not enough to guarantee that information cannot be erased for free: a priori, a theory satisfying the axioms could have entropy sinks, which can absorb mixed states without becoming more mixed themselves.To exclude this possibility, we formulated the absence of entropy sinks as an axiom.Surprisingly, in the context of our four axioms, the No Entropy Sinks requirement turns out to be equivalent to a stronger version of Purification, in which every state admits a purification where the purified and purifying system are identical.This fact strikes us as an additional piece of evidence that Purification plays an essential role in the axiomatic definition of a sensible theory of thermodynamics.
the set of states of system A • Eff (A) the set of effects on A • Transf (A, B) the set of transformations from A to B • A ⊗ B the composition of systems A and B.

r
i=1 ⊂ H B are orthonormal vectors.This implies the relation SWAP|Ψ = (C ⊗ D) |Ψ , where C and D are the partial isometries C := r i=1 |β i α i | and D := r i=1 |α i β i |.

5 )
This proves the Local Exchangeability property for all pure bipartite states in the 2-setting/2-outcome scenario.The situation is analogous in the case of 2 settings and arbitrary number of outcomes.If x, y ∈ {0, 1}, and a can take d A values and b can take d B values, all extreme non-local correlations are characterized in Ref. [61].Up to local reversible transformations, they are labeled by a parameter k ∈ {2, . . ., min {d A , d B }} and they are such that p ab|xy = 1 k b − a ≡ xy mod k 0 otherwise .(Thanks to the local equivalence, it is enough to prove the validity of Local Exchangeability for correlations in the standard form of Eq. (5).We distinguish between the two cases xy = 0 and xy = 1.For xy = 0, swapping x with y and a with b has no effect on p ab|xy .For xy = 1, by swapping x with y and a with b, one obtains the probability distribution p ab|xy = 1 k a − b ≡ 1 mod k 0 otherwise This probability distribution can be obtained from the original one by relabeling the outputs as a := k − a and b := k − b.Such relabeling corresponds to local reversible operations on A and B. In other words, Local Exchangeability holds.
(a) Example of state space in a theory of dynamical control which is not a theory of purity (b) Example of state space in a theory of purity, which is not canonical.

Proof.
The implications 1 =⇒ 2 and 1 =⇒ 3 follow from lemma 3 and require the validity of all the three axioms.The implications 2 =⇒ 1 and 3 =⇒ 1 follow from lemma 4 and require only the validity of Purification.The duality can be illustrated by the commutative diagrams operationally by discarding one of the component systems.Another illustration of the duality is via the diagram

Axiom 5 (
Symmetric Purification) Every state has a symmetric purification.All the purifications of a given state are equivalent up to local reversible transformations.Clearly, symmetric purifications are locally exchangeable: indeed, if Ψ is a symmetric purification one has Ψ

Proposition 8
Every theory satisfying Purification, Purity Preservation, Local Exchangeability, and No Entropy Sinks satisfy Symmetric Purification.

)F
Inserting the realizations of C and D in the local exchangeability condition, we obtain for some pure state Γ.The above equation shows that the state Γ can be generated by LOCC using Ψ as a catalyst.By the No Entropy Sinks requirement, we have that Γ must be a product state, i.e.Γ = η ⊗ ϕ for two pure states η and ϕ .Hence, the local exchangeability condition becomes Proposition 2 Let U ∈ Transf (A, B) be a reversible channel.Then a state ψ ∈ St (A) is pure if and only if the state Uψ ∈ St (B) is pure.