Minimising the heat dissipation of quantum information erasure

M Hamed Mohammady; Masoud Mohseni; Yasser Omar

doi:10.1088/1367-2630/18/1/015011

1. Introduction

1.1. Information erasure and thermodynamics

In his attempt to exorcise Maxwell's demon [1, 2], Leo Szilard conceived of an engine [3] composed of a box that is in thermal contact with a reservoir at temperature T, and contains a single gas particle. By placing a partition in the middle of the box and determining on which side of this the particle is located, the Maxwellian demon can attach to said partition a weight-and-pulley system so that, as the gas expands, the weight is elevated. By ensuring that the partition moves without friction, and continuously adjusting the weight to make the process quasi-static, one may fully convert ${k}_{B}T\mathrm{log}(2)$ units of heat energy from the gas into work. Here, k_B is Boltzmann's constant and $\mathrm{log}(\cdot )$ is the natural logarithm. In order to save the second law of thermodynamics the engine must dissipate at least ${k}_{B}T\mathrm{log}(2)$ units of energy to the thermal reservoir as heat. While it was initially believed that this heat dissipation is due to the measurement act by the Maxwellian demon, following the work of Landauer, Penrose, and Bennet [4–7] the responsible process was identified as the erasure of information in the demon's memory—the logically irreversible process of assigning a prescribed value to the memory, irrespective of its prior state. That the minimum heat dissipation required to erase one bit of information cannot be any smaller than ${k}_{B}T\mathrm{log}(2)$ is commonly known as Landauer's principle, and said minimum quantity as Landauer's limit. In general, Landauer's principle may be encapsulated by the Clausius inequality

$\begin{eqnarray}&&{\rm{\Delta }}Q\geqslant {k}_{B}T{\rm{\Delta }}S,\end{eqnarray} \tag{ 1.1 }$

where ${\rm{\Delta }}Q$ is the heat dissipation to the thermal reservoir and ${\rm{\Delta }}S$ is the entropy reduction in the object of information erasure.

1.2. Thermodynamics in the quantum regime

Recent years have been witness to a growing interest in thermodynamics and statistical mechanics in the quantum regime (see [8, 9] for a review). This has lead to a lively debate regarding the definition of two central concepts in thermodynamics—work and heat—within the framework of quantum theory. In classical physics, the work done during a process is defined as the increase in useful, ordered energy. Conversely, the heat dissipated during a process is the increase in unusable, disordered energy. In Szilard's engine, for example, work is characterised as the (deterministic) elevation of a weight, and hence the increase of its gravitational potential energy. The heat dissipated, on the other hand, would be stored as kinetic energy in the random motion of the atoms that constitute Szilard's engine, as well as the environment. This clear distinction fails in quantum mechanics, which is an inherently probabilistic theory.

Broadly speaking, work may be characterised in two different ways: (i) -deterministic work [10, 11]; and (ii) average work [12, 13]. In either case, one may include the work storage device—a quantum analogue of the elevated weight in Szilard's engine—explicitly in the formalism, such as [14, 15]. This is not always done, and one may directly examine the energy change in the system under consideration. In the -deterministic framework, the work of a process is defined as the difference in energy measurement outcomes on the system (or work storage device), observed prior and posterior to the process. The -deterministic work is then the maximum value of work, thus defined, which occurs with a probability of at least $1-\epsilon$ . Meanwhile, average work is given as either the difference in expectation values of energy, or the difference in the free energies, of the system (or work storage device) observed prior and posterior to the process. The difference in average energy can be converted to the difference in free energy by subtracting the von Neumann entropy of the system, multiplied by the temperature, from its average energy.

Definitions of heat can similarly be broadly classified into two categories: (i) where the thermal reservoir is treated extrinsically [12, 16]; and (ii) where the thermal reservoir is treated intrinsically [17, 18] . If the thermal reservoir is treated extrinsically, whereby it does not explicitly appear in the framework as a quantum system susceptible to change and examination, heat is a property of the system of interest. One may therefore define heat after having determined work—that is to say, given the change in total energy of the system, ${\rm{\Delta }}E$ , and the work, ${\rm{\Delta }}W$ , the heat ${\rm{\Delta }}Q$ is given by the first law of thermodynamics as ${\rm{\Delta }}Q={\rm{\Delta }}E-{\rm{\Delta }}W$ . Alternatively, Landauer's principle may be invoked to get a lower bound of heat dissipation, given that the system has undergone an entropy change of ${\rm{\Delta }}S$ . If the thermal reservoir is treated intrinsically, on the other hand, heat can be defined as the average energy change of the reservoir itself. In other words, heat is average work pertaining to the thermal reservoir. A thermal reservoir, considered intrinsically, is a system that is initially uncorrelated from every other system considered, and is prepared in a Gibbs state. We note that, from this perspective, treating the thermal reservoir with the Born Markov approximation would render it extrinsic; this is because the state of the reservoir, in the coarse-grained picture, is assumed to never change. As such, defining heat dissipation during a process as the average energy increase of the reservoir would lead one to conclude that no heat is dissipated at all. Indeed, the physical justification for the Born Markov approximation is that, at time-scales much shorter than that at which the system changes, the reservoir relaxes to its equilibrium state by interacting with an unseen and, hence extrinsic, environment. If this environment is explicitly accounted for quantum mechanically, then the total system will again evolve unitarily, and the energy increase of this environment has to also be accounted for.

In this article, we shall adopt the view that work is the change in average energy of the system. Moreover, whenever a thermal reservoir is mentioned, we will consider it intrinsically and include it as part of the system under investigation. The work storage device, however, is considered extrinsically: by the first law of thermodynamics we take as a priori the notion that the change in average energy of the system—including the reservoir if it is present—must come from an external energy source. This total change in average energy is defined as the work done by the extrinsic work storage device. If the total system is composed of an object and thermal reservoir, each with a well-defined Hamiltonian, then the portion of this work that is taken up by the object is called the work done on the object, and the portion taken by the reservoir is called the heat dissipated to the reservoir. If the total system is thermal, then the entirety of the work done by the extrinsic work storage device is defined as heat.

1.3. A quantum mechanical Landauer's principle

The surge of interest in quantum thermodynamics has included attempts to consider Landauer's principle quantum mechanically [18–24]. Most notable among such efforts is that of Reeb and Wolf [25], who provide a fully quantum statistical mechanical derivation of Landauer's principle by considering the process of reducing the entropy of a quantum object by its joint unitary evolution with a thermal reservoir. Here, they consider heat dissipation as the average energy increase of the reservoir, which is initially in a Gibbs state and is not correlated with the object. For a reservoir with a Hilbert space of finite dimension ${d}_{{ \mathcal R }}$ , they arrive at an equality form of Landauer's principle

$\begin{eqnarray}&&{\rm{\Delta }}Q={k}_{B}T\left({\rm{\Delta }}S+I{({ \mathcal O }:{ \mathcal R })}_{{\rho }^{\prime }}+S({\rho }_{{ \mathcal R }}^{\prime }\parallel {\rho }_{{ \mathcal R }}(\beta ))\right),\end{eqnarray} \tag{ 1.2 }$

where $I{({ \mathcal O }:{ \mathcal R })}_{{\rho }^{\prime }}$ is the mutual information between object and reservoir after the joint evolution, and $S({\rho }_{{ \mathcal R }}^{\prime }\parallel {\rho }_{{ \mathcal R }}(\beta ))$ is the relative entropy between the post-evolution state of the reservoir and its initial state at thermal equilibrium. As the mutual information and relative entropy terms are non-negative, this implies Landauer's principle. While equation (1.2) always yields the exact heat dissipation, it involves terms that are cumbersome to calculate and, perhaps more importantly, it is not a function of ${\rm{\Delta }}S$ alone. As such, Reeb and Wolf provide an inequality form of Landauer's principle

$\begin{eqnarray}&&{\rm{\Delta }}Q\geqslant {k}_{B}T({\rm{\Delta }}S+M({\rm{\Delta }}S,{d}_{{ \mathcal R }})),\end{eqnarray} \tag{ 1.3 }$

where $M({\rm{\Delta }}S,{d}_{{ \mathcal R }})$ is a non-negative correction term that vanishes in the limit as ${d}_{{ \mathcal R }}$ tends to infinity.

1.4. The need for a context-dependent Landauer's principle

The study in [25] provides a lower bound of energy transferred to the thermal reservoir as heat dissipation, given that the object's entropy decreases by ${\rm{\Delta }}S$ and that the reservoir's Hilbert space dimension is ${d}_{{ \mathcal R }}$ . The crucial point however is that this lower bound can be obtained for some physical context, but not all of them. By physical context, we mean the tuple $({{ \mathcal H }}_{{ \mathcal O }},{\rho }_{{ \mathcal O }},{{ \mathcal H }}_{{ \mathcal R }},{H}_{{ \mathcal R }},T)$ . Here ${{ \mathcal H }}_{{ \mathcal O }}$ and ${\rho }_{{ \mathcal O }}$ are respectively the Hilbert space and state of the object, while ${{ \mathcal H }}_{{ \mathcal R }}$ , ${H}_{{ \mathcal R }}$ , and T are respectively the Hilbert space, Hamiltonian, and temperature of the reservoir. For example, one way to achieve the lower bound of equation (1.3) is for the object and reservoir to have the same Hilbert space dimension, allowing us to perform a swap map between them; this will take the mutual information term in equation (1.2) to zero. The next step of the optimisation would be to pick a specific ${\rho }_{{ \mathcal O }}$ , ${H}_{{ \mathcal R }}$ and T so as to minimise the relative entropy term. Conversely, for a given physical context such inequalities may prove less instructive. Indeed, if it is impossible to achieve the lower bound of equation (1.3) in a given experimental setup, in what sense can we consider this as the lowest possible heat dissipation due to information erasure? In this study, therefore, we aim to approach the problem of information erasure from the dual perspective: given a physical context, what is the minimum heat that must be dissipated in order to achieve a certain level of information erasure. This context-dependent Landauer's principle will be characterised by the equivalence class of unitary operators that achieve our task. Of course, this first requires a re-examination of what exactly we mean by information erasure.

1.5. Information erasure: pure state preparation and entropy reduction

In this article, we take information erasure to be synonymous with pure state preparation; just as in classical mechanics erasure (in the Landauer sense) involves the many-to-one mapping on the information bearing degrees of freedom, then in quantum mechanics this translates naturally as the irreversible process of preparing the object in a pure state. Probabilistic information erasure, then, refers to the case where the probability of preparing the object in the desired pure state is lower than unity. Although erasing the information of an object as presently defined leads to a reduction of its entropy, the two processes are not quantitatively the same. If we wish to maximise the largest eigenvalue in the object's probability spectrum, thereby maximising the probability of preparing it in a given pure state, in general we need not minimise its entropy to do so; the only cases where maximising the probability of information erasure leads to minimising the entropy are when the object has a two-dimensional Hilbert space, or where we are able to fully purify the object and thereby take its entropy to zero. In general, then, a given probability of information erasure is compatible with many different values of entropy reduction. By choosing the smallest entropy reduction, one would expect that we may minimise the consequent heat dissipation, as per equation (1.2). Consequently, our desired task can be stated as the minimisation of heat dissipation given probabilistic information erasure—that is to say, of minimising the amount of energy transferred to the thermal reservoir as heat if we require that the probability of preparing the object in a specific pure state $\left|{\varphi }_{1}\right.\rangle$ be no smaller than ${p}_{{\varphi }_{1}}^{\mathrm{max}}-\delta$ . Here ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ is the maximum probability of information erasure that is permissible by the physical context, and $\delta \geqslant 0$ the error. We will refer to the equivalence class of unitary operators that achieve this as $[{U}_{\mathrm{opt}}(\delta )]$ . If the object also has a non-trivial Hamiltonian, then to further reduce the total work cost of information erasure, conditional on first minimising the heat dissipation, we may further optimise the unitary operators within the equivalence class $[{U}_{\mathrm{opt}}(\delta )]$ so that the state of the object is made to be passive [26, 27], and with as small an expected energy value as possible. This reduced equivalence class is referred to as $[{U}_{\mathrm{opt}}^{{\rm{p}}}(\delta )]$ .

1.6. Information erasure and information processing

Reducing the heat dissipation due to information erasure is important for both classical and quantum information processing devices. As recent studies suggest [28], heat dissipation is a major limiting factor on the continual growth in the computational density of modern CMOS transistors. Meanwhile for quantum computation in the circuit-based model, error correction requires a constant supply of ancillary qubits, in pure states, for syndrome measurements. Indeed, the authors in [29] show that in the absence of such a constant supply the number of steps in which the computation can be performed fault tolerantly will be limited. Given a finite supply of ancillary qubits, we must constantly purify them during the execution of the algorithm. If the resulting heat dissipation leads to the intensification of thermal noise beyond the threshold for fault tolerance [30], then the computation will fail. A context-dependent Landauer's principle will thus prove especially important for information processing devices, in both classical and quantum architectures, where the structure of the reservoir Hamiltonian will usually be fixed. Furthermore, our work may be useful for certain high-performance, probabilistic (classical) information processing devices, that would operate at or near the quantum regime. Although the current state of the art in information processing devices dissipates heat orders of magnitude in excess of Landauer's limit, our ever increasing ability to control microscopic devices will mean that achieving such theoretical limits may be possible in the not-too-distant future. Indeed, experiments already exist, both in classical [31] and quantum [32] systems, which have achieved heat dissipation very close to Landauer's limit.

1.7. Layout of article

In section 2 we shall characterise the equivalence class of unitary operators acting on the composite system of object and reservoir, as a result of which the object undergoes probabilistic information erasure and, given this, the reservoir gains the minimal quantity of heat. If the object also has a non-trivial Hamiltonian, the unitary operators can be further optimised so as to reduce the energy gained by the object. Here, we operate within Landauer's framework—the object and reservoir are initially uncorrelated and the composite system evolves unitarily. We demonstrate, using a sequential swap algorithm introduced in section 2.5, the tradeoff between probability of information erasure and minimal heat dissipation; an increase in probability of preparing the object in a defined pure state is accompanied by an increase in the minimal heat that must be dissipated to the thermal reservoir. In section 3 we apply the general results to the case of erasing a maximally mixed qubit with the greatest allowed probability of success. Two reservoir classes will be considered: (i) a d-dimensional ladder system, where the energy gap between consecutive eigenstates is uniformly ω; and (ii) a spin chain with nearest-neighbour interactions, that is under a local magnetic field gradient. For both models, we shall also inquire into the effect of energy conserving, pure dephasing channels on the erasure process. In section 3.3, we determine the minimum quantity of heat that must be dissipated given full information erasure of a general qudit prepared in a maximally mixed state, in the limit of utilising an infinite-dimensional ladder system, which is a harmonic oscillator. In section 4 we shall address how information erasure can be achieved at a lower heat cost than Landauer's limit, by operating outside of Landauer's framework, but in such a way that terms like heat and temperature would continue to have referents in the mathematical description. In appendix A we provide a brief overview of certain key results from majorisation theory that will be used throughout the article. In appendix B we explain what an equivalence class of unitary operators constitutes. Finally, in appendix C we provide proofs for the main results.

2. Information erasure within Landauer's framework

2.1. The setup

We consider a system composed of an object, ${ \mathcal O }$ , with Hilbert space ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal O }}}$ and reservoir, ${ \mathcal R }$ , with Hilbert space ${{ \mathcal H }}_{{ \mathcal R }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal R }}}$ . Let the Hamiltonian of the reservoir be the self-adjoint operator ${H}_{{ \mathcal R }}={\displaystyle \sum }_{m=1}^{{d}_{{ \mathcal R }}}{\lambda }_{m}^{\uparrow }| {\xi }_{m}\rangle \langle {\xi }_{m}|$ , where ${{\boldsymbol{\lambda }}}^{\uparrow }:= {\{{\lambda }_{m}^{\uparrow }\}}_{m}$ is a non-decreasing vector of energy eigenvalues. This means that ${\lambda }_{i}^{\uparrow }\leqslant {\lambda }_{j}^{\uparrow }$ for any $i\lt j$ . Similarly, the object Hamiltonian is denoted ${H}_{{ \mathcal O }}$ . The compound system is initially in the uncorrelated state $\rho ={\rho }_{{ \mathcal O }}\otimes {\rho }_{{ \mathcal R }}(\beta )$ , where ${\rho }_{{ \mathcal O }}:= {\displaystyle \sum }_{l=1}^{{d}_{{ \mathcal O }}}{o}_{l}^{\downarrow }| {\varphi }_{l}\rangle \langle {\varphi }_{l}|$ is the initial state of the object, such that ${{\boldsymbol{o}}}^{\downarrow }:= {\{{o}_{l}^{\downarrow }\}}_{l}$ is a non-increasing vector of probabilities. This means that ${o}_{i}^{\downarrow }\geqslant {o}_{j}^{\downarrow }$ for any $i\lt j$ . Additionally, the reservoir is initially in the Gibbs state ${\rho }_{{ \mathcal R }}(\beta ):= {{\rm{e}}}^{-\beta {H}_{{ \mathcal R }}}/\mathrm{tr}[{{\rm{e}}}^{-\beta {H}_{{ \mathcal R }}}]$ at inverse temperature $\beta := {({k}_{B}T)}^{-1}\in (0,\infty )$ . Figure 1 represents the setup diagrammatically. Because of the ordering on the energy eigenvalues, we may represent this state as ${\rho }_{{ \mathcal R }}(\beta ):= {\displaystyle \sum }_{m\quad =\quad 1}^{{d}_{{ \mathcal R }}}{r}_{m}^{\downarrow }| {\xi }_{m}\rangle \langle {\xi }_{m}|$ , such that ${{\boldsymbol{r}}}^{\downarrow }:= {\{{r}_{m}^{\downarrow }\}}_{m}$ is a non-increasing vector of probabilities. For simplicity, we write the initial state ρ in the equivalent form

$\begin{eqnarray}&&\rho =\displaystyle \sum _{l=1}^{{d}_{{ \mathcal O }}}\displaystyle \sum _{m=1}^{{d}_{{ \mathcal R }}}{o}_{l}^{\downarrow }{r}_{m}^{\downarrow }| {\varphi }_{l}\rangle \langle {\varphi }_{l}| \otimes | {\xi }_{m}\rangle \langle {\xi }_{m}| \equiv \displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal R }}}{p}_{n}^{\downarrow }| {\psi }_{n}\rangle \langle {\psi }_{n}| ,\end{eqnarray} \tag{ 2.1 }$

where the non-increasing vector ${{\boldsymbol{p}}}^{\downarrow }:= {\{{p}_{n}^{\downarrow }\}}_{n}$ is the ordered permutation of ${\{{o}_{l}^{\downarrow }{r}_{m}^{\downarrow }\}}_{l,m}$ , and ${\{\left|{\psi }_{n}\right.\rangle \in {{ \mathcal H }}_{{ \mathcal O }}\otimes {{ \mathcal H }}_{{ \mathcal R }}\}}_{n}$ the associated permutation of ${\{\left|{\varphi }_{l}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle \}}_{l,m}$ . We note that this state representation is unique if and only if there are no degeneracies in the probability distribution ${{\boldsymbol{p}}}^{\downarrow }$ . We assume that the total system is thermally isolated, so that the process of information erasure will be characterised by a unitary operator U. The state of the system after the process is complete is therefore

$\begin{eqnarray}&&{\rho }^{\prime }:= U\rho {U}^{\dagger }=\displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal R }}}{p}_{n}^{\downarrow }U| {\psi }_{n}\rangle \langle {\psi }_{n}| {U}^{\dagger }.\end{eqnarray} \tag{ 2.2 }$

The marginal states of ${\rho }^{\prime }$ are ${\rho }_{{ \mathcal O }}^{\prime }:= {\mathrm{tr}}_{{ \mathcal R }}[{\rho }^{\prime }]$ and ${\rho }_{{ \mathcal R }}^{\prime }:= {\mathrm{tr}}_{{ \mathcal O }}[{\rho }^{\prime }]$ , where ${\mathrm{tr}}_{A}[\cdot ]$ represents the partial trace, of a composite system A + B, over the system A.

As the pure state we wish to prepare the object in is arbitrary up to local unitary operations, for simplicity we choose this to be $\left|{\varphi }_{1}\right.\rangle ;$ this is the eigenstate of ${\rho }_{{ \mathcal O }}$ with the largest eigenvalue, i.e., ${o}_{1}^{\downarrow }$ . The probability of preparing ${\rho }_{{ \mathcal O }}^{\prime }$ in the state $\left|{\varphi }_{1}\right.\rangle$ is defined as

$\begin{eqnarray}p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime }):= \langle {\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime }| {\varphi }_{1}\rangle & = & \displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal R }}}{p}_{n}^{\downarrow }\langle {\psi }_{n}| {U}^{\dagger }(| {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes {{\mathbb{1}}}_{{ \mathcal R }})U| {\psi }_{n}\rangle ,\\ & = & \displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal R }}}{p}_{n}^{\downarrow }{g}_{n}(U)\equiv {{\boldsymbol{p}}}^{\downarrow }\cdot {\boldsymbol{g}}({\boldsymbol{U}}),\end{eqnarray} \tag{ 2.3 }$

where ${\boldsymbol{g}}({\boldsymbol{U}})$ is a vector of positive numbers ${g}_{n}(U):= \langle {\psi }_{n}| {U}^{\dagger }(| {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes {{\mathbb{1}}}_{{ \mathcal R }})U| {\psi }_{n}\rangle$ such that $\sum }_{n}{g}_{n}(U)={d}_{{ \mathcal R }$ . In general, we wish to achieve $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })\geqslant {p}_{{\varphi }_{1}}^{\mathrm{max}}-\delta$ , where ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ is the maximum probability of information erasure permissible by the physical context, and $\delta \in [0,{p}_{{\varphi }_{1}}^{\mathrm{max}}-{o}_{1}^{\downarrow }]$ is the error. As we want the process to produce a larger $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ than ${o}_{1}^{\downarrow }$ , this will lead to a decrease in the von Neumann entropy of ${ \mathcal O }$ . The von Neumann entropy of a state ρ is $S(\rho ):= -\mathrm{tr}[\rho \mathrm{log}(\rho )]$ . We define the reduction in entropy of ${ \mathcal O }$ as ${\rm{\Delta }}S:= S({\rho }_{{ \mathcal O }})-S({\rho }_{{ \mathcal O }}^{\prime })$ .

The process is also assumed to be cyclic, meaning that the total Hamiltonian at the start of the process is identical with that at the end. As such, the total average energy consumption of the erasure protocol will be

$\begin{eqnarray}{\rm{\Delta }}E:= \mathrm{tr}[({H}_{{ \mathcal O }}+{H}_{{ \mathcal R }})({\rho }^{\prime }-\rho )] & = & \mathrm{tr}[{H}_{{ \mathcal O }}({\rho }_{{ \mathcal O }}^{\prime }-{\rho }_{{ \mathcal O }})]+\mathrm{tr}[{H}_{{ \mathcal R }}({\rho }_{{ \mathcal R }}^{\prime }-{\rho }_{{ \mathcal R }}(\beta )],\\ & = & {\rm{\Delta }}W+{\rm{\Delta }}Q.\end{eqnarray} \tag{ 2.4 }$

A positive ${\rm{\Delta }}E$ implies that the process requires energy from an external work storage device. Conversely, a negative ${\rm{\Delta }}E$ implies that the process produces energy that can, in turn, be stored in the work storage device. Here, ${\rm{\Delta }}W$ is the energy change in the object, which we call work done on the object, and ${\rm{\Delta }}Q$ the energy change in the reservoir, or the heat dissipated to the reservoir. As shown in [25, 33], these terms can also be written as

$\begin{eqnarray}&&\beta {\rm{\Delta }}W=S({\rho }_{{ \mathcal O }}^{\prime }\parallel {\rho }_{{ \mathcal O }}(\beta ))-S({\rho }_{{ \mathcal O }}\parallel {\rho }_{{ \mathcal O }}(\beta ))-{\rm{\Delta }}S,\end{eqnarray} \tag{ 2.5 }$

$\begin{eqnarray}&&\beta {\rm{\Delta }}Q={\rm{\Delta }}S+I{({ \mathcal O }:{ \mathcal R })}_{{\rho }^{\prime }}+S({\rho }_{{ \mathcal R }}^{\prime }\parallel {\rho }_{{ \mathcal R }}(\beta )),\end{eqnarray} \tag{ 2.6 }$

where $S(\rho \parallel \sigma ):= \mathrm{tr}[\rho (\mathrm{log}(\rho )-\mathrm{log}(\sigma ))]$ is the entropy of ρ relative to σ, and $I{(A:B)}_{\rho }:= S({\rho }_{A})+S({\rho }_{B})-S(\rho )$ is the mutual information of a state ρ of a bipartite system A + B. As we are only interested in cases where ${\rm{\Delta }}S$ is positive, we can infer from the non-negativity of the relative entropy and mutual information that ${\rm{\Delta }}Q$ is always positive for information erasure, even though ${\rm{\Delta }}W$ may be negative.

We wish to make the physical interpretation that ${\rm{\Delta }}Q$ is energy that is irreversibly lost during the information erasure process, and is hence qualitatively different in nature from ${\rm{\Delta }}W$ . For this to be true, it must be impossible to extract work from the reservoir, after the process is complete, by means of a cyclic unitary process involving the reservoir alone. This is satisfied if ${\rho }_{{ \mathcal R }}^{\prime }$ is passive, i.e., ${\rho }_{{ \mathcal R }}^{\prime }={\displaystyle \sum }_{m}{{r}_{m}^{\prime }}^{\downarrow }| {\xi }_{m}\rangle \langle {\xi }_{m}| ;$ that is to say, if ${\rho }_{{ \mathcal R }}^{\prime }$ is diagonal in the Hamiltonian eigenbasis, and its eigenvalues are non-increasing with respect to energy. If ${\rho }_{{ \mathcal R }}^{\prime }$ is not passive, as shown by [34] it is possible to extract a maximum amount of work, given as

$\begin{eqnarray}&&{\rm{\Delta }}{W}^{\mathrm{max}}:= \mathrm{tr}[{H}_{{ \mathcal R }}({\rho }_{{ \mathcal R }}^{\prime }-{\rho }_{{ \mathcal R }}^{\mathrm{passive}})],\end{eqnarray} \tag{ 2.7 }$

where ${\rho }_{{ \mathcal R }}^{\mathrm{passive}}$ has the same spectrum as ${\rho }_{{ \mathcal R }}^{\prime }$ , but is passive. As will be shown in the following sections, not only is it possible for ${\rho }_{{ \mathcal R }}^{\prime }$ to be passive, but this is always satisfied in the case of minimal heat dissipation. However, if the dimension of ${{ \mathcal H }}_{{ \mathcal R }}$ is at least three, and we have access to N copies of ${\rho }_{{ \mathcal R }}^{\prime }$ , it may be possible, for a sufficiently large N, to have the compound state ${{\rho }_{{ \mathcal R }}^{\prime }}^{\otimes N}$ be non-passive. This is called activation. Consequently, by keeping the reservoir systems after their utility in the erasure protocol, and then acting globally on this collection, we may be able to retrieve some energy. The only passive state which cannot be activated, no matter how many copies we have access to, is the Gibbs state [26]. However, ${\rho }_{{ \mathcal R }}^{\prime }$ will not in general be in a Gibbs state. To ensure that ${\rm{\Delta }}Q$ is truly lost, irrespective of what reservoir is used, we must impose an additional structure. The simplest method is to impose the condition that the reservoir system is irrevocably lost after the process is complete. For example, if the reservoir system ${ \mathcal R }$ is randomly chosen from an infinite collection of identical systems, but we do not know which particular system was used, then the probability of picking this system again at random, after the erasure protocol, will be vanishingly small.

2.2. Maximising the probability of information erasure

In appendix C.1 we prove that the maximum probability of information erasure is

$\begin{eqnarray}&&{p}_{{\varphi }_{1}}^{\mathrm{max}}:= \displaystyle \sum _{m=1}^{{d}_{{ \mathcal R }}}{p}_{m}^{\downarrow },\end{eqnarray} \tag{ 2.8 }$

and the equivalence class of unitary operators that achieve this, denoted $[{U}_{\mathrm{maj}}^{g}]$ , is characterised by the rule

$\begin{eqnarray}&&\mathrm{for}\ \mathrm{all}\ m\in \{1,...,{d}_{{ \mathcal R }}\},{U}_{\mathrm{maj}}^{g}\left|{\psi }_{m}\right.\rangle =\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{m}^{\prime }\right.\rangle ,\end{eqnarray} \tag{ 2.9 }$

where ${\{\left|{\xi }_{m}^{\prime }\right.\rangle \}}_{m}$ is an arbitrary orthonormal basis in ${{ \mathcal H }}_{{ \mathcal R }}$ . To see what we mean by an equivalence class of unitary operators, refer to appendix B. In other words, to maximise the probability of information erasure the unitary operator must take the ${d}_{{ \mathcal R }}$ vectors $\left|{\psi }_{m}\right.\rangle$ , that have the largest probabilities associated with them in the spectral decomposition of ρ, to the product vectors $\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{m}^{\prime }\right.\rangle$ . Similar results, leading to the conclusion that ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ in general cannot be brought to unity, have been reported in [25, 35–37].

A necessary and sufficient condition for ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ to be greater than the largest eigenvalue of the object's initial state, i.e, $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}):= {o}_{1}^{\downarrow }$ , is that ${o}_{2}^{\downarrow }{r}_{1}^{\downarrow }$ be greater than ${o}_{1}^{\downarrow }{r}_{{d}_{{ \mathcal R }}}^{\downarrow }$ . If this were not the case, the ${d}_{{ \mathcal R }}$ largest probabilities ${p}_{m}^{\downarrow }$ would be the set ${\{{o}_{1}^{\downarrow }{r}_{m}^{\downarrow }\}}_{m}$ . Recall that the maximum probability of information erasure is given by summing over this set, which gives ${o}_{1}^{\downarrow }$ . That is to say, ${p}_{{\varphi }_{1}}^{\mathrm{max}}:= {\displaystyle \sum }_{m=1}^{{d}_{{ \mathcal R }}}{p}_{m}^{\downarrow }\equiv {\displaystyle \sum }_{m=1}^{{d}_{{ \mathcal R }}}{o}_{1}^{\downarrow }{r}_{m}^{\downarrow }={o}_{1}^{\downarrow }$ . This implies that for a non-trivial erasure process, whereby the probability of preparing the object in the state $\left|{\varphi }_{1}\right.\rangle$ is increased, we require that

$\begin{eqnarray}&&\displaystyle \frac{{o}_{1}^{\downarrow }}{{o}_{2}^{\downarrow }}\lt \displaystyle \frac{{r}_{1}^{\downarrow }}{{r}_{{d}_{{ \mathcal R }}}^{\downarrow }}={{\rm{e}}}^{\beta ({\lambda }_{{d}_{{ \mathcal R }}}^{\uparrow }-{\lambda }_{1}^{\uparrow })},\end{eqnarray} \tag{ 2.10 }$

where the equality is a consequence of ${r}_{m}^{\downarrow }:= {{\rm{e}}}^{-\beta {\lambda }_{m}^{\uparrow }}/\mathrm{tr}[{{\rm{e}}}^{-\beta {H}_{{ \mathcal R }}}]$ . Similar arguments were made in [25], although there the focus was on providing a bound on the smallest eigenvalue of ${\rho }_{{ \mathcal O }}^{\prime }$ that could be obtained.

2.3. Minimising the heat dissipation

As the initial state of the reservoir is fixed, the heat dissipation is minimised by lowering the expected energy of the post-transformation marginal state of the reservoir, $\mathrm{tr}[{H}_{{ \mathcal R }}{\rho }_{{ \mathcal R }}^{\prime }]$ . In appendix C.2 we prove that ${\rm{\Delta }}Q$ is minimised by the equivalence class of unitary operators $[{U}_{\mathrm{maj}}^{f}]$ characterised by the rule

$\begin{eqnarray}&&\mathrm{for}\ \mathrm{all}\ m\in \{1,...,{d}_{{ \mathcal R }}\}\ \mathrm{and}\ n\in \{(m-1){d}_{{ \mathcal O }}+1,...,{{md}}_{{ \mathcal O }}\},{U}_{\mathrm{maj}}^{f}\left|{\psi }_{n}\right.\rangle =\left|{\varphi }_{l}^{m}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle ,\end{eqnarray} \tag{ 2.11 }$

with the set $\{\left|{\varphi }_{l}^{m}\right.\rangle | l\in \{1,...,{d}_{{ \mathcal O }}\}\}$ forming an orthonormal basis in ${{ \mathcal H }}_{{ \mathcal O }}$ for each m. A unitary operator from this equivalence class will ensure that ${\rho }_{{ \mathcal R }}^{\prime }$ is passive, and that it majorises any other passive state that could have been prepared. This is done by first maximising the probability of preparing the reservoir in the ground state $\left|{\xi }_{1}\right.\rangle$ , by taking the ${d}_{{ \mathcal O }}$ vectors $\left|{\psi }_{n}\right.\rangle$ , that have the largest probabilities associated with them in the spectral decomposition of ρ, to the product vectors $\left|{\varphi }_{l}^{1}\right.\rangle \otimes \left|{\xi }_{1}\right.\rangle$ . After this, the probability of preparing the reservoir in the next energy state $\left|{\xi }_{2}\right.\rangle$ is maximised in a similar fashion, and so on for all other energy eigenstates.

2.4. Minimal heat dissipation conditional on maximising the probability of information erasure

If we compare the rule that maximises the probability of information erasure, given by equation (4.5), and the rule that minimises the heat dissipation, given by equation (2.11), we notice that they are incompatible. As such, no unitary operator simultaneously exists in both equivalence classes: $[{U}_{\mathrm{maj}}^{g}]\cap [{U}_{\mathrm{maj}}^{f}]=\{\varnothing \}$ . The two tasks are in some sense complementary, and there will be a tradeoff between them. Here, we shall prioritise; a unitary operator will be chosen such that it maximises the probability of information erasure and, given this constraint, minimises the heat dissipation. In other words, we find the equivalence class of unitary operators $[{U}_{\mathrm{opt}}(0)]\subset [{U}_{\mathrm{maj}}^{g}]$ that minimise ${\rm{\Delta }}Q$ . The zero in braces indicates that the error in probability of information erasure, δ, is zero. To this end we first divide the vector of probabilities ${{\boldsymbol{p}}}^{\downarrow }$ to form the non-increasing vector of cardinality ${d}_{{ \mathcal R }}$ , denoted ${{\rm{\Pi }}}_{0}^{\downarrow }$ , and the non-increasing vectors of cardinality ${d}_{{ \mathcal O }}-1$ , denoted $\{{{\rm{\Pi }}}_{m}^{\downarrow }| m\in \{1,...,{d}_{{ \mathcal R }}\}\}$ , defined as

$\begin{eqnarray}{{\rm{\Pi }}}_{0}^{\downarrow } & := & \{{p}_{m}^{\downarrow }| m\in \{1,...,{d}_{{ \mathcal R }}\}\},\\ {{\rm{\Pi }}}_{m\geqslant 1}^{\downarrow } & := & \{{p}_{{d}_{{ \mathcal R }}+(m-1)({d}_{{ \mathcal O }}-1)+l}^{\downarrow }| l\in \{1,...,{d}_{{ \mathcal O }}-1\}\}.\end{eqnarray} \tag{ 2.12 }$

We refer to the mth element of ${{\rm{\Pi }}}_{0}^{\downarrow }$ as ${{\rm{\Pi }}}_{0}^{\downarrow }(m)$ , and the lth element of ${{\rm{\Pi }}}_{m\geqslant 1}^{\downarrow }$ as ${{\rm{\Pi }}}_{m\geqslant 1}^{\downarrow }(l)$ .

In appendix C.3 we prove that the equivalence class of unitary operators that maximise the probability of information erasure and, given this constraint, minimise the heat dissipation, is characterised by the rules

$\begin{eqnarray}{U}_{\mathrm{opt}}(0)\;:\left\{\begin{array}{cc}\left|{\psi }_{n}\right.\rangle \mapsto \left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle & \mathrm{if}\ p({\psi }_{n}| \rho )={{\rm{\Pi }}}_{0}^{\downarrow }(m),\\ \left|{\psi }_{n}\right.\rangle \mapsto \left|{\varphi }_{l}^{m}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle & \mathrm{if}\ p({\psi }_{n}| \rho )={{\rm{\Pi }}}_{m}^{\downarrow }(l)\ \mathrm{and}\ m\geqslant 1,\end{array}\right.\end{eqnarray} \tag{ 2.13 }$

where, for all m, each member of the orthonormal set ${\{\left|{\varphi }_{l}^{m}\right.\rangle \}}_{l}$ is orthogonal to $\left|{\varphi }_{1}\right.\rangle$ .

Effectively, the first line of equation (2.13) conforms with equation (2.9) and hence maximises the probability of information erasure. The orthonormal vectors ${\{\left|{\xi }_{m}^{\prime }\right.\rangle \}}_{m}$ are chosen to be the eigenvectors of the reservoir Hamiltonian, however, in order to minimise the contribution to heat from this line. The second line is an altered version of equation (2.11), thereby minimising the heat dissipation given the constraint posed by the first line. We now make the following observations:

(a)
If we choose $\left|{\varphi }_{l}^{m}\right.\rangle =\left|{\varphi }_{l+1}\right.\rangle$ for all m, and such that ${\{\left|{\varphi }_{l}\right.\rangle \}}_{l}$ are the eigenvectors of the object Hamiltonian ${H}_{{ \mathcal O }}$ in increasing order of energy, then ${U}_{\mathrm{opt}}(0)$ would also ensure that erasure to the ground state $\left|{\varphi }_{1}\right.\rangle$ would be done in such a way that $p({\varphi }_{i}| {\rho }_{{ \mathcal O }}^{\prime })\geqslant p({\varphi }_{j}| {\rho }_{{ \mathcal O }}^{\prime })$ for all $i\lt j;$ the object is brought to a passive state, although this state will in general not be thermal [26]. We refer to this as passive information erasure, and the resultant equivalence class of unitary operators as $[{U}_{\mathrm{opt}}^{{\rm{p}}}(0)]\subset [{U}_{\mathrm{opt}}(0)]$ . These unitary operators will result in the smallest possible ${\rm{\Delta }}E$ , conditional on first maximising the probability of information erasure, and then minimising the heat dissipation; that is to say, $[{U}_{\mathrm{opt}}^{{\rm{p}}}(0)]$ minimises ${\rm{\Delta }}W$ for all unitary operators in the equivalence class $[{U}_{\mathrm{opt}}(0)]$ . Figure 2 shows the matrix representatisson of ${\rho }^{\prime }={U}_{\mathrm{opt}}^{{\rm{p}}}(0)\rho {U}_{\mathrm{opt}}^{{\rm{p}}}{(0)}^{\dagger }$ .
(b)
Since the desired task is the maximisation of $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ , we need not in general maximise ${\rm{\Delta }}S$ because this will lead to a greater amount of heat dissipation than necessary, as per equation(2.6). The only cases where maximisation of $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ necessarily leads to the maximisation of ${\rm{\Delta }}S$ are when: (i) ${p}_{{\varphi }_{1}}^{\mathrm{max}}=1;$ and (ii) where ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{\mathbb{C}}}^{2}$ . In case (i) the entropy of the object is brought to zero, so ${\rm{\Delta }}S$ is trivially maximised. In case (ii), we note that if ${{\boldsymbol{o}}}_{1}^{\downarrow }\succ {{\boldsymbol{o}}}_{2}^{\downarrow }$ , where ${{\boldsymbol{o}}}_{1}^{\downarrow }$ and ${{\boldsymbol{o}}}_{2}^{\downarrow }$ are the spectra of ${\rho }_{{ \mathcal O }}^{1}$ and ${\rho }_{{ \mathcal O }}^{2}$ respectively, then $S({\rho }_{{ \mathcal O }}^{1})\leqslant S({\rho }_{{ \mathcal O }}^{2})$ . If we maximise $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ in the case of ${ \mathcal O }$ being a two-level system, this will necessarily minimise $p({\varphi }_{2}| {\rho }_{{ \mathcal O }}^{\prime })$ , which in turn will result in the spectrum of ${\rho }_{{ \mathcal O }}^{\prime }$ to majorise all possible spectra. Consequently, $S({\rho }_{{ \mathcal O }}^{\prime })$ will be minimised, and hence ${\rm{\Delta }}S$ will be maximised.However, one can always say that maximising the probability of information erasure requires that we minimise the min-entropy, S_min, defined as
$\begin{eqnarray}&&{S}_{\mathrm{min}}(\rho ):= \underset{i}{\mathrm{min}}\{-\mathrm{log}({p}_{i})\},\end{eqnarray} \tag{ 2.14 }$
where ${\{{p}_{i}\}}_{i}$ is the spectrum of ρ [38]. The min-entropy is clearly given by the largest value in the spectrum. To minimise the min-entropy, therefore, we must maximise the largest value in the spectrum; this is the definition of maximising the probability of information erasure.
(c)
The only instance where ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{ \mathcal H }}_{{ \mathcal R }}\simeq {{\mathbb{C}}}^{d}$ , and ${U}_{\mathrm{opt}}^{{\rm{p}}}(0)$ for passive, maximally probable information erasure is a swap operation, is when d = 2. For larger dimensions, this is no longer the case.
(d)
It is evident that ${\rho }_{{ \mathcal R }}^{\prime }$ is diagonal with respect to the eigenbasis of ${H}_{{ \mathcal R }}$ , and that the spectrum of ${\rho }_{{ \mathcal R }}^{\prime }$ is non-increasing with respect to the eigenvalues of ${H}_{{ \mathcal R }}$ . In other words, ${\rho }_{{ \mathcal R }}^{\prime }$ is a passive state. However, its spectrum is majorised by that of ${\rho }_{{ \mathcal R }}(\beta )$ . As such, by corollary A.1, ${\rm{\Delta }}Q\geqslant 0$ . This conforms with Landauer's principle that information erasure must dissipate heat.

**Figure 1.** The object ${ \mathcal O }$ with Hilbert space ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal O }}}$ and thermal reservoir ${ \mathcal R }$ with Hilbert space ${{ \mathcal H }}_{{ \mathcal R }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal R }}}$ . The eigenbasis of the reservoir Hamiltonian ${H}_{{ \mathcal R }}$ is ${\{\left|{\xi }_{m}\right.\rangle \}}_{m}$ , with the vector numbering being in order of increasing energy. The eigenbasis with respect to which the object is initially diagonal is ${\{\left|{\varphi }_{n}\right.\rangle \}}_{n}$ .
Download figure:
Standard image High-resolution image

**Figure 1.** The object ${ \mathcal O }$ with Hilbert space ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal O }}}$ and thermal reservoir ${ \mathcal R }$ with Hilbert space ${{ \mathcal H }}_{{ \mathcal R }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal R }}}$ . The eigenbasis of the reservoir Hamiltonian ${H}_{{ \mathcal R }}$ is ${\{\left|{\xi }_{m}\right.\rangle \}}_{m}$ , with the vector numbering being in order of increasing energy. The eigenbasis with respect to which the object is initially diagonal is ${\{\left|{\varphi }_{n}\right.\rangle \}}_{n}$ .
Download figure:
Standard image High-resolution image

**Figure 2.** (a) The partitioning of ${{\boldsymbol{p}}}^{\downarrow }$ , the vector of eigenvalues of ρ arranged in a non-increasing order, into the vectors ${{\rm{\Pi }}}_{0}^{\downarrow }$ and ${{\rm{\Pi }}}_{m\geqslant 1}^{\downarrow }$ . (b) The density operator ${\rho }^{\prime }:= {U}_{\mathrm{opt}}^{{\rm{p}}}(0)\rho {U}_{\mathrm{opt}}^{{\rm{p}}}{(0)}^{\dagger }$ , in matrix representation, where ${U}_{\mathrm{opt}}^{{\rm{p}}}(0)$ is the optimal unitary operator for passive, maximally probable information erasure. The post-transformation marginal state of the object, ${\rho }_{{ \mathcal O }}^{\prime }$ , is passive. It is also the least energetic passive state that is possible to prepare, given the constraints: (i) $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })={p}_{{\varphi }_{1}}^{\mathrm{max}};$ and (ii) ${\rm{\Delta }}Q$ is minimal given (i).
Download figure:
Standard image High-resolution image

**Figure 2.** (a) The partitioning of ${{\boldsymbol{p}}}^{\downarrow }$ , the vector of eigenvalues of ρ arranged in a non-increasing order, into the vectors ${{\rm{\Pi }}}_{0}^{\downarrow }$ and ${{\rm{\Pi }}}_{m\geqslant 1}^{\downarrow }$ . (b) The density operator ${\rho }^{\prime }:= {U}_{\mathrm{opt}}^{{\rm{p}}}(0)\rho {U}_{\mathrm{opt}}^{{\rm{p}}}{(0)}^{\dagger }$ , in matrix representation, where ${U}_{\mathrm{opt}}^{{\rm{p}}}(0)$ is the optimal unitary operator for passive, maximally probable information erasure. The post-transformation marginal state of the object, ${\rho }_{{ \mathcal O }}^{\prime }$ , is passive. It is also the least energetic passive state that is possible to prepare, given the constraints: (i) $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })={p}_{{\varphi }_{1}}^{\mathrm{max}};$ and (ii) ${\rm{\Delta }}Q$ is minimal given (i).
Download figure:
Standard image High-resolution image

2.5. The tradeoff between probability of information erasure and minimal heat dissipation

We would now like to relax the condition of maximising the probability of information erasure, and allow the error δ to take non-zero values. The question we would now like to ask is: How will the minimal achievable ${\rm{\Delta }}Q$ be affected by varying δ, and how may we then characterise the equivalence class of optimal unitary operators $[{U}_{\mathrm{opt}}^{{\rm{p}}}(\delta )]$ ? The answer for the extremal cases is trivial; we have already addressed the case of $\delta =0$ in section 2.4, and when $\delta ={p}_{{\varphi }_{1}}^{\mathrm{max}}-{o}_{1}^{\downarrow }$ , then $[{U}_{\mathrm{opt}}^{{\rm{p}}}(\delta )]={\mathbb{1}}$ and ${\rm{\Delta }}Q=0$ . In appendix C.4 we prove that the algorithm of sequential swaps, shown in figure 3, will result in a non-increasing sequence of errors, ${{\boldsymbol{\delta }}}^{\downarrow }:= {\{{\delta }_{j}^{\downarrow }\}}_{j}$ , commensurate with a non-decreasing sequence of heat, ${\boldsymbol{\Delta }}{{\boldsymbol{Q}}}^{\uparrow }:= {\{{\rm{\Delta }}{Q}_{j}^{\uparrow }\}}_{j}$ . For each error ${\delta }_{j}^{\downarrow }$ , the associated value of heat ${\rm{\Delta }}{Q}_{j}^{\uparrow }$ will be minimal. Furthermore, the marginal state of the object, ${\rho }_{{ \mathcal O }}^{\prime }$ , will always be passive. Each swap operation acts on a subspace spanned by $\{\left|{\varphi }_{i}\right.\rangle ,\left|{\varphi }_{j}\right.\rangle \}\otimes \{\left|{\xi }_{k}\right.\rangle ,\left|{\xi }_{l}\right.\rangle \}$ . As the state is initially diagonal with respect to the basis ${\{\left|{\varphi }_{l}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle \}}_{l,m}$ , and swap operations only permute the probabilities in the state's spectrum, the composite system will always be diagonal with respect to this basis at every stage of the algorithm.

**Figure 3.** A sequence of swap operations that results in a non-increasing sequence of errors, ${{\boldsymbol{\delta }}}^{\downarrow }$ , commensurate with a non-decreasing sequence of minimal heat ${\boldsymbol{\Delta }}{{\boldsymbol{Q}}}^{\uparrow }$ . At each stage of the algorithm, the probability associated with the vector $\left|{\varphi }_{i}\right.\rangle \otimes \left|{\xi }_{j}\right.\rangle$ is denoted as ${p}_{i,j}$ .
Download figure:
Standard image High-resolution image

**Figure 3.** A sequence of swap operations that results in a non-increasing sequence of errors, ${{\boldsymbol{\delta }}}^{\downarrow }$ , commensurate with a non-decreasing sequence of minimal heat ${\boldsymbol{\Delta }}{{\boldsymbol{Q}}}^{\uparrow }$ . At each stage of the algorithm, the probability associated with the vector $\left|{\varphi }_{i}\right.\rangle \otimes \left|{\xi }_{j}\right.\rangle$ is denoted as ${p}_{i,j}$ .
Download figure:
Standard image High-resolution image

Figure 4 depicts this process for the case where ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{ \mathcal H }}_{{ \mathcal R }}\simeq {{\mathbb{C}}}^{3}$ , with ${\rho }_{{ \mathcal O }}=\frac{1}{3}{{\mathbb{1}}}_{{ \mathcal O }}$ . Here the diagonal entries of the density operator ${\rho }^{\prime }$ are shown in each column, with the first column from the right representing the initial state, and the final column representing the case of passive, maximally probable information erasure. The algorithm for reducing error by increasing heat moves from right to left, as shown by the arrows. The elements surrounded by dashed circles, and coloured in red, are those which must be swapped to decrease δ, with the resultant diagonal elements of the new state shown to the left.

**Figure 4.** The diagonal elements of ${\rho }^{\prime }:= {U}_{\mathrm{opt}}^{{\rm{p}}}(\delta )\rho {U}_{\mathrm{opt}}^{{\rm{p}}}{(\delta )}^{\dagger }$ , for $\rho =\displaystyle \frac{1}{3}{{\mathbb{1}}}_{{ \mathcal O }}\otimes {\rho }_{{ \mathcal R }}(\beta )$ , resulting in $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })={p}_{{\varphi }_{1}}^{\mathrm{max}}-\delta$ . ${\rm{\Delta }}Q$ is minimised and ${\rho }_{{ \mathcal O }}^{\prime }$ is passive with the smallest average energy possible given this constraint. Here ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{ \mathcal H }}_{{ \mathcal R }}\simeq {{\mathbb{C}}}^{3}$ , and ${\{{\delta }_{j}^{\downarrow }\}}_{j}$ is a non-increasing sequence of errors. The elements inside a dashed circle (red online) are those which must be swapped to move from ${\delta }_{j}^{\downarrow }$ to ${\delta }_{j+1}^{\downarrow }$ .
Download figure:
Standard image High-resolution image

To allow for a continuous change in δ, we need to generalise the swap operation to an entangling swap. That is to say, for the vectors $\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{i}\right.\rangle$ and $\left|{\varphi }_{l}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle$ , and the real number $\gamma \in [0,1]$ , we define

$\begin{eqnarray}&&{\mathrm{SW}}_{\gamma }\;:\left\{\begin{array}{c}\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{i}\right.\rangle \mapsto \sqrt{1-\gamma }\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{i}\right.\rangle +\sqrt{\gamma }\left|{\varphi }_{l}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle ,\\ \left|{\varphi }_{l}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle \mapsto \sqrt{\gamma }\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{i}\right.\rangle -\sqrt{1-\gamma }\left|{\varphi }_{l}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle .\end{array}\right.\end{eqnarray} \tag{ 2.15 }$

Therefore, ${\mathrm{SW}}_{0}={\mathbb{1}}$ and as $\gamma \to 1$ , ${\mathrm{SW}}_{\gamma }$ converges to the swap operation. Hence, for any error $\delta \in ({\delta }_{j}^{\downarrow },{\delta }_{j+1}^{\downarrow })$ , the optimal unitary operator ${U}_{\mathrm{opt}}^{p}(\delta )$ would be given by following the algorithm for discrete errors up to ${\delta }_{j}^{\downarrow }$ , and then replacing the swap operation which would give the error ${\delta }_{j+1}^{\downarrow }$ with the entangling swap operation defined above, with an appropriate choice of γ. This will ensure for a continuous decrease in δ and a continuous increase in ${\rm{\Delta }}Q$ .

3. Examples: erasing a fully mixed qubit with maximal probability of success

We shall now consider the erasure of a qubit, with Hilbert space ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{\mathbb{C}}}^{2}$ . We are also interested in examining the scenario where no a priori information about the state of the object is known; the probabilities ${o}_{1}^{\downarrow }$ and ${o}_{2}^{\downarrow }$ are both one-half. For simplicity, we make the substitution ${d}_{{ \mathcal R }}\equiv d$ for the dimension of the reservoir's Hilbert space. The action of the optimal unitary operator for passive, maximally probable information erasure, would therefore be such that the diagonal elements of ${\rho }^{\prime }$ , as depicted in figure 2(b) and from top to bottom in decreasing order, are the probabilities ${{\boldsymbol{p}}}^{\downarrow }\equiv \{\frac{{r}_{1}^{\downarrow }}{2},\frac{{r}_{1}^{\downarrow }}{2},...,\frac{{r}_{d}^{\downarrow }}{2},\frac{{r}_{d}^{\downarrow }}{2}\}$ . We will consider two models for the reservoir:

(a)
A ladder system.The ground state of the system has an energy of zero, and for every m,
$\begin{eqnarray}&&\langle {\xi }_{m+1}| {H}_{{ \mathcal R }}| {\xi }_{m+1}\rangle -\langle {\xi }_{m}| {H}_{{ \mathcal R }}| {\xi }_{m}\rangle =\omega .\end{eqnarray} \tag{ 3.1 }$
The mth energy of such a system, in increasing order, is given as ${\lambda }_{m}^{\uparrow }=\omega (m-1)$ . The Hamiltonian has the operator norm $\parallel {H}_{{ \mathcal R }}\parallel ={\lambda }_{d}^{\uparrow }=\omega (d-1)$ which grows with d. In the limit as d tends to infinity, this system will be a harmonic oscillator of frequency ω, with a spectrum bounded from below by zero, and unbounded from above.
(b)
A chain of spin-half systems, with nearest-neighbour interactions, that are under a linear magnetic field gradient.Here, the reservoir has the Hilbert space ${{ \mathcal H }}_{{ \mathcal R }}={\displaystyle \otimes }_{k=1}^{N}{{ \mathcal H }}_{k}$ , with ${{ \mathcal H }}_{k}\simeq {{\mathbb{C}}}^{2}$ for all k. The Hamiltonian is
$\begin{eqnarray}&&{H}_{{ \mathcal R }}=\displaystyle \sum _{k=1}^{N}(k{\rm{\Theta }}){\sigma }_{z}^{k}+J\displaystyle \sum _{k=1}^{N-1}\displaystyle \sum _{a\in \{x,y,z\}}{\sigma }_{a}^{k}\otimes {\sigma }_{a}^{k+1},\end{eqnarray} \tag{ 3.2 }$
where $\{{\sigma }_{a}| a\in \{x,y,z\}\}$ are the Pauli operators. The operator ${\sigma }_{a}^{k}$ acts nontrivially only on Hilbert space ${{ \mathcal H }}_{k}$ . The parameters ${\rm{\Theta }}\in {{\mathbb{R}}}^{+}$ and $J\in {{\mathbb{R}}}^{+}$ represent, respectively, an effective magnetic field gradient in the z-axis and the nearest-neighbour spin-spin coupling strength. This Hamiltonian conserves the total magnetisation, $\sum }_{k}{\sigma }_{z}^{k$ .

For each reservoir, we wish to determine how much heat is dissipated in excess of the improved lower bound of Landauer's inequality, determined in [25], given as

$\begin{eqnarray}&&{\rm{\Delta }}L:= {\rm{\Delta }}Q-\displaystyle \frac{1}{\beta }\left({\rm{\Delta }}S+\displaystyle \frac{2{({\rm{\Delta }}S)}^{2}}{{\mathrm{log}}^{2}(d-1)+4}\right).\end{eqnarray} \tag{ 3.3 }$

We use the simple form of this lower bound, which is not tight.

3.1. Comparison of reservoirs given unitary evolution

Figure 5(a) demonstrates the dependence of ${\rm{\Delta }}L$ and ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ on β and d, when the reservoir is a ladder system with a fixed frequency $\omega =1$ . Figures 5(b)–(d) depict the dependence of ${\rm{\Delta }}L$ and ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ on Θ, J, and β when the reservoir is a spin chain of length N. When varying any of these, the other two are left constant at the value of one. We now make the following observations:

(a)
When the reservoir is a ladder system, an increase in d increases ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ and also, generally, ${\rm{\Delta }}L$ , for all finite temperatures. In the limit as β tends to infinity, ${\rho }_{{ \mathcal R }}(\beta )=| {\xi }_{1}\rangle \langle {\xi }_{1}|$ and ${U}_{\mathrm{opt}}^{{\rm{p}}}(0)$ effects a swap map in the subspace $\{\left|{\varphi }_{1}\right.\rangle ,\left|{\varphi }_{2}\right.\rangle \}\otimes \{\left|{\xi }_{1}\right.\rangle ,\left|{\xi }_{2}\right.\rangle \}$ . As such, in this limit ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ and ${\rm{\Delta }}Q$ tend to unity and $\omega /2$ respectively.
(b)
When the reservoir is a spin chain, as N increases, so does ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ . This can be explained by noting that ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ grows with ${\lambda }_{d}^{\uparrow }-{\lambda }_{1}^{\uparrow }$ , which is always greater than or equal to $2{\rm{\Theta }}{\displaystyle \sum }_{k=1}^{N}k$ .
(c)
For spin chains of even length, in the limit of large J, ${\rm{\Delta }}L$ quickly diverges. However, there is some critical value of J for odd-length chains such that an increase in J beyond this drastically reduces the rate at which ${\rm{\Delta }}L$ increases.
(d)
The best case scenario is when the reservoir is a long chain, with ${\rm{\Theta }}\lt J,\beta$ and $J\sim \beta$ . For example, for a chain of eleven spins, with ${\rm{\Theta }}=0.25$ , and $J=\beta =1$ , we get ${p}_{{\varphi }_{1}}^{\mathrm{max}}\approx 1$ while ${\rm{\Delta }}L\approx 0.12$ . Compare this with the case where the reservoir is given by a ladder system of dimension $d={2}^{11}$ and $\beta =1$ . Here, in order to achieve the same value of ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ , realised when $\omega \approx 0.1$ , we get ${\rm{\Delta }}L\approx 0.29$ .

3.2. Comparison of reservoirs under energy-conserving, Markovian dephasing channels

Before this juncture, we have considered the active element of erasure—the unitary operator—as a bijection between two orthonormal basis sets. To consider this as a bona fide dynamical process we must conceive of the time-ordered sequence $\{{H}_{k}| k\in \{1,...,N\}\}$ , where H_k is the Hamiltonian of the composite system ${ \mathcal O }+{ \mathcal R }$ in the time period $t\in ({t}_{k-1},{t}_{k})$ . If the system is thermally isolated, then this will be accompanied by the time-ordered sequence of unitary operators $\{{U}_{{\rm{\Delta }}{t}_{k}}={{\rm{e}}}^{-{\rm{i}}{\rm{\Delta }}{t}_{k}{H}_{k}}| k\in \{1,...,N\}\}$ where the time duration is defined as ${\rm{\Delta }}{t}_{k}:= {t}_{k}-{t}_{k-1}$ . The time-ordered application of these results in the unitary operator U_τ, where $\tau ={t}_{N}-{t}_{0}$ , which determines the total evolution of the system. If we identify ${H}_{0}:= {H}_{{ \mathcal O }}+{H}_{{ \mathcal R }}$ as the Hamiltonian of the system at times prior to t₀ and posterior to t_N, whereby the new sequence of Hamiltonians can be aptly called a Hamiltonian cycle, then

$\begin{eqnarray}&&{\rm{\Delta }}Q=\mathrm{tr}[{H}_{{ \mathcal R }}({\mathrm{tr}}_{{ \mathcal O }}[{U}_{\tau }\rho {U}_{\tau }^{\dagger }]-{\rho }_{{ \mathcal R }}(\beta ))],\end{eqnarray} \tag{ 3.4 }$

will refer to the amount by which the average energy of the reservoir, at times $t\gt {t}_{N}$ , will be greater than that at times $t\lt {t}_{0}$ , and will have the same meaning as the heat term in equation (2.4). Implicit in this framework is the notion that changing the Hamiltonian acting on the system will take energy from, or put energy into, a work storage device which we do not account for explicitly. If a non-unitary evolution is effected, however, we cannot in general make such an identification. This is because a general completely positive, trace preserving map can always be conceived, via Stinespring's dilation theorem [39], as resulting from a unitary evolution on the system coupled with an environment. Indeed, the energy consumption in such a case will be determined by the total Hamiltonian of the system plus the environment. If energy is allowed to flow between the system and environment, then the energy increase of ${ \mathcal R }$ (plus the energy increase in ${ \mathcal O }$ ) will not be identical to the energy consumed from the work storage device; ${\rm{\Delta }}Q$ may be less or greater than the energy lost.

The only exception to this rule is when the unitary evolution between system and environment conserves the energy of the two individually, whereby no energy is transferred amongst them. This will result in the system to undergo pure dephasing with respect to the (time-local) Hamiltonian eigenbasis; we refer to such a generalised evolution as an energy conserving one. The simplest realisation of such a scenario would require us to consider the sequence of Hamiltonians to be accompanied by the time-ordered sequence of super-operators $\{{{\rm{e}}}^{{\rm{\Delta }}{t}_{k}{{\mathcal{L}}}_{k}}| k\in \{1,...,N\}\}$ , with the Liouville super-operators ${{\mathcal{L}}}_{k}$ defined as

$\begin{eqnarray}&&{{\mathcal{L}}}_{k}\;:\rho \mapsto {\rm{i}}{[\rho ,{H}_{k}]}_{-}+{\rm{\Gamma }}\displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}d}\left(| {\phi }_{n}^{k}\rangle \langle {\phi }_{n}^{k}| \rho | {\phi }_{n}^{k}\rangle \langle {\phi }_{n}^{k}| -\displaystyle \frac{1}{2}{[\rho ,| {\phi }_{n}^{k}\rangle \langle {\phi }_{n}^{k}| ]}_{+}\right),\end{eqnarray} \tag{ 3.5 }$

where ${\{\left|{\phi }_{n}^{k}\right.\rangle \}}_{n}$ is the eigenbasis of H_k, while ${[\cdot ,\cdot ]}_{-}$ and ${[\cdot ,\cdot ]}_{+}$ are the commutator and anti-commutator respectively, and ${\rm{\Gamma }}\in [0,\infty )$ is the dephasing rate. In each time period $t\in ({t}_{k-1},{t}_{k})$ the system evolves as $\rho \mapsto ({{\rm{e}}}^{{\rm{\Delta }}{t}_{k}{{\mathcal{L}}}_{k}})(\rho )$ while conserving H_k; the system evolves by energy conserving, Markovian dephasing channels. As such channels are unital, they will cause the consequent heat dissipation to increase in proportion to the entropy reduction in the object; energy conserving, Markovian dephasing will be detrimental to the erasure process [25].

For our two models, we will consider the simplest Hamiltonian cycle where the sequence of Hamiltonians sandwiched by H₀ is the singleton $\{{H}_{1}\}$ . Furthermore, we set ${U}_{\tau }={{\rm{e}}}^{-{\rm{i}}\tau {H}_{1}}$ to equal ${U}_{\mathrm{opt}}^{{\rm{p}}}(0)$ , as determined by the sequential swap algorithm given in section 2.5, when $\tau =1$ . Now, we let the system evolve instead as $\rho \mapsto ({{\rm{e}}}^{\tau {{\mathcal{L}}}_{1}})(\rho )$ . By again evolving the system for a period of $\tau =1$ , we may ascertain how such an environmental interaction affects both the probability of qubit erasure, and the heat dissipation.

Figure 6(a) shows the effect of dephasing on the erasure process, when the reservoir is a spin chain of length $N\in \{2,3,4,5\}$ , with ${\rm{\Theta }}=J=\beta =1$ . Similarly, figure 6(b) shows the effect of dephasing on the erasure process, when the reservoir is a ladder system with $\omega =1$ and $\beta =5$ . In both instances, an increase in Γ results in a decrease in $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ and, with the exception of N = 2 and $d\leqslant 4$ , an increase in ${\rm{\Delta }}Q$ . However, not all ladder dimensions d, or spin chain lengths N, are affected the same way.

**Figure 6.** (a) and (b) show the effect of dephasing rate Γ on ${\rm{\Delta }}Q$ and $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ . The system is evolved for time $\tau =1$ . (a) The reservoir is given by a spin chain of length N and where all the parameters are set to one. (b) The reservoir is given by a ladder system, with energy spacing $\omega =1$ , and inverse temperature $\beta =5$ . (c) and (d) show, respectively, the effect of ladder system dimension d on ${\rm{\Delta }}Q$ and $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ , at a constant value of ${\rm{\Gamma }}=1$ . It appears that for dimensions $d={2}^{n}$ , with $n\in {\mathbb{N}}$ and $n\geqslant 2$ , ${\rm{\Delta }}Q$ is smaller than that of all larger dimension values, while $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ take the largest global values. In other words, the ladder system is most robust to energy conserving, Markovian dephasing, when it is dimensionally equivalent to a spin chain.
Download figure:
Standard image High-resolution image

We note that when the two reservoirs are dimensionally equivalent, i.e., when the ladder system has dimensions $d\in \{{2}^{2},{2}^{3},{2}^{4},{2}^{5}\}$ , commensurate with spin chains of length $N\in \{2,3,4,5\}$ , they display the same behaviour under energy conserving, Markovian dephasing channels. This is because the generator of their evolution, the Liouville super-operator ${{\mathcal{L}}}_{1}$ , is the same in such cases. In both instances, an increase in dimension leads to an increase in ${\rm{\Delta }}Q$ , while the probability of qubit erasure increases as we move from $d={2}^{2}$ to $d={2}^{3}$ , decreasing again as we increase further still to $d={2}^{4}$ and $d={2}^{5}$ .

What is most striking, however, is that the ladder system seems to perform the best precisely when it is dimensionally equivalent to a spin chain. Consider figures 6(c) and (d). Here, ${\rm{\Delta }}Q$ and $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ are calculated for dimensions $d\in \{2,...,32\}$ , while keeping all other parameters constant. For dimensions $d\in \{{2}^{2},{2}^{3},{2}^{4},{2}^{5}\}$ , we observe that ${\rm{\Delta }}Q$ is smaller than that for all larger d, while $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ attain the largest global values. As such, we make the following conjecture:

Conjecture 3.1. Let the reservoir be given by a d-dimensional ladder system with a constant energy spacing ω. In the presence of energy conserving, Markovian dephasing, reservoirs with $d={2}^{n}$ , with $n\in {\mathbb{N}}$ and $n\geqslant 2$ , allow for the largest global probabilities of qubit erasure while, at the same time, dissipating less heat than all such reservoirs of larger dimension.

3.3. Full erasure of a qudit with a harmonic oscillator

Here, we expound on the example of using a ladder system as a reservoir, but consider what happens as we take the limit of infinitely large d. In this limit we may call the ladder system a harmonic oscillator. Let us first consider the case where the object is a qudit, with Hilbert space ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal O }}}$ , prepared in the maximally mixed state

$\begin{eqnarray}&&{\rho }_{{ \mathcal O }}=\displaystyle \frac{1}{{d}_{{ \mathcal O }}}\displaystyle \sum _{l=1}^{{d}_{{ \mathcal O }}}| {\varphi }_{l}\rangle \langle {\varphi }_{l}| .\end{eqnarray} \tag{ 3.6 }$

In appendix C.5 we show that the heat dissipation when the reservoir is a harmonic oscillator is

$\begin{eqnarray}&&\underset{d\to \infty }{\mathrm{lim}}{\rm{\Delta }}Q=\displaystyle \frac{\omega ({d}_{{ \mathcal O }}-1)}{2}\mathrm{coth}\left(\displaystyle \frac{\beta \omega }{2}\right)\gt \displaystyle \frac{({d}_{{ \mathcal O }}-1)}{\beta }.\end{eqnarray} \tag{ 3.7 }$

${\rm{\Delta }}Q$ approaches $({d}_{{ \mathcal O }}-1){k}_{B}T$ in the limit as ω becomes vanishingly small, whereby the spectrum of ${H}_{{ \mathcal R }}$ will be approximately continuous.

Now let us focus on the case where the object is a qubit, but with an initial bias in its spectrum:

$\begin{eqnarray}&&{\rho }_{{ \mathcal O }}=q| {\varphi }_{1}\rangle \langle {\varphi }_{1}| +(1-q)| {\varphi }_{2}\rangle \langle {\varphi }_{2}| ,q\in \left[\displaystyle \frac{1}{2},1\right).\end{eqnarray} \tag{ 3.8 }$

In appendix C.5.1 we show that, in the limit as ω tends to zero, ${\rm{\Delta }}Q$ will be

$\begin{eqnarray}&&{\rm{\Delta }}Q=\displaystyle \frac{2q(1-q)\mathrm{log}(\displaystyle \frac{q}{1-q})}{\beta (2q-1)}.\end{eqnarray} \tag{ 3.9 }$

In the limit as q tends to one-half, ${\rm{\Delta }}Q$ approaches ${k}_{B}T$ as in our previous analysis. The concomitant entropy reduction is, of course, always

$\begin{eqnarray}&&{\rm{\Delta }}S=q\mathrm{log}\left(\displaystyle \frac{1}{q}\right)+(1-q)\mathrm{log}\left(\displaystyle \frac{1}{1-q}\right).\end{eqnarray} \tag{ 3.10 }$

By defining the function

$\begin{eqnarray}&&\underline{\underline{{\rm{\Delta }}}}:= \beta {\rm{\Delta }}Q-{\rm{\Delta }}S,\end{eqnarray} \tag{ 3.11 }$

as shown in figure 7, it is evident that except for the trivial case of q = 1, commensurate with ${\rm{\Delta }}S={\rm{\Delta }}Q=0$ , the heat dissipation will exceed Landauer's limit .

**Figure 7.** The difference between $\beta {\rm{\Delta }}Q$ and ${\rm{\Delta }}S$ , denoted $\underline{\underline{{\rm{\Delta }}}}$ , as a function of the initial bias in the qubit state, q. The two coincide only in the trivial case of q = 1, commensurate with ${\rm{\Delta }}S={\rm{\Delta }}Q=0$ .
Download figure:
Standard image High-resolution image

**Figure 8.** The augmentation of the basic setup by the inclusion of a third, auxiliary system ${ \mathcal A }$ with Hilbert space ${{ \mathcal H }}_{{ \mathcal A }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal A }}}$ . As before, the reservoir is initially in a thermal state and uncorrelated from the rest of the system. The initial state of the object and auxiliary, however, may or may not be correlated.
Download figure:
Standard image High-resolution image

**Figure 8.** The augmentation of the basic setup by the inclusion of a third, auxiliary system ${ \mathcal A }$ with Hilbert space ${{ \mathcal H }}_{{ \mathcal A }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal A }}}$ . As before, the reservoir is initially in a thermal state and uncorrelated from the rest of the system. The initial state of the object and auxiliary, however, may or may not be correlated.
Download figure:
Standard image High-resolution image

4. Information erasure beyond Landauer's framework

In section 2 the setup for information erasure had the compound system of object and thermal reservoir—our system of interest—as a thermally isolated quantum system whose constituent parts are initially uncorrelated. The system then undergoes a cyclic process described by a unitary operator, and the average energy increase of the reservoir is defined as heat. Indeed, these are the basic assumptions under which Landauer's principle holds. To achieve heat dissipation lower than that discussed in section 2 we must operate outside of Landauer's framework by abandoning some of these assumptions. However, dissipating less heat than Landauer's limit will become meaningless if there is no referent of heat or temperature in the mathematical model. As such, if we wish to avoid making category errors, there are restrictions on the ways in which we may change our assumptions. That is to say, the model must continue to involve a system that is initially prepared in a Gibbs state that is uncorrelated from any other system considered. This way, the system has a well-defined temperature, and we may continue to consider its energy increase as heat. In addition, the process must still be cyclic, i.e., the Hamiltonian of the total system—in particular the thermal system—must be the same at the end of the process, as it was at the beginning. If this condition is not satisfied, we may observe any value of heat we desire by appropriately changing the final Hamiltonian.

One option available is to move beyond unitary evolution. This can be achieved by introducing an auxiliary system to the setup introduced in section 2 so that the unitary evolution of the totality results in the object and reservoir to evolve non-unitarily; the auxiliary system must also have a trivial Hamiltonian, proportional to the identity, for the resultant decrease in ${\rm{\Delta }}Q$ to always translate to a decrease in energy consumption. Although the reservoir must always be uncorrelated from the other subsystems for it to be thermalised relative to them [40], the auxiliary system and object may have initial correlations. Unless these correlations are classical, then the resulting dynamics of the object plus reservoir subsystem would cease to be described by completely positive maps [41, 42].

The other option available is to first consider a system that is in a thermal state and, therefore, has a temperature. Subsequently, the system may be (conceptually) partitioned into two correlated subsystems, with one of them taking the role of the object. The energy generation due to information erasure of the object, of course, must then be determined over the total system itself. This is because the subsystems do not have well defined Hamiltonians. Although there is technically no thermal reservoir to speak of, since the total system was initially thermal, the average energy change thereof may still be called heat in a consistent manner as before.

4.1. Information erasure with the aid of an auxiliary system

Consider a system composed of: the object, ${ \mathcal O }$ , with Hilbert space ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal O }}};$ the auxiliary system, ${ \mathcal A }$ , with Hilbert space ${{ \mathcal H }}_{{ \mathcal A }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal A }}};$ and the thermal reservoir, ${ \mathcal R }$ , with Hilbert space ${{ \mathcal H }}_{{ \mathcal R }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal R }}}$ . Let the initial state of the system be $\rho \otimes {\rho }_{{ \mathcal R }}(\beta )$ , with ρ the state of ${ \mathcal O }\;+\;{ \mathcal A }$ and ${\rho }_{{ \mathcal R }}(\beta )$ the Gibbs state of the thermal reservoir. This setup is represented diagramatically in figure 8. We may (probabilistically) prepare the object in a pure state by conducting a cyclic process on the total system, characterised by a unitary operator, as before. By letting the Hamiltonian of the auxiliary system, ${H}_{{ \mathcal A }}$ , be proportional to the identity, we may ensure that the total energy consumption due to this process would be accounted for by the energy change of the object and thermal reservoir alone. As before, the energy change of the thermal reservoir, ${\rm{\Delta }}Q$ , is heat.

In the extreme case, we may consider that the unitary operator acts non-trivially only on the object plus auxiliary subsystem; the thermal reservoir will thus not be involved, and no heat will be dissipated. We would like to know what the necessary and sufficient conditions for fully erasing the object would be in this case. The mapping ${\rho }_{{ \mathcal O }}\mapsto {\rho }_{{ \mathcal O }}^{\prime }:= {\mathrm{tr}}_{{ \mathcal A }}[U\rho {U}^{\dagger }]$ , with U acting on ${{ \mathcal H }}_{{ \mathcal O }}\otimes {{ \mathcal H }}_{{ \mathcal A }}$ , will fully erase ${ \mathcal O }$ into the pure state $\left|{\varphi }_{1}\right.\rangle$ if and only if

$\begin{eqnarray}&&U\rho {U}^{\dagger }=| {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes \displaystyle \sum _{n=1}^{{R}_{{ \mathcal A }}\leqslant {d}_{{ \mathcal A }}}{q}_{n}^{\downarrow }| {\phi }_{n}\rangle \langle {\phi }_{n}| ,\end{eqnarray} \tag{ 4.1 }$

where ${R}_{{ \mathcal A }}$ is the rank of ${\rho }_{{ \mathcal A }}^{\prime }$ and, hence, the rank of ρ. The class of states that allow for such a transformation can, without loss of generality, be represented as

$\begin{eqnarray}&&\rho =\displaystyle \sum _{n=1}^{{R}_{{ \mathcal A }}\leqslant {d}_{{ \mathcal A }}}{q}_{n}^{\downarrow }{U}^{\dagger }(| {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes | {\phi }_{n}\rangle \langle {\phi }_{n}| )U.\end{eqnarray} \tag{ 4.2 }$

Therefore, a necessary and sufficient condition for full information erasure by unitary evolution, without using the thermal reservoir, is for the rank of ρ to be less than, or equal to, ${d}_{{ \mathcal A }}$ . To see how correlations between ${ \mathcal O }$ and ${ \mathcal A }$ come into play, consider the simple case where ${d}_{{ \mathcal O }}={d}_{{ \mathcal A }}=2$ and, for $\lambda \in (1/2,1)$ , the following states:

$\begin{eqnarray}{\rho }_{{\rm{u}}.{\rm{c}}.} & = & (\lambda | {\varphi }_{1}\rangle \langle {\varphi }_{1}| +(1-\lambda )| {\varphi }_{2}\rangle \langle {\varphi }_{2}| )\otimes | {\phi }_{1}\rangle \langle {\phi }_{1}| ,\\ {\rho }_{{\rm{c}}.{\rm{c}}.} & = & \lambda | {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes | {\phi }_{1}\rangle \langle {\phi }_{1}| +(1-\lambda )| {\varphi }_{2}\rangle \langle {\varphi }_{2}| \otimes | {\phi }_{2}\rangle \langle {\phi }_{2}| ,\\ {\rho }_{{\rm{q}}.{\rm{d}}.} & = & \lambda | {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes | {\phi }_{1}\rangle \langle {\phi }_{1}| +(1-\lambda )| {\varphi }_{2}\rangle \langle {\varphi }_{2}| \otimes | {\phi }_{+}\rangle \langle {\phi }_{+}| ,\ \ \left|{\phi }_{\pm }\right.\rangle := \displaystyle \frac{1}{\sqrt{2}}(\left|{\phi }_{1}\right.\rangle \pm \left|{\phi }_{2}\right.\rangle ),\\ {\rho }_{{\rm{p}}.{\rm{e}}.} & = & | \psi \rangle \langle \psi | ,\ \ \left|\psi \right.\rangle =\sqrt{\lambda }\left|{\varphi }_{1}\right.\rangle \otimes \left|{\phi }_{1}\right.\rangle +\sqrt{1-\lambda }\left|{\varphi }_{2}\right.\rangle \otimes \left|{\phi }_{2}\right.\rangle .\end{eqnarray} \tag{ 4.3 }$

All of these have a rank of at most 2, and the reduced state ${\rho }_{{ \mathcal O }}=\lambda | {\varphi }_{1}\rangle \langle {\varphi }_{1}| +(1-\lambda )| {\varphi }_{2}\rangle \langle {\varphi }_{2}|$ which, with an appropriate unitary operator, can be fully erased to $| {\varphi }_{1}\rangle \langle {\varphi }_{1}|$ . Each state, however, falls under a different class of correlations: ${\rho }_{{\rm{u}}.{\rm{c}}.}$ is uncorrelated, ${\rho }_{{\rm{c}}.{\rm{c}}.}$ is classically correlated, ${\rho }_{{\rm{q}}.{\rm{d}}.}$ has quantum discord, and ${\rho }_{{\rm{p}}.{\rm{e}}.}$ is a pure entangled state. The only case where the state of ${ \mathcal A }$ is also left intact, however, is when the two systems are classically correlated. Notwithstanding, this cannot be seen as allowing for ${ \mathcal A }$ to act as a catalyst for information erasure. For ${ \mathcal A }$ to be utilised in the information erasure of another object system, with the same unitary operator, the two must first be correlated; this process will have a thermodynamic cost itself [43]. In the case where ${ \mathcal O }$ and ${ \mathcal A }$ are in a pure entangled state, the unitary operator which prepares ${ \mathcal O }$ in a pure state will also prepare ${ \mathcal A }$ in a pure state. As discussed in [22], this will allow for ${ \mathcal R }$ to be cooled by transferring entropy from it to ${ \mathcal A }$ , resulting in a negative ${\rm{\Delta }}Q$ .

In either scenario, the initial state ρ on the composite system of ${ \mathcal O }+{ \mathcal A }$ , which has a rank smaller than ${d}_{{ \mathcal A }}$ , can be seen as a thermodynamic resource. This is because it is a system that is highly out of equilibrium. Recall that the Hamiltonian of ${ \mathcal A }$ is considered to be trivial, being proportional to the identity. As such, if this system was also at thermal equilibrium with the inverse temperature β, then we would have $\rho ={\rho }_{{ \mathcal A }}(\beta )\otimes {\rho }_{{ \mathcal O }}$ $=\displaystyle \frac{1}{{d}_{{ \mathcal A }}}{{\mathbb{1}}}_{{ \mathcal A }}\otimes {\rho }_{{ \mathcal O }}$ . Any unitary operator acting on such a system would not be able to increase the largest eigenvalue of ${\rho }_{{ \mathcal O }}$ . As such, information erasure would not be possible.

In the case where the rank of ρ is greater than ${d}_{{ \mathcal A }}$ , but smaller than ${d}_{{ \mathcal O }}{d}_{{ \mathcal A }}$ , the reservoir may be used to facilitate information erasure using similar arguments as in section 2. This will allow for a larger ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ , and a smaller consequent ${\rm{\Delta }}Q$ , than if ${ \mathcal A }$ was not present.

4.2. Object as a component of a thermal system

Consider a system composed of an object, ${ \mathcal O }$ , with Hilbert space ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal O }}}$ , and some other system, ${ \mathcal K }$ , with Hilbert space ${{ \mathcal H }}_{{ \mathcal K }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal K }}}$ . The composite system has the Hamiltonian

$\begin{eqnarray}&&H=\displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal K }}}{\lambda }_{n}^{\uparrow }| {\xi }_{n}\rangle \langle {\xi }_{n}| .\end{eqnarray} \tag{ 4.4 }$

Let the initial state of the system be in the thermal state $\rho (\beta )={{\rm{e}}}^{-\beta H}/\mathrm{tr}[{{\rm{e}}}^{-\beta H}]\equiv {\displaystyle \sum }_{n}{p}_{n}^{\downarrow }| {\xi }_{n}\rangle \langle {\xi }_{n}|$ with the non-increasing vector of probabilities ${{\boldsymbol{p}}}^{\downarrow }:= {\{{p}_{n}^{\downarrow }\}}_{n}$ . We wish to prepare the subsytem ${ \mathcal O }$ in the pure state $\left|{\rm{\Psi }}\right.\rangle$ by some cyclic process characterised by a unitary operator U acting on ${{ \mathcal H }}_{{ \mathcal O }}\otimes {{ \mathcal H }}_{{ \mathcal K }}$ . By lemma C.1 the maximal probability of achieving this is accomplished by choosing U from an equivalence class of unitary operators $[{U}_{\mathrm{maj}}]$ characterised by the rule

$\begin{eqnarray}{U}_{\mathrm{maj}}\;:\left\{\begin{array}{cc}\left|{\xi }_{n}\right.\rangle \mapsto \left|{\rm{\Psi }}\right.\rangle \otimes \left|{\phi }_{j}\right.\rangle & \ \mathrm{if}\ n\in \{1,...,{d}_{{ \mathcal K }}\},\\ \left|{\xi }_{n}\right.\rangle \mapsto \left|{\nu }_{k}\right.\rangle & \ \mathrm{if}\ n\notin \{1,...,{d}_{{ \mathcal K }}\}.\end{array}\right.\end{eqnarray} \tag{ 4.5 }$

Each of the vectors in ${\{\left|{\nu }_{k}\right.\rangle \in {{ \mathcal H }}_{{ \mathcal O }}\otimes {{ \mathcal H }}_{{ \mathcal K }}\}}_{k}$ are orthogonal to those in ${\{\left|{\rm{\Psi }}\right.\rangle \otimes \left|{\phi }_{j}\right.\rangle \}}_{j}$ , so that the union thereof forms an orthonormal basis that spans ${{ \mathcal H }}_{{ \mathcal O }}\otimes {{ \mathcal H }}_{{ \mathcal K }}$ . As the system was initially thermal, the gain in its average energy is heat, which obeys the identity

$\begin{eqnarray}&&{\rm{\Delta }}Q:= \mathrm{tr}[H({\rho }^{\prime }-\rho (\beta ))]=\displaystyle \frac{S({\rho }^{\prime }\parallel \rho (\beta ))+S({\rho }^{\prime })-S(\rho (\beta ))}{\beta }=\displaystyle \frac{1}{\beta }S({\rho }^{\prime }\parallel \rho (\beta )).\end{eqnarray} \tag{ 4.6 }$

Here, we make the substitution ${\rho }^{\prime }:= U\rho (\beta ){U}^{\dagger }$ . As unitary evolution does not alter the von Neumann entropy, this energy production is a function of the relative entropy alone; ${\rm{\Delta }}Q$ is therefore nonnegative and independent of ${\rm{\Delta }}S:= S({\rho }_{{ \mathcal O }})-S({\rho }_{{ \mathcal O }}^{\prime })$ . To determine how ${\rm{\Delta }}Q$ can be minimised, we write $S({\rho }^{\prime }\parallel \rho (\beta ))$ in the alternative way

$\begin{eqnarray}S({\rho }^{\prime }\parallel \rho (\beta )) & = & \displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal K }}}{q}_{n}^{U}\mathrm{log}\left(\displaystyle \frac{1}{{p}_{n}^{\downarrow }}\right)-S({\rho }^{\prime }),\\ & = & {{\boldsymbol{q}}}^{{\boldsymbol{U}}}\cdot {{\bf{log}}}_{{\boldsymbol{p}}}^{\uparrow }-S(\rho (\beta )),\end{eqnarray} \tag{ 4.7 }$

where ${{\boldsymbol{q}}}^{{\boldsymbol{U}}}$ is a vector of real numbers

$\begin{eqnarray}&&{q}_{n}^{U}:= \displaystyle \sum _{m=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal K }}}{p}_{m}^{\downarrow }{| \langle {\xi }_{n}| U| {\xi }_{m}\rangle | }^{2},\end{eqnarray} \tag{ 4.8 }$

and ${{\bf{log}}}_{{\boldsymbol{p}}}^{\uparrow }\;:= \;{\{\mathrm{log}(1/{p}_{n}^{\downarrow })\}}_{n}$ a non-decreasing vector. In appendix C.6 we prove that to minimise ${\rm{\Delta }}Q$ , after having maximised the probability of preparing ${ \mathcal O }$ in the pure state $\left|{\rm{\Psi }}\right.\rangle$ , we must choose the unitary operator $U\in [{U}_{\mathrm{maj}}^{1}]\subset [{U}_{\mathrm{maj}}]$ so that ${{\boldsymbol{q}}}^{{\boldsymbol{U}}}$ is non-increasing with respect to the energy eigenvalues of H, and that it majorises all such possible vectors. If we are free to choose what Hamiltonian to construct for the system, then the heat dissipation may be further minimised by choosing the eigenvectors of H, that have support on ${\{\left|{\rm{\Psi }}\right.\rangle \otimes \left|{\phi }_{j}\right.\rangle \}}_{j}$ , to be chosen from this set itself. In other words, the optimal value of ${\rm{\Delta }}Q$ is achieved when the eigenvectors of H are uncorrelated with respect to the ${ \mathcal O }\;:{ \mathcal K }$ partition.

Let us consider a simple example, where ${d}_{{ \mathcal O }}={d}_{{ \mathcal K }}=2$ , and the eigenvectors of H are given as

$\begin{eqnarray}\left|{\xi }_{1}\right.\rangle & = & \sqrt{{\gamma }_{+}}\left|{\varphi }_{1}\right.\rangle \otimes \left|{\phi }_{1}\right.\rangle +\sqrt{1-{\gamma }_{+}}\left|{\varphi }_{2}\right.\rangle \otimes \left|{\phi }_{2}\right.\rangle ,\\ \left|{\xi }_{2}\right.\rangle & = & \sqrt{1-{\gamma }_{+}}\left|{\varphi }_{1}\right.\rangle \otimes \left|{\phi }_{1}\right.\rangle -\sqrt{{\gamma }_{+}}\left|{\varphi }_{2}\right.\rangle \otimes \left|{\phi }_{2}\right.\rangle ,\\ \left|{\xi }_{3}\right.\rangle & = & \sqrt{{\gamma }_{-}}\left|{\varphi }_{1}\right.\rangle \otimes \left|{\phi }_{2}\right.\rangle +\sqrt{1-{\gamma }_{-}}\left|{\varphi }_{2}\right.\rangle \otimes \left|{\phi }_{1}\right.\rangle ,\\ \left|{\xi }_{4}\right.\rangle & = & \sqrt{1-{\gamma }_{-}}\left|{\varphi }_{1}\right.\rangle \otimes \left|{\phi }_{2}\right.\rangle -\sqrt{{\gamma }_{-}}\left|{\varphi }_{2}\right.\rangle \otimes \left|{\phi }_{1}\right.\rangle .\end{eqnarray} \tag{ 4.9 }$

Moreover, let ${\lambda }_{1}^{\uparrow }=0$ and ${\lambda }_{n+1}^{\uparrow }-{\lambda }_{n}^{\uparrow }=1$ for all n. Conforming with equation (4.5), the unitary operator

$\begin{eqnarray}&&U\;:\left\{\begin{array}{c}\left|{\xi }_{1}\right.\rangle \mapsto \left|{\rm{\Psi }}\right.\rangle \otimes \left|{\phi }_{1}^{\prime }\right.\rangle ,\\ \left|{\xi }_{2}\right.\rangle \mapsto \left|{\rm{\Psi }}\right.\rangle \otimes \left|{\phi }_{2}^{\prime }\right.\rangle ,\\ \left|{\xi }_{3}\right.\rangle \mapsto \left|{{\rm{\Psi }}}^{\perp }\right.\rangle \otimes \left|\phi ^{\prime} {^{\prime} }_{1}\right.\rangle ,\\ \left|{\xi }_{4}\right.\rangle \mapsto \left|{{\rm{\Psi }}}^{\perp }\right.\rangle \otimes \left|\phi ^{\prime} {^{\prime} }_{2}\right.\rangle ,\end{array}\right.\end{eqnarray} \tag{ 4.10 }$

will then prepare ${ \mathcal O }$ in the state $\left|{\rm{\Psi }}\right.\rangle$ , with ${p}_{{\rm{\Psi }}}^{\mathrm{max}}={p}_{1}^{\downarrow }+{p}_{2}^{\downarrow }$ . We may then minimise ${\rm{\Delta }}Q$ by choosing U from the equivalence class $[{U}_{\mathrm{maj}}^{1}]$ . This can be achieved if we choose the vectors $\left|{\rm{\Psi }}\right.\rangle$ , $\left|{\phi }_{1}^{\prime} \right.\rangle$ , and $\left|{\phi }_{1}^{\prime \prime} \right.\rangle$ respectively from the sets $\{\left|{\varphi }_{1}\right.\rangle ,\left|{\varphi }_{2}\right.\rangle \}$ , $\{\left|{\phi }_{1}\right.\rangle ,\left|{\phi }_{2}\right.\rangle \}$ , and $\{\left|{\phi }_{1}\right.\rangle ,\left|{\phi }_{2}\right.\rangle \};$ what particular permutation does this depends on the temperature and the values of ${\gamma }_{\pm }$ . In the special case of ${\gamma }_{+}={\gamma }_{-}=\gamma$ , for example, we find that irrespective of the temperature, when $\gamma \gt 1/2$ this is achieved when $\left|{\rm{\Psi }}\right.\rangle =\left|{\varphi }_{1}\right.\rangle$ , $\left|{\phi }_{1}^{\prime }\right.\rangle =\left|{\phi }_{1}\right.\rangle$ , and $\left|{\phi }_{1}^{\prime \prime} \right.\rangle =\left|{\phi }_{2}\right.\rangle$ . Conversely when $\gamma \lt 1/2$ this is realised when $\left|{\rm{\Psi }}\right.\rangle =\left|{\varphi }_{2}\right.\rangle$ , $\left|{\phi }_{1}^{\prime }\right.\rangle =\left|{\phi }_{2}\right.\rangle$ , and $\left|{\phi }_{1}^{\prime \prime} \right.\rangle =\left|{\phi }_{1}\right.\rangle$ .

Figure 9 shows the dependence of both ${\rm{\Delta }}S$ and ${\rm{\Delta }}Q-{\rm{\Delta }}S/\beta$ on the entanglement of the Hamiltonian eigenvectors ${\{\left|{\xi }_{n}\right.\rangle \}}_{n}$ , with ${\gamma }_{+}={\gamma }_{-}=\gamma \in \{1/2,3/4,1\}$ . The system is always evolved using $U\in [{U}_{\mathrm{maj}}^{1}]$ . As γ tends to one, thereby resulting in uncorrelated Hamiltonian eigenvectors, both ${\rm{\Delta }}Q$ and ${\rm{\Delta }}S$ decrease, vanishing in the the limit as β tends to infinity. However, for intermediate temperatures, ${\rm{\Delta }}Q$ becomes so low that it 'violates' Landauer's limit. This is similar to the possibility of extracting work from the correlations between a quantum system and its environment, which are initially in a thermal state [44].

5. Conclusions

In this article, we have developed a context-dependent, dynamical variant of Landauer's principle. We used techniques from majorisation theory to characterise the equivalence class of unitary operators that bring the probability of information erasure to a desired value and minimise the consequent heat dissipation to the thermal reservoir. By constructing a sequential swap algorithm, we demonstrated that there is a tradeoff between the probability of information erasure and the minimal heat dissipation. Furthermore, we showed that except for the cases where the object is a two-level system, or when we are able to fully erase the object's information, we may maximise the probability of information erasure without also minimising the object's entropy; this allows for a more energy-efficient procedure for probabilistic information erasure.

We also investigated methods of reducing heat dissipation due to information erasure by operating outside of Landauer's framework. However, we wanted this departure to preserve the referent of heat and temperature in our mathematical description; dissipating less heat than Landauer's limit becomes meaningless when there is no temperature or heat to speak of. Therefore, we arrived at two alterations to Landauer's framework which would not result in a category error with respect to heat and temperature. The first avenue was to introduce an auxiliary system to the object and reservoir, while the second was to consider the object as a subpart of a system in thermal equilibrium. In the first instance, the figure of merit was identified as the rank of the system in the object-plus-auxiliary subspace; if the rank of this state is less than the dimension of the auxiliary Hilbert space, then full information erasure is possible with at most zero heat dissipation to the reservoir. In the second instance, information erasure can be achieved with possibly less heat than Landauer's limit when the eigenvectors of the system Hamiltonian, that have support on the pure state we which to prepare the object in, are product states.

The primary question we have not addressed in this study, and shall leave for future work, is the inclusion of time-dynamics into what we consider as the physical context; the optimal unitary operator for information erasure is considered here as a bijection between orthonormal basis sets. In most realistic settings, however, one is restricted in the Hamiltonians they can establish between the object and reservoir. As such, the optimal unitary operator may not always be reachable, resulting in a smaller maximal probability of information erasure, a larger minimal heat dissipation, or both. Furthermore, an interesting question to address is the number of times that we must switch between the Hamiltonians, that generate the unitary group, in order to obtain the optimal unitary operator, and how this would scale with the reservoir's dimension. This would provide a link between the present work and the third law of thermodynamics [45] from a controltheoretic [46] viewpoint.

Acknowledgments

The authors wish to thank Ali Rezakhani for his useful comments on the manuscript. MHM and YO thank the support from Fundação para a Ciência e a Tecnologia (Portugal), namely through programmes PTDC/POPH and projects UID/Multi/00491/2013, UID/EEA/50008/2013, IT/QuSim and CRUP-CPU/CQVibes, partially funded by EU FEDER, and from the EU FP7 projects LANDAUER (GA 318287) and PAPETS (GA 323901). Furthermore MHM acknowledges the support from the EU FP7 Marie Curie Fellowship (GA 628912).

Appendix A.: Majorisation theory

Here we shall introduce some useful concepts from the theory of majorisation [47]. Given a vector of real numbers ${\boldsymbol{a}}:= {\{{a}_{i}\}}_{i}\in {{\mathbb{R}}}^{N}$ , where i runs over the index set $I:= \{1,...,N\}$ , we may construct the ordered vectors ${{\boldsymbol{a}}}^{\uparrow }:= {\{{a}_{i}^{\uparrow }\}}_{i}$ and ${{\boldsymbol{a}}}^{\downarrow }:= {\{{a}_{i}^{\downarrow }\}}_{i}$ by permuting the elements in ${\boldsymbol{a}}$ . The non-decreasing vector ${{\boldsymbol{a}}}^{\uparrow }$ is defined such that for all $i,j\in I$ where $i\lt j$ , we have ${a}_{i}^{\uparrow }\leqslant {a}_{j}^{\uparrow }$ . Conversely the non-increasing vector ${{\boldsymbol{a}}}^{\downarrow }$ is defined such that for all $i,j\in I$ where $i\lt j$ , we have ${a}_{i}^{\downarrow }\geqslant {a}_{j}^{\downarrow }$ . The vector ${\boldsymbol{a}}$ is said to be weakly majorised by ${\boldsymbol{b}}$ from below, denoted ${\boldsymbol{a}}\;{\prec }_{w}{\boldsymbol{b}}$ , if and only if for every $k\in I$ , $\sum }_{i=1}^{k}{b}_{i}^{\downarrow }\geqslant {\displaystyle \sum }_{i=1}^{k}{a}_{i}^{\downarrow$ . Conversely, ${\boldsymbol{a}}$ is said to be weakly majorised by ${\boldsymbol{b}}$ from above, denoted ${\boldsymbol{a}}{\prec }^{w}{\boldsymbol{b}}$ , if and only if for every $k\in I$ , $\sum }_{i=1}^{k}{a}_{i}^{\uparrow }\geqslant {\displaystyle \sum }_{i=1}^{k}{b}_{i}^{\uparrow$ . The stronger condition of ${\boldsymbol{a}}$ being majorised by ${\boldsymbol{b}}$ , denoted ${\boldsymbol{a}}\;\prec \;{\boldsymbol{b}}$ , is satisfied if both ${\boldsymbol{a}}\;{\prec }_{w}{\boldsymbol{b}}$ and ${\boldsymbol{a}}\;{\prec }^{w}\;{\boldsymbol{b}}$ (or alternatively, if one of these conditions is met together with $\sum }_{i}{a}_{i}={\displaystyle \sum }_{i}{b}_{i$ ). A sufficient condition for ${\boldsymbol{a}}\;{\prec }_{w}{\boldsymbol{b}}$ is if for all $i\in I$ , ${a}_{i}^{\downarrow }\leqslant {b}_{i}^{\downarrow }$ . Similarly, a sufficient condition for ${\boldsymbol{a}}\;{\prec }^{w}\;{\boldsymbol{b}}$ is if for all $i\in I$ , ${a}_{i}^{\uparrow }\geqslant {b}_{i}^{\uparrow }$ . We now introduce a theorem that will be central to many results in this article.

Theorem A.1. For two vectors ${\boldsymbol{a}}$ and ${\boldsymbol{b}}$ , in ${{\mathbb{R}}}^{N}$ , their inner product obeys the relation

$\begin{eqnarray*}&&{{\boldsymbol{a}}}^{\downarrow }\cdot {{\boldsymbol{b}}}^{\uparrow }\leqslant {\boldsymbol{a}}\cdot {\boldsymbol{b}}\leqslant {{\boldsymbol{a}}}^{\downarrow }\cdot {{\boldsymbol{b}}}^{\downarrow }.\end{eqnarray*}$

For a proof we refer to theorem II.4.2 in [48]. This leads to the simple corollary:

Corollary A.1. Consider the pairs of vectors $\{{{\boldsymbol{a}}}_{{\bf{1}}},{{\boldsymbol{a}}}_{{\bf{2}}}\}$ and $\{{{\boldsymbol{b}}}_{{\bf{1}}},{{\boldsymbol{b}}}_{{\bf{2}}}\}$ , such that ${{\boldsymbol{a}}}_{{\bf{1}}}\;\prec \;{{\boldsymbol{a}}}_{{\bf{2}}}$ and ${{\boldsymbol{b}}}_{{\bf{1}}}\;\prec \;{{\boldsymbol{b}}}_{{\bf{2}}}$ . It follows from theorem A.1 that ${{\boldsymbol{a}}}_{{\bf{1}}}^{\downarrow }\cdot {{\boldsymbol{b}}}_{{\bf{1}}}^{\downarrow }\leqslant {{\boldsymbol{a}}}_{{\bf{2}}}^{\downarrow }\cdot {{\boldsymbol{b}}}_{{\bf{2}}}^{\downarrow }$ , and ${{\boldsymbol{a}}}_{{\bf{2}}}^{\downarrow }\cdot {{\boldsymbol{b}}}_{{\bf{2}}}^{\uparrow }\leqslant {{\boldsymbol{a}}}_{{\bf{1}}}^{\downarrow }\cdot {{\boldsymbol{b}}}_{{\bf{1}}}^{\uparrow }$ .

Appendix B.: An equivalence class of unitary operators

Here we expound on the sense in which equation (2.9) characterises an equivalence class of unitary operators, instead of just one unique unitary operator. The arguments here translate to the other equivalence classes of unitary operators mentioned in the article. Here, two unitary operators U and V are said to be equivalent so far as the probability of information erasure is concerned, if and only if

$\begin{eqnarray}&&\mathrm{tr}[(| {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes {\mathbb{1}})U\rho {U}^{\dagger }]=\mathrm{tr}[(| {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes {\mathbb{1}})V\rho {V}^{\dagger }].\end{eqnarray} \tag{ B.1 }$

First of all, a degeneracy in the probability distribution ${{\boldsymbol{p}}}^{\downarrow }$ will mean that the representation of ρ, as given in equation (2.1), will not be unique. If, for example, we have ${p}_{i}^{\downarrow }={p}_{j}^{\downarrow }$ , then $\left|{\psi }_{i}\right.\rangle$ and $\left|{\psi }_{j}\right.\rangle$ in equation (2.1) can be replaced by any orthonormal pair of vectors $\{\left|\psi \right.\rangle ,\left|{\psi }^{\perp }\right.\rangle \}$ that span the same subspace. As such, replacing ${U}_{\mathrm{maj}}^{g}\left|{\psi }_{i}\right.\rangle =\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{i}^{\prime }\right.\rangle$ with ${U}_{\mathrm{maj}}^{g}\left|\psi \right.\rangle =\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{i}^{\prime }\right.\rangle$ , and similarly ${U}_{\mathrm{maj}}^{g}\left|{\psi }_{j}\right.\rangle =\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{j}^{\prime }\right.\rangle$ with ${U}_{\mathrm{maj}}^{g}\left|{\psi }^{\perp }\right.\rangle =\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{j}^{\prime }\right.\rangle$ , would give a different unitary operator, but the same probability of information erasure. As such, both unitary operators belong in the same equivalence class with respect to the probability of information erasure, denoted $[{U}_{\mathrm{maj}}]$ . Additionally, as the probability of information erasure is unaffected by the orthonormal basis $\{\left|{\xi }_{m}^{\prime }\right.\rangle \}$ in the transformation rules of (2.9), then any choice of this basis will define a different unitary operator that, nonetheless, belongs to the same equivalence class $[{U}_{\mathrm{maj}}]$ .

Appendix C.: Technical proofs

C.1. Maximising the probability of information erasure

Recall that the probability of preparing ${\rho }_{{ \mathcal O }}^{\prime }$ in the state $\left|{\varphi }_{1}\right.\rangle$ is defined as

$\begin{eqnarray}p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime }):= \langle {\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime }| {\varphi }_{1}\rangle & = & \displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal R }}}{p}_{n}^{\downarrow }\langle {\psi }_{n}| {U}^{\dagger }(| {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes {{\mathbb{1}}}_{{ \mathcal R }})U| {\psi }_{n}\rangle ,\\ & = & \displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal R }}}{p}_{n}^{\downarrow }{g}_{n}(U)\equiv {{\boldsymbol{p}}}^{\downarrow }\cdot {\boldsymbol{g}}({\boldsymbol{U}}),\end{eqnarray} \tag{ C.1 }$

where ${\boldsymbol{g}}({\boldsymbol{U}})$ is a vector of positive numbers ${g}_{n}(U):= \langle {\psi }_{n}| {U}^{\dagger }(| {\varphi }_{1}\rangle \langle {\varphi }_{1}| \;\otimes \;{{\mathbb{1}}}_{{ \mathcal R }})U| {\psi }_{n}\rangle$ such that $\sum }_{n}{g}_{n}(U)={d}_{{ \mathcal R }$ .

Lemma C.1. The maximum probability of information erasure is ${p}_{{\varphi }_{1}}^{\mathrm{max}}={\displaystyle \sum }_{m=1}^{{d}_{{ \mathcal R }}}{p}_{m}^{\downarrow }$ . The equivalence class of unitary operators that achieve this, denoted $[{U}_{\mathrm{maj}}^{g}]$ , is characterised by the rule

$\begin{eqnarray}&&\mathrm{for}\ \mathrm{all}\ m\in \{1,...,{d}_{{ \mathcal R }}\},{U}_{\mathrm{maj}}^{g}\left|{\psi }_{m}\right.\rangle =\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{m}^{\prime }\right.\rangle ,\end{eqnarray} \tag{ C.2 }$

where ${\{\left|{\xi }_{m}^{\prime }\right.\rangle \}}_{m}$ is an arbitrary orthonormal basis in ${{ \mathcal H }}_{{ \mathcal R }}$ .

Proof. By theorem A.1 we know that ${{\boldsymbol{p}}}^{\downarrow }\cdot {\boldsymbol{g}}({\boldsymbol{U}})\leqslant {{\boldsymbol{p}}}^{\downarrow }\cdot {{\boldsymbol{g}}}^{\downarrow }({\boldsymbol{U}})$ . Let ${U}_{\mathrm{maj}}^{g}$ be a member of an equivalence class of unitary operators such that ${\boldsymbol{g}}({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{g}}})={{\boldsymbol{g}}}^{\downarrow }({{\boldsymbol{U}}}_{{\bf{maj}}}^{g})$ and ${{\boldsymbol{g}}}^{\downarrow }({\boldsymbol{U}})\prec {{\boldsymbol{g}}}^{\downarrow }({{\boldsymbol{U}}}_{{\bf{maj}}}^{g})$ for all U acting on ${{ \mathcal H }}_{{ \mathcal O }}\otimes {{ \mathcal H }}_{{ \mathcal R }}$ . Therefore, by corollary A.1 we get ${{\boldsymbol{p}}}^{\downarrow }\cdot {{\boldsymbol{g}}}^{\downarrow }({\boldsymbol{U}})\leqslant {{\boldsymbol{p}}}^{\downarrow }\cdot {{\boldsymbol{g}}}^{\downarrow }({{\boldsymbol{U}}}_{\mathrm{maj}}^{{\boldsymbol{g}}})$ , and hence $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ is maximised by ${U}_{\mathrm{maj}}^{g}$ . Because ${g}_{n}(U)\in [0,1]$ for all n, and $\sum }_{n}{g}_{n}(U)={d}_{{ \mathcal R }$ , the first ${d}_{{ \mathcal R }}$ elements in ${{\boldsymbol{g}}}^{\downarrow }({{\boldsymbol{U}}}_{\mathrm{maj}}^{{\boldsymbol{g}}})$ must be one, and the rest zero.

□

C.2. Minimising the heat dissipation

We may always write the post-transformation marginal state of the reservoir as

$\begin{eqnarray}&&{\rho }_{{ \mathcal R }}^{\prime }=\displaystyle \sum _{m=1}^{{d}_{{ \mathcal R }}}{{r}_{m}^{\prime }}^{\downarrow }(U)| {\xi }_{m}^{\prime }\rangle \langle {\xi }_{m}^{\prime }| ,\end{eqnarray} \tag{ C.3 }$

with ${{{\boldsymbol{r}}}^{\prime }}^{\downarrow }({\boldsymbol{U}}):= {\{{{r}_{m}^{\prime }}^{\downarrow }(U)\}}_{m}$ a non-increasing vector of probabilities and ${\{\left|{\xi }_{m}^{\prime }\right.\rangle \}}_{m}$ an arbitrary orthonormal basis in ${{ \mathcal H }}_{{ \mathcal R }}$ . Because ${\rho }_{{ \mathcal R }}(\beta )$ is fixed, minimising ${\rm{\Delta }}Q$ is achieved by minimising the average energy of this state, given as

$\begin{eqnarray}&&\mathrm{tr}[{H}_{{ \mathcal R }}{\rho }_{{ \mathcal R }}^{\prime }]=\displaystyle \sum _{m=1}^{{d}_{{ \mathcal R }}}{{r}_{m}^{\prime }}^{\downarrow }(U)\langle {\xi }_{m}^{\prime }| {H}_{{ \mathcal R }}| {\xi }_{m}^{\prime }\rangle \equiv {{{\boldsymbol{r}}}^{\prime }}^{\downarrow }({\boldsymbol{U}})\cdot {{\boldsymbol{\lambda }}}^{\prime },\end{eqnarray} \tag{ C.4 }$

where ${{\boldsymbol{\lambda }}}^{\prime }$ is a vector of real numbers ${\lambda }_{m}^{\prime }:= \langle {\xi }_{m}^{\prime }| {H}_{{ \mathcal R }}| {\xi }_{m}^{\prime }\rangle$ . To determine how ${\rm{\Delta }}Q$ can be minimised, we first provide a recursive proof of the Ky Fan principle [48] to show that the set of eigenvalues ${\boldsymbol{\lambda }}$ majorises all possible ${{\boldsymbol{\lambda }}}^{\prime }$ .

Lemma C.2. ${{\boldsymbol{\lambda }}}^{\prime }\;\prec \;{\boldsymbol{\lambda }}$ for all orthonormal bases ${\{\left|{\xi }_{m}^{\prime }\right.\rangle \in {{ \mathcal H }}_{{ \mathcal R }}\}}_{m}$ .

Proof. To show this, it is sufficient to show that $\sum }_{m}{\lambda }_{m}={\displaystyle \sum }_{m}{\lambda }_{m}^{\prime$ and ${\lambda }^{\prime }\;{\prec }^{w}\;\lambda$ for all ${\{\left|{\xi }_{m}^{\prime }\right.\rangle \}}_{m}$ . The first condition is trivial, as ${\displaystyle \sum }_{m}{\lambda }_{m}^{\prime }=\mathrm{tr}[{H}_{{ \mathcal R }}]$ and is independent of ${\{\left|{\xi }_{m}^{\prime }\right.\rangle \}}_{m}$ . To show that ${\lambda }^{\prime }\;{\prec }^{w}\lambda$ , it is sufficient to prove that for all m and ${\{\left|{\xi }_{m}^{\prime }\right.\rangle \}}_{m}$ , ${\lambda }_{m}^{\uparrow }\leqslant {{\lambda }_{m}^{\prime }}^{\uparrow }$ . This can be done by showing that the minimal value attainable by ${{\lambda }_{1}^{\prime }}^{\uparrow }$ is ${\lambda }_{1}^{\uparrow }$ and, given this constraint, the minimal value attainable by ${{\lambda }_{2}^{\prime }}^{\uparrow }$ is ${\lambda }_{2}^{\uparrow }$ , and so on. One may always write $\left|{\xi }_{m}^{\prime }\right.\rangle ={\alpha }_{m}\left|{\xi }_{m}\right.\rangle +{\beta }_{m}\left|{\xi }_{m}^{\perp }\right.\rangle$ where $\left|{\xi }_{m}^{\perp }\right.\rangle$ is the orthogonal complement to $\left|{\xi }_{m}\right.\rangle$ in ${{ \mathcal H }}_{{ \mathcal R }}$ . Consequently, we have ${{\lambda }_{m}^{\prime }}^{\uparrow }={| {\alpha }_{m}| }^{2}\langle {\xi }_{m}| {H}_{{ \mathcal R }}| {\xi }_{m}\rangle +{| {\beta }_{m}| }^{2}\langle {\xi }_{m}^{\perp }| {H}_{{ \mathcal R }}| {\xi }_{m}^{\perp }\rangle$ . It is evident that $\langle {\xi }_{1}^{\perp }| {H}_{{ \mathcal R }}| {\xi }_{1}^{\perp }\rangle \geqslant \langle {\xi }_{1}| {H}_{{ \mathcal R }}| {\xi }_{1}\rangle =: {\lambda }_{1}^{\uparrow }$ . Therefore we know that ${{\lambda }_{1}^{\prime }}^{\uparrow }$ is minimised by setting ${| {\alpha }_{1}| }^{2}=1$ . In the next step, the fact that $\langle {\xi }_{1}^{\prime }| {\xi }_{2}^{\prime }\rangle =0$ and that our previous step sets $\left|{\xi }_{1}^{\prime }\right.\rangle =\left|{\xi }_{1}\right.\rangle$ implies that $\langle {\xi }_{1}| {\xi }_{2}^{\perp }\rangle =0$ . This in turn implies that $\langle {\xi }_{2}^{\perp }| {H}_{{ \mathcal R }}| {\xi }_{2}^{\perp }\rangle \geqslant \langle {\xi }_{2}| {H}_{{ \mathcal R }}| {\xi }_{2}\rangle =: {\lambda }_{2}^{\uparrow }$ , so that $\langle {\xi }_{2}^{\prime }| {H}_{{ \mathcal R }}| {\xi }_{2}^{\prime }\rangle$ is minimised by setting ${| {\alpha }_{2}| }^{2}=1$ . This argument can be made recursively for all m.

□

Now we are able to characterise the equivalence class of unitary operators that minimise ${\rm{\Delta }}Q$ .

Lemma C.3. ${\rm{\Delta }}Q$ is minimised by the equivalence class of unitary operators $[{U}_{\mathrm{maj}}^{f}]$ characterised by the rule

$\begin{eqnarray*}&&\mathrm{for}\ \mathrm{all}\ m\in \{1,...,{d}_{{ \mathcal R }}\}\ \mathrm{and}\ n\in \{(m-1){d}_{{ \mathcal O }}+1,...,{{md}}_{{ \mathcal O }}\},{U}_{\mathrm{maj}}^{f}\left|{\psi }_{n}\right.\rangle =\left|{\varphi }_{l}^{m}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle ,\end{eqnarray*}$

with the set $\{\left|{\varphi }_{l}^{m}\right.\rangle | l\in \{1,...,{d}_{{ \mathcal O }}\}\}$ forming an orthonormal basis in ${{ \mathcal H }}_{{ \mathcal O }}$ for each m.

Proof. By corollary A.1 and lemma C.2, ${{{\boldsymbol{r}}}^{\prime }}^{\downarrow }({\boldsymbol{U}})\cdot {{\boldsymbol{\lambda }}}^{\uparrow }\leqslant {{{\boldsymbol{r}}}^{\prime }}^{\downarrow }({\boldsymbol{U}})\cdot {{\boldsymbol{\lambda }}}^{\prime }$ . Therefore $\mathrm{tr}[{H}_{{ \mathcal R }}{\rho }_{{ \mathcal R }}^{\prime }]$ is minimal when for all m, $\left|{\xi }_{m}^{\prime }\right.\rangle =\left|{\xi }_{m}\right.\rangle$ . In such a case, we have

$\begin{eqnarray}{{r}_{m}^{\prime }}^{\downarrow }(U):= \langle {\xi }_{m}| {\rho }_{{ \mathcal R }}^{\prime }| {\xi }_{m}\rangle & = & \displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal R }}}{p}_{n}^{\downarrow }\langle {\psi }_{n}| {U}^{\dagger }({{\mathbb{1}}}_{{ \mathcal O }}\otimes | {\xi }_{m}\rangle \langle {\xi }_{m}| )U| {\psi }_{n}\rangle ,\\ & = & \displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal R }}}{p}_{n}^{\downarrow }{f}_{n}(U,m)={{\boldsymbol{p}}}^{\downarrow }\cdot {\boldsymbol{f}}({\boldsymbol{U}},{\boldsymbol{m}}),\end{eqnarray} \tag{ C.5 }$

where ${\boldsymbol{f}}({\boldsymbol{U}},{\boldsymbol{m}})$ is a vector of positive numbers ${f}_{n}(U,m):= \langle {\psi }_{n}| {U}^{\dagger }({{\mathbb{1}}}_{{ \mathcal O }}\otimes | {\xi }_{m}\rangle \langle {\xi }_{m}| )U| {\psi }_{n}\rangle$ such that $\sum }_{n}{f}_{n}(U,m)={d}_{{ \mathcal O }$ for all m. Let ${U}_{\mathrm{maj}}^{f}$ be a member of the equivalence class of unitary operators such that ${{{\boldsymbol{r}}}^{\prime }}^{\downarrow }({\boldsymbol{U}})\prec {{{\boldsymbol{r}}}^{\prime }}^{\downarrow }({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{f}}})$ for all U acting on ${{ \mathcal H }}_{{ \mathcal O }}\otimes {{ \mathcal H }}_{{ \mathcal R }}$ . By corollary A.1 it would then follow that ${{{\boldsymbol{r}}}^{\prime }}^{\downarrow }({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{f}}})\cdot {{\boldsymbol{\lambda }}}^{\uparrow }\leqslant {{{\boldsymbol{r}}}^{\prime }}^{\downarrow }({\boldsymbol{U}})\cdot {{\boldsymbol{\lambda }}}^{\uparrow }$ , resulting in the minimisation of $\mathrm{tr}[H{\rho }_{{ \mathcal R }}^{\prime }]$ and hence ${\rm{\Delta }}Q$ . To find ${{{\boldsymbol{r}}}^{\prime }}^{\downarrow }({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{f}}})$ , we first need to maximise ${{r}_{1}^{\prime }}^{\downarrow }(U)$ and then, given this constraint, maximise ${{r}_{2}^{\prime }}^{\downarrow }(U)$ , and so on. This, in turn, is achieved by choosing ${U}_{\mathrm{maj}}^{f}$ so that ${\boldsymbol{f}}({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{f}}},1)={{\boldsymbol{f}}}^{\downarrow }({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{f}}},1)$ and ${{\boldsymbol{f}}}^{\downarrow }({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{f}}},1)\succ {{\boldsymbol{f}}}^{\downarrow }({\boldsymbol{U}},1)$ for all U. Note that for each m, ${f}_{n}(U,m)\in [0,1]$ for all n, and $\sum }_{n}{f}_{n}(U,m)={d}_{{ \mathcal O }$ . Hence, the first ${d}_{{ \mathcal O }}$ entries of ${{\boldsymbol{f}}}^{\downarrow }({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{f}}},1)$ are taken to one and the rest to zero. Because of the constraint posed by the orthogonality of the vectors ${\{U\left|{\psi }_{n}\right.\rangle \}}_{n}$ , however, the first ${d}_{{ \mathcal O }}$ elements of ${\boldsymbol{f}}({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{f}}},2)$ must be zero, and to maximise ${{r}_{2}^{\prime }}^{\downarrow }(U)$ the best we can do is to only take the second ${d}_{{ \mathcal O }}$ entries of ${\boldsymbol{f}}({{\boldsymbol{U}}}_{{\bf{maj}}}^{{\boldsymbol{f}}},2)$ to one, with the rest being zero. This argument is then made recursively for all m.

□

C.3. Minimal heat dissipation conditional on maximising the probability of information erasure

Let us divide the vector of probabilities ${{\boldsymbol{p}}}^{\downarrow }$ to form the non-increasing vector of cardinality ${d}_{{ \mathcal R }}$ , denoted ${{\rm{\Pi }}}_{0}^{\downarrow }$ , and the non-increasing vectors of cardinality ${d}_{{ \mathcal O }}-1$ , denoted $\{{{\rm{\Pi }}}_{m}^{\downarrow }| m\in \{1,...,{d}_{{ \mathcal R }}\}\}$ , defined as

$\begin{eqnarray}{{\rm{\Pi }}}_{0}^{\downarrow } & := & \{{p}_{m}^{\downarrow }| m\in \{1,...,{d}_{{ \mathcal R }}\}\},\\ {{\rm{\Pi }}}_{m\geqslant 1}^{\downarrow } & := & \{{p}_{{d}_{{ \mathcal R }}+(m-1)({d}_{{ \mathcal O }}-1)+l}^{\downarrow }| l\in \{1,...,{d}_{{ \mathcal O }}-1\}\}.\end{eqnarray} \tag{ C.6 }$

We refer to the mth element of ${{\rm{\Pi }}}_{0}^{\downarrow }$ as ${{\rm{\Pi }}}_{0}^{\downarrow }(m)$ , and the lth element of ${{\rm{\Pi }}}_{m\geqslant 1}^{\downarrow }$ as ${{\rm{\Pi }}}_{m\geqslant 1}^{\downarrow }(l)$ .

Theorem C.1. The equivalence class of unitary operators that maximise the probability of information erasure and, given this constraint, minimise the heat dissipation, is denoted as $[{U}_{\mathrm{opt}}(0)]$ . This is characterised by the rules

$\begin{eqnarray}{U}_{\mathrm{opt}}(0)\;:\left\{\begin{array}{ll}\left|{\psi }_{n}\right.\rangle \mapsto \left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle & \mathrm{if}\ p({\psi }_{n}| \rho )={{\rm{\Pi }}}_{0}^{\downarrow }(m),\\ \left|{\psi }_{n}\right.\rangle \mapsto \left|{\varphi }_{l}^{m}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle & \mathrm{if}\ p({\psi }_{n}| \rho )={{\rm{\Pi }}}_{m}^{\downarrow }(l)\ \mathrm{and}\ m\geqslant 1,\end{array}\right.\end{eqnarray} \tag{ C.7 }$

where, for all m, each member of the orthonormal set ${\{\left|{\varphi }_{l}^{m}\right.\rangle \}}_{l}$ is orthogonal to $\left|{\varphi }_{1}\right.\rangle$ .

Proof. The first line conforms with the conditions imposed by lemma C.1 and, as such, results in $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })={p}_{{\varphi }_{1}}^{\mathrm{max}}$ . However, here we are restricted to the case $\left|{\xi }_{m}^{\prime }\right.\rangle =\left|{\xi }_{m}\right.\rangle$ for all m, thereby minimising the contribution to heat dissipation by corollary A.1 and lemma C.2. The second line, by virtue of not affecting $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ , is evidently allowed for a unitary operator in the equivalence class $[{U}_{\mathrm{maj}}^{g}]$ . This rule takes the ${d}_{{ \mathcal R }}$ largest remaining probabilities to states $\left|{\varphi }_{l}^{1}\right.\rangle \otimes \left|{\xi }_{1}\right.\rangle$ , thereby maximising the probability associated with $\left|{\xi }_{1}\right.\rangle$ , and so on for the other states $\left|{\xi }_{m}\right.\rangle$ . By the same line of reasoning as in lemma C.3, therefore, the contribution to heat dissipation from this line is minimal.

□

C.4. The tradeoff between probability of information erasure and minimal heat dissipation

Let us make the following observations:

(a)
For any value of ${\rm{\Delta }}Q$ , $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ is maximised when the eigenvectors of ${\rho }_{{ \mathcal O }}^{\prime }$ that have support on $\left|{\varphi }_{1}\right.\rangle$ are given by the set ${\{\left|{\varphi }_{l}\right.\rangle \}}_{l}$ . This follows from corollary A.1, which implies that $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })={\displaystyle \sum }_{l}{o}_{l}^{{}^{\prime }\downarrow }| {\langle {\varphi }_{1}| {\varphi }_{l}^{\prime }\rangle | }^{2}\leqslant {o}_{1}^{{}^{\prime }\downarrow }$ , where ${\rho }_{{ \mathcal O }}^{\prime }={\displaystyle \sum }_{l}{o}_{l}^{{}^{\prime }\downarrow }| {\varphi }_{l}^{\prime }\rangle \langle {\varphi }_{l}^{\prime }|$ .
(b)
For any value of $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ , ${\rm{\Delta }}Q$ is minimised when the eigenvectors of ${\rho }_{{ \mathcal R }}^{\prime }$ are given by the set ${\{\left|{\xi }_{m}\right.\rangle \}}_{m}$ . This follows from lemma C.2.

Observations (a) and (b), together, show that in general the optimal case will require that, for all n, $U\left|{\psi }_{n}\right.\rangle ={\displaystyle \sum }_{l}\sqrt{{\gamma }_{l}^{n}}\left|{\varphi }_{l}\right.\rangle \otimes \left|{\xi }_{l}^{n}\right.\rangle$ . Here ${\gamma }_{l}^{n}\geqslant 0$ are the Schmidt coefficients, and $\left|{\xi }_{l}^{n}\right.\rangle ={{\rm{e}}}^{{\mathfrak{i}}{\phi }_{l}^{n}}{\sigma }_{n}\left|{\xi }_{l}\right.\rangle$ with ${\sigma }_{n}$ a permutation on the set ${\{\left|{\xi }_{l}\right.\rangle \}}_{l}$ and ${\phi }_{l}^{n}\in [0,2\pi )$ a phase.

Consider now the algorithm for sequential swaps between two-dimensional subspaces of ${{ \mathcal H }}_{{ \mathcal O }}$ and ${{ \mathcal H }}_{{ \mathcal R }}$ shown in figure C1.

During each step of the algorithm, we denote the (updated) probability $p({\varphi }_{l}\otimes {\xi }_{m}| {U}_{\mathrm{step}}\rho {U}_{\mathrm{step}}^{\dagger })$ as ${p}_{l,m}$ . Here, U_step represents the unitary operator that results from conducting the algorithm up to some particular step.

Lemma C.4. The sequential swap algorithm produces a non-increasing sequence of errors, ${{\boldsymbol{\delta }}}^{\downarrow }:= {\{{\delta }_{j}^{\downarrow }\}}_{j}$ , commensurate with a non-decreasing sequence of heat, ${\boldsymbol{\Delta }}{{\boldsymbol{Q}}}^{\uparrow }:= {\{{\rm{\Delta }}{Q}_{j}^{\uparrow }\}}_{j}$ , such that the resultant state ${\rho }_{{ \mathcal O }}^{\prime }$ is always passive.

Proof. For every iteration of step (2), each swap operation increases $p({\varphi }_{1}| {\rho }_{{ \mathcal O }}^{\prime })$ , so we obtain the non-increasing sequence of errors ${{\boldsymbol{\delta }}}^{\downarrow }$ by construction. Furthermore, each swap increases $p({\xi }_{i}| {\rho }_{{ \mathcal R }}^{\prime })$ , while decreasing $p({\xi }_{m}| {\rho }_{{ \mathcal R }}^{\prime })$ . To show that this always leads to an increase in heat by corollary A.1, we must show that, for each swap, $i\gt m$ . Every swap in each iteration of step (2) effects a permutation on the set $\{{p}_{1,i},{p}_{2,m},...,{p}_{{d}_{{ \mathcal O }},m}\}$ . Initially, ${p}_{1,i}={o}_{1}^{\downarrow }{r}_{i}^{\downarrow }$ . We note that if ${o}_{1}^{\downarrow }{r}_{i}^{\downarrow }\lt {o}_{l}^{\downarrow }{r}_{m}^{\downarrow }$ with $l\geqslant 2$ , then by necessity $i\gt m$ . As such, the swaps for the first iteration of step (2), that involve state $\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{2}\right.\rangle$ and lead to a permutation in $\{{p}_{\mathrm{1,2}},{p}_{\mathrm{2,1}},...,{p}_{{d}_{{ \mathcal O }},1}\}$ , result in a decrease in $p({\xi }_{1}| {\rho }_{{ \mathcal R }}^{\prime })$ and an increase in $p({\xi }_{2}| {\rho }_{{ \mathcal R }}^{\prime })$ , which indeed leads to a non-decreasing sequence of heat. And so on recursively for all i. To show that ${\rho }_{{ \mathcal O }}^{\prime }$ is always passive, we need to show that after each swap, $p({\varphi }_{i}| {U}_{\mathrm{step}}\rho {U}_{\mathrm{step}}^{\dagger })={\displaystyle \sum }_{m}{p}_{i,m}\geqslant p({\varphi }_{j}| {U}_{\mathrm{step}}\rho {U}_{\mathrm{step}}^{\dagger })={\displaystyle \sum }_{m}{p}_{j,m}$ for all $i\lt j$ . This follows from the fact that ${\{{p}_{i,m}\}}_{i}$ are always in non-increasing order, and that every element in ${\{{p}_{i,m}\}}_{i\geqslant 2}$ is greater than or equal to all those in ${\{{p}_{i,{m}^{\prime }}\}}_{i\geqslant 2}$ if $m\lt {m}^{\prime }$ .

□

Now, we wish to show that the non-decreasing sequence of heat ${{\boldsymbol{\Delta Q}}}^{\uparrow }$ is optimal for the associated non-increasing sequence of errors ${{\boldsymbol{\delta }}}^{\downarrow }$ .

Theorem C.2. If an error δ can be achieved using the sequential swap algorithm, the consequent heat dissipation will be optimal. Achieving the same δ with the presence of entanglement in the vectors ${\{{U}_{\mathrm{step}}\left|{\psi }_{n}\right.\rangle \}}_{n}$ will either increase ${\rm{\Delta }}Q$ , ${\rm{\Delta }}W$ , or both.

Proof. By corollary A.1, lemmas C.2 and C.4, the heat dissipation due to the sequential swap algorithm is minimal if we are restricted to swap operations. If we are not restricted to performing swap operations, we could also achieve the same error δ by allowing for entanglement in the vectors ${\{{U}_{\mathrm{step}}\left|{\psi }_{n}\right.\rangle \}}_{n}$ . To show that this will result in a greater amount of heat dissipation, it is sufficient to show that doing so would increase ${p}_{i,m}$ and decrease ${p}_{i,{m}^{\prime }}$ , for some i, such that $m\gt {m}^{\prime }$ . Likewise, we may show that this would increase the average energy of ${\rho }_{{ \mathcal O }}^{\prime }$ , and hence increase ${\rm{\Delta }}W$ , by demonstrating that the process would increase ${p}_{i,m}$ and decrease ${p}_{j,m}$ , for some m, such that $i\gt j$ .

Here is a sketch of the proof. First start with $\rho ={U}_{\mathrm{opt}}^{p}(0)\rho {U}_{\mathrm{opt}}^{p}{(0)}^{\dagger }$ , which coincides with the smallest error ${\delta }_{{j}_{\mathrm{max}}}^{\downarrow }=0$ , where j_max represents the final step in the swap algorithm. Here, we have ${p}_{1,{d}_{{ \mathcal R }}}^{{j}_{\mathrm{max}}}={p}_{{d}_{{ \mathcal R }}}^{\downarrow }$ and ${p}_{2,1}^{{j}_{\mathrm{max}}}={p}_{{d}_{{ \mathcal R }}+1}^{\downarrow }$ . The first step of the sequential swap algorithm, run backwards, gives us ${p}_{1,{d}_{{ \mathcal R }}}^{{j}_{\mathrm{max}}-1}={p}_{{d}_{{ \mathcal R }}+1}^{\downarrow }$ and ${p}_{2,1}^{{j}_{\mathrm{max}}-1}={p}_{{d}_{{ \mathcal R }}}^{\downarrow }$ , with ${\delta }_{{j}_{\mathrm{max}}-1}^{\downarrow }={p}_{{d}_{{ \mathcal R }}}^{\downarrow }-{p}_{{d}_{{ \mathcal R }}+1}^{\downarrow }$ . All other values are the same as before. Now instead of the swap operation, have

$\begin{eqnarray}&&{U}_{{j}_{\mathrm{max}}-1}\left|{\psi }_{{d}_{{ \mathcal R }}}\right.\rangle =\sqrt{\gamma }\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{{d}_{{ \mathcal R }}}\right.\rangle +\sqrt{1-\gamma }\left|{\varphi }_{i}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle ,\end{eqnarray} \tag{ C.8 }$

and

$\begin{eqnarray}&&{U}_{{j}_{\mathrm{max}}-1}\left|{\psi }_{{d}_{{ \mathcal R }}+(m-1)({d}_{{ \mathcal O }}-1)+i}\right.\rangle =\sqrt{1-\gamma }\left|{\varphi }_{1}\right.\rangle \otimes \left|{\xi }_{{d}_{{ \mathcal R }}}\right.\rangle -\sqrt{\gamma }\left|{\varphi }_{i}\right.\rangle \otimes \left|{\xi }_{m}\right.\rangle ,\end{eqnarray} \tag{ C.9 }$

with all other ${U}_{{j}_{\mathrm{max}}-1}\left|{\psi }_{n}\right.\rangle$ defined by ${U}_{\mathrm{opt}}^{p}(0)$ . With some choice of $\gamma ,i,m$ , we can again obtain

$\begin{eqnarray}&&{p}_{1,{d}_{{ \mathcal R }}}^{{j}_{\mathrm{max}}-1}=\gamma {p}_{{d}_{{ \mathcal R }}}^{\downarrow }+(1-\gamma ){p}_{{d}_{{ \mathcal R }}+(m-1)({d}_{{ \mathcal O }}-1)+i}^{\downarrow }={p}_{{d}_{{ \mathcal R }}+1}^{\downarrow },\end{eqnarray} \tag{ C.10 }$

and hence the same value of ${\delta }_{{j}_{\mathrm{max}}-1}^{\downarrow }$ as before. This, however, will lead to

$\begin{eqnarray}&&{p}_{2,1}^{{j}_{\mathrm{max}}-1}={p}_{{d}_{{ \mathcal R }}+1}^{\downarrow }\leqslant {p}_{{d}_{{ \mathcal R }}}^{\downarrow },\end{eqnarray} \tag{ C.11 }$

and

$\begin{eqnarray}&&{p}_{i,m}^{{j}_{\mathrm{max}}-1}=(1-\gamma ){p}_{{d}_{{ \mathcal R }}}^{\downarrow }+\gamma {p}_{{d}_{{ \mathcal R }}+(m-1)({d}_{{ \mathcal O }}-1)+i}^{\downarrow }\geqslant {p}_{{d}_{{ \mathcal R }}+(m-1)({d}_{{ \mathcal O }}-1)+i}^{\downarrow }.\end{eqnarray} \tag{ C.12 }$

In other words, using the new entangling unitary operator, instead of the sequential swap algorithm, will result in ${p}_{\mathrm{2,1}}$ to decrease, and ${p}_{i,m}$ to increase. If i = 2 and $m\geqslant 2$ , this will result in a larger ${\rm{\Delta }}{Q}_{{j}_{\mathrm{max}}-1}^{\uparrow }$ . Conversely, if m = 1 and $i\geqslant 3$ , this will increase the average energy of the object, and thereby increase ${\rm{\Delta }}W$ . If both $i\geqslant 3$ and $m\geqslant 2$ , then both ${\rm{\Delta }}Q$ and ${\rm{\Delta }}W$ will be larger. The same line of reasoning would apply for entanglement of higher Schmidt-rank.

□

C.5. Full erasure of a maximally mixed qudit with a harmonic oscillator

Here, we expound on the example of using a ladder system as a reservoir, but consider what happens as we take the limit of infinitely large d. In this limit we may call the ladder system a harmonic oscillator. Furthermore, we consider the object as a qudit, with Hilbert space ${{ \mathcal H }}_{{ \mathcal O }}\simeq {{\mathbb{C}}}^{{d}_{{ \mathcal O }}}$ , prepared in the maximally mixed state

$\begin{eqnarray}&&{\rho }_{{ \mathcal O }}=\displaystyle \frac{1}{{d}_{{ \mathcal O }}}\displaystyle \sum _{l=1}^{{d}_{{ \mathcal O }}}| {\varphi }_{l}\rangle \langle {\varphi }_{l}| .\end{eqnarray} \tag{ C.13 }$

Consider a harmonic oscillator of frequency ω, with the ground state energy, ${\lambda }_{1}^{\uparrow }$ , defined as zero. As such, the ${m}^{{th}}$ smallest energy is ${\lambda }_{m}^{\uparrow }=\omega (m-1)$ . Given a fixed and finite ω, in the limit as d tends to infinity there will be infinitely many eigenvalues of ${H}_{{ \mathcal R }}$ that become formally infinite, and hence infinitely many probabilities ${r}_{m}^{\downarrow }$ vanish. As such, we have

$\begin{eqnarray}&&\underset{d\to \infty }{\mathrm{lim}}{\rho }^{\prime }=| {\varphi }_{1}\rangle \langle {\varphi }_{1}| \otimes {\rho }_{{ \mathcal R }}^{\prime },\end{eqnarray} \tag{ C.14 }$

whereby ${p}_{{\varphi }_{1}}^{\mathrm{max}}=1$ . In addition

$\begin{eqnarray}&&\underset{d\to \infty }{\mathrm{lim}}{\rho }_{{ \mathcal R }}^{\prime }=\displaystyle \sum _{m=1}^{\infty }\displaystyle \frac{{r}_{m}^{\downarrow }}{{d}_{{ \mathcal O }}}\displaystyle \sum _{j=0}^{{d}_{{ \mathcal O }}-1}\left(| {\xi }_{{d}_{{ \mathcal O }}m-j}\rangle \langle {\xi }_{{d}_{{ \mathcal O }}m-j}| \right),\end{eqnarray} \tag{ C.15 }$

and a resulting heat dissipation of

$\begin{eqnarray}\underset{d\to \infty }{\mathrm{lim}}{\rm{\Delta }}Q & = & \displaystyle \sum _{m=1}^{\infty }\displaystyle \frac{{r}_{m}^{\downarrow }}{{d}_{{ \mathcal O }}}\displaystyle \sum _{j=1}^{{d}_{{ \mathcal O }}-1}({\lambda }_{{d}_{{ \mathcal O }}m-j}^{\uparrow })-\displaystyle \sum _{m=1}^{\infty }{r}_{m}^{\downarrow }{\lambda }_{m}^{\uparrow },\\ & = & \displaystyle \frac{\omega ({d}_{{ \mathcal O }}-1)}{2}\mathrm{coth}\left(\displaystyle \frac{\beta \omega }{2}\right)\gt \displaystyle \frac{({d}_{{ \mathcal O }}-1)}{\beta }.\end{eqnarray} \tag{ C.16 }$

${\rm{\Delta }}Q$ approaches $({d}_{{ \mathcal O }}-1){k}_{B}T$ in the limit as ω becomes vanishingly small, and hence the optimal case is achieved when we take the double limit of d going to infinity while ω goes to zero.

**Figure C1.** The sequential swap algorithm.
Download figure:
Standard image High-resolution image

Of course, the 'rate' at which we take the limit $d\to \infty$ must be greater than that at which ω approaches zero. As shown in figure C2 (a), for the case of ${d}_{{ \mathcal O }}=2$ , if we increase d while decreasing ω in such a way so as to keep $\parallel {H}_{{ \mathcal R }}\parallel$ constant, both the probability of qubit erasure and the heat dissipation decrease. Precisely, this may be achieved if we define the frequency as $\omega := \parallel {H}_{{ \mathcal R }}\parallel /(d-1)$ . In the limit as d tends to infinity and ω vanishes, the spectra of ${H}_{{ \mathcal R }}$ and ${\rho }_{{ \mathcal R }}(\beta )$ become continuous. That is to say, ${\lambda }_{m+1}^{\uparrow }-{\lambda }_{m}^{\uparrow }\to 0$ and ${r}_{m}^{\downarrow }-{r}_{m+1}^{\downarrow }\to 0$ , for all m. We may therefore simplify our calculations by replacing sums with Riemann integrals. First, we note that in this case the maximum probability of qudit erasure is

$\begin{eqnarray}&&\underset{\displaystyle \genfrac{}{}{0em}{}{\omega \to 0}{d\to \infty }}{\mathrm{lim}}{p}_{{\varphi }_{1}}^{\mathrm{max}}=\displaystyle \frac{{\displaystyle \int }_{0}^{\frac{\parallel {H}_{{ \mathcal R }}\parallel }{{d}_{{ \mathcal O }}}}{{\rm{e}}}^{-\beta x}{\rm{d}}x}{{\displaystyle \int }_{0}^{\parallel {H}_{{ \mathcal R }}\parallel }{{\rm{e}}}^{-\beta x}{\rm{d}}x}=\displaystyle \frac{1}{{\displaystyle \sum }_{j=0}^{{d}_{{ \mathcal O }}-1}{{\rm{e}}}^{-\frac{\beta j\parallel {H}_{{ \mathcal R }}\parallel }{{d}_{{ \mathcal O }}}}}.\end{eqnarray} \tag{ C.17 }$

Moreover the heat dissipation is

$\begin{eqnarray}\underset{\displaystyle \genfrac{}{}{0em}{}{\omega \to 0}{d\to \infty }}{\mathrm{lim}}{\rm{\Delta }}Q & = & \underset{\displaystyle \genfrac{}{}{0em}{}{\omega \to 0}{d\to \infty }}{\mathrm{lim}}\displaystyle \frac{1}{{d}_{{ \mathcal O }}}\displaystyle \sum _{m=1}^{d/{d}_{{ \mathcal O }}}\left(\displaystyle \sum _{j=0}^{{d}_{{ \mathcal O }}-1}{r}_{m+{jd}/{d}_{{ \mathcal O }}}^{\downarrow }\right)\left(\displaystyle \sum _{j=0}^{{d}_{{ \mathcal O }}-1}{\lambda }_{{d}_{{ \mathcal O }}m-j}^{\uparrow }\right)-\underset{\displaystyle \genfrac{}{}{0em}{}{\omega \to 0}{d\to \infty }}{\mathrm{lim}}\displaystyle \sum _{m=1}^{d}{r}_{m}^{\downarrow }{\lambda }_{m}^{\uparrow },\\ & = & \displaystyle \frac{\displaystyle \frac{1}{{d}_{{ \mathcal O }}}{\displaystyle \int }_{0}^{\parallel {H}_{{ \mathcal R }}\parallel }\left(\displaystyle \sum _{j=0}^{{d}_{{ \mathcal O }}-1}{{\rm{e}}}^{-\beta (\frac{x+j\parallel {H}_{{ \mathcal R }}\parallel }{{d}_{{ \mathcal O }}})}\right)x\ {\rm{d}}x}{{\displaystyle \int }_{0}^{\parallel {H}_{{ \mathcal R }}\parallel }{{\rm{e}}}^{-\beta x}\ {\rm{d}}x}-\displaystyle \frac{{\displaystyle \int }_{0}^{\parallel {H}_{{ \mathcal R }}\parallel }{{\rm{e}}}^{-\beta x}x\ {\rm{d}}x}{{\displaystyle \int }_{0}^{\parallel {H}_{{ \mathcal R }}\parallel }{{\rm{e}}}^{-\beta x}\ {\rm{d}}x},\\ & = & \displaystyle \frac{{d}_{{ \mathcal O }}-1}{\beta }+\displaystyle \frac{\parallel {H}_{{ \mathcal R }}\parallel }{2}\left[\mathrm{coth}\left(\displaystyle \frac{\beta \parallel {H}_{{ \mathcal R }}\parallel }{2}\right)-\mathrm{coth}\left(\displaystyle \frac{\beta \parallel {H}_{{ \mathcal R }}\parallel }{2{d}_{{ \mathcal O }}}\right)\right].\end{eqnarray} \tag{ C.18 }$

These functions take the values of 1 and $({d}_{{ \mathcal O }}-1){k}_{B}T$ , respectively, precisely when $\parallel {H}_{{ \mathcal R }}\parallel$ is infinitely large. Therefore, if ω and d decrease and increase, respectively, in such a way so that $\parallel {H}_{{ \mathcal R }}\parallel$ also increases, then in this limit we achieve the optimal case of full information erasure with the minimal heat dissipation of $({d}_{{ \mathcal O }}-1){k}_{B}T$ . One way of ensuring this, as shown in figure C2(b), is to define the dimension of the reservoir as $d={2}^{n}+1$ , where n is a natural number, while defining the frequency as $\omega =\bar{\omega }/n$ . The Hamiltonian norm will be

$\begin{eqnarray}&&\parallel {H}_{{ \mathcal R }}\parallel =\bar{\omega }\displaystyle \frac{{2}^{n}}{n},\end{eqnarray} \tag{ C.19 }$

which, in the limit as n tends to infinity, becomes infinitely large.

**Figure C2.** Comparison between two different methods of taking the double limit of $d\to \infty ,\omega \to 0$ , and their effect on ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ and ${\rm{\Delta }}L$ , when ${d}_{{ \mathcal O }}=2$ (a) Here the frequencies are $\omega =1/(d-1)$ . Therefore the Hamiltonian norm is 1 for all d. For any given β, as d grows larger, thereby making ω smaller, both ${\rm{\Delta }}L$ and ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ decrease. (b) Here $d={2}^{n}+1$ for frequencies $\omega =1/n$ with $n\in {\mathbb{N}}$ . Therefore the Hamiltonian norm is ${2}^{n}/n$ . For a sufficiently large β, as n grows larger, thereby making ω smaller, ${\rm{\Delta }}L$ decreases while ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ increases.
Download figure:
Standard image High-resolution image

**Figure C2.** Comparison between two different methods of taking the double limit of $d\to \infty ,\omega \to 0$ , and their effect on ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ and ${\rm{\Delta }}L$ , when ${d}_{{ \mathcal O }}=2$ (a) Here the frequencies are $\omega =1/(d-1)$ . Therefore the Hamiltonian norm is 1 for all d. For any given β, as d grows larger, thereby making ω smaller, both ${\rm{\Delta }}L$ and ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ decrease. (b) Here $d={2}^{n}+1$ for frequencies $\omega =1/n$ with $n\in {\mathbb{N}}$ . Therefore the Hamiltonian norm is ${2}^{n}/n$ . For a sufficiently large β, as n grows larger, thereby making ω smaller, ${\rm{\Delta }}L$ decreases while ${p}_{{\varphi }_{1}}^{\mathrm{max}}$ increases.
Download figure:
Standard image High-resolution image

C.5.1. Full erasure of a qubit with an initial bias

We have shown that when the whole harmonic oscillator is used as a reservoir we can fully purify a qubit in a maximally mixed state, where the entropy reduction is ${\rm{\Delta }}S=\mathrm{log}(2)$ , with a heat cost of ${\rm{\Delta }}Q\gt {k}_{B}T$ . Here we wish to evaluate the optimal ${\rm{\Delta }}Q$ for arbitrary initial states of the qubit and, hence, arbitrary entropy changes ${\rm{\Delta }}S$ . To this end, define the initial state of the object as

$\begin{eqnarray}&&{\rho }_{{ \mathcal O }}=q| {\varphi }_{1}\rangle \langle {\varphi }_{1}| +(1-q)| {\varphi }_{2}\rangle \langle {\varphi }_{2}| ,q\in \left[\displaystyle \frac{1}{2},1\right).\end{eqnarray} \tag{ C.20 }$

The non-increasing vector of probabilities ${{\boldsymbol{p}}}^{\downarrow }$ can therefore be written as

$\begin{eqnarray}&&{{\boldsymbol{p}}}^{\downarrow }=\{{{qr}}_{1}^{\downarrow },...,{{qr}}_{k}^{\downarrow },(1-q){r}_{1}^{\downarrow },...,(1-q){r}_{k}^{\downarrow },...\},\end{eqnarray} \tag{ C.21 }$

where the ordering implies that

$\begin{eqnarray}&&\displaystyle \frac{q}{1-q}\geqslant \displaystyle \frac{{r}_{1}^{\downarrow }}{{r}_{k}^{\downarrow }}={{\rm{e}}}^{\beta \omega (k-1)}.\end{eqnarray} \tag{ C.22 }$

After the joint evolution with an infinite-dimensional reservoir, the above sequence ${{\boldsymbol{p}}}^{\downarrow }$ describes the spectrum of ${\rho }_{{ \mathcal R }}^{\prime }$ , with the first entry associated with eigenvector $\left|{\xi }_{1}\right.\rangle$ , and so on. In the limit of infinitesimally small ω, the energy spectrum of the reservoir and, hence, the probabilities ${{\boldsymbol{r}}}^{\downarrow }$ can be approximated as a continuum. We may therefore evaluate ${\rm{\Delta }}Q$ by

$\begin{eqnarray}{\rm{\Delta }}Q & = & \displaystyle \frac{{\displaystyle \sum }_{n=1}^{\infty }Q(n)-{\displaystyle \int }_{0}^{\infty }x{{\rm{e}}}^{-\beta x}{\rm{d}}x}{{\displaystyle \int }_{0}^{\infty }{{\rm{e}}}^{-\beta x}\ {\rm{d}}x}=\displaystyle \frac{2q(1-q)\mathrm{log}(\displaystyle \frac{q}{1-q})}{\beta (2q-1)},\\ Q(n) & = & q{\displaystyle \int }_{(2n-2){\rm{\Omega }}}^{(2n-1){\rm{\Omega }}}x{{\rm{e}}}^{-\beta (x-(n-1){\rm{\Omega }})}{\rm{d}}x+(1-q){\displaystyle \int }_{(2n-1){\rm{\Omega }}}^{2n{\rm{\Omega }}}x{{\rm{e}}}^{-\beta (x-n{\rm{\Omega }})}{\rm{d}}x,\end{eqnarray} \tag{ C.23 }$

where Ω is the energy 'width' which satisfies $q/(1-q)={{\rm{e}}}^{\beta {\rm{\Omega }}}$ . In the limit as q tends to one-half, ${\rm{\Delta }}Q$ approaches $1/\beta$ as in our previous analysis.

C.6. Object as a component of a thermal system

In this case, the heat dissipation is

$\begin{eqnarray}{\rm{\Delta }}Q:= \mathrm{tr}[H(U\rho (\beta ){U}^{\dagger }-\rho (\beta ))] & = & \displaystyle \frac{1}{\beta }S(U\rho (\beta ){U}^{\dagger }\parallel \rho (\beta )),\\ & = & \displaystyle \frac{1}{\beta }\left(\displaystyle \sum _{n=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal K }}}{q}_{n}^{U}\mathrm{log}\left(\displaystyle \frac{1}{{p}_{n}^{\downarrow }}\right)-S({\rho }^{\prime })\right),\\ & = & \displaystyle \frac{1}{\beta }\left({{\boldsymbol{q}}}^{{\boldsymbol{U}}}\cdot {{\bf{log}}}_{{\boldsymbol{p}}}^{\uparrow }-S(\rho (\beta ))\right),\end{eqnarray} \tag{ C.24 }$

where ${{\boldsymbol{q}}}^{{\boldsymbol{U}}}$ is a vector of real numbers

$\begin{eqnarray}&&{q}_{n}^{U}:= \displaystyle \sum _{m=1}^{{d}_{{ \mathcal O }}{d}_{{ \mathcal K }}}{p}_{m}^{\downarrow }{\langle {\xi }_{n}| U| {\xi }_{m}\rangle | }^{2},\end{eqnarray} \tag{ C.25 }$

and ${{\bf{log}}}_{{\boldsymbol{p}}}^{\uparrow }:= {\{\mathrm{log}(1/{p}_{n}^{\downarrow })\}}_{n}$ a non-decreasing vector. We now determine the properties that $U\in [{U}_{\mathrm{maj}}]$ must satisfy so as to minimise ${\rm{\Delta }}Q$ conditional on maximising $p({\rm{\Psi }}| {\rho }_{{ \mathcal O }}^{\prime })$ .

Proposition C.1. For a fixed Hamiltonian, ${\rm{\Delta }}Q$ given maximally probable information erasure is minimised by choosing U from an equivalence class of unitary operators $[{U}_{\mathrm{maj}}^{1}]\subset [{U}_{\mathrm{maj}}]$ such that ${{{\boldsymbol{q}}}^{{{\boldsymbol{U}}}_{{\bf{maj}}}^{1}}}^{\downarrow }={{\boldsymbol{q}}}^{{{\boldsymbol{U}}}_{{\bf{maj}}}^{1}}$ and ${{{\boldsymbol{q}}}^{{{\boldsymbol{U}}}_{{\bf{maj}}}^{1}}}^{\downarrow }\succ {{\boldsymbol{q}}}^{{\boldsymbol{U}}}{}^{\downarrow }$ for all $U\in [{U}_{\mathrm{maj}}]$ .

Proof. As $S({\rho }^{\prime })=S(\rho (\beta ))$ and ${{\bf{log}}}_{{\boldsymbol{p}}}^{\uparrow }$ are fixed by the initial conditions, then ${\rm{\Delta }}Q$ is minimised by minimising ${{\boldsymbol{q}}}^{{\boldsymbol{U}}}\cdot {{\bf{log}}}_{{\boldsymbol{p}}}^{\uparrow }$ . This is achieved by ${U}_{\mathrm{maj}}^{1}$ as a consequence of theorem A.1 and corollary A.1.

□

Of course, we may also minimise ${\rm{\Delta }}Q$ by engineering the Hamiltonian itself.

Proposition C.2. ${\rm{\Delta }}Q$ given maximally probable information erasure will be minimised if all $\left|{\xi }_{n}\right.\rangle$ that have support on ${\{\left|{\rm{\Psi }}\right.\rangle \otimes \left|{\phi }_{j}\right.\rangle \}}_{j}$ are given from the set ${\{\left|{\rm{\Psi }}\right.\rangle \otimes \left|{\phi }_{j}\right.\rangle \}}_{j}$ .

Proof. As shown during the proof of the Klein inequality in [49], given a constant spectrum of ρ and σ, $S(\rho \parallel \sigma )$ is minimised when ρ commutes with σ. Since ${\rm{\Delta }}Q$ takes its smallest value by minimising $S(U\rho (\beta ){U}^{\dagger }\parallel \rho (\beta ))$ , to achieve this $U\rho (\beta ){U}^{\dagger }$ must commute with $\rho (\beta )$ . By construction, ${U}_{\mathrm{maj}}^{1}\left|{\xi }_{n}\right.\rangle =\left|{\rm{\Psi }}\right.\rangle \otimes \left|{\phi }_{j}\right.\rangle$ for all $n\in \{1,...,{d}_{{ \mathcal K }}\}$ . So, if $| \langle {\xi }_{m}| {U}_{\mathrm{maj}}^{1}| {\xi }_{n}\rangle | \gt 0$ , to minimise ${\rm{\Delta }}Q$ we must have $\left|{\xi }_{m}\right.\rangle \in {\{\left|{\rm{\Psi }}\right.\rangle \otimes \left|{\phi }_{j}\right.\rangle \}}_{j}$ .

□

Minimising the heat dissipation of quantum information erasure

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

1.1. Information erasure and thermodynamics

1.2. Thermodynamics in the quantum regime

1.3. A quantum mechanical Landauer's principle

1.4. The need for a context-dependent Landauer's principle

1.5. Information erasure: pure state preparation and entropy reduction

1.6. Information erasure and information processing

1.7. Layout of article

2. Information erasure within Landauer's framework

2.1. The setup

2.2. Maximising the probability of information erasure

2.3. Minimising the heat dissipation

2.4. Minimal heat dissipation conditional on maximising the probability of information erasure

2.5. The tradeoff between probability of information erasure and minimal heat dissipation

3. Examples: erasing a fully mixed qubit with maximal probability of success

3.1. Comparison of reservoirs given unitary evolution

3.2. Comparison of reservoirs under energy-conserving, Markovian dephasing channels

3.3. Full erasure of a qudit with a harmonic oscillator

4. Information erasure beyond Landauer's framework

4.1. Information erasure with the aid of an auxiliary system

4.2. Object as a component of a thermal system

5. Conclusions

Acknowledgments

Appendix A.: Majorisation theory

Appendix B.: An equivalence class of unitary operators

Appendix C.: Technical proofs

C.1. Maximising the probability of information erasure

C.2. Minimising the heat dissipation

C.3. Minimal heat dissipation conditional on maximising the probability of information erasure

C.4. The tradeoff between probability of information erasure and minimal heat dissipation

C.5. Full erasure of a maximally mixed qudit with a harmonic oscillator

C.5.1. Full erasure of a qubit with an initial bias

C.6. Object as a component of a thermal system

Minimising the heat dissipation of quantum information erasure

Article metrics

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

1.1. Information erasure and thermodynamics

1.2. Thermodynamics in the quantum regime

1.3. A quantum mechanical Landauer's principle

1.4. The need for a context-dependent Landauer's principle

1.5. Information erasure: pure state preparation and entropy reduction

1.6. Information erasure and information processing

1.7. Layout of article

2. Information erasure within Landauer's framework

2.1. The setup

2.2. Maximising the probability of information erasure

2.3. Minimising the heat dissipation

2.4. Minimal heat dissipation conditional on maximising the probability of information erasure

2.5. The tradeoff between probability of information erasure and minimal heat dissipation

3. Examples: erasing a fully mixed qubit with maximal probability of success

3.1. Comparison of reservoirs given unitary evolution

3.2. Comparison of reservoirs under energy-conserving, Markovian dephasing channels

3.3. Full erasure of a qudit with a harmonic oscillator

4. Information erasure beyond Landauer's framework

4.1. Information erasure with the aid of an auxiliary system

4.2. Object as a component of a thermal system

5. Conclusions

Acknowledgments

Appendix A.: Majorisation theory

Appendix B.: An equivalence class of unitary operators

Appendix C.: Technical proofs

C.1. Maximising the probability of information erasure

C.2. Minimising the heat dissipation

C.3. Minimal heat dissipation conditional on maximising the probability of information erasure

C.4. The tradeoff between probability of information erasure and minimal heat dissipation

C.5. Full erasure of a maximally mixed qudit with a harmonic oscillator

C.5.1. Full erasure of a qubit with an initial bias

C.6. Object as a component of a thermal system