Biochemical Szilard engines for memory-limited inference

By designing and leveraging an explicit molecular realisation of a measurement-and-feedback-powered Szilard engine, we investigate the extraction of work from complex environments by minimal machines with finite capacity for memory and decision-making. Living systems perform inference to exploit complex structure, or correlations, in their environment, but the physical limits and underlying cost/benefit trade-offs involved in doing so remain unclear. To probe these questions, we consider a minimal model for a structured environment—a correlated sequence of molecules—and explore mechanisms based on extended Szilard engines for extracting the work stored in these non-equilibrium correlations. We consider systems limited to a single bit of memory making binary ‘choices’ at each step. We demonstrate that increasingly complex environments allow increasingly sophisticated inference strategies to extract more free energy than simpler alternatives, and argue that optimal design of such machines should also consider the free energy reserves required to ensure robustness against fluctuations due to mistakes.


I. INTRODUCTION
Living and human-made systems exploit out-ofequilibrium fuel supplies to do useful work. For example, if glucose is present in the environment in higher than equilibrium concentrations relative to carbon dioxide and water, bacteria can power themselves through respiration. Similarly, internal combustion engines use an out of equilibrium concentration of their fuel, i.e. petrol, and are powered by the conversion of fuel and oxygen to carbon dioxide and water.
The amount of work that can be done using the fuel is bounded by the non-equilibrium free energy of the fuel [1]. This free energy contains both energetic and entropic terms. As one would expect, if the fuel contains more energy, then, in general, the amount of work that can be done is higher. However, fuels are also more useful if they are in well-defined initial states, with limited microscopic uncertainty. This uncertainty is quantified by the entropy, which is why the entropy contributes to the free energy.
The idea of using high energy fuel is intuitive. If the fuel initially has greater energy than at equilibrium, then that extra energy can be transferred to somewhere else to do useful work as the fuel equilibrates. It is less obvious how to exploit low entropy fuel-nonetheless, entropy is an important component of the free energy stored in biochemical fuel molecules and cellular membrane potentials. For example, the free energy released by converting an ATP molecule to an ADP molecule in a cell is approximately 1.5 times the standard free energy difference between an ATP and an ADP molecule [2].
Spurred by a desire to understand the fundamental physics of computation and information processing, there has been significant recent interest in the exploitation of * t.ouldridge@imperial.ac.uk purely entropic resources [3][4][5][6][7][8][9]. Data arrays are physical systems, and the Shannon entropy of the data contributes to the overall physical entropy of the system. The data itself is therefore a potential resource, and manipulating data has thermodynamic consequences due to changes in the entropy of the data array [10]. A data array can have a simple statistical bias towards 1 or 0, and several authors have discussed how such a bias, which implies a low entropy register, might be exploited to perform work [3,4]. A more subtle and equally fundamental possibility is exploiting structure across multiple bits in the arrayits entropy can be low due to correlations within the data, rather than an overall bias at the level of individual bits [5][6][7][8][9]. However, the principles of designing devices to optimally exploit correlations in general settings remain unclear [9].
Although inspired by the physics of computation, the question of how to exploit correlations is also of fundamental biological relevance. If organisms existed in a homogeneous non-equilibrium environment, there would be no need to develop sophisticated information-processing machinery to survive. However, from the chemotaxis system of E. coli to the brains of humans, complex molecular and cellular networks have been evolved to exploit the fact that the environment exhibits correlated fluctuations. These systems rely on the fact that what is sensed at a certain point in space and time contains information about nearby points [8]. They have evolved even though they are costly to maintain, and despite the fact that the information obtained is limited by features such as the memory and processing power available [11]. However, the fundamental trade-offs that determine the sophistication of these systems are not fully explored.
In this paper we take steps towards unifying these two perspectives on the exploitation of correlations. We first present a molecular design for a measurement-andfeedback device (a Szilard engine [12]) in which the mechanics of the feedback is explicit within the molecular system. We then leverage this construct to propose biomolecular machines that make repeated binary choices about how to act based on measurements of their environment (an array of 'molecular bits'). These machines use their single bit of memory to extract chemical work from correlated arrays, demonstrating that it is possible to design minimal biophysical systems that exploit minimal structured environments.
No memory at all is needed to extract all of the available work from an input consisting of an array of uncorrelated subsystems, and simple schemes with one-bit memories can extract all of the stored free energy from Markovian environments. If we increase the complexity of the environment further, by making it a hidden Markov process, 100% efficiency becomes impossible with a single-bit memory and some implicit inference of the hidden state is required. In this setting, schemes that perform batch averaging to obtain a better estimate of the hidden Markov state can become more efficient than the most direct approaches, at the expense of increased biochemical complexity. We are thus able to construct a minimal thermodynamic setting in which increasingly complex information-processing machinery becomes advantageous in increasingly complex environments.
We first, in section II A, give the relevant assumptions and underlying statistical mechanics. Then, in section II B, we discuss the previous work on devices to extract work from series of bits and introduce our own model. In section II C we demonstrate how work can be extracted from a single molecule in a non-equilibrium state by our setup. Next, in section III A, we discuss how to make a biochemical version of the Szilard engine, which forms the basis of our machines to extract work from correlations. Subsequently we find the maximum amount of work that a device with a persistent memory can extract from a series of correlated bits (section III B 1). We discuss a device based on the biomolecular Szilard engine that reaches this limit and can extract all of the work available from a Markovian input in section III B 2. In sections III B 3 and III B 4, we discuss the limitations of this machine when acting on an input produced by a hidden Markov model. We propose a different machine, in section III B 5, that averages over a batch of multiple input molecules that can extract more work in some cases. Finally, in section III B 6, we discuss the robustness of such devices to fluctuations in the input.

II. MATERIALS AND METHODS
A. Non-equilibrium free energies and information as a resource In this paper, all physical systems are assumed to be well-described by discrete macrostates of molecules in dilute solution. Each of these states has an associated energy, and all systems are in contact with a single heat bath at temperature T [13]. We are concerned with small, fluctuating systems, so the state is characterised by a ran-dom variable X. For any probability distribution over the states of the system P (X = x) = p(x), there is an expected energy the Shannon entropy of the system (in nats) is and the non-equilibrium free energy of the system is [14] F(X) = E(X) − k B T H(X).
The free energy is minimised by the equilibrium distribution to which the system eventually converges. Now consider a system consisting of two subsystems; the overall state of the system is the joint random variable (X, Y ) where X and Y are the random variables that describe the individual subsystems. If we assume that the subsystems are not energetically coupled, so that it is possible to write the energy of any joint state as the sum of the energy of the states of the subsystems, then the free energy can be written [1] F joint (X, Y ) = F X (X) + F Y (Y ) + I(X; Y ), (4) where I(X; Y ) is the mutual information between the two random variables: x,y p(x, y) ln p(x, y) p(x)p(y) .
The mutual information is a measure of how much knowledge of the state of one random variable reduces uncertainty about the state of the other random variable [15]. Eq. 4 shows that there is a real contribution of information to the free energy of a physical system. Fundamentally, correlation between two non-interacting subsystems means that the uncertainty in the state of the joint system is low without a compensating reduction in the energy-work is therefore available.
In terms of the non-equilibrium free energy, the second law of thermodynamics states that the free energy of an isolated system Z can never increase [14]: Let Z consist of two non-interacting subsystems X and Y , as in equation 4, and assume the mutual information between subsystems is zero at the time t. Then for any process between time t and t + τ that leaves X and Y non-interacting in the final state, The reduction in the free energy of X can be used to increase the free energy of Y by an amount up to the magnitude of the change in free energy of X. In this paper we will refer to this increase of free energy of Y as 'work' being performed on the physical system, with work being a shorthand for the more formal term 'chemical work' [16,17]. Therefore, in a process that reduces the free energy of a subsystem X by ∆F X a work of W ≥ −∆F X can be done on another subsystem.
B. Model systems

Prior models
In this paper we consider machines designed to extract work from a non-equilibrium series of bits with both the machines and the bits rendered as biomolecules. These devices exploit pre-existing information within the input via a series of measurement and feedback operations implemented through a 1-bit memory. We now summarise prior work on microscopic machines designed to extract work from a tape of bits to put this study into context.
The first detailed analysis of such a machine was performed by Mandal and Jarzynski [3]. The authors considered a three state device that couples to each bit in an input sequence for a period of time before being moved to the next bit. The machine changes state stochastically and couples the changing of the state of the input bit to the raising and lowering of a mass in a gravitational field. Although the authors pointed out that correlations within the tape could store free energy, their actual design could only exploit the overall bias of the input bits towards either 0 or 1. The device is powered by an increase in the entropy of its input, rather than a change in its energy, but the fundamental principle is not dissimilar to a device that exploits the difference in pressure between two volumes of ideal gasses, which is also entropic in nature. The analogy is particularly vivid if one assigns a '0' to gas particles arriving from the left of a piston, and '1' to particles arriving from the right.
This model was extended to allow the device to step stochastically along its tape, and furnished with a chemical realisation, by Barato and Seifert [4]. In neither case is information in the environment-in the sense of structure induced by correlations-exploited, and there is no feedback from the state of the tape to the operation of the device.
Horowitz et al. discussed a device that interacts with a series of two-state systems via a process of measurement and feedback [18]. The input was an equilibrium system, however, without correlations between successive subsystems. Hence the mechanism of measurement and feedback, which was implicit, must necessarily consume at least as much work as could be extracted in the exploitation step.
Boyd et al. have sought to develop machines that extract work from 'temporal' correlations between successive bits [5][6][7]. The authors consider, in a similar fashion to the previous models, a machine with a number of discrete states coupled to successive bits in a long string of inputs. These machines are intended to extract work from tapes that have no overall bias towards one state or the other but contain correlations between the state of bits. As has been highlighted by Stopnitzky et al., however, the machines in these works were designed without 'reversibly embeddable' dynamics-a necessity if the machines are to operate without external control, as was assumed [9]. Stopnitzky et al. did present systems with reversibly embeddable dynamics that extract positive work from a perfect sequence of alternating 1s and 0s, but the efficiency was very low. The extraction of work from perfectly correlated systems has also been analysed in a quantum mechanical setting [19].
A biochemical machine for exploiting correlated pairs of molecules was presented by McGrath et al. [8]. Although information is indeed exploited in this work, the nature of the correlations-which are much more simple than those in a string of bits encountered one after the other-allow a particularly straightforward, memory-free approach. In effect, the pairs of molecules could be described as a single 4-state non-equilibrium system, and processed in isolation from other pairs.
The lack of concrete physical rendering in some of these models [3, 5-7, 9, 19] makes the machines mysterious and increases the scope for error, as discussed in [9]. If the inputs are simply described as an abstract string of bits without any explanation of their physical instantiation, the low entropy of the data is thereby made to seem like a new, and almost non-physical source of work. Measurement-and-feedback-driven devices in which the feedback mechanism is implicit can also ignore some of the costs of the process, down-play the challenges of inducing feedback-driven behaviour, and, for those unfamiliar with the field, provide misleading intuition as to how the measurement must be stored, as we will discuss.

Molecular implementation of a measurement-and-feedback machine
We now present a general description of the devices considered in this work. We render our machine, and the input bits, as biomolecules, to address the issues of concreteness mentioned above, and to connect to inference in living systems. The detailed set-up is shown in figure  1(a). The model consists of an input, a reaction volume, a series of chemical buffers, and a molecular 'hook' that can bind to the input molecules independently of their state [20,21]. The input is a series of small boxes each containing a single input molecule. This molecule can be in one of two strongly metastable states, X and X * so these input molecules represent a string of bits. This input is a minimal analogue of a fluctuating chemical environment, as experienced by single-celled organisms [22][23][24][25].
The rest of the system is our machine, a minimal analogue of an organism exploiting its environment. The machine functions by transferring molecules to and from its reaction volume via the molecular hook. Once in the reaction volume, input molecules undergo reactions with molecules that are internal to the system-for example, a molecule M encoding the memory. These reactions are coupled to large fuel buffers that collectively allow the machine to store the work extracted from the environment, similar to Refs. [4,8]. The buffers are the molecular analogue of a weight in a gravitational field that can be lifted by the system [3,[5][6][7].
Details of how a molecular hook might operate are given in appendix A. Such a mechanism can transfer molecules to and from the reaction volume with no net expenditure of work, provided that the hook is controlled by a particular quasistatic protocol. Similarly, to reach maximal efficiency, we shall assume that buffer concentrations are also changed quasistatically by a well-defined protocol [13,20,26,27], as illustrated in figure 1(b). These quasistatic protocols will not follow directly from the dynamics of degrees of freedom explicitly modelledthey are essentially externally imposed. Our system is then non-autonomous. Crucially, however, the protocols we will consider require no decision-making intelligence; the same series of manipulations will be applied repeatedly, without feedback from the state of the system. All 'decisions' and feedback strategies must be made by the molecules that are explicitly represented.
By avoiding protocols that require external decisionmaking dependent on the state of the system, we avoid implicit costs that have caused much of the confusion in the thermodynamics of computation, since the original thought experiment of Maxwell [28]. In principle, the protocols we invoke could be applied in parallel to an arbitrarily large number of replicas (as shown in figure 1(a)), rendering the marginal cost per machine of the external protocol negligible. Indeed, this is the assumption usually made with macroscopic thermodynamics. By contrast, if separate decisions had to be made for each replica, the economy of scale would not exist. Although the use of external control makes the individual devices a weaker analogy for single, autonomous organisms, the combined set-up of many devices and their controller is then an analogy for a single, albeit more complex, organism. We note in passing that the need for a quasistatic protocol to control the hook is equivalent to the need for a quasistatic protocol to deterministically advance the tape with which a machine interacts, as has previously been assumed in many bit-driven machines [3,[5][6][7]9]-our physical instantiation makes this need clearer.

C. Example system and calculation: Extracting work from a biased environment
We illustrate the operation and analysis of the set-up outlined in section II B 2 by demonstrating it in the simplest possible context. We consider the reversible extraction of work from a low entropy input by increasing the free energy of a chemical buffer. In this setting, the input array consists of input molecules each initially in the state X * with 100% probability. The X and X * states of the input molecules have equal intrinsic free energy so in equilibrium a single input molecule is equally likely to be in either state. Therefore, it is possible to extract a work of k B T ln 2 per input molecule from the environment.
Each input molecule is transferred to and from the reaction volume by a hook with no net work expenditure, as outlined in appendix A. When the input is in the reaction volume, we extract work by increasing the free energy of a bath of fuel molecules F and F * with chemical potentials µ F and µ F * . To do this we need a chemical reaction which couples the interconversion of X and X * to the interconversion of F and F * . The interconversion of X and X * , or F and F * , is assumed to be infinitely slow except via this reaction. No other molecules, such as those representing a memory, are necessary in this simple context. It is possible to extract some work by connecting the X * molecule to a single bath of F and F * molecules with a high concentration of F * , so that µ F * > µ F . Both the input and the bath are individually out of equilibrium, and tend to drive the reaction in equation 8 in opposite directions. In this case, the drive from the input is stronger and the reaction in equation 8 proceeds from right to left, with the input doing work on the bath. Over time, the bias of the input will decrease until the driving force of both contributions cancel; although the bath and the input are individually still out of equilibrium and store free energy, the input has reach a bias which is in equilibrium with driving force of the bath. At this point, the input will be in state X with probability 1/(1 + e −β∆G ) and in state X * with probability e −β∆G /(1 + e −β∆G ) where β = 1/(k B T ), ∆G = µ F − µ F * < 0. During this relaxation to equilibrium, 1/(1 + e −β∆G ) molecules of F are converted to F * on average. Therefore, the free energy of the bath is changed by −∆G/(1 + e −β∆G )-this is the work extracted per input molecule.
Different choices of ∆G lead to different values of the work; however, the work is maximised at a value ≈ 0.28k B T , which is less than k B T ln 2. This protocol has not extracted all of the work available; indeed the input molecule has not even reached its equilibrium distribution, so it is still a store of free energy. Thus the input molecule could be put in contact with a second bath with a lower concentration of F * molecules but still with an excess of F * above the equilibrium concentration and some more work could be extracted.
If the input molecule is connected to two successive baths with a non-infinitesimal difference in fuel concentrations, then the input molecule undergoes a thermodynamically irreversible relaxation, with some fraction of the free energy being wasted. However, if we take this idea of connecting the input molecule to successive baths with lower ∆G = µ F − µ F * to the limit of a continuous change in ∆G we get a quasistatic process with no irreversible relaxations to equilibrium: the system is at equilibrium with the bath(s) at all points in time. This protocol is achieved by connecting the reaction volume to a large number of baths in succession for enough time to reach equilibrium with the bath as shown in figure 1(b). There is only a small change in concentration of fuel molecules between successive baths. Therefore, in the limit of infinite baths and infinitesimal changes in concentration the reaction volume experiences a quasistatic change in the concentrations of the fuel molecules.
The specific protocol of fuel molecule concentrations, illustrated in figure 1(b), is as follows. Initially [F] = [F * ] = 0 so reaction 8 cannot occur. Then [F * ] is slowly increased up to an appreciable value we name f * . The reason the concentration must be increased slowly is so that fuel molecules are not irreversibly transferred between different buffers via the reaction volume. The reaction in equation 8 still cannot occur, since only X * and F * are present. Then, F is slowly increased. Now reaction 8 can occur; although, initially, the rate of converting X * to X is much slower than the reverse so the input molecule is still in state X * with high probability.
[F] is increased to f , which is the concentration at which the free energy change in reaction 8 is ∆G = 0, so the X and X * states are equally likely.
To calculate the average work extracted in this quasistatic process we consider the increase in free energy of the F/F * baths. Let the probability of the input molecule occupying state X * , when equilibrated with a buffer with a chemical potential difference of ∆G, be p ∆G (X * ). A change in chemical potential difference of ∆G − δ∆G to ∆G is then associated with a probability change of p ∆G (X * ) − p ∆G−δ∆G (X * ) ≈ dp ∆G (X * ) d∆G δ∆G. This change is also equal to the number of F * molecules that are on average converted to F molecules when the reaction volume is exposed to a new buffer. Therefore, the free energy of the bath increases by ∆G dp ∆G (X * ) d∆G δ∆G on average. Taking the limit of infinitely many baths, we integrate the total work done where we integrate by parts and recall that p ∆G (X * ) = e −β∆G /(1 + e −β∆G ). The quasistatic protocol is therefore able to recover all free energy stored in the initial low entropy state, k B T ln 2, as work. In performing this calculation, we have ignored external costs associated with generating the quasistatic protocol, for reasons outlined in II B 2. The transfer of molecules between adjacent buffers, mediated by the reaction volume, has a cost that tends to zero as the concentration difference between buffers tends to zero. With the basic approach to set-up and analyse of our machines explained, we can discuss specific measurement and feedback processes.

A. A Biochemical Szilard engine
Before analysing structured environments, we first present a measurement-and-feedback device that acts on a single binary input. This simpler setting allows us to illustrate the explicit measurement-and-feedback cycle that will underlie all the devices in this work. In particular, we demonstrate a mechanism by which the input is first able to influence the state of a memory, and subsequently the influence is reversed so that the state of the memory affects how work is extracted from the input.
Our device is an exact and explicit biochemical formulation of the Szilard engine [12]. Szilard used this thought experiment to argue against the possibility of an observer violating the second law by measuring a system's equilibrium fluctuations and subsequently using feedback to exploit them. Szilard explained that any exploitation required an 'ominous coupling' between the measured system and the system that performs the feedback-a correlation that persists beyond the physical decoupling of the two degrees of freedom. He argued that such a 'measurement' cannot be performed without a 'compensation' that preserves the second law. Although Szilard was able to analyse explicit mechanisms for both the measurement and exploitation separately, he did not analyse a full cycle of an explicit device.
The biochemical Szilard engine consists of an input molecule, a memory molecule, and chemical fuel buffers that are used to supply or recover chemical work. The input molecule is in one of two states: X or X * . For simplicity, we assume the states have equivalent intrinsic free energy, and that the system is in equilibrium: the molecule is then found in each state with probability 1 2 . The memory molecule also has two states with equivalent intrinsic free energy, and is initially in state M 0 with probability 1 2 and in state M 1 with probability 1 2 . To 'measure' the state of the input means to set the state of the memory to M 0 if the input is X or to M 1 if the input is X * : we correlate the states. This step follows the optimal copy protocol in [20] and can be done using the chemical reactions where F 1 , F * 1 , F 2 and F * 2 are a fuel molecules that are present in excess, and X and X * act as catalysts for the transformation of M between its states. Interconversions other than via the catalytic reactions in equation 10 are assumed to be so slow as to be negligible.
The selective catalysis in equation 10 is an approximation of the behaviour demonstrated by bi-functional kinases in cell signalling networks [29], and can also be engineered from nucleic acid networks (see appendix E for details). The free energy changes of the reactions and One input molecule is moved from its box to the reaction volume by the hook. The reaction volume is then connected to a series of buffers in succession containing different concentrations of fuel molecules, as in (b), in order to measure and extract work from the input molecule. Then the hook moves the input molecule from the reaction volume to its original box and the next input molecule can be moved to the reaction volume. The dashed line separates the input/environment from the machine. The control can operate multiple replicas of the system simultaneously. (b) Fuel manipulation protocol to extract maximal work for a known input state. The concentration of the fuel molecules in the reaction volume is set by connecting the reaction volume with a chemical buffer. The concentrations can be gradually changed by connecting the reaction volume to a series of buffers with a small change in concentration between adjacent buffers. (c) Fuel manipulation protocol for the biochemical implementation of the Szilard engine. In the first stage the concentrations of fuel molecules are changed to set the memory molecule to M0 if the input molecule is X and M1 if the input molecule is X * . In the second state work is extracted from the correlation between the input molecule and the memory molecule.
the reaction rates can be controlled by the concentrations of the fuel molecules, as in the simple example in section II C. It would be possible to set the memory molecule M to the correct state by directly coupling to a buffer with . As in section II C, however, the associated process would be thermodynamically irreversible, wasting the ability of the fuel buffer to do useful work. We therefore change the fuel concentrations quasistatically, as illustrated in figure 1(c), gradually forcing the memory to the M 0 state when in the presence of X, and to the M 1 state in the presence of X * . Initially are all set to zero. The reactions in equation 10 therefore cannot occur. Then, the concentrations are simultaneously increased at a fixed ratio that maintain an overall free energy change of zero for the reactions in equation 10. One of these interconversions (determined by whether there is an X or and X * present) now occurs at an appreciable rate, but forwards reactions exactly balance reverse reactions so there is no overall change in the probability of observing of M 0 and M 1 .
Next ] are decreased to zero. Now the reactions in equation 10, again, cannot occur so the memory molecule is fixed to be M 0 if the input is X and M 1 if the input is X * .
In this correlated state the entropy of the combined (X, M) system is k B ln 2 because there are two equally likely states: (X, M) = (X * , M 1 ) or (X, M 0 ). Prior to measurement the entropy was k B ln 4 because the four combinations of X and M are equally likely. Thus the entropy of the system has decreased by k B ln 2 and so the free energy of the system has increased by k B T ln 2.
The increase in free energy of (X, M) is compensated by a decrease in the free energy of the buffers. This decrease can be calculated as in section II C, except with the limits on the integral reversed and considering two equally likely possibilities: either the input molecule was X and the concentrations of F 1 and F * 1 are changed due to the first reaction in equation 10, or the input molecule was X * and the concentrations of F 2 and F * 2 are changed due to the section reaction in equation 10. The result is that the free energy change of the buffers is −k B T ln 2, which exactly cancels the free energy increase of (X, M), as it should because the process is thermodynamically reversible (see appendix B 1 for more details on this calculation). This reduction in free energy of the buffers is the 'cost' to measurement that was recognised by Szilard as the resolution to the Maxwell's demon paradox [12].
We now consider the feedback step. The device extracts chemical work from the correlated state by allowing the input molecule to evolve in a manner that reflects the outcome of the measurement. The machine uses the reactions in which the F * 3 , F * 4 , F 3 and F 4 are further fuel molecules. Now, M 0 and M 1 act as catalysts for the transformation of X between its states; non-catalysed reactions are again assumed to be impossible. M and X must therefore be mutual bifunctional catalysts, which can be effectively switched on and off by modulating fuel concentrations. This explicit rendering demonstrates the complexity necessary in a minimal measurement-and-feedback device such as Szilard's engine, in which the memory and input must reverse their roles as the determinants of the dynamics. A design based on DNA strand displacement [30][31][32] is presented in appendix E.
As in the measurement step, the reaction rates are slowly manipulated by coupling to buffers with different concentrations of fuel molecules. Initially As with the measurement, we can calculate the change in free energy of the input molecule and measurement molecule system and the chemical work done by the chemical fuel buffers. This extraction step is essentially the reverse of the measurement step, so the free energy of the input molecule and measurement molecule system decreases by k B T ln 2 while simultaneously the free energy of the buffers increases by k B T ln 2 (see appendix B 2 for more details on this calculation).
At the end of the cycle, both the memory molecule and the input have been returned to unbiased and statistically uncorrelated states. Chemical free energy has been transferred from the buffers 1 and 2, to buffers 3 and 4. The net chemical work extracted is then zero since the k B T ln 2 cost of measurement balances the work extracted. This is, of course, expected-extracting work from the initially equilibrated input should be impossible. However, the basic design will underpin that of devices intended to exploit structured environments, and recover net positive work.
We note, in passing, two instructive features of our explicitly-described biochemical Szilard engine. Firstly, there is no need for an 'erase' step to reset the memory to a specific state [33]. Whilst it would be possible to include such a reset, it is not necessary, either for efficient operation or to preserve the second law of thermodynamics. The second law is preserved simply by the 'ominous' nature of the non-equilibrium correlations originally identified by Szilard. Secondly, the measurement is simply the act of setting the engine into the correct state to exploit the input (setting the memory to M 0 or M 1 ). There is no need for any other system, intelligent or otherwise, to record or be aware of the outcome of the measurement. In the context of the typical one-particlegas description of Szilard's engine [12], the measurement is simply the correlation of the pulley and particle positions. Any additional recording of the particle position (for example in the brain of an intelligent being) corresponds to a useless extra correlation or measurement, with associated costs that must be carefully recovered at a later time to reach 100% efficiency.

B. Exploiting a series of correlated bits
Although the Szilard engine cannot extract useful work from its equilibrium input, it forms the basis of a device for exploiting a series of identical biochemical bits labelled with the index i, whose correlated states, described by the random variables {X i }, are generated by a stationary stochastic process. The random variable X i has the possible outcomes of X or X * . We consider the series to be infinite in both directions. As with the Szilard engine in section III A, both states of the input bits are assumed to be equally intrinsically stable, and separate bits do not interact (they are in different boxes in the language of figure 1(a)). The equilibrium distribution of the inputs is, then, for each molecule to be independently distributed uniformly between its two states.
Free energy is stored in the input array if either an initial bias towards X or X * is present, and/or correlations exist between X i and X j for i = j. Since designing a system to exploit an intrinsic bias is simple, and requires no measurement or inference (see section II C), we focus exclusively on the case in which the marginalised probability of each bit occupying either state is 1/2.

Bounds on work extraction
The free energy per bit stored in such an array, and hence the available work per bit, is determined by the difference between the equilibrium Shannon entropy per bit of ln 2 and the entropy rate h [5] where An array of N bits has a state space of size 2 N . For an array with arbitrary correlations, an operation must be 'globally integrated' across all N bits to fully extract W available [34]. Even if a system were able to achieve this integration by coupling to all bits in an array simultaneously, extracting the full available work would be highly non-trivial. In practice, the protocol would need to be tuned to the expected initial occupancy of each of the 2 N states to avoid losses.
The opposite limit to a device that is able to interact with the entire input at once is a device that interacts with each bit separately and in an independent manner. However, such a device can only extract the free energy stored in the state X i , F x (X i ), having marginalised over all other X j =i . In our setting, F x (X i ) = F eq x and thus no work can be extracted. The correlations are wasted and a 'modularity cost' is incurred due to the fact that before the work extraction there is mutual information between X i and later input states, but after the work extraction that mutual information is zero [34].
Let us consider a simple extension to the independentbit device that is interpretable and offers the potential of extracting at least some of the stored work whilst retaining limited complexity. We still manipulate input bits individually, but allow for a memory that maintains its state when the device moves to the next subsystem. This memory permits some of the free energy stored in correlations between successive inputs to be exploited. We now derive a bound on work extraction by this method.
Consider two adjacent input bits labelled i and i + 1, and the memory system. The initial state of the ith bit is the random variable X i . X i can take two values: X or X * . During the interaction of the memory system with the ith bit, X i is both measured and recorded in the memory as the state M i , and work is extracted from the ith bit as it relaxes to a state X final i . We are now concerned with the work that can subsequently be extracted from the i + 1th bit following the same procedure, given the correlations between M i and X i+1 induced by the measurement.
Let F joint (X i+1 , M i ) be the free energy of the joint system consisting of the i + 1th bit and the memory system when in states X i+1 and M i respectively. Before and after the coupling of the i+1th bit and the memory system, there is no direct interaction between the two subsystems, and hence the free energy can be written as the sum of individual contributions calculated using marginalised probabilities and an informational term arising from the correlation between X i+1 and M i as in equation 4 [1]. Prior to measurement, we have After the interaction window, we have The work extracted by any process operating between these start and end points is bounded by If the process that produces the inputs is stationary and the measurement protocol is the same each time, Invoking the positivity of the mutual information [15], we see that the available work is maximal when X final i+1 also follows an equilibrium distribution, and the extraction process fully decorrelates the input from the memory (I(X final i+1 ; M i+1 ) = 0). Thus the work extracted per input bit is bounded by A system that does not make use of a memory, such as the setup for directly exploiting biased inputs discussed in section II C, would therefore extract no work.
The value of the mutual information in equation 17 depends on the details of the measurement process. The state of the memory system, M i , only depends on the state of the next input, X i+1 , through the previous input state, X i , so by the data processing inequality the maximum work that can be extracted is This work is, of course, not greater than the available work in the input. The input is stationary so it is possible to write the entropy rate as [15] and if we use the fact that the conditional entropy is not increased when conditioning on additional variables then These results are a special case of the 'modularity cost' outlined in [34].
Single bit memory devices are therefore constrained by the amount of information they carry forward to the next bit in the chain. Note that carrying this information forward is not sufficient-it must also be used effectively during the interaction window. One might assume that there is an inherent trade-off between updating the memory to be the best possible predictor of the next bit, and using the memory to make the extraction of work from the current bit as efficient as possible. We will now explore this potential trade-off, and these bounds on work extraction more generally, in the context of two distinct devices in two different types of environment.

Exploiting a Markovian input
We first consider the case in which the binary input is Markovian. That is, the probability distribution of the state of each input molecule only depends on the state of the previous molecule. Since we consider processes which have no bias to either state 0 or 1, this process is a one parameter model given by the probability of transitioning state from one input to the next. The entropy of a series of n Markovian random variables is (21) in which we have first used the chain rule for conditional entropies [15] and second the Markov property. Therefore, from Equation 12, the available work if the Markov chain is stationary is Comparing Equations 18 and 22, we see that the maximum available work for a single bit memory is equal to the full available work in a Markovian environment: W max single = W Markov available . We now outline a device that extracts all of this work, both achieving the required measurement accuracy H(M i , X i ) = 0 and using this measurement to extract all of W max single for each bit. We first note that any update of the memory from M i−1 to M i must occur before the ith bit is allowed to evolve. Thermodynamically efficient manipulation of the ith bit requires that any protocol is quasistatic, with the X X * reactions reaching equilibrium with respect to the control faster than the control is updated. Thus, as soon as X X * transitions are allowed by the control, all memory of the previous state is necessarily forgotten, and subsequent updates of the memory using the initial value of X i are impossible.
At first glance it might then seem impossible to extract all the work stored in this setting. We must apparently pay to update the memory from M i−1 to M i using input X i to carry information forward, before we are able to use the memory to exploit X i . The recent result of [35] does not preclude the possibility of extracting all the stored work, but leaves open the possibility that a single additional 'hidden' state might be required to circumvent this apparent problem (see appendix C).
In fact, no additional states are required. The solution is to use the information carried forward from the measurement of the previous input, I(X i ; M i−1 ), to make an extremely cheap and faithful measurement of X i , (M i ) and then to use that measurement to extract k B T ln 2 of work from the relaxation of the ith bit exactly as in Szilard's engine III A. An overview of this process is shown in figure 2.
FIG. 2. Schematic diagram of an efficient measurement and feedback process. The machine receives an input molecule characterised by the random variable Xi and the memory molecule is initially characterised by the random variable Mi−1. Then, the memory variable is set to Mi, which is a measurement of Xi. The state Mi−1 does not affect the state Mi but the fact that Mi−1 and Xi are correlated means that the measurement can be taken cheaply. Next the correlation between Mi and Xi is used to extract work when changing the state of the ith input molecule from Xi to X final i , which is in an equilibrium distribution. Then, that input molecule is returned to its box and the next input molecule, with state Xi+1, is moved to the reaction volume and the process can repeat.
First the new input is copied to the memory. This is done using the same chemical reactions as in the measurement step of the biochemical Szilard engine III A, repeated here for convenience The only difference from the biochemical Szilard engine is that now the initial state of the molecules is different. It is still the case that the input molecule and memory molecules are each, when treated in isolation, equally likely to be in both of their states. Now, however, the states of the two molecules are correlated since the memory molecule has been set using the state of the previous input molecule, X i−1 . A different measurement protocol is therefore needed to make an optimal (reversible) measurement. Instead of starting from a chemical potential difference ∆G 1 = 0 for the fuels, we must start with either F 1 and F * 1 in excess so that the equilibrium distribution dictated by this buffer matches the actual biased probability distribution of the memory molecule given that the input molecule is X. Similarly, either F 2 and F * 2 must be in excess so that the equilibrium distribution dictated by this buffer matches the biased probability distribution of the memory molecule given that the input molecule is X * .
The ideal protocol therefore proceeds as follows. Initially, as in the biochemical Szilard The reactions catalysed by whichever of X or X * is present now occur at an appreciable rate, but forwards reactions exactly balance reverse reactions so there is no overall change in the probability of observing of M 1 and M 2 . If there is no overall bias towards X or X * then ∆G 1 = −∆G 2 = ∆G offset by symmetry. We have used the term '∆G offset ' because the chemical potential difference has been 'offset' from zero, which is what it would be if the successive input molecules were uncorrelated. The rest of the protocol is the same as for the measurement step of the biochemical Szilard  The work done by the chemical fuel baths to make this measurement is once more calculated as in section II C but with different limits on the integral due to the different ∆G offset . As shown in appendix D 1, the work done is exactly k B T H(M i−1 |X i ) = k B T H(X i−1 |X i ), as expected from the change in entropy of the input molecule and memory molecule joint system. Now that the state of the memory molecule has been updated so that M i perfectly reflects X i , k B T ln 2 work is extracted in exactly the same way as in the biochemical Szilard engine. Thus the net work extracted per input molecule is which is all the available work in a stationary Markovian input as in equation 22. This machine has 100% efficiency and there is no irreversible dissipation.
It is therefore possible that a machine with a twostate memory that is well-calibrated for this Markovian environment-with the correct initial ∆G offset in chemical potentials to reflect the nearest-neighbour correlations in the Markov chain-can extract all of the work available. How a machine might obtain this optimal offset parameter, either via design or some form of evolution (to effectively infer the one parameter specifying the Markov process), is beyond the scope of this paper. We note that such a machine faces no trade-offs between exploiting and measuring X i ; the exact measurement of X i both carries the maximal information forward, and enables its full exploitation.

Exploiting a non-Markovian input
In a Markovian environment, if a machine measures the state of an input molecule it knows everything it could about the distribution of the next input. A more complex environment might have correlations that are not fully-described by those of adjacent inputs. In particular, we might imagine an environment with a hidden state S i that influences the probability of X i ; as the hidden state changes, the device moves between regions in which the apparent environmental bias is different. The machine's challenge then becomes a more obvious inference task: to infer the overall state of the environment, and to accordingly exploit the inputs.
Specifically, we will consider a hidden state S i with s i ∈ {0, 1}. When moving from one input molecule to the next the hidden state has a probability k of changing. Conditioned on the hidden state, each input molecule is an independent Bernoulli random variable. The probability of an X * molecule is α if S i = 0 and 1 − α if S i = 1. Some example sequences produced by this process are shown in figure 3(a). Due to the overall symmetry of the process, X i = X and X i = X * are both equally likely having marginalised over all inputs j = i. Thus, as in section III B 2, no free energy is stored in the state of single molecules-only in the correlations between different molecules. The available work that can be extracted per input molecule is plotted against the parameters k and α in figure 3(b). Hidden states that either reliably persist (k → 0) or switch (k → 1), and which provide a predictable output (α → 0, 1) lead to the most free energy stored in the environment.
Given the history of the inputs {X j<i }, the optimal statistical prediction of the next input X i can be made via the forward algorithm [36]. A machine capable of both iterating the forward algorithm at each step, and using the previous value to optimally exploit the current input, would be able to extract the full W available . However, implementing the forward algorithm is impossible for our machine with a single bit of memory that can make only a binary 'decision' during its feedback. For a hidden Markov process, the conditional probability distribution of the next input molecule given the entire history of the input is different for all possible states of the history. Equivalently, the process {X i } cannot be described by a finite state -machine [37], and thus the forward algorithm requires a memory that is a real number, and the exploitation step would need to have a continuous dependence on this real number.
It might be tempting to think that a simpler alternative to the forward algorithm, in which the two-bit memory variable M i is set based on both the current input variable X i and its previous value M i−1 , would give better predictions by allowing the machine to take in more historical information at each step. Such an approach would represent a trade-off, with a maximal information carried forward I(X i+1 ; M i ) being obtained only at the expense of an increased uncertainty H(X i , M i ) in the state of the current input after the measurement. Whether or not the reduced measurement cost could compensate for the reduction in work obtained during the extraction step is moot, however, since such a strategy is impossible, at least in the quasistatic setting. One cannot update the memory from M i−1 to M i quasistatically, in a way such that I(M i ; M i−1 ) = 0, without access to additional hidden memory states [35,38]. All information on initial conditions is necessarily lost immediately when a degree of freedom evolves under a quasistatic process. Thus in the quasistatic setting at least, our single bit memory cannot trade off the accuracy of measurement of the current input and information carried forward.

Markov machines in non-Markovian environments
With the above limitations in mind, we first ask how well the Markov machines considered in Section III B 2, that are limited to interact with one bit at a time, and carry only one bit of memory forward, function in the non-Markovian environment specified. For a perfect measurement of each bit, such that H[X i |M i ] = 0, the expected work extracted per molecule for a quasistaticallyoperated device still follows from Equation 18 but now W max single < W available since there is additional information in long range correlations that is not taken into account by the information between nearest neighbours. Therefore, the machine has efficiency η = W max single /W available < 1 and irreversibly generates entropy This efficiency, η, of Markov machines acting on a hidden Markov model input is plotted in figure 4(a). In making these plots, we first identify the optimal Markov machine offset parameter at each α and k, and then calculate the efficiency of that device-once again assuming that the machine's parameters are optimised to the statistical properties of its environment (perhaps through evolution). It is notable that the Markov machines perform reasonably well in these environments, except when k → 0 or 1, and α ≈ 0 or 1. In these environments, the hidden state behaves predictably and so correlations are long-ranged, but X i fluctuates considerably within the hidden state, effectively fooling the Markov machine that is only able to predict X i based on X i−1 .
The behaviour of the Markov machine can be related to that of the Kalman filter [39], an algorithm for making real-time predictions of the state of a noisy dynamical system with noisy measurements of the system's state. The relative weight put on previous measurements versus the most recent input is a parameter that can be adjusted, and it is well known that high intrinsic noise implies that the current measurement should be weighted strongly, whereas high measurement noise calls for greater emphasis on the previous measurements. The Markov machine is effectively constrained to put all of its emphasis on the most recent measurement; it therefore functions better when the 'intrinsic' noise of the hidden state is relatively high (k ∼ 0.5 and α ∼ 0, 1), and worse when the 'measurement' noise of the inputs is relatively large (k ∼ 0, 1, α ∼ 0.5).

Batch averaging machines in non-Markovian environments
We now ask whether the a more sophisticated strategy, still involving only a single memory molecule and a single binary decision, can overcome this weakness of the Markov machine. If we consider the region where k ≈ 0, then it is likely that a run of multiple input molecules will be produced by the same hidden state. Inspired by our analogy with the Kalman filter, we look for a mechanism of somehow considering multiple input molecules to provide more reliable information about the hidden state, allowing more efficient work extraction. Indeed, in the context of cellular sensing of the concentration of external ligands [11,40], it has been observed that averaging approaches can be beneficial when correlation times in the environment are long [41].
We therefore introduce the batch machine, illustrated in figure 3(c), which is similar to the Markov machine except that it interacts with (i.e. measures and exploits) a batch of multiple molecules simultaneously, rather than just one. An N -batch machine operates by: (a) transferring N inputs to the reaction volume (with no work cost-see appendix A); (b) performing an operation to set a memory based on these N inputs (for a low work cost because the state of the batch is correlated with the state of the memory, which is set based on the state of the previous batch); (c) exploiting the N inputs simultaneously using the memory; and (d) transferring the N inputs back to their array in a random order.
We will first consider a 'binary' machine that, like the Markov machine, has only two measurement reactions and two work extraction reactions. Let J i be the random variable representing whether the number of X * molecules in batch i is greater than half the batch-size, N/2 (J i = 1 if true, 0 otherwise). The machine performs measurement of batch i by setting the memory molecule to M i = M 0 if J i = 0, and to M i = M 1 if J i = 1; we note that other binary measurement choices are possible, but this simple one serves to illustrate the possibilities of a more complex inference strategy. The machine then exploits the imbalance of inputs in the same way that the Markov machine exploits a measured X i = X or X i = X * , by allowing the inputs to relax to an unbiased distribution whilst transferring free energy to chemical buffers. For N = 1, the binary batch machine is identical to the Markov machine of section III B 2; for N > 1 the initial measurement essentially performs an average over N inputs to set its memory. In the limit N → ∞, the batch machine interacts with all molecules at once. However, with only two possible measurement states (and hence two possible work extraction strategies), this limit is generally inefficient.
The measurement can be done with the reactions when N is odd and when N is even. There is an N/2 in one of the reactions in equation 28 and an N/2 + 1 in the other because for even N there is an arbitrary choice as whether to assign the state where there are N/x molecules of X * in the batch to M 0 or M 1 . We have chosen to assign the state with N/2 X * molecules to M 0 . Clearly, if there are N molecules of X and X * in total, then only one of these reactions in equations 27 and 28 can occur at once. We immediately see the price for a more complicated strategy-our reactions now require ∼ N/2 molecules to act as combined catalysts, rather than just a single input molecule (see appendix G for a DNA strand displacement design for these reactions).
In exactly the same way as in the biochemical Szilard engine and in the Markov machine, depending on whether there are more X or X * molecules in the batch, one of the measurement reactions can now occur at an appreciable rate. The initial offsets ∆G 1 and ∆G 2 allow information between batches to be exploited, and are exactly analogous to the constant offsets introduced in section III B 2. Like in the Markov machine, if N is odd then ∆G 1 = −∆G 2 = ∆G offset by symmetry. If N is even then ∆G 1 = −∆G 2 because P (J i = 0) = P (J i = 1 ] are decreased to zero. Now the reactions in equations 27 and 28, again, cannot occur and the memory molecule has been set to state M 0 if the batch contains more X molecules that X * or equal number of X and X * molecules, and to state M 1 if the batch contains more X * molecules than X. The cost of making the measurement is calculated in exactly the same way as for the Markov machine (see appendix F 1), and gives The negative sign represents negative work extraction. Subsequently, work is extracted from the correlated state of the measurement molecule and the batch. The binary batch machine uses the same reactions as the biochemical Szilard engine and the Markov machine to extract work; they are repeated here for convenience: However, the protocol is modified, which is necessary because the state of the memory molecule does not report perfectly on the state of the inputs: any number of molecules in state X * greater than N/2 correspond to J i = 1 and hence M i = M wherep 0 is the probability that an input molecule in the batch is in the state X * , conditioned on J i = 0, andp 1 is the probability that an input molecule in the batch is in the state X * conditioned on J i = 1. It is clear that ] = 0 again the reactions in 31 cannot occur. So, finally, the batch reaches an unbiased equilibrium, and during this process the free energy of the buffers are increased. The work extracted in this step is simply N times the work extracted from one input molecule with a bias ofp 0 if J i = 0 orp 1 if the number of J i = 1. It is therefore Therefore, the net work extracted by the binary batch machine from one batch is As with the Markov machine, we can ask the question of how the optimal batch machine (with N and the free-energy offsets of the fuel baths optimally tuned to the environmental parameters k and α), would perform. Note that since the binary batch machine with N = 1 is a Markov machine, the optimal binary batch machine must perform at least as well as the optimal Markov machine.
The efficiency of the optimal binary batch machine is plotted for different values of α and k in figure 4(b), showing apparently higher efficiency than the optimal Markov machine for some values of k and α as k → 0. To make this comparison clearer, in figure 4(c) we have plotted the work extracted by the binary batch machine per molecule divided by the work extracted by the Markov machine per molecule. We see that there are two regions where the binary batch machine extracts more work. Also, in figure 4(d) we have plotted the optimal batch size for the binary batch machine for the different values of the parameters. For k > 0.08 the optimum batch size is always 1 so the Markov machine and the binary batch machine are the same, but for smaller values of k larger batches are frequently favoured. It is always the case that the optimum batch size is odd. This is because the extraction reactions of the binary batch machine cannot extract work from a batch with equal numbers of X and X * molecules so even batch sizes are disfavoured.
The binary batch machine delivers, at least in part, on the prospect of improving work extraction from an environment with more complexity. It is unsurprising that a long hidden state life time, k → 0, is necessary for this advantage to be manifest: the averaging strategy will clearly fare poorly when the hidden state switches rapidly. When α is close to 0 or 1 the state of the input molecule reflects the hidden state with a high probability so the string of input molecules is approximately Markovian, preventing the batch machine from finding a competitive advantage. The most subtle question, however, is why the binary batch machine does not extract more work than the Markov machine when α ≈ 0.5 and k → 0. Naïvely, this regime would seem to be ideal for the batch machine to extract work from weak, but long-lived biases towards either X or X * . From the perspective of the analogy with Kalman filters, this regime should favour the approach that considers a wide range of inputs, rather than just the most recent. To understand why this intuition fails, we consider where the thermodynamic losses occur during the operation of the binary batch machine.
Several stages of the operation of the optimal binary batch machine are thermodynamically irreversible, resulting in efficiencies η < 1. They include the point at which the memory is updated without taking into account correlations between non-neighbour batches; the point at which the batch of N input molecules are mixed within the reaction volume (figure 3(c)); and the point at which the work extraction begins using the measurement molecule. In the first process, a modularity cost is incurred. In the second, mixing causes the positional order within a batch to be lost, reducing our ability to extract work from the sequence of molecules within the batch. All that remains is a non-equilibrium distribution of the number of molecules in each state. In the third process, this non-equilibrium distribution relaxes further to a binomial distribution for the number of X * with parameter p 0 if J i = 0 orp 1 if J i = 1, with no work extracted on average during this relaxation, as shown in figure 5(a).
We can imagine a putative 'full batch machine' that could extract all of the work available from the unordered batch, after the initial mixing and measurement. Such a machine would require additional extraction processes to which the memory could couple in each state. The optimal batch size for this full batch machine is plotted in figure 5(b). We can see that for this machine it is not the case that the optimal batch size is 1 when α ≈ 0.5. The contour plot for this machine is more similar to expectations: as k → 0 optimal batch size increases for all values α. We have also plotted the ratio between the work extracted by the full batch machine and the Markov machine in figure 5(c), and see that the full batch machine extracts more work than the Markov machine when k is close to 0 and α ≈ 0.5. Thus the reason that the binary batch machine fails to provide an improvement in the vicinity of α = 0.5 is at least in part because the free energy wasted during the simple binary work extraction mechanism is too large compared to the relatively low amounts of work available (as seen in figure 3(b)). Initially, there is a distribution produced by the hidden Markov model (in this case with k = 0.8 and α = 0.01). Then, whether or not here are more than N/2 X * molecules in the batch is measured. At the start of the work extraction stage of the protocol the reaction volume is put in contact with a buffer of fuel molecules with chemical potential differences as defined in equation 32. No work is done on the buffer but the batch of input molecules irreversibly relaxes to be in equilibrium with the buffer. Finally, work is extracted by shifting the probability distribution to the equilibrium distribution. A putative "full" batch machine that avoids the loss of this irreversible relaxation can out-perform batch machine, as shown in (b)-(d). (b) The greatest expected work as a function of the input parameters for the full batch machine does not have an optimum batch size of 1 for α ≈ 0.5. The optimum N has been found by numerically calculating the work extracted for the values of N up to N = 9. (c) The ratio between the work extracted by the full batch machine and the work extracted by the Markov machine. In contrast to the binary batch machine this full batch machine can extract more work than the Markov machine when k is close to 0 and α is close to 0.5. (d) The efficiency η of the full batch machine.

Robustness
On average, all the machines can extract a positive amount of work from each input molecule or batch of input molecules. However, in a single realisation of the input produced by the stochastic process it is possible for the machines to extract a negative amount of work; i.e. lose free energy, since the prediction of the upcoming state is only probabilistic even in the best case.
Thus, the total work extracted by any machine is a biased random walk. If the machine is unlucky it can receive a fluctuation in the input and get many negative steps with few positive steps. If we are imagining that the machine needs to harvest enough work to power its decision-making, like a biological organism, a fluctuation in its environment where it loses all of its stored free energy would be disastrous. We therefore also consider fluctuations in the work extracted by the machines. If one protocol has a higher expected work extraction but a larger variance it might not be truly better.
The expected worst-case energy-loss-the infimum of the work extracted-can be thought of as the starting larder-size/fuel-reserves that such a reasoning machine requires. It also gives a minimum timescale that any machine would need to run before it could create a replica that is also robust to environmental fluctuations. This infimum of the total work extracted by the machines in a trajectory averaged over many simulated trajectories is plotted against the parameters of the input process in figure 6. When k ≈ 0.5 or α ≈ 0.5, the work that is extracted by the machines is small so the size of the negative fluctuations are also small for both machines. Comparing figures 6(a) and (b) shows that the binary batch machine exhibits reduced fluctuations in the regions where k is close to 0 and α is not close to 0, 0.5 or 1, where a batch size greater than 1 is favoured by the average work extracted. This fact is perhaps unsurprising, given that averaging over many inputs is inherently conservative. The mean infimum of work of the binary batch machine in a run of 100 molecules averaged over 100000 trajectories. We see that in the region where the optimal batch size is greater than 1 (shown in figure 4(d)) the magnitude of the negative fluctuations is decreased compared to the Markov machine.

IV. DISCUSSION
We have considered the question of how minimal molecular devices might be designed to exploit the free energy stored in simple non-equilibrium environments. Having outlined a concrete design for a biomolecular Szilard engine, we have shown how such a device can form the basis of machines for exploiting a correlated series of molecular bits, expanding on previous work that has only considered environments with a very particular structure [8,9,19].
Although our devices require externally-applied protocols to operate, all information-processing is performed by degrees of freedom that are explicitly represented as biomolecules undergoing reactions in dilute solutionthere are no concealed degrees of freedom. As a result, the complexity of implementing minimal systems that exhibit efficient measurement and feedback is made clear, and ambiguities are eliminated. This clarity applies not only to the extended correlation-exploiting devices, but also to our representation of the canonical Szilard engine itself. The continuing debate surrounding such devices (see references in [1,13]) shows the importance of a concrete physical representation.
For an environment with no structure-without correlations between successively encountered molecular bits-there is no need to process information and all of the available free energy can be extracted as work without use of a memory or any decision making. For a Markvovian array with non-zero correlations between consecutive bits, we show that a simple two-state memory that can select one of two work extraction protocols can extract all of the stored free energy in the environment. The two-state memory is sufficient to carry all of the available information about the future of the environment forward, and we identified a protocol that is simultaneously optimal for updating the memory according to the current input, and using exploiting said input. For a more complex environment, involving a hidden variable that can only be inferred by the machine through noisy measurements, we argue that a machine with a finite memory cannot extract all of the available free energy as work. We demonstrate that in such a setting, a more complex strategy involving effectively averaging over a batch of molecules can be advantageous if correlations are long-ranged, but noise is substantial. This is similar to the result in [42] that a more complex predictive model is advantageous in a more complex environment, but, in this paper, we give an explicit physical model for how our machines measure and exploit the environment. In our design, the complexity of the mechanism involved the ability to couple to multiple inputs simultaneously; we predict that alternatives (such as systems with larger memories and more possible decision) would also show the potential for improved performance. A real living system must not only extract enough resources from its environment on average, but also over short intervals. In any fluctuating environment, an un-lucky sequence of events might lead to starvation and death. We probe this situation in our minimal setting by considering the typical infimum (lowest point) of the work extracted by our devices, which represents the typical scale of negative fluctuations. We find that the more sophisticated inference strategy considered here also has smaller negative fluctuations when it is favourable on average, suggesting that it truly can be advantageous. In a minimal living system, reduced negative fluctuations would correspond to the need for a smaller reserve of energy, and the ability to produce viable offspring more quickly, since each offspring would need to be provided with the reserves to deal with typical negative fluctuations for a large fraction to survive.
The minimal devices we consider are clearly unnatural, and constitute only a first step towards understanding the physics of living or life-like systems that make simple decisions. Future work will focus on constructing minimal models in which the systems are autonomous, requiring no external control, and power their own informationprocessing tasks by the free energy harvested. More realistic environments of fluctuating chemical concentrations will also be considered. A deeper question is whether we can design concrete systems that actually learn the statistics of their environment, evolving the parameters of their decision making process towards an optimal strategy, rather than simply imposing optimal behaviour as in this work.
Despite the simplicity of our current approach, however, we believe that concrete lessons can be drawn for the physics of living or life-like systems making simple decisions. In our physical model, successively more complex, and potentially costly, information-processing architectures perform better in successively more complex environments. We would expect that the informationprocessing carried out by living organisms reflects a similar trade-off: more complex decision-making strategies are more worthwhile in environments that exhibit statistical structure over time scales that are long compared to the decision-making time, and large fluctuations that must not be misinterpreted. We also expect that true evolved strategies will not optimize exploitation of the environment on average in isolation; strategies should also be designed to hedge against the risk of negative short-term fluctuations, to a degree that depends upon the cost of storing resources that compensate for these fluctuations.

V. DATA AVAILABILITY
The code and data to produce the figures in this paper can be found at https://doi.org/10.5281/zenodo. 1976933.

VI. ACKNOWLEDGEMENTS
T. E. O. acknowledges support from a Royal Society University Research Fellowship and R. A. B. acknowledges support from an Imperial College London AMMP studentship.
, where ∆G 0 1 depends on the intrinsic nature of the F 1 and F * 1 molecules and the reaction volume but not their concentrations, is quasistatically changed from 0 to ∞ and ∆G 2 = µ F2 −µ [2] Firstly, let us assume that the input molecule is in state X, which occurs with prob of 1/2. In this case, only the first reaction in equation B1 can occur. At any point in the process there is a probability p(M 1 ) that the memory molecule is in state M 1 . This probability only changes with a corresponding change in the number of F 1 and F * 1 in the buffer. If p(M 1 ) changes by a small amount dp(M 1 ) then dp(M 1 ) F * 1 are converted into F 1 so a work of dp(M 1 )∆G 1 is done on the buffer connected to the reaction volume. Therefore, in a process a work is done on the buffers of Because the change in concentration of the fuels is quasistatic, at all times in the process the memory molecule is in equilibrium with the fuel buffer the reaction volume is connected to. Therefore, where β = 1 kBT . The fact that only dependence p(M 1 ) has on time is through ∆G 1 means that equation B2 can be converted into an integral over ∆G 1 instead. Because the change is quasistatic the particular function of time that ∆G 1 is does not matter. Only the change in ∆G 1 matters. Therefore, Now to get the work we simply have to use equation B3 and evaluate the integral. It is convenient to first integrate by parts to get and then exploit equation B3 to get using l'Hôpital's rule for the ∆G 1 → ∞ limit in the first line. A negative work corresponds to a decrease in free energy of the buffers. Alternatively, there is a probability of 1/2 that the input molecule is X * so only the second reaction in equation B1 can occur. In this case where [F * 2 ] and the work done on the buffers is To evaluate the upper limit it is convenient to substitute in equation B7 Each of these possibilities is equally likely so the expected work is (B10)

Extraction
Now the system is either in state (X, M 0 ) or (X * , M 1 ). Work is extracted from this high free energy state using the reactions If the system is in state (X, M 0 ) then only the first reaction in equation B11 can occur. In this case the probability of the input molecule being in the X * state is: where [F3] . As ∆G 3 is changed from ∞ to 0, we obtain This is exactly the same calculation as equation B6. The sign is positive because the free energy of the fuel molecule buffers is now increased.
If the system is in state (X * , M 1 ) then only the second reaction in equation B11 can occur. In this case the probability of the input molecule being in the X * state is: where [F4] . As ∆G 4 is changed from −∞ to 0, we obtain This is exactly the same calculation as equations B8 and B9. Each of these possibilities is equally likely so the expected work is Therefore, in a measure and extract cycle the net work done by the fuel molecule buffers is zero.
Appendix C: Quasistatic embeddability of Markov machine Any transformation of a probability distribution over discrete states can be represented by a stochastic matrix.
A quasistatic embedding is a non-homogeneous continuous time Markov chain that produces such a transformation with no entropy production [35]. It is not possible to find such an embedding for all stochastic matrices. For some stochastic matrices the state-space must be extended with additional 'hidden' states before a quasistatic embedding can be found. Owen et al. [35] have found bounds on the number of hidden state required.
We can apply the results of [35] to the Markov machine. The joint system of the input molecule and memory molecule has four states. We order them (XM 0 , XM 1 , X * M 0 , X * M 1 ). The transformation that measures the state of the input molecule to the state of the memory molecule and takes the input molecule to its equilibrium distribution is then The determinant of P is zero so according to [35] the lower bound on the number of additional hidden states required for a quasistatic embedding is zero.
The upper found on the number of hidden states required is r + (P ) − 1 where r + (P ) is the nonnegative rank of P . For an n × n stochastic matrix, M , the nonnegative rank is the smallest m such that M can be written M = RS where R is a n × m stochastic matrix and S is a m × n stochastic matrix.
P can be written as and then quasistatically changed to ∆G 1 = ∞. Therefore, the work done is [F * 2 ] must be initially set to∆G 2 = −∆G offset such that and then quasistatically changed to ∆G 1 = −∞. Therefore, the work done is The first case occurs with probability p(X) and the second occurs with probability p(X * ) so the expected work is W =p(X) 1 β p(M 1 |X) ln p(M 1 |X) The random variable M i is an exact copy of X i so The input process is stationary so H[ H[X i+1 |X i ] ≤ ln 2 so the fact that the memory molecule and input molecule are initially correlated means that the measurement requires less work to be done on the system by the fuel molecule buffers.

Extraction
The extraction process is exactly the same as for the biochemical Szilard engine. Therefore, the work done on the fuel molecule buffers is k B T ln 2 so the net work per input molecule is (D9) In this section we present a domain level DNA-based design to implement the measurement and work extraction reactions of the Biochemical Szilard engine and Markov machine using DNA strand displacement. The design is shown in figure 7. Our designs leverage the general construction of [30].
The nature of DNA strand displacement reactions means that additional auxiliary strands, labelled A 1 to A 12 , are required. We assume that these strands are always present in the reaction volume in excess.
when N is even.
Let J i be the random variable representing whether the number of X * molecules in batch i is greater than N/2 (J i = 1 if true, 0 otherwise). The measurement process is exactly the same as for the Markov machine except that the chemical potential differences, ∆G 1 = µ F1 − µ F * 1 = ∆G 0 1 + ln [F1] [F * 1 ] and ∆G 2 = µ F2 − µ F * 2 = ∆G 0 2 + ln [F2] [F * 2 ] , are initially set to ∆G 1 = ∆G 1 offset and ∆G 2 = ∆G 2 offset such that However, the protocol of the chemical potential differences must be different. In the biochemical Szilard engine and Markov machine, if the memory molecule was in the state M 0 then the input molecule would be for certain in the state X. However, in the binary batch machine if the memory molecule is in the state M 0 then there is a nonzero probability that some of the input molecules in the batch are in state X * . The chemical potential differences, [F3] and ∆G 4 = µ F * 4 − µ F4 = ∆G 0 4 + ln First, there is an irreversible relaxation in the batch from the initial input distribution, which depends on the input stochastic process, to a binomial distribution over the number of X * molecules with a mean of Np 0 or Np 1 . If the memory molecule is is M 0 , the work extracted in this relaxation is where X * initial M0 is the expected number of X * in the batch initially. If the memory molecule is is M 1 , the work extracted in this relaxation is where X * initial M1 is the expected number of X * in the batch initially. Then, ∆G 3 and ∆G 4 are quasistatically changed to zero. If the memory molecule is in state M 0 the work that is done in this quasistatic step is Therefore, if the memory molecule is in state M 0 the total work that is done in the irreversible relaxation and quasistatic steps is N 1 β ln 2+ X * initial M0 /N lnp 0 +(1− X * initial M0 /N ) ln(1−p 0 ) . (F14) This is maximised ifp 0 = X * initial M0 /N . Similarly, if the memory molecule is in state M 0 the work is maximised by settingp 1 = X * initial M1 /N . i.e.p 0 is the probability that an input molecule in the batch is X * if J i = 0 and p 1 is the probability that an input molecule in the batch is X * if J i = 1. This means that initially no work is done on the fuel molecule buffers during the irreversible relaxation because on average there is no net change of number of X * in the batch. Therefore, the expected work done in the extraction step is In this section we present a domain level DNA-based design to implement the measurement reactions of the batch machine using DNA strand displacement. The design is shown in figure 8 for the case when N = 5. Our designs leverage the general construction of [30]. This design is the same as the measurement reactions for the Biochemical Szilard engine and Markov machine except for that the gates are extended so that three X or X * strands must bind for the reaction to occur. In principle, the mechanism could be generalised to an arbitrary number of inputs-although this may prove challenging in practice.
The nature of DNA strand displacement reactions means that additional auxiliary strands, labelled A 1 to A 16 , are required. We assume that these strands are always present in the reaction volume in excess.
Appendix H: Work calculation for full batch machine

Measurement
The measurement is exactly the same as for the binary batch machine so the work is (H1)

Extraction
In this section we will not give an explicit chemical scheme to extract all of the work from an unordered batch of input molecules. We will simply calculate the available work. In equilibrium the number of X * molecules in the batch, n(X * ), is described by a random variable B eq , which is distributed as p B eq = n(X * ) = 1 2 N N ! n(X * )!(N − n(X * ))! . (H2) If we define the free energy of each state of the unordered batch as F(n(X * )) = −k B T ln p B eq = n(X * )), then the equilibrium free energy is zero. Initially the number of X * molecules in the batch is described by a random variable B initial The free energy of the batch is initially p in (n(X * )F(n(X * )) p in (n(X * ) ln p in (n(X * ), (H4) where p in (n(X * ) is the initial distribution over the number of X * in the batch and B initial is the random variable that describes the initial state of the batch. Therefore, using equation 4, the free energy of the joint system of the batch and the memory molecule is