Thermodynamics of deterministic finite automata operating locally and periodically

Real-world computers have operational constraints that cause nonzero entropy production (EP). In particular, almost all real-world computers are ‘periodic’, iteratively undergoing the same physical process; and ‘local’, in that subsystems evolve whilst physically decoupled from the rest of the computer. These constraints are so universal because decomposing a complex computation into small, iterative calculations is what makes computers so powerful. We first derive the nonzero EP caused by the locality and periodicity constraints for deterministic finite automata (DFA), a foundational system of computer science theory. We then relate this minimal EP to the computational characteristics of the DFA. We thus divide the languages recognised by DFA into two classes: those that can be recognised with zero EP, and those that necessarily have non-zero EP. We also demonstrate the thermodynamic advantages of implementing a DFA with a physical process that is agnostic about the inputs that it processes.

These analyses use minimal physical descriptions of the computations performed by the abstract constructs of computer science theory [20,21].Some recent work has instead probed the thermodynamics of certain types of hardware, such as CMOS-based electronic circuits [22,23].However, there exist practical constraints on physical computation that are not specified by the overall computation performed, but which are nonetheless relevant beyond a particular type of hardware.The thermodynamic costs of these constraints are not resolved by either of the approaches above, although their consequences can be significant [24].Accordingly, we ask: which kinds of thermodynamic costs necessarily arise when implementing a computation using a physical system solely due to constraints that seem to be shared by all real-world physical systems that implement digital computation?To begin to investigate this issue, here we consider the minimal entropy production (EP) that arises due to two ubiquitous constraints on real-world digital computers.First, the vast majority of modern physical computers are periodic: they implement the same physical process at each iteration (or clock cycle) of the computation.Second, all modern physical systems that perform digital computation are "local", i.e., not all physical variables that are statistically coupled are also physically coupled when the system's state updates.Ultimately, the reason that this constraint is imposed in both abstract models of computation and real world computers is that it allows us to break down complex computations into simple, iterative logical steps.
In this work we explore how and when operating under these constraints imposes lower bounds on the EP of a computation modeled as a CTMC, regardless of any other details about how the computation is performed (equivalent results apply even in a quantum setting [9]).Taken together, the constraints impose necessary EP through mismatch cost [8][9][10][11] of two types: "modularity" cost [6,8,12,13], and what we call "marginal" mismatch cost.Both types of mismatch cost have been identified in the literature as possibly causing EP in any given physical process; here we argue that they are in fact inescapable in complex computations.In particular, we demonstrate their effects for one of the simplest nontrivial types of computer, deterministic finite automata (DFA).
DFA have important applications in the design of modern compilers, as well as text searching and editing tools [25].They are also foundational in computer science theory, at the foot of the Chomsky hierarchy [26,27], below push-down automata [21] and Turing machines [20,28,29].These properties makes DFA particularly well-suited for an initial study of the consequences of locality and periodicity in computational systems.We thus take the first step towards investigating the thermodynamic consequences of locality and periodicity in all the computational machines of computer science theory.
We next introduce our modelling approach and key definitions.We subsequently outline the general consequences of locality and periodicity for arbitrary computations, in the form of a strengthened second law.Having discussed these strengthened second laws, we then derive specific expressions for constraint-driven EP in DFA, and explore how DFA could be designed to minimize the expected and worst-case costs that result.Next, we analyse how this EP relates to the underlying computation performed; surprisingly, the most compact DFA for a given language is generally neither especially thermodynamically efficient nor inefficient.Finally, we consider regular languages, i.e., the sets of strings such that every string in the set can be recognized by some DFA.We show that such languages can be divided into a class that is thermodynamically costly for a DFA to recognise, and a class that is inherently low-cost.

A. Deterministic Finite Automata
A DFA [6,26,27] is a 5-tuple (R, Λ, r ∅ , r A , ρ) where: R is a finite set of (computational) states; Λ is a finite alphabet of input symbols; ρ is a deterministic update function specifying how the current DFA state is updated to a new one based on the next input symbol, i.e., ρ : R × Λ → R; r ∅ ∈ R is a unique initial state; and r A ⊂ R is a set of accepting states.An example is shown in Fig. 1.The set of all finite input strings is indicated as Λ * .
The DFA starts in state r ∅ and an input string λ ∈ Λ * is selected.The selected input string's first symbol, λ 1 , is then used to change the DFA's state to ρ(λ 1 , r ∅ ).The computation proceeds iteratively, with each successive component of the vector λ used as input to ρ alongside the then-current DFA state to produce the next state.We write λ −i for the entire vector λ except for the i'th component.
We write the DFA's computational state just before iteration i as r i−1 , and we use r i for the state after the update.The update in iteration i is then the map We refer to this map as the local dynamics, and define the set of local states as FIG. 1: Example DFA with states R = {0, 1, 2, 3}, alphabet Λ = {a, b}, initial state r ∅ = 0 and accepting set r A = {0, 1, 2}.The update function ρ is illustrated in (a); the current computational state and the current input symbol specify the next computational state.This DFA accepts input strings that do not contain three or more consecutive bs.(b) shows the evolution of the local state through three iterations; the input string is read from left to right.
with elements z ∈ Z. z 0 i is the local state just before update i: z 0 i = (λ i , r i−1 ), and z f i = (λ i , r i ) is the local state after update i.Note that z f i = z 0 i+1 in general, since z 0 i+1 involves λ i+1 , not λ i .The local update function fixes the full update function of the entire state space, since λ −i is unchanged during an update.
A DFA accepts λ if its state is contained in r A after processing the final symbol.The language accepted by a DFA is the set of all input strings it accepts.Many DFA accept the same language L; the minimal DFA for L has the smallest set of computational states R for all DFA that accept L [26,27].
Fig. 1 (a) shows a DFA with four computational states that processes words built from a two-symbol alphabet {a, b}.This DFA accepts all strings without three or more consecutive bs.Three iterations of this DFA when fed with an input (a, b, b) are shown in Fig. 1 (b).
DFA can be divided into those with an invertible local map ρ, and those with a non-invertible ρ.The map ρ defines islands in the local state space: an island of ρ is a set of all inputs to ρ that map to the same output (i.e., it the pre-image of an output of ρ).If the local dynamics defined by ρ is invertible, all local states are islands of size 1; otherwise Z is partitioned by ρ −1 into non-intersecting islands, some of which contain multiple elements.We write c i for the island that contains z 0 i .The DFA in Fig. 1 is non-invertible, since z f i = (a, 0) could have arisen from either z 0 i = (a, 0), (a, 1) or (a, 2), which comprise an island.

B. Thermodynamic description of DFA
Details of our thermodynamic modelling of DFA are given in the Methods.In short, we assume that the logical states of the device are instantiated as well-defined, discrete physical states.At each iteration, a control pro-tocol µ(t) is applied that drives a deterministic update of the DFA's state according to the logical rules of the computation.
Although the overall update is deterministic, we assume that the input word is sampled from a distribution p(λ), representing the possible computations that the DFA may be required to perform.We use λ to represent the random variable corresponding to the input word.The randomness of λ means that the computational state after update i; the local state before and after update i; and the island occupied during iteration i are also random variables.To represent these random variables we use r i , z 0 i , z f i and c i , respectively.As outlined in the Methods, when a time-dependent control protocol µ(t) is applied to a thermodynamic system X with a finite set of states X = {x 1 , x 2 , ...}, the mismatch cost [6,8,9] is a lower bound on EP.Here, the time-dependent protocol µ(t) drives an evolution from p(x) to p ′ (x ′ ) = x P (x ′ |x)p(x), or p ′ = P p.The distribution q µ is known as the prior distribution [6,15,30], and is specific to the applied protocol µ(t).D(p || q µ ) is the Kullback-Leibler (KL) divergence between p and q µ ; the mismatch cost is then the drop in KL divergence due to the matrix P .σ µ is zero if p = q µ , and non-negative by the data processing inequality.Intuitively, the mismatch cost is the contribution to the EP of the misalignment between the actual input distribution p(x) and an optimal distribution q µ (x) specified by the physical process µ(t).If the input distribution is well-matched to the protocol applied, p(x) = q µ (x), EP is minimised.
In the Methods, we outline how the EP of two coevolving subsystems X a and X b that are not physically coupled during the period of evolution can be split into EP for the two subsystems in isolation, and a term related to the change in mutual information between the two.In the special case where X a evolves during the time period in question and X b = X −a is static, the dynamics of X a under µ(t) = µ a (t) is said to be solitary [6,8,13].In this case, the mismatch cost is [6,8,12,13] Here, p a (x a ) is the initial marginal distribution for subsystem a, and ∆I is the change in the mutual information between X a and X −a over the period in question.The first term in Eq. ( 4) is the non-negative mismatch cost generated by X a running in isolation, having marginalised over the other degrees of freedom.We call this the marginal mismatch cost, σ mar .Like any other mismatch cost, it is non-negative.The second term is the reduction in mutual information between X a and X −a [7,13], which we call the modularity mismatch cost, σ mod , after Ref. [12].By the data processing inequality [31], σ mod ≥ 0. Intuitively, this term reflects the fact that information about the statistical coupling between X a and X −a is a store of non-equilibrium free energy, and that information is reduced in a solitary process.
To analyse the minimal thermodynamic costs of operating DFA under local and periodic constraints, we consider the effect of these constraints on the overall mismatch cost at each iteration.As discussed in the Methods, any additional entropy production can, in principle, be taken to zero.

Locality
In principle, one could build a DFA that physically couples the entire input word, λ, to the local subsystem z i during update i.However, this coupling is not required by the computational logic, which is local to z i .Moreover, it would be extremely challenging to implement in practice; modern computers do not physically couple bits that do not need to be coupled by the logical operation in question.Accordingly, we assume that the evolution of the local state z i is solitary.As a result, the global mismatch cost splits into two non-negative components: a marginal mismatch cost associated with the evolution of the local state in isolation; and a modularity mismatch cost associated with non-conserved information between the local state and the rest of the system.

Periodicity
The marginal mismatch cost for iteration i will depend on the similarity of p(z 0 i ), the initial distribution over local states, and q µi (z i 0 ), the prior distribution for the protocol µ i (t) implemented at iteration i.Typically, p(z 0 i ) will vary with i.In theory, one could design µ i (t) to match these variations, ensuring q µi (z i 0 ) = p(z 0 i ) at each update, eliminating σ mar .However, designing such a protocol would require knowledge of p(z 0 i ) -which in turn would require running a computation emulating the DFA before running the DFA, gaining nothing.Moreover, one of the major strengths of computing paradigms such as DFA, Turing machines and real world digital computers is that their logical updates are not iteration-dependent.It is therefore natural to impose a second constraint: the protocol µ i (t), like the logical update ρ, is identical at each update i (µ i (t) = µ(t)).Formally, we define a local, periodic DFA (LPDFA) as any process that implements a DFA via a repetitive, solitary process on the local state z i .

D. General consequences of local and periodic constraints
We briefly consider the consequences of locality and periodicity in general, before re-focussing on DFA.The mismatch and modularity costs introduced in Section II B are well established.However, systems that perform nontrivial computations by iterating simpler logical steps on subsystems are exposed to these costs in a way that simpler operations, like erasing a bit, are not.The need to operate iteratively on an input that is evolving from iteration to iteration makes the mismatch cost unavoidable.Additionally, modularity-cost-inducing statistical correlations result from the need to carry information between iterations, which will not be required in simpler systems.
Consider a physical realisation of an arbitrary computation that is local and periodic in a way that reflects the locality and periodicity of the computational logic.Then the marginal and modularity mismatch costs set a lower bound on EP, regardless of any further details about how the computation is implemented.Specifically, let X be the computational system and X i the local subsystem that is updated at iteration i.Then over the course of N iterations, the system will experience a total marginal mismatch cost where P is the update matrix, p(x i ) is the initial distribution of the local state and q µ (x i ) is the prior built in to the actual protocol µ(t).
Eq. 5 depends on the details of µ(t) beyond the locality and periodicity constraints.However, some choice of q µ (and hence µ(t)) will minimize σ mar , setting a lower bound on EP that is independent of these details.
Unless p(x i ) is identical for all i, or P is a simple permutation, it is not generally possible to choose a single q µ that will eliminate σ mar at every iteration i.In this case, Eq. 6 provides a strictly positive periodicity-induced lower bound on the EP that depends purely on the logic of the computation performed.
Similarly, the accumulated modularity cost follows directly as where ∆I(X i ; X −i ) is the change in mutual information between X i and X −i due to update i.As with Eq. 6, this contribution to EP is entirely determined by the computational paradigm used and the distribution of inputs; it is independent of the details of the implementation given the assumption of locality and periodicity.Taken together, the sum of σ mar and σ mod from Eq. 6 and Eq. 7 constitute a strengthened second law for periodic, local computations that depends only on the logic of the computation, not the details of its implementation.These implementation-independent lower bounds, alongside the qualitative observation that computing systems are particularly vulnerable to modularity and mismatch costs, is the first main result of this work.These results apply to any computational system implemented using a periodic, local process.For the rest of the paper, we will focus on DFA.Doing so allows us to illustrate the consequences of local and periodic restrictions in a concrete computational model.

E. Entropy production for LPDFA
Under our assumptions, the EP when applying a solitary dynamics µ(t) to an initial distribution p(z 0 i , λ −i ) at the update stage of iteration i of a DFA is where is the marginal mismatch cost of update i, and is the modularity mismatch cost of update i.A variant of the modularity cost in Eq. ( 10) was considered in isolation in Ref. [32], for the special case of DFA operating in steady state.Henceforth, for simplicity, we suppress the dependence of σ i on µ since µ is constant over all iterations.The KL divergences in Eq. 9, giving σ mar , can be simplified for LPDFA.Since each update in an LPDFA deterministically collapses all probability within an island to one state, p(z f i |c i ) = q(z f i |c i ).As shown in Section 2 I of the Supplementary Information, this simplification implies that which is the second main result of this work.σ i mar is therefore the divergence between initial and prior distributions, conditioned on the island of the initial state.
In Fig. 2, we explore the properties of σ i mar for the DFA shown in Fig. 1.The four sub-figures show σ i mar for four distinct distributions p(λ), and a fixed (uniform) prior q µ .We immediately see that σ i mar is strongly dependent on both the distribution of input words and the iteration, with σ i mar non-monotonic in i in all four cases.σ i mar is determined by a combination of how well tuned the prior is to the input distribution within a given island, and the probability of that island at each iteration.At the start of iteration 1, particularly for subfigure (b), there is a high probability of the system being in the island {(a, 0); (a, 1); (a, 2)} and the uniform prior is poorly 2: FIX LABELS EP in a simple system shows non-trivial dependence on iteration and input word distribution.We plot total EP σ i , and its decomposition into σ i mar and σ i mod , for the DFA in Fig. 1 (a), which accepts all words that do not contain three or more consecutive bs.In all cases we use a uniform prior q µ (z 0 i |c i ) within each island, and consider a distribution of input words with fixed length N = 15, but vary the distribution of input words p(λ).
aligned with the actual initial condition within this island (all in (a, 0)).At larger i, this cost drops both because the probability of being in that island drops, and the conditional distribution within the island gets more uniform.
For iterations i ≥ 3, the system has a non-zero probability of being in the other non-trivial island {(b, 2); (b, 3)}.The uniform prior is initially poorly matched to the conditional distribution within this island (at the start of iteration i = 3, the system cannot be in (b, 3)).Additionally, the probability of the system being in this island is quite low for subfigure (a) and (b), but much higher for (c) and (d) -explaining the jumps in those traces.
The third main result of this work is a simple expression for the modularity mismatch cost for DFA.As we show in Section 3 of the Supplementary Information, Surprisingly, σ i mod , a global quantity, is given by the entropy of the local state at the beginning of the update, conditioned on the island occupied at the start of iteration i.This result holds regardless of the distribution of input strings or the DFA's complexity.
To understand Eq. 12 intuitively, we note that z 0 i in general contains information about λ −i .After the update, any information provided by λ i alone is retained, since the input symbol is not updated by the DFA.Moreover, for islands of size 1, the combined values of λ i and r i are just as informative about λ −i as λ i and r i−1 were.However, for non-trivial islands, the extra information provided by r i−1 on top of λ i is lost, yielding Eq. 12.We see from our example system in Fig. 2 that modularity costs behave very differently from marginal mismatch costs.In general, σ i mod tends to zero as the probability of being absorbed into state 3 increases: in this case, there is no entropy of z 0 i .Modularity costs stay high for system (b), in which bbb substrings are infrequent.
Modularity costs are relatively low in Fig. 2 (d), in which symbols of the input word are correlated.Naïvely, one might have assumed that a larger I(z 0 i , λ −i ) generated by a correlated input word would be more susceptible to large modularity costs.We explore this question in more detail in Fig. 3 for both the DFA illustrated in Fig. 1 (a) and a second DFA that accepts words that are concatenations of bb and baa substrings (Fig. 3 (a)).
In Fig. 3 (b) we plot the total modularity cost, N i=1 σ i mod , for both DFA processing a Markovian input, as a function of the degree of correlation, P (λ i+1 = λ i ).We see that in both cases, the uncorrelated input words with P (λ i+1 = λ i ) = 0.5 have relatively high (though not maximal) modularity cost, and fully correlated strings have σ mod = 0.
To understand why, consider Fig. 3 (c), in which we plot the information between the local state and the rest of the input word before (I 0 = I(z 0 i ; λ −i )) and after (I f = I(z f i ; λ −i )) the update of iteration i, for the original DFA in Fig. 1 (a).We consider uncorrelated input words (P (λ i+1 = λ i ) = 0.5) and moderately correlated input words (P (λ i+1 = λ i ) = 0.8).At early iterations, I 0 is larger for the correlated input, as would be expected (at later times, the DFA with correlated input is more likely to be absorbed into state 3, reducing I 0 ).More importantly, the system with correlated inputs retains more of its information in the final state.Because λ −i is correlated with the the current symbol λ i , it is a better predictor of the final state of the update.In the limit of P (λ i+1 = λ i ) = 1 or 0, there is no modularity cost as z f i is perfectly predictable from λ −i .Combining Eqs.(11) and (12) gives where is the cross entropy between q µ (z 0 i |c i ) and p(z 0 i |c i ).This total entropy production is also shown for the example DFA of Fig. 1 (a) in Fig. 2. Modularity cost is plotted as a function of the probability that subsequent symbols in the word have the same value.(c) Mutual information between the local state and the rest of the input word before (I 0 ) and after (I f ) the update of iteration i, for the DFA in Fig. 1 (a).Data is plotted for P (λ i+1 = λ i ) = 0.8 (correlated) and P (λ i+1 = λ i ) = 0.5 (independent).Results are plotted for different values of q µ ((a, 0)|c ⋆ ), where c ⋆ = {(a, 0), (a, 1), (a, 2)} is the island containing (a, 0).q µ (z 0 i |c i ) is otherwise unbiased, and q µ ((a, 0)|c ⋆ ) = 1/3 corresponds to a totally unbiased prior.(c) Equivalent to (a), but for input p(a) = 0.2, p(b) = 0.8, and applying a bias to q µ ((b, 3)|c ⋆⋆ ), where c ⋆⋆ = {(b, 2), (b, 3)} is the other non-trivial island for this DFA.q µ (z 0 i |c i ) is otherwise unbiased, and q µ ((b, 3)|c ⋆⋆ ) = 1/2 corresponds to a totally unbiased prior.
F. Reducing the marginal mismatch cost through choice of priors

Applying a bias to the prior
It is natural to ask how q µ (z 0 i |c i ) might be chosen to minimize EP for a given p(λ) and a given DFA.One might hope that q µ (z 0 i |c i ) could be tuned to p(λ) alone, without any reference to the operation of the DFA.Unfortunately, however, such an approach will fail.The states within each island all have the same value of λ, because the update map ρ(λ i , r i−1 ) = (λ i , r i−1 ) → (λ i , r i ) does not update the input symbol.Applying a prior that is a function of λ alone results in a uniform q µ (z 0 i |c i ).Reducing the mismatch cost through choice of prior thus requires some understanding of the computational state, not just the inputs.For example, for the DFA in Fig. 1 (a), the computation starts in the state r ∅ = 0. Biasing q µ (z 0 i |c i ) towards states with r = 0, as we show in Fig. 4 (a), can reduce the marginal mismatch cost of the first step.If the bias is too strong, then increased costs at later iterations overwhelm the initial reduction.It is possible, however, to reduce the total EP with a moderate bias of q µ (z 0 i |c i ) towards states with r = 0.Alternatively, one could bias q µ (z 0 i |c i ) towards states with r = 3, since most trajectories will eventually be absorbed.As shown in Fig. 4 (b), doing so incurs an extra cost at short times, particularly at iteration i = 3.At the start of the third iteration, the DFA is moderately likely to be in computational state r = 2, but cannot be in computational state r = 3, so the biased prior is a poor match for p(z 0 i |c i ).At later iterations, however, the biased prior performs better.Again, a moderate bias performs best overall.

Advantages of a uniform prior
Section II F 1 shows that it is possible to reduce EP by applying biased priors.However, we also saw that very biased priors could lead to very high EP.As noted in Ref. 33, in which a similar result to Eq. 11 was derived in the absence of distinct islands, σ i mar penalizes an overconfident prior q µ (z 0 i |c i ).If q µ (z 0 i |c i ) = 0 for a given state but p(z 0 i |c i ) = 0, Eq. 13 implies σ i mar → ∞.The authors of Ref. 33 hypothesised, therefore, that a uniform q µ (z 0 i |c i ) may be optimal.As a fourth main result of this work, we present three important properties of a q µ (z 0 i |c i ) that is uniform for each c i , i.e., a prior q µ (z 0 i |c i ) = 1/L ci , with L c the size of island c.First, for such a prior, Eq. 13 becomes Here, L cmax is the size of the largest island of ρ.Eq. 15 gives a finite upper bound to EP for LPDFA employing a uniform prior distribution q µ (z 0 i |c i ) = 1/L ci , constrained by the size of the largest island.
Second, for any protocol, the worst case EP is at least ln L cmax .A uniform prior distribution q µ (z 0 i |c i ) = 1/L ci therefore minimizes the worst case EP.To verify this claim, consider the input distribution p(z i 0 ) = δ z i 0 ,zmin , where z min is a state that minimizes q µ (z 0 i |c i ) within the largest island.For such a distribution, Eq. 15 reduces to where the final inequality follows from q µ (z min |c max ) ≤ 1/L cmax .Finally, the uniform prior distribution q µ (z 0 i |c i ) = 1/L ci minimizes predicted average EP if a designer is maximally uncertain about p(z 0 i , λ i ).A designer may not know that p i (z 0 i |c i ) is the input distribution at iteration i -either because p(λ), or the DFA's dynamics on p(λ), are unknown.Thus the choice of protocol µ(t), and hence q µ (z 0 i |c i ), is performed under uncertainty over not just the input state, but also the distribution from which that state is drawn.
Let the designer's belief about the distributions p(c i ) and p(z 0 i |c i ) be represented by a distribution π(v, w) over an (arbitrary) discrete set of possible distributions indexed by v, w: p v (c i ), p w (z 0 i |c i ).The designer's best estimate of the expected EP at iteration i is then (see Section 4 of the Supplementary Information) Here, H(z 0 i |c i , v, w) and I(z 0 i ; w|c i , v) are defined with respect to the estimated joint distribution, p(v, w, z 0 is the designer's estimate for the probability distribution within an island, having averaged over the uncertainty quantified by π(w|v).
All three terms in Eq. 17 are non-negative.The first is σ i mod averaged over v and w.The third is the marginal mismatch cost between p(z 0 i |c i , v) and q µ (z 0 i |c i ).However, even if q µ (z 0 i |c i ) matches the average estimated distribution within an island, p(z 0 i |c i , v) = q µ (z 0 i |c i ), the best estimate of σi mar is non-zero.The second term, I(z 0 i ; w|c i , v), quantifies how much uncertainty in w is actually manifest in an uncertainty in the input distribution; variability about p(z 0 i |c i , v) gives positive expected EP.An equivalent term was previously identified in Ref. [34] for arbitrary processes with a single island.

H(z 0
i |c i , v, w) and I(z 0 i ; w|c i , v) are protocolindependent and cannot be changed for a given computation.D(p(z 0 i |c i , v) || q µ (z 0 i |c i )), however, can be minimized by by choosing q µ (z 0 i |c i ) = p(z 0 i |c i , v).Given maximal uncertainty, the designer's best estimate will be uniform: p(z 0 i |c i , v) = 1/L ci .In this case, a uniform q µ (z 0 i |c i ) = 1/L ci minimizes estimated average EP.
The results hitherto apply to LPDFA, but do not reflect the actual computation performed.The results for σ i mar -the optimality of a uniform protocolapply to any deterministic process; the LPDFA's restrictions simply justify why q µi (z 0 i |c i ) cannot be tuned to p(z 0 i |c i ) at each i.The results for σ i mod are more specific, relying on a solitary process using a single symbol λ i from an unchanging "input string", and a device whose state after the update is unambiguously specified by λ i≤j .Nonetheless, σ i mod in Eq. 12 is not directly related to the computational task.We now explore how EP is related to ρ, and the language accepted by the DFA.G. Relating EP to computational tasks The EP in Eq. 13 is positive iff q µ (z 0 i |c i ) = 1 for any z 0 i , c i for which p(z 0 i |c i ) = 0 and p(c i ) = 0.This condition is met whenever an island c i with p(c i ) > 0 has at least two elements z 0 with p(z 0 i |c i ) > 0. There are two ways to avoid this EP.One is if all islands have a single element, i.e., the local update function ρ is invertible (this observation was made for σ mod alone in Ref. [32]).The second is if the distribution of input strings p(λ) is such that for every island c i with at least two elements, all but one of those elements always have p(z 0 i |c i ) = 0.However, in that case, q µ (z 0 i |c i ) must be finely-tuned to match this condition when the physical system implementing the computation is constructed.As discussed in Sections II F 1 and II F 2, this strategy risks high costs for overconfidence.
We now focus on the former way of achieving zero EP, asking what determines whether ρ is invertible.Since ρ preserves the input symbol λ i , it can only be noninvertible if it maps two distinct computational states to the same output for the same symbol λ i .If we illustrate ρ by a series of directed graphs, one for each value of λ i , then a non-invertible DFA will have at least one state with at least two incoming transitions for at least one value of λ i .We label states with more than one incoming transition for a given λ i as conflict states; conflict states for the DFA in Fig. 1 (a) are shown in Fig. 5.

The minimal DFA for a given language does not generally minimize or maximize EP
The minimal DFA for a language L has the smallest set of computational states R for all DFA that accept L. This minimal DFA has just enough memory to sort parsed substrings into classes of equivalent strings, so that information can be passed forward to complete the computation [26,27,35].More formally, define input strings λ and µ to be equivalent with respect to language L iff λν ∈ L ⇐⇒ µν ∈ L for any string of input symbols ν, where λν is a concatenation of ν after λ.The Myhill-Nerode theorem states that the number of states of the minimal DFA for L is the number of equivalence classes of this equivalence relation [26,27,35].Perhaps surprisingly, minimal LPDFA do not in general either maximise or minimise EP.This claim is our fifth main result; to illustrate it, first consider the two DFA in Fig. 6, which both have Λ = {a, b} and accept input strings with an even number of bs.Fig. 6 (a) is the minimal DFA for this language.It is invertible, and so has zero EP.The larger DFA in Fig. 6(b) is noninvertible, and so σ i (p(z 0 i , λ −i )) > 0 in general.For example, EP is positive if the sequences (Λ i−2, Λ i−1 , Λ i ) = (a or b, b, a) and (Λ i−2, Λ i−1 , Λ i ) = (b, a, a) both have nonzero probability.The minimal LPDFA never has higher EP than larger DFA, and often has lower EP.Now consider the two DFA in Fig. 7.Both accept any input string constructed from Λ = {a, b} with no b symbols, and Fig. 7(a) is the minimal DFA for this language.Neither DFA is invertible, so EP is generally non-zero for both.However, the non-minimal LPDFA in Fig. 7 (b) delays entropy production by a single iteration relative to Fig. 7 (a).As outlined in Section 5 of the Supplementary Information, this delay ensures that the overall EP for the larger LPDFA is always less than or equal to the EP for the minimal LPDFA.

H. Languages are divided into costly and low-cost
classes by the structure of their minimal DFA The DFA in Fig. 7 (b) can be extended, delaying nonzero EP.However, a finite number of additional states cannot prevent EP for arbitrary length inputs, and DFA are necessarily finite.Indeed, the sixth main result of our work, proven in detail in Section 6 of the Supplementary Information, is that if a minimal DFA is noninvertible, any DFA that accepts the same language must also be non-invertible.One cannot eliminate conflict states without disrupting the sorting of strings into equivalence classes.Thus if the minimal DFA for a regular language L is non-invertible, recognising that language is inherently costly.Conversely, if the minimal DFA that accepts L is invertible, recognising that language is lowcost.
As an example, consider a DFA that takes inputs of integers in base n, and accepts the integer y if y is divisible by m.As we show in Section 7 of the Supplementary Information, the minimal DFA for such a computation is invertible iff n and m have no common factors.Therefore, it is inherently costly to decide whether a number is divisible by 9 if the number is expressed in base 3, but not if the number is expressed in base 2, showing that even conceptually similar computations can have very different thermodynamic consequences.

III. DISCUSSION
Breaking down complex computations into simple periodic updates, involving small parts of the computational system, is at the heart of both theoretical computer science and real-world computing devices.It is natural that physical systems designed to implement computations involve physical processes that are also local and periodic; that is how "synchronous, clocked" digital computers are designed.
However, physical systems that implement periodic, local computations are particularly vulnerable to stronger lower bounds on EP than the zero bound of the second law.Any physical operation -including computationscan, in principle, be performed in a thermodynamically reversible way, with a sufficiently well-designed protocol [36].The nature of non-trivial computations, however, means that such a protocol would need to reflect not just the distribution of possible inputs to the computer, but also how those inputs are processed, and the subtle statistical coupling that is generated as the computation proceeds.
We have illustrated how these challenges manifest as marginal and modularity mismatch costs in DFA with non-invertible local update maps.Interestingly, the overall computation performed by a DFA -mapping the input word and starting computational state to the same input word and a final computational state -is always invertible.The logical properties of the overall computation are therefore not helpful in understanding the necessary EP of a local, periodic device.
We have only a qualitative, system-specific understanding of why the curves in Fig. 2 and Fig. 4 have the forms they do.Additionally, although similar results will hold for quantum mechanical or finite heat-bath treat-ments of DFA's thermodynamics, additional subtleties will arise.More generally, DFA are just the simplest machine in the Chomsky hierarchy and it is unknown how marginal and modularity mismatch costs behave for other paradigms.The constraints of locality and periodicity will also apply to (physical systems implementing) other machines in the hierarchy, such as push-down automata, RAM machines, or Turing Machines.We would expect that variants of the results concerning σ mod and σ mar presented here also apply to those systems.However, there will also be important differences.For example, the overwriting of input and/or memory that occurs in machines more powerful than DFA will affect σ mod in ways not considered in this paper.Moreover, Turing machines and push-down automata have access to an infinite memory.DFA, by definition, do not -indeed, it is this restriction that divides regular languages into lowand high-cost.
Finally, it is interesting to consider how the consequences of locality and periodicity relate to other resource costs.Recent work on transducers -a computational machine that generates an output corresponding to a hidden Markov model -has shown that a quantum advantage exists over a classical implementation if and only if the machine is not locally invertible [37]; it is unclear whether a similar result holds for DFA.The role of the input distribution in determining the thermodynamic costs in our work is also reminiscent of the way computational complexity depends on the distribution over inputs.
Consider a system X with a finite set of states X = {x 1 , x 2 , ...}.There is a distribution p(x) over X at some initial time, and that distribution evolves according to a (potentially time-dependent) Markov process µ(t).We assume that the system is attached to a single heat bath during this process, choosing units so that the bath's temperature equals 1/k B .We also assume that µ(t) obeys local detailed balance with respect to that bath and the system's (potentially time-evolving) Hamiltonian [4].Although we won't need to specify whether the Markov process is discrete-time or continuous-time, to fix the reader's intuition (and accord with real-world digital computers) we can assume that it is continuous-time.
Suppose that the process runs for some pre-fixed time.The distribution over X at the end of that time is a linear function of the initial distribution, which we write as p ′ (x ′ ) = x P (x ′ |x)p(x), or just p ′ = P p for short, where P is implicitly fixed by the stochastic process µ(t).A given P will partition X into islands.Two states x and x ′ are within the same island if and only if P (x ′′ |x) = 0 and P (x ′′ |x ′ ) = 0 for any state x ′′ .
Let q c µ (x) be the initial probability distribution that minimizes the entropy production under µ(t) for distributions with support restricted to the island c.This optimal distribution will be unique within each island.
No matter what the actual initial distribution p is, and regardless of the specific details of the process µ(t) that implements P , so long as each q c µ has full support within island c, the EP when the process is run with the initial distribution p will be [8,9,11] Here, the index c runs over the islands of the process, p(c) = x∈c p(x) and q µ (x) = c q(c)q c µ (x) is called the prior distribution [6,15].
Note that the distribution over islands, q µ (c), is arbitrary.Any distribution q µ (x) that is a sum over the set of optimal distributions {q c µ (x)} could be used with the same results.In practice, the existence of many possible q µ does not affect our analysis; we shall simply use a convenient q µ with q µ (c) = 0 for all c.
The first two terms in Eq. 18 are the mismatch cost [6,8,9] of the process.The final term in Eq. 18 is the residual entropy production.Unlike the statistical mismatch cost, the residual EP depends on the physical details of the process implementing µ(t).Each term in the sum is non-negative, but can be reduced to zero using a quasi-static process [6,8,9].

Marginal and modularity mismatch costs
Let X a and X b be two co-evolving systems that are physically separated from one another during a time period [0, 1], though they may have been coupled in the past.Due to this separation, we may consider separate protocols µ a (t) and µ b (t).Moreover, the prior for the overall process must be a product distribution, q µ (x) = q µa (x a )q µ b (x b ) Taking p(x a ) and p(x b ) as the marginal distributions of the initial joint distribution p(x a , x b ), the drop in KL divergence during [0, 1] is where P a , P b are the two matrices corresponding to the conditional distributions of ending states given initial states.This drop equals where H is the entropy, H(. || .) is cross-entropy, and ∆ means change from beginning to end of the evolution under P .Adding and subtracting marginal entropies, this form can be re-expressed as By the definition of the change of mutual information between X a and X b , ∆I, we obtain We may thus write for the EP during [0, 1], which simplifies to Eq. ( 4) if X b = X −a and X −a is static.
Eq. ( 4) may, at first glance, seem inconsistent with the general discussion in Ref. [8], which used a more general Bayes net formalism.In fact there is no inconsistency.In the language of Ref. [8], the variables in z i 0 are the "parents" of r i , resulting in the same marginal and modularity mismatch costs as derived here.

B. Physical model of DFA
In order to apply stochastic thermodynamics to the computational model of DFA, it is necessary to make assumptions about how the logic is instantiated in a physical system.We assume that all the possible logical states of the system, defined by the set R × Λ * × Z + (combining the possible computational states, input words and iteration steps) correspond to well-defined discrete physical states [4,38].For example, the DFA could be a molecular assembly processing a copolymer tape [14].Metastable configurations of the assembly would represent the computational state, the sequence of the copolymer the state of the input word, and the position of the polymer the iteration.We also assume that if it is necessary to implement ρ, the DFA has access to ancillary hidden stateswhich with probability 1 are unoccupied at the start and end of any update [36].
Computation will, in general, involve an externally applied control protocol that varies the physical conditions of the system over time; in the case of the molecular computer, we would use time-varying concentrations of molecular fuel [14].This protocol defines the dynamics µ(t) discussed in Section IV A. Although the dynamics will be stochastic, strictly speaking, we assume that µ(t) biases trajectories sufficiently to obtain effectively deterministic computation by the end of each update More formally, we are interested in the limits of stochastic protocols under which they approximate deterministic dynamics to arbitrary accuracy [33].We abuse notation, using µ(t) to refer to both the external protocol and the dynamics it induces over the system's states.
We take the input word λ to be a random variable sampled from a distribution p(λ).We use r i , z 0 i , z f i and c i to represent the random variables corresponding to the computational state of the DFA after update i, local state before and after update i; and the island occupied during iteration i, respectively.
We will consider a distribution p(λ) in which all words are the same finite length N .Within this setup, a distribution of input words with lengths less than or equal to N could be simulated by adding an extra null symbol that induces no computational transitions to the alphabet.Processing these null input symbols would have no thermodynamic cost under the assumptions considered here.For simplicity, we do not include these null symbols in our examples.
C. Thermodynamic costs of DFA

Different measures of cost
In this paper we focus on entropy production as the fundamental thermodynamic cost of running DFA.EP represents the lost ability to extract work from a system, and is a metric for the thermodynamic irreversibility.In certain contexts, the work required to perform a process, or the heat transferred to the environment therein, are also used to quantify the thermodynamic cost of a process.
The operation of a DFA does not increase the entropy of the computational degrees of freedom of the system, since the map from (r 0 = r ∅ , λ) to (r N , λ) is one-to-one if the full input word is taken into account.If the computational states all have the same energy and intrinsic entropy [4,38] as is typically assumed, the energy and entropy change of the system will thus be zero.Any EP is equal to the heat transferred to the environment, which must be exactly compensated by the work done on the system.All three measures of thermodynamic cost are therefore identical.

Costs considered in analysing the model
We do not consider further the residual EP, nor the costs of incrementing i (both can, in principle, be made arbitrarily small).We also neglect costs associated with actually generating µ(t) itself, as discussed in Ref. [14].Given these assumptions, whenever we use the term "(minimal) EP", we refer to the (minimal) EP due to the mismatch cost (and its decomposition into marginal and modularity mismatch cost).

Decomposition of EP generated at each iteration
In general, when applying the mismatch cost formula to a computation there are multiple choices for the times of the beginning and end of the underlying process.This choice matters, because the mismatch cost contribution to EP is not additive over time.for example, the drop in KL divergence for a two-timestep computation will generally differ from the sum of the drops in KL divergence for each of those timesteps.
One could consider a single mismatch cost evaluated over the entire computation.Under this choice, none of the details of how the conditional distribution P of the overall computation arises by iterating the conditional distributions of each step are resolved by the mismatch cost.All that matters is the drop in KL divergence between the initial distribution, when the computer is initialized, and the ending distribution, when the output of the computation is determined.This approach has been used to analyze Turing Machines [7,15] as well as DFA [39].
An alternative choice is to focus on the EP generated at each iteration of the DFA, with the total EP of the entire computation being a sum of those iteration-specific EPs.Doing so allows us to manifest restrictions on the applied protocol inherent to the iterative process in the mismatch cost, rather than burying them in the residual entropy production of the computation as a whole.Given that we focus on costs arising from the iterative nature of the computation, it is natural to focus on the EP at each iteration of the DFA.We first explicitly write the divergences as a sum over the islands and then a sum over states within islands: Since the update deterministically collapses all probability within an island to a kronecker delta, p(z f i |c i ) = q(z f i |c i ).Thus the final two terms in Eq. 1 cancel and we obtain 2. SIMPLIFICATION OF σ i mod FOR LPDFA.
To calculate the modularity mismatch cost in an LPDFA, it is helpful to separate λ j>i , the input string variables for j > i, from λ j<i , the variables for j < i. Making that separation then using the chain rule for mutual information, we obtain Next, if we express the local state variable z i in terms of the DFA's state variable and current input symbol's state variable, apply the chain rule again and cancel terms, we get Due to the deterministic and sequential operation of a DFA, both r i and r i−1 are unambiguously determined by the first i variables in the input string, λ j≤i .As a result, the two conditional information terms in the final line of Eq. 4 are both zero.Applying the chain rule for mutual information twice to the remaining terms and simplifying, we obtain Again using the fact that the first i variables in the input string, λ j≤i , unambiguously specify both r i and r i−1 , I(r i−1 ; λ j≤i ) = H(r i−1 ) and I(r i ; λ j≤i ) = H(r i ).Thus, using the definition of the conditional entropy and mutual information, where the last line follows by adding and subtracting H(c i ).Finally, since the deterministic collapse of all inputs to a single output within an island ensures H(z f i |c i ) = 0, we can further reduce the modularity mismatch cost to This result establishes the claim made in the main text.

ESTIMATING ENTROPY PRODUCTION FOR AN UNCERTAIN INPUT DISTRIBUTION.
The designer's best estimate for the entropy production is obtained by averaging Eq. 13 of the main text over π(v) and π(w|v): Expanding the cross entropy yields Here, H(z 0 i |c i , v, w) and I(z 0 i ; w|c i , v) are defined with respect to the joint distribution estimated by the designer, p(v, w, z 0 i , c i ) = π(v)π(w|v)p v (c i )p w (z 0 i |c i ), and p(z 0 i |c i , v) = w π(w|v)p w (z 0 i |c i ) is the designer's estimate for the probability distribution within an island, having averaged over π(w|v).We claim that, for any distribution of inputs and choice of iterated protocol for the LPDFA in Fig. 7 (a) of the main text, it is possible to choose a protocol for the LPDFA in Fig. 7. (b) that results in EP that is less than or equal to the EP of the LPDFA in Fig. 7 (a).To prove this claim, note that the EP at iteration i for the minimal DFA in Fig. 7 (a) is given by considering only the island defined by {(b, 0), (b, 1)}.Thus σ i (p(z 0 i , λ −i )) = p i (b, 0) ln q µ (b, 0|b, 0 or 1) + p i (b, 1) ln q µ (b, 1|b, 0 or 1).
For the larger LPDFA in in Fig. 7 (b), the EP at iteration i is entirely due to the two islands defined by {(b, 0), (b, 1)} and {(a, 0), (a, 1)}.Thus ) ln q ′ µ (b, 1|b, 0 or 1) + p ′ i (a, 0) ln q ′ µ (a, 0|a, 0 or 1) + p ′ i (a, 1) ln q ′ µ (a, 1|a, 0 or 1), (11) with primed quantities referring to the larger DFA for clarity.Given the well-defined starting state of the LPDFA, it is possible to say that none of these states are occupied at the first step: p ′ 1 (b, 0) = p ′ 1 (a, 0) = p ′ 1 (b, 1) = p ′ 1 (a, 1) = 0.Moreover, assuming the same distribution of input strings to both devices, the related structure of both devices implies p i (b, 0) = p ′ i+1 (b, 0) + p ′ i+1 (a, 0) and p i (b, 1) = p ′ i+1 (b, 1) + p ′ i+1 (a, 1).If we then chose protocols for the larger DFA so that q ′ µ (b, 0|b, 0 or 1) = q ′ µ (a, 0|a, 0 or 1) = q µ (b, 0|b, 0 or 1) and q ′ µ (b, 1|b, 0 or 1) = q ′ µ (a, 1|a, 0 or 1) = q µ (b, 1|b, 0 or 1), we obtain for i > 1, and σ i ′ (p ′ (z 0 i , ), λ −i ) = 0 for i = 1.As a result, for any finite number of iterations N , To prove the claim, recall that a non-invertible DFA has at least one "conflict state" to which multiple input computational states are mapped by the same input symbol under ρ (see Fig. 5 of the main text).We consider the network ρ λ , defined by the mapping between computational states for an input symbol λ corresponding to such a conflict state in a minimal DFA D L that accepts the language L. If D L has M computational states, there are exactly M directed edges in ρ λ .Thus the existence of a conflict state with more than one inward edge implies at least one state with zero inward edges.The existence of such a state r † in ρ λ implies that there are no transitions into the equivalence class represented by r † due to the symbol λ.
The states in any other DFA D ′ L that accepts L can be partitioned into non-overlapping sets, each of which corresponds to an equivalence class of L (one of the states of D L -see Refs.[26][27] of the main text).The transitions between these non-overlapping sets must exactly match the transitions defined by ρ in D L , otherwise D ′ L would fail to sort input strings into equivalence classes of L. Therefore, if r † has no inward edges in the network ρ λ defined by D L , none of the states in the set corresponding to the equivalence class represented by r † can have inward edges in the network ρ ′ λ defined by D ′ L .The existence of at least one state in the network ρ ′ λ with zero inward edges implies the existence of at least one conflict state with two or more inward edges in ρ ′ λ , since the total number of edges is equal to the total number of states.Therefore any D ′ L that accepts the same language as a non-invertible minimal DFA D L must exhibit conflict states, and must also be non-invertible.In the context of these DFA, it is helpful to refer to the alphabet using numerical indeces.We assume that the integer y is written on the tape in base n so that its most significant figure is λ 1 , its second most significant figure is λ 2 , etc.
Let y i be the integer represented by the first i entries in the input word.The DFA will be in the absorbing state r A after iteration i if and only if y i mod m = 0.Moreover, after the next iteration, the system will be in r A if and only if y i+1 mod m = (λ i+1 + n(y i mod m)) mod m = 0. ( The value of y i mod m is thus sufficient to express the specify the equivalence class of the word fragment y i , since it is the only information needed from the first i digits to determine whether the full word is divisible by m.We note, however, that knowledge of y i is not necessary to specify the equivalence class.In general, words with distinct values of y i mod m can belong to the same equivalence class.Nonetheless, the equivalence class corresponding to the absorbing state necessarily only contains word fragments with y i mod m = 0.

m and n have no common factors
Due to the arguments in Section 5 of the SUpplementary Information, it is sufficient to show that any DFA that accepts this language is invertible.We may therefore consider a DFA in which there are m states, each one corresponding to a single value of y i mod m.Let us assume, for the sake of contradiction, that such a DFA is non-invertible.For this to be true, we require that two distinct values of y i mod m, which would lead to different computational states after iteration i, result in the same value of y i+1 mod m for a given λ i+1 .Using the expression for y i+1 mod m in Eq. 16 where l, k are two distinct integers between 0 and m − 1.We will assume k > l without loss of generality.
Using the properties of modular arithmetic, we may rewrite Eq. 17 as (n(k − l)) mod m = 0.
Eq. 18 can only be zero if k − l is zero, which would violate the requirement that l = k, or if the union of the prime factors of n and k − l is a superset of the prime factors of m.Since k − l < m, the prime factors of l − k alone cannot be a superset of the prime factors of m.Therefore n and m must share at least one prime factor, violating the initial assumption and proving the claim by contradiction.

m and n have at least one common factor
We now prove that the minimal DFA that accepts words written in base n that are divisible by m is non-invertible if n and m have at least one common factor.To do so, it is sufficient to show that at least one non-zero value of y i mod m results in y i+1 mod m = 0 for λ i+1 = 0, since this operation corresponds to a non-accepting state being mapped to r A by λ i+1 = 0, and r A will also be mapped to r A by λ i+1 = 0.In other words, we require (0 + nk) mod m = 0 (19) for integer k, 0 < k < m.For any n, m that share a common factor g, there will always be a k = m/g for which Eq. 19 holds.Thus any DFA that accepts words written in base n that are divisible by m will be non-invertible if n and m have at least one common factor.
FIG.2: FIX LABELS EP in a simple system shows non-trivial dependence on iteration and input word distribution.We plot total EP σ i , and its decomposition into σ i mar and σ i mod , for the DFA in Fig.1(a), which accepts all words that do not contain three or more consecutive bs.In all cases we use a uniform prior q µ (z 0 i |c i ) within each island, and consider a distribution of input words with fixed length N = 15, but vary the distribution of input words p(λ).(a) input words have independent and identically distributed (IID) symbols with p(a) = p(b) = 0.5.(b) input words have IID symbols with p(a) = 0.8 and p(b) = 0.2.(c) input words have IID symbols with p(a) = 0.2 and p(b) = 0.8.(d) input words are Markov chains.The first symbol is a or b with equal probability, and subsequently P (λ i+1 = λ i ) = 0.8.
FIG.2: FIX LABELS EP in a simple system shows non-trivial dependence on iteration and input word distribution.We plot total EP σ i , and its decomposition into σ i mar and σ i mod , for the DFA in Fig.1(a), which accepts all words that do not contain three or more consecutive bs.In all cases we use a uniform prior q µ (z 0 i |c i ) within each island, and consider a distribution of input words with fixed length N = 15, but vary the distribution of input words p(λ).(a) input words have independent and identically distributed (IID) symbols with p(a) = p(b) = 0.5.(b) input words have IID symbols with p(a) = 0.8 and p(b) = 0.2.(c) input words have IID symbols with p(a) = 0.2 and p(b) = 0.8.(d) input words are Markov chains.The first symbol is a or b with equal probability, and subsequently P (λ i+1 = λ i ) = 0.8.
FIG. 3: Correlated input words do not generate high modularity costs.(a) A 4-state DFA that processes words formed from a two-symbol alphabet, accepting those formed by concatenating bb and baa substrings.(b) Total modularity cost N i=1 σ i mod for the DFA in (a) and the DFA in Fig. 1 (a), when processing words of length N = 15 that are generated using a Markov chain.Modularity cost is plotted as a function of the probability that subsequent symbols in the word have the same value.(c) Mutual information between the local state and the rest of the input word before (I 0 ) and after (I f ) the update of iteration i, for the DFA in Fig.1(a).Data is plotted for P (λ i+1 = λ i ) = 0.8 (correlated) and P (λ i+1 = λ i ) = 0.5 (independent).

FIG. 5 :
FIG. 5: Decomposition of the DFA in Fig. 1 (a) into networks of transitions for each input symbol, ρ λ .(a) Network for λ i = a, where the state r = 0 is a conflict state.(b) Network for λ i = b, where the state r = 3 is a conflict state.

FIG. 6 :FIG. 7 :
FIG.6: Two DFA that accept input strings with an even number of bs built from Λ = {a, b}.(a) The minimal DFA for this language; it is invertible.(b) A larger DFA that accepts the same language but is non-invertible; state 0 is a conflict state for ρ b and state 2 is a conflict state for ρ a .

4 .
MINIMAL DFA ARE NOT NECESSARILY MORE THERMODYNAMICALLY EFFICIENT THAN LARGER DFA.

6 .
THE INVERTIBILITY OF DFAS THAT ACCEPT WORDS IN BASE n THAT ARE DIVISIBLE BY m