Memory-efficient tracking of complex temporal and symbolic dynamics with quantum simulators

Tracking the behaviour of stochastic systems is a crucial task in the statistical sciences. It has recently been shown that quantum models can faithfully simulate such processes whilst retaining less information about the past behaviour of the system than the optimal classical models. We extend these results to general temporal and symbolic dynamics. Our systematic protocol for quantum model construction relies only on an elementary description of the dynamics of the process. This circumvents restrictions on corresponding classical construction protocols, and allows for a broader range of processes to be modelled efficiently. We illustrate our method with an example exhibiting an apparent unbounded memory advantage of the quantum model compared to its optimal classical counterpart.


I. INTRODUCTION
Continuous-time stochastic processes are omniprescent across the sciences.They are used to model a rich and diverse range of systems [1], such as speech recognition [2,3], financial timeseries [4], neuronal spike trains [5,6], gene recognition [7], Internet traffic [8], and geophysical processes [9].Given this broad applicability, our ability to study, simulate, and make predictions using such models is of great import.However, simulations of such models can become highly resource-intensive, in part due to their continuous nature.The information that must be tracked about the past of the process typically diverges with increased precision [6,10,11].
Computational mechanics [12,13] provides a toolset that may be employed in optimising the use of certain resources.Stemming from notions of structural complexity in stochastic processes [14][15][16], it prescribes a framework for obtaining minimal memory predictive models of a process.This has been applied to a panoply of discrete-time processes [6,[17][18][19][20][21][22][23][24], but only recently have similar studies been made for continuous-time processes, in restricted settings [11,25].
In parallel, the field of quantum computational mechanics has emerged [26][27][28][29][30][31][32][33][34][35][36][37][38].A central result arising from these works is that quantum models of stochastic processes can operate whilst tracking less information about the past than even the optimal classical models [26].This quantum advantage can be immense, and the gap between quantum and optimal classical memory requirements can grow unbounded [32][33][34].As with classical computational mechanics, the focus has largely been on discrete-time processes, and hitherto the quantum computational mechanics of continuous-time processes has been restricted to tracking only a limited set of temporal dynamics (renewal processes), where the times between consecutive emissions are all drawn from the same distribution [34].Nevertheless, it was found that quantum models of such processes can exhibit unbounded advantages, requiring only finite memory to predict processes that classically need infinite

II. FRAMEWORK
We consider continuous-time, discrete event stochastic processes.Such processes are characterised by a sequence (x n , t n ) detailing what is observed, and when.The emitted symbols x n ∈ A denote the events, while t n records the time elapsed between events n − 1 and n.Sequences occur with probabilities drawn from P (X n , T n ) [39] (upper-case denotes random variables, and lower-case their corresponding realisations).We use shorthand notation x n = (x n , t n ), and denote contiguous strings of observations by the concatenation x l:m = x l x l+1 . . .x m−1 .We restrict our attention to stationary processes, wherein P (X 0:L ) = P (X s:s+L )∀s, L ∈ Z.This framework accommodates emissions that take place either as instantaneous events separated by times t n , or as continuous emissions with dwell time t n .We focus primarily on the former, and later discuss how our results may be modified for the latter.Further, though our primary focus will be on continuous-time processes, many of the results can be applied to temporal discretisations of such processes, and we will provide an explicit construction for these coarse-grained analogues.
The observation sequence can be partitioned into past and future.We take 0 as the current emission step, such that x 0 is the next symbol to be emitted, and define t← − 0 (t− → 0 ) as the time since the last (until the next) emission, such that t 0 = t← − 0 + t− → 0 .We delineate the past as ← − x = x −∞:0 (∅, t← − 0 ) (∅ signifies the event x 0 is currently unknown), and the future as − → x = (x 0 , t− → 0 )x 1:∞ [25].
Such processes can be represented by edge-emitting hidden semi-Markov models (eeHSMM) [25].These are defined by a set of hidden modes {g}, an emission alphabet A, and a transition dynamic T x kj φ x kj (t).The transition dynamic describes the probability density that the system, upon transitioning to mode j, will subsequently reside in this mode for a time t, at which point it will transition to mode k while emitting symbol x.The φ x kj (t) are normalised, such that T x kj describes the total probability that the system transitions from j to k while emitting x without reference to the time.We can represent such models diagrammatically [Fig.1(a)].Semi-Markov [1] refers to the property that the transition dynamic depends only on the current mode and dwell time, such that the causal pair (g, t← − 0 ) gives the fullest possible description for predicting the future of the process that may be obtained from past observations.We desire models of a process that can faithfully reproduce future statistics given a particular past, and that are causal, i.e. contain no information about the future that is not obtainable from the past [37].Computational mechanics [12,13] provides a pathway for determining the optimal FIG.1: Diagrammatic representation of continuous-time processes.Continuous-time, discrete event stochastic processes may be drawn as edge-emitting hidden semi-Markov models, where (a) the system transitions between a set of hidden modes while emitting symbols x, with dynamics defined by T x kj φ x kj (t).(b) The temporal dynamics can be tracked by their unpacking from the modes into hidden causal states (thick line) that store relevant information about the mode and time since last emission.
(storing minimal information) classical models.The basis of computational mechanics are causal states S, equivalence classes agglomerating pasts with identical future predictions [16].Two pasts ← − x and ← − x belong to the same causal state (are causally equivalent ∼ e ) iff they have identical conditional future probabilities: It has been proven that using these causal states as the hidden states of a model provides the optimal classical predictive representation [12].For discrete-time processes, these form edgeemitting hidden Markov models, while for continuous-time processes one has a continuum of hidden states the system traverses along, jumping into a hidden 'start' state (mode of an eeHSMM) upon emission [Fig.1(b)].The optimal classical predictive models are called ε-machines, and are unifilar: given knowledge of a prior causal state and the observation sequence since, the present causal state is known with certainty [12].The information required by the ε-machine to track the process is known as the statistical complexity C µ .This is given by the Shannon entropy (in bits) of the steady-state distribution P (S) over causal states [12,16]: This quantity is generically larger than the information shared between the past and future of the process [12], indicating that even these optimal models must store redundant information about the past.Indeed, ε-machines are wasteful whenever there is stochasticity in the transition dynamic of the causal states.Though much of the focus of computational mechanics has been on discrete-time symbolic processes, recently analogous results have emerged for continuous-time processes tracking purely temporal [11], and both symbolic and temporal dynamics together [25].While the optimality proofs hold for such processes, systematic construction protocols for finding causal states are known only for a limited set of processes.The first such class are renewal processes.These describe purely temporal dynamics where all emission symbols are identical, and the times between each consecutive pair of emissions are independent and identically distributed according to a common 'waiting time' function [11].Recently [25], the causal architectures of more general processes with complex symbolic and temporal dynamics have been uncovered, albeit only when they satify certain (quite stringent) restrictions.Assuming the modes {g} are already expressed in the minimal unifilar representation, the process must satisfy: i) Unifilarity (i.e.synchronisability) of the modes with regards to the observed symbol sequence alone.That is, the mode the system transitions into on the next emission must depend only on the current mode and emitted symbol, and not the time the emission takes place; ii) Emission-time distributions depend only on the current mode That is, the time at which emissions occur are independent of the symbol emitted, given the mode; iii) Transition dynamics T x kj φ j (t) must be such that pairs (g 0 , t← − 0 ) are not only sufficient for future prediction, but are also minimal.That is, processes where different pairs (j, t← − 0 ) and (k, t ← − 0 ) can become causally equivalent are forbidden.This prohibits processes where two modes, conditioned on times since last emission, lead to identical future statistics.This also rules out processes where the conditional emission-time distribution of a mode becomes periodic after a given time since last emission.
These restrictions are illustrated in Fig. 2. They also apply to construction protocols for models tracking both temporal and symbolic dynamics together in discrete-time (to our knowledge, no prior work has explicitly covered this latter scenario).
Our quantum models will not be subject to these restrictions.Moreover, unlike these classical works, we shall not employ differential entropies for continuous-time processes, instead preserving the operational meaning of statistical complexity as the information the model must store about the past.A consequence of this is that whenever the temporal dynamic is not wholly memoryless (i.e. the φ x kj (t) are not all Poisson distributions) C µ will diverge, as the ε-machine is ultimately storing a continuous parameter [10,11,25,32,34].

III. QUANTUM MODELS FOR TRACKING COMPLEX TEMPORAL AND SYMBOLIC DYNAMICS
While causal states eliminate redundancy in storing information that distinguishes pasts with identical future statistics, they provide no savings when two pasts have similar, yet non-identical futures.This is because states must be either identical or fully distinguishable in classical information theory.In contrast, quantum information [40] can be encoded into states that are only partly distinguishable, and this may be used to reduce the past information that must be retained [26].
Specifically, causal states can be encoded into quantum states whose overlap increases with the overlap between their corresponding futures.Labelling the information tracked by such quantum models ('q-machines') as M q , we have that generically M q ≤ C µ , with equality only for processes with no stochasticity in the hidden states of the model [26] (that is, whenever the ε-machine stores redundant information, a q-machine can mitigate some of this redundancy).This reduced entropy bears operational advantages when considering storage or communication of the states of ensembles of simulators of a process (in some cases the advantage may also manifest in the single-shot case [37]; we do not explore this regime here).The use of M q rather than C q denotes that the q-machine may not be optimal, and hence M q is not necessarily the quantum statistical complexity [31] (but provides an upper bound for it).
We now provide a systematic protocol that constructs q-machines for simultaneously tracking complex temporal and symbolic dynamics in continuous-time processes.In particular, we show that in general the resultant models: (C1) Produce accurate future predictions given the past; (C2) Can be operated in a continuous manner; (C3) Can be synchronised from the past (are causal); (C4) Automatically satisfy causal equivalence relations; (C5) Store less information than any classical model.
As with the classical case, we assume the minimal unifilar modes {g} of the process' eeHSMM have been determined.This can be achieved using techniques adapted from discrete-time computational mechanics [16,41], taking the dual x n as effective emitted symbols.Given these minimal modes, a sufficient, causal set of parameters is the pair (g 0 , t← − 0 ).A model based on these parameters then need only to identify (with correct probabilities) whether emission occurs in the next infinitesimal interval dt, what the emission and subsequent mode are, and to update to be in the corresponding state.The causal states correspond to groupings of such pairs with identical future predictions.Specifically, We introduce several quantities to characterise the processes.First, we define as shorthand for the transition dynamic Next, the modal steady-state distribution π j is defined as the (unique [42]) eigenvector of x T x kj with unit eigenvalue, normalised such that j π j = 1.These π j are the steady-state probabilities that the system is in mode j immediately after an emission.We further define the mode survival probability (the probability that the dwell time in mode j is at least t): We also define the mode lifetime (the average dwell time for mode j): and average emission lifetime τ = j π j τ j .Finally, the modal and mean firing rates are given by reciprocals of the respective lifetimes: µ j = 1/τ j and µ = 1/τ .With these definitions, we can express From these, we define the associated quantum memory states (QMS) for each pair (g 0 , t← − 0 ) = (j, t): where the function inside the integral is the square root of the conditional probability for the future [Eq.(7)].The QMS belong to a tripartite composite Hilbert space.The first of these is a continuous space encoding the statistics of the remaining dwell time, while the others are discrete, and correspond to tracking the emitted symbol statistics and subsequent mode respectively.Measurement of the first two spaces in the |t |x basis yields outcomes t and x with probability density P (T− → 0 = t , X 0 = x|G 0 = j, T← − 0 = t), and leaves the final subspace in |k , flagging the mode k to which the system transitions after such an emission.Mapping state |k (with appropriate blank ancillae) to |ς k (0) sets the model in the appropriate post-emission QMS.A measurement sweep of the time subspace over the range [0, δt) yields a non-emission result with the correct probability, and for such a non-emission, with a relabelling t → t − δt, will produce the QMS |ς j (t + δt) .Thus, such measurement sweeps can be used to emulate the passage of time in such models.By performing these measurement sweeps and mappings of flag states for the subsequent post-emission modes, the QMS can model the corresponding stochastic process, fulfilling (C1) and (C2).
As QMS are clearly well-defined for each pair (g 0 , t← − 0 ), the q-machine is causal, satisfying (C3).Further, as the QMS depend only on the conditional probabilities that define causal equivalence, they satisfy (C4): Thus, QMS corresponding to pasts in the same causal state are identical: they automatically adopt the causal architecture of the process without the need to explicitly apply the causal equivalence relation.Further, QMS corresponding to pasts in different causal states generally have non-zero overlap, given by the fidelity of the corresponding conditional probability distributions [Eq.( 7)].Whenever these overlaps are not all either zero or unity (wherein the ε-machine exhibits no stochasticity in its hidden states, and stores no redundant information [26]) the QMS steadystate distribution has lower entropy than that of the classical causal states, and the q-machine stores less information than the corresponding ε-machine, satisfying (C5).The information stored in the q-machine may be calculated by determining the Shannon entropy of the spectrum of the steady-state density matrix ρ q = j P (j, t)|ς j (t) ς j (t)|dt, (10) where P (j, t) = µπ j Φ j (t) is the steady-state distribution of the QMS (see Appendix A).Using a Gram matrix approach [42], we can construct a characteristic equation for the spectrum expressed in terms of the steady-state distribution and overlaps of QMS (see Appendix B): The memory required by the q-machine is then given by These λ n can be determined by solving the integral equation Eq. (11).
The QMS can be adapted to model coarse-grained versions of a process, wherein continuous time is replaced by small, discrete timesteps.Analogous to classical coarse-graining, in which all states within a given timestep δt are merged into a single state prior to applying the causal equivalence relation, we likewise merge the QMS in a particular timestep to form coarse-grained QMS, with probabilities defined by integrals of probability densities over timesteps.The corresponding coarsegrained QMS are given by where ψx Outcome n from measurement of the first subspace corresponds to t− → 0 = nδt.As the coarse-grained states will generally not be mutually orthogonal, a quantum advantage remains.The continuous-time case is recovered in the limit δt → 0.
For continuously-emitting processes, the emitted symbols are determined immediately after transition events, rather than immediately before.The current emission is then known throughout the current step.An appropriate modification to the protocol is to make a measurement of the symbol subspace in the new QMS immediately after the previous transition event occurs, rather than on the old QMS.The QMS are now also labelled by the current symbol, with where δ x kj = 1 if T x kj is non-zero (and zero otherwise), and Φ x j (t) = k ∞ t δ x kj φ x kj (t )dt .As with the discrete event case, one can calculate the steady-state density matrix of these states, and from its spectrum determine the amount of information tracked by the q-machine.

IV. EXAMPLE PROCESS WITH COMPLEX TEMPORAL AND SYMBOLIC DYNAMICS EXHIBITING EXTREME QUANTUM ADVANTAGE
To illustrate our results, we employ our q-machine construction protocol to study an example process that violates the restrictions on current systematic ε-machine construction protocols.Consider the following conceptual scenario allegorising a process: Charlie owns a device that exhibits a constant breaking probability within any fixed interval of time.Upon breakage, he takes the device to either Alice or Bob for repair, after which the device is returned to Charlie.The time taken by Alice and Bob to fix the device is a random variable; Alice's fixing time distribution is different to Bob's, but has the same average.Emissions herald when the device changes hands, and indicate the new holder.
Mathematically, this process can be represented by three modes, forming the eeHSMM depicted in Fig. 3(a).Modes g A and g B each emit symbol C at times drawn from φ A (t) and φ B (t) respectively, upon which the system transitions to mode g C .Mode g C emits symbols A or B with equal probability at times drawn from Poisson distribution φ C (t) and transitions into the corresponding  15).The temporal dynamics of modes g A and g B provide equivalent futures for sufficiently long dwell times, as do all dwell times in mode g C (see main text).(b) Classically, equivalence relations must be applied manually to appropriately merge the hidden states tracking the dynamics, while (c) quantum models have automatically merged states by construction.mode g A or g B .We define the distributions: We see that this process exhibits causal equivalence between certain of its causal pairs, and as such falls outside the class for which systematic ε-machine construction methods are currently known.Specifically, because φ C (t) is a Poisson distribution, all dwell times within this mode have identical conditional futures, and so only a single causal state is needed to describe occupation of this mode [10,11,34]: Further, we see that the conditional futures for modes g A and g B can become identical for certain combinations of dwell times; specifically, we have FIG. 4: Memory requirements for example process.(a) The information stored within the ε-machine and q-machine at increasing levels of precision, showing a quantum advantage that appears to become unbounded in the continuum limit (plot shown for τ = 2T , with data points taken at values τ /4δt ∈ Z + for calculational simplicity).(b) Inspection of the steady-state spectrum for increasingly fine discretisation (N = T Fix /δt) of the q-machine indicates that the eigenvalues appear to fall off with a 1/n 2 dependence.
Here β is a normalisation constant chosen such that n=1001 λ n for the N = 2 14 case (eigenvalues ranked largest to smallest).
These both exemplify violations of restriction iii.The conceptual scenario can be easily extended to describe a process in violation of restriction ii: suppose Charlie's device has two possible faults, with different failure rates, and the choice between Alice and Bob is decided by which of the faults occured.Restriction i would be violated if Charlie were to merely announce that the device has broken (instead of who will be fixing it) and the choice between Alice and Bob is determined by how long the device took to fail.
The relative simplicity of this example allows us to perform the appropriate application of the equivalence relations to merge causal pairs into causal states on an ad hoc basis, as is displayed in Fig. 3(b).Notably however, in contrast to this we can exploit the self-merging nature of the QMS construction, and blindly assign QMS for each causal pair.These QMS will automatically satisfy the causal equivalence relations, and will not incur any penalty in the information the q-machine must track.This is depicted in Fig. 3(c).
As described in Appendix C, we can calculate the relevant modal and mean lifetimes and firing rates of the process, the coarse-grained causal states and QMS, and their respective steady-state probabilities.Solving the course-grained analogue of Eq. ( 11) we can determine the steady-state density matrix spectrum, and hence calculate M q for the particular level of coarse-graining.We can further calculate C µ at the same coarse-graining for comparison.In Fig. 4(a) we show that with increasingly fine coarse-graining C µ diverges, while M q appears to converge towards a finite value.This suggests an unbounded advantage of the q-machine over the ε-machine, similar to that seen for renewal processes [34].Inspection of the steady-state density matrix spectrum indicates that it falls off approximately as 1/n 2 [Fig.4(b)].When the spectrum has this dependence, the associated entropy is finite [34] providing further support that M q is bounded in the continuum limit.

V. DISCUSSION
We have proposed a systematic construction protocol for quantum models of general complex continuous-time stochastic processes that automatically adopt the process' causal architecture, and exhibit an entropic memory advantage over their optimal classical counterparts.Moreover, our model construction protocol can be applied more generally than corresponding protocols for optimal classical models, without any restrictions on the combined symbolic and temporal dynamics.This allows our models to be utilised to study a much broader range of processes, such as the aforementioned examples in our introduction [1][2][3][4][5][6][7][8][9].
While the information cost of of a quantum model constructed by our protocol is lower than any classical model, there is no claim of optimality over all quantum models.By accounting for longer-range temporal correlations [27,28,35], it has been found that discrete-time q-machines can reduce their information cost, and analogous constructs may be possible in the continuoustime case, providing further memory savings.Other reductions may be possible by exploiting the possibility of using complex amplitudes in the QMS.Even so, our current models provide an upper bound on the information cost for the optimal quantum model, and a lower bound is given by the mutual information between the past and future of the process [12,26].
It is interesting to consider how to implement the q-machine as a simulator.Our earlier work on q-machines for continuous-time renewal processes [34] discussed how they might be realised by using the position of a particle as the continuous variable tracking time, and the motion of the particle towards a detector as the measurement sweep.One can envisage the more general multi-mode, multi-symbol processes here might be implemented similarly, using internal states of the particle for the mode and symbol subspaces of the QMS.We leave specific details of particular experimental implementations as an open question for future work.
Finally, let us remark on the apparent unbounded advantage in our example.In earlier work we have postulated that this is likely a typical feature of continuous-time q-machines [34].We recapitulate the argument [32,34] here.Consider a coarse-graining with timesteps δt.With further refinement, the new QMS will typically have large overlap with the existing temporally-adjacent QMS.At very fine coarse-graining, the new states will be almost identical to those existing, and so the increase in information cost will be ever-decreasing, resulting in the observed convergence of M q .In contrast, the mutual orthogonality of the classical states leads to a logarithmic divergence in information cost.An enticing problem for future work is to develop methods of coarse-graining that exploit such quantum features.We expect that such a quantum coarse-graining could be used for near-exact simulation with extreme memory advantages even in the single-shot regime.It would also be interesting to consider the extension to input-output processes [43,44] operating in continuous-time.The resulting operator ρ G is called the Gram matrix, and as noted above, has the same spectrum as ρ q .It has a corresponding characteristic equation where f n are the eigenfunctions, and λ n the spectrum.Expanding this in terms of the QMS and steady-state distributions, we obtain the characteristic equation Eq. ( 11), which may be solved to obtain the spectrum λ n , and hence calculate the information M q stored by the q-machine.
Appendix C: Further calculational details for the example process Here we provide the detailed derivations of the properties of the example in the main text.Recall the definition of the process, as depicted in Fig. 3(a), whereby two modes g A and g B each emit symbol C while transitioning to mode g C at a time drawn from φ A (t) or φ B (t) respectively, while mode g C emits A or B and transitions to the corresponding mode with equal probability, at a time drawn from φ C (t).The respective distributions are defined by as illustrated in Fig. 5.
From these distributions, we can straightforwardly calculate the modal lifetimes and steadystate distributions using the definitions in the main text.These are as follows: The average lifetime is τ = (T Fix + 2T Brk )/4.We can also calculate the mode survival probabilities: The corresponding steady-state probabilities are: While no general prescription is presently known for implementing the causal equivalence relations for continuous-time processes in classical systems [25], their relative simplicity in this example allows for them to be applied on an ad hoc basis, as displayed in Fig. 3(b).We can merge all times within mode g C into a single causal state, as well as all pairs of states (g A , t + T Fix /2) and (g B , t + T Fix /4) for t ≥ 0. For clarity, in the following we label states by a specific mode only for the unmerged states, and use g M to represent states in the merge between the g A and g B pairs.The corresponding causal states have steady-state probabilities given by As discussed in the main text, the continuous nature of the distributions leads to the classical statistical complexity C µ being infinite.However, we can still investigate how the complexity grows with increasingly fine discretisation, and compare this to the quantum case.For calculational simplicity, we use timesteps δt such that T Fix /4δt is an integer, which ensures that each timestep does not span across any borders where the behaviour of the probability distributions change (i.e.there are no 'partially merged' timesteps).When discretising, we note that there is a small O(δt) correction to the effective lifetimes and firing rates [10], which can be appropriately accounted for by inclusion of a normalisation factor N such that N ∞ n=0 Φ(nδt) = µπ j τ j .Calculating the Shannon entropy of the discretised distributions, we obtain C µ at each level of precision, as plotted in Fig. 4(a) for the case T Fix = 2T Brk .
For the q-machine, we must first construct the discretised QMS, as prescribed in the main text:

(C6)
Due to the self-merging nature of the QMS, this blind construction protocol is automatically consistent with the causal equivalence relations satisfied by the classical causal states.By constructing a density matrix of these QMS, with weighting given by the appropriate steady-state probabilities Eqs.(C4) (or rather, discretised analogues thereof), we can find its spectrum, and the associated Shannon entropy provides the quantum memory requirement M q .These are plotted for increasingly fine levels of discretisation in Fig. 4(a) alongside the corresponding classical memory requirements, where it is clear that the q-machine requires less information than the optimal classical machine, and appears to converge to a bounded value, much like was previously found for renewal processes [34].

FIG. 2 :
FIG.2: Processes without systematic ε-machine construction protocols.There are currently no systematic construction protocols for the ε-machines of processes where (i) the current mode cannot be synchronised from symbolic dynamics alone; (ii) the modes have symbol-dependent emission-time distributions; or (iii) two modes have identical future dynamics after sufficiently long occupation.

FIG. 3 :
FIG. 3: Causal structure of example process.(a) Example eeHSMM with distributions as given in Eq. (15).The temporal dynamics of modes g A and g B provide equivalent futures for sufficiently long dwell times, as do all dwell times in mode g C (see main text).(b) Classically, equivalence relations must be applied manually to appropriately merge the hidden states tracking the dynamics, while (c) quantum models have automatically merged states by construction.

FIG. 5 :
FIG. 5: Emission distributions for example.Dwell times of Eqs.(C1) describing the temporal dynamics of the example.