Operator Spreading in the Memory Matrix Formalism

The spread and scrambling of quantum information is a topic of considerable current interest. Numerous studies suggest that quantum information evolves according to hydrodynamical equations of motion, even though it is a starkly different quantity to better-known hydrodynamical variables such as charge and energy. In this work we show that the well-known memory matrix formalism for traditional hydrodynamics can be applied, with relatively little modification, to the question of operator growth in many-body quantum systems. On a conceptual level, this shores up the connection between information scrambling and hydrodynamics. At a practical level, it provides a framework for calculating quantities related to operator growth like the butterfly velocity and front diffusion constant, and for understanding how these quantities are constrained by microscopic symmetries. We apply this formalism to calculate operator-hydrodynamical coefficients perturbatively in a family of Floquet models. Our formalism allows us to identify the processes affecting information transport that arise from the spatiotemporal symmetries of the model.


Introduction
Under time evolution a wide class of many-body quantum systems tend towards equilibrium, where the final state is well described by a relatively small number of parameters such as temperature or pressure. At this point, information about the local conditions of the initial state is 'scrambled': it can no longer be determined by simple local measurements but is instead encoded in increasingly delicate and complicated observables. The process of scrambling has been the focus of intense study in the fields of black hole physics and holography [1][2][3][4][5][6][7][8][9][10][11][12], integrable systems [13][14][15][16][17], random unitary circuits [18][19][20][21][22][23][24], quantum field theories [25][26][27][28][29][30][31][32], and in the setting of chaotic spin-chains [33][34][35][36][37][38][39][40][41][42][43]. A principal reason for the flurry of interest in scrambling is the striking universality observed in the information spreading dynamics of apparently disparate models. One universal feature of scrambling in ergodic systems is the ballistic growth of operators (with 'butterfly velocity' v B ), a feature connected to the universally observed linear growth of quantum entanglement. This result has been demonstrated for random unitary circuits [19]; in 1D a simple picture emerges where operators grow according to biased diffusion [19,20]. This picture is altered in the presence of additional symmetries which give rise to power-law tails in the distributions of operator right end-points [21,22]. In all of these cases, it appears that in ergodic systems quantum information obeys a sort of hydrodynamics, with an unusual conservation law, which we call 'information conservation'. The purpose of the present work is to show that the connection to hydrodynamics is not just an analogy and that a standard hydrodynamical tool-the so-called memory matrix formalism (MMF) [44]-can be applied with only a few technical (but consequential) modifications. Once we postulate a suitable slow manifold (a concept we explain), the ballistic growth of information is inevitable; in the same way that diffusion is inevitable in high temperature systems when the local conserved density is identified as the sole slow variable. Moreover, the formalism provides a framework for the perturbative calculation of information transport coefficients (e.g., the butterfly velocity) in concrete models. This is useful because operator spreading tends to require working at infinite temperature, where quantum field theoretic methods become harder to control. As such, the MMF is one of the only tools available for the analytical calculation of operator transport coefficients (although a similar effective membrane theory has been independently suggested for the related issue of entanglement growth [45]).
This work is organized as follows. In section 2 we extend the MMF to include a slow mode associated with the conservation of quantum information in the setting of translation invariant one-dimensional Hamiltonian systems and in section 3 we do the same for translation invariant one-dimensional Floquet models. Our formalism yields a succinct expression for the butterfly velocity, and shows that the biased diffusion of operator fronts observed in ergodic systems is arguably the simplest scenario consistent with the conservation laws. We also explain how microscopic symmetries constrain the butterfly velocity and front diffusion constants (D). We demonstrate the usefulness of the MMF in section 4 where we consider a translation invariant Floquet circuit with no additional symmetries and give results for the circuit averaged butterfly velocity and operator front diffusion constant in the limit of large local Hilbert space dimension q. In section 6.2, we calculate the O(1/q 2 ) corrections to the butterfly velocity and attribute various contributions to the discrete time translation symmetry and to the spatial translation symmetry. Finally, inspired by this calculation we give predictions for v B and D for a family of Floquet models to order O(1/q 2 ). This calculation also serves as a consistency check on our formalism, confirming that the slow manifold we have proposed is sufficiently complete to perform hydrodynamical calculations.

Memory matrix formalism: operator spreading as a slow mode
The MMF is a method for predicting the hydrodynamical properties of many-body systems [44,[46][47][48]. The input for the method is a Hamiltonian (or some other dynamics), as well as a guess as to what the likely slow modes are in the system (the local densities of conserved quantities are natural candidates). Under the assumption that all slow modes have been included and that all remaining fast modes decay sufficiently rapidly, the formalism yields predictions for the long-distance behavior of correlation functions involving the slow modes.
In this section, we adapt the formalism to include the slow mode associated with information conservation in the setting of one-dimensional quantum systems with local dynamics, i.e., local Floquet circuits, or systems with local Hamiltonians. In this paper, the 'conservation of quantum information' is equivalent to the statement that Heisenberg evolved operators have a conserved Hilbert-Schmidt norm Tr(O † (t)O(t)) = const., a direct consequence of unitarity.
By averaging over choices of initial operator with the same right endpoint, we express the distribution of operator right endpoints (or the 'operator right density' as we refer to it in this paper), as an autocorrelation function of elements in the space of operators on two replicas of the Hilbert space. Using the MMF, we investigate the pole structure of the corresponding spectral function and give Kubo-like formula for the butterfly velocity v B and an expression for diffusion constant D in terms of the memory matrix.

Quantifying operator spreading
We consider a one-dimensional lattice of N sites with single site Hilbert space H local = C q . The space of operators on the full Hilbert space H is denoted B(H). A convenient basis for single site operators are the generalised Pauli matrices 1 {σ μ }, a set of unitary matrices satisfying the orthogonality relation Tr(σ μ † σ ν )/q = δ μ,ν . A time evolved operator O(t) can be expressed as a linear combination of strings of generalised Pauli operators, The time evolution is generated by a Hamiltonian H, U(t) = exp(−itH). In the second equality we have introduced the Liouvillian L(·) = [H, ·], 2 and in the last equality only the coefficients C O μ (t) depend on time. We have normalised O such that the Hilbert-Schmidt norm gives 1 The generalised Pauli matrices are generated by the shift and clock matrices X and Z, σ μ = X μ (1) Z μ (2) . Where X q = Z q = and ZX = e 2πi q XZ. 2 In the context of this paper, a Liouvillian L is a generator of Hamiltonian evolution (in operator space), as opposed to a generator of Markovian dynamics in open quantum systems, as the name often refers. O 2 HS = dim(H) = q N , this ensures μ |C O μ | 2 = 1. Following the work of [19,20], we define the right density ρ R (x, t) (the probability of an operator having right endpoint at position x at time t) by where Rhs(μ) denotes the rightmost site on which the Pauli string has non-trivial support. Summing over the site positions in equation (2) gives us a conserved quantity,

Operator right density as a correlation function
The starting point for the MMF is a temporal correlation function of slow variables, typically local charge operators associated to a conserved quantity such as energy or charge. Analogous to the local operators associated with energy or charge, we are able to associate pseudo-local operatorsŴ x with the right density ρ R . These operators, which we call the right density operators, will play the role of slow variables in the MMF. One complication is that, rather than being an operator on the original Hilbert space like charge/energy,Ŵ x is an operator on operators-a super-operator (or equivalently an operator on two replicas of the original Hilbert space). Thus, in our application of MMF, we will be studying the dynamics of super-operators like W x , eventually writing ρ R as a temporal correlation function of 'vectorised' right density operators |W x . This serves as the starting point for the MMF.Ŵ x is implicitly defined in the equation for ρ R (x, t) below, where the inner-product a|b is the infinite temperature operator inner-product suitable for B(H). An explicit definition ofŴ x is given by (5) whereΛ + is the identity super-operator,Λ − is the projector onto the identity operator andΛ 0 is the projector onto the space of non-identity operators. Algebraically,Λ ± are given bŷ By writing equtaion (4) in the language of tensor diagrams, we make the following manipulations to write the density ρ R (x, t) as an overlap of two 'vectors', We view the final expression as the inner-product of two vectors in the vector space W ≡ B(H) B(H) of operators acting on two replicas of the Hilbert space (we have used a boxed tensor product to represent the tensor product between two copies of the operator space). In particular, this allows us to write the density ρ R as a temporal correlation function of the form A(t)|B W as shown below, Elements of W can be written as linear combinations of vectors |A B , which has the following diagrammatic representation, where the legs 1, 1, 2, 2, represent the indices of the operators A and B which act on the first and second replica respectively. An inner-product of two states has the obvious meaning of connecting legs, One also needs to include a rule for moving a symbol around a bend to ensure consistency, The manipulation of equation (7) has the following consequences for theΛ ± andΛ 0 , where we have chosen to normalise the vectors |+ and |− , ±| |± = 1 (this introduces factors of q that differ from equation (6)). The vectors |± take a simple diagrammatic form, The vectorised right density super-operators |W x are then given by (12) where d >x is the dimensions of the subsystem of sites to the right of site x. A closely related super-operator is the purity super-operatorF x , given by (13) where d x is the dimension of the subsystem of sites r x. Using this super-operator, we are able to represent the purity γ x (t) of the system (partitioned into a subsystem left (inclusive) and right of x) as a matrix element, where ρ(t) is the density matrix for the state of the system. Up to an overall constant, the purity operators F x and the integrated right density operators, y x W y , are equivalent. Therefore, the W x can be written as a difference of (re-scaled) F x 's, Time evolution in the doubled operator space W is generated by a doubled Liouvillian L ≡ L + L, which evolves both replicas of the operator space independently. The time . We can use this to write down a continuity equation for W x , where J x = −iL(F x )/d >x is the current associated with the operator right density. The currents J x are pseudo-local super-operators; they look, locally, like Λ + everywhere to the left the cut {x, x + 1} and Λ − everywhere to the right. Local unitary evolution acts trivially in the + and − domains. We show this by considering the action of a (single-site) Liouvillian L i at site i in the + domain, By moving the conjugated Hamiltonians around the 'bend' from the barred legs onto the unbarred legs, we find that the first and fourth terms cancel, as do the second and third. This can be easily generalised to l-local interactions. By swapping legs 1 ↔ 2 in equation (17), we the find the action of L in the − domain. The region separating the + and − domains can only grow via the local evolution at its edges. In this sense, one can consider equation (16) to be an equation of local conservation of operator right density.

Operator averaging
Rather than consider the evolution of the right density for a particular operator O, we average over the all operators with right endpoint x. Doing this in the basis of generalised Pauli matrices, the averaged density ρ R (x, y, t) is given by where we have pulled the time evolution out from the operators O(t). The factor before the sum is normalisation for the average (the reciprocal of the number of linearly independent operators with right endpoint x). This is in fact just a temporal-correlation function between right density super-operators, The W x are orthogonal but not normalised with respect to the trace inner-product. The result of this is that the Fourier transformed right-densities W k are not orthogonal. This lead us to an unusual inner-product ·|· with respect to which the W x are orthonormal, Where Φ is given by Q is the projector onto the fast subspace Q, the orthogonal complement 3 of the slow subspace P ≡ Span{W x }. A proof that ·|· satisfies the axioms of an inner-product is found in appendix A. We must be careful with expression equation (19) as L is not self-adjoint with respect to this inner-product. Making the operator average implicit, the right density becomes The position dependent re-scaling of the right densities W x by Φ reflects the strong entropic bias for operators to grow, i.e., for y > x we find Operators are exponentially more likely to grow than to shrink 4 .

Spectral and memory function
Having expressed the right density as a correlation function, we now investigate its late time behavior. A helpful diagnostic in the long time behaviour of a correlation function is the pole structure of the corresponding spectral function, Fourier transforming in space will allow us to express the poles at small z in terms of a wave-number k. In particular this allows us to characterise the long-time and long-wavelength behaviour of the averaged density ρ R (x, y, t). The density super-operators in k-space are given by We choose to center the lattice at x = 0. In the k-space, the spectral function is given by By taking P ≡ Span{W k } as the slow space in the MMF, we use some formal manipulations of resolvent operators [44] to express the spectral function as where Ω k,p ≡ W k |L| W p and Σ(z) k,p is the memory matrix, By It is simple to check using equation (13) and (11) that F x | l H |F y = F x | r H |F y = q −|y−x| Tr(H) (and equivalently for multiplication by H in the second replica). This gives F x | L |F y = 0 and, by linearity, Ω k,p = 0.
For translationally invariant systems (in the thermodynamic limit N → ∞), the memory matrix is diagonal, Σ(z) k,p = Σ (k, z) δ k,p and the inverse of z + iΣ (k, z) is readily calculated, The long-time and long-wavelength pole structure is found by expanding the memory function Σ (z, k), for small k and z. The k-space representation of the continuity equation equation (16) is then given by, where Putting this back into the memory matrix yields This will give the pole structure Provided the analyticity of v(z) and b(z) as −iz → 0 + , this will be precisely the pole structure associated with a biased diffusion equation. This analyticity condition is met so long as we can assume that the fast variables J k and LΦ(W k ) have rapidly decaying correlations (faster than 1/t).

Butterfly velocity and diffusion constant
In this section we provide formal expression for the butterfly velocity v B and diffusion constant D and sufficient conditions for the symmetry of the operator growth light-cone, v L = v R and D L = D R .
2.5.1. Formal expressions for v B and D. Using equation (32), the butterfly velocity v B is given by We introduce the proxy σ(k, z), defined as Using L(W k ) ∼ k, we conclude that lim k→0 σ/k = lim k→0 Σ/k, provided that we take the k → 0 limit before taking z → i0 + . Using this, we give a Kubo-like formula for v B , where J ≡ J k=0 and W ≡ W k=0 . Converting to the usual trace inner-product, we have W |L| J(−t) = LΦ(W)| |J(−t) . Importantly, Φ and L do not in general commute 5 . If they were to commute, we would find L(W k=0 ) = 0 and hence v B = 0, which would lead to the incorrect conclusion that information propagates diffusively rather than ballistically. Using the biased diffusion ansatz for the pole location, the diffusion constant is given by 2.5.2. v R = v L and the operator growth light-cone. Generically, the right and left butterfly velocities, v R and v L , are not equal [42,49,50]. In this section we relate v R and v L in a way that gives rise to a light-cone structure. We then determine sufficient conditions for a symmetric light-cone, v R = v L and symmetric fronts D L = D R . We need to adapt our notation when talking about both left and left and right density distributions at once, to do this we denote W x R as the familiar right density super-operators and W x L to be the left density super-operator. The altered inner-product is also right/left dependent, with ·|· R ( ·|· L ) corresponding to the inner-productive suitable for right (left) endpoint calculations. The right and left density distributions are given by The equations for v L and D L are found to be Where we have introduced a superscript H to label which Hamiltonian the systems is being evolved with. In order to convert between these distributions let us introduce the involution I, the spatial inversion symmetry operation, x → −x. It has the following action on the density super-operators and the Liouvillian, where H I is the spatially inversion of the (translationally invariant) Hamiltonian H. Using this involution on ρ L gives the following where We have added labels to make clear under which Hamiltonian the system is being evolved. This relation implies the following, Using this and expressions for v R/L and D R/L (equations (34), (38) and (40)), we find the relationship between v L and v R and D L and D R to be This implies a physically obvious result: in an inversion symmetric system the left and right velocities are equal. A somewhat less obvious result is the following: The second of these sufficient conditions applies for systems with a time-reversal/antiunitary symmetry, provided that the symmetry transformation can be achieved with single site transformations. Before we prove this result, we will apply it to several models: • An example where both sufficient conditions are met is the spin-1/2 model Alternatively, if we had chosen R to be a product of Pauli X operators on every even site and on every odd site, we find R † HR = H * , (a time-reversal symmetry). Despite this model lacking inversion symmetry, it has a symmetric operator light-cone, v L = v R and D L = D R . • An example where the second of the sufficient conditions is met is given by another spin-1/2 model with two-body interactions and no external field coupling, terms containing a single Y are sent to their negatives, every other term remains unchanged. The terms that flipped sign make up the imaginary part of H, hence H → H * , so that yet again v L = v R .
• Taking the integrable and non-integrable Hamiltonians of [42], in the special case λ = 0 the Hamiltonians lack inversion symmetry but satisfy both of the sufficient conditions above, giving v L = v R in agreement with the results in [42].
To prove this result we make use of the symmetry operation S = (1 ↔ 2), which swaps the legs 1 and 2 on every site. S has the property S |W x R = |W x L and is a symmetry of L. Therefore, where we have used the reality of the density. Using this gives This implies the existence of a light cone structure, where the future light cone is a π rotation of the past light cone. If, in the final equality of equation (45), we had brought the complex conjugation onto each term, we would have instead found We can generalise this by considering any transformation which performs on-site basis rotations, such a transformation is an isometry of the density super-operators, |W x . Letting R = x R x be a product of unitary rotations, we can freely conjugate H by R at any point in equation (45) (on-site rotation is an isometry of W x ) and maintain equality. This gives

MMF for Floquet models
With only minor changes, the MMF generalises to Floquet models. In this section we repeat a number of steps taken in the continuous time case, finding Floquet analogues of the memory matrix and Ω. The main results of this section are equation (56) for the Floquet analogue of Σ and Ω and formal expressions equation (58) for v B and D (and a Kubo-like formula for v B in equation (61)). Readers only interested in main results should look at these equations and skip the rest of the section. Let U be a Floquet unitary, then the Heisenberg evolution of an operator O is given by Writing U ad = l U r U −1 , where l U (r U ) is left (right) multiplication by U, allows us to rewrite the operator evolution as Evolution of elements in the doubled operator space W is given by Denoting U = U ad U ad , the evolution of an element X ∈ W is given compactly by Using equation (52), we define the discrete time derivative Δ t X(t) ≡ X(t) − X(t − 1) = −LX(t), where L = U − . The Floquet analogue of the continuous time continuity equation equation (16) is then given below, where we have analogously defined a current J, Following the steps in section 2.1, the operator averaged right density is given by For translationally invariant Floquet models, the spectral function ρ(z, k) is given by equation (55), in analogy to equation (29) in the Hamiltonian case, Ω and the memory matrix Σ are given, in analogy to equation (28), by Unlike in the Hamiltonian case, Ω does not in general vanish, as L ≡ U − is not a (doubled) Liouvillian. The pole of the spectral function encodes the long time/distance hydrodynamical limit, and is given by equation (57), Using the biased diffusion ansatz equation (37), the butterfly velocity and operator front diffusion constants are formally given by As in the continuous time case, the memory function is difficult to calculate. Once again, it is convenient to define an auxiliary quantity σ, defined as (59) By using this equivalence and by splitting the current J it up into its slow and fast components J k P = P(J k ) and J k Q = Q(J k ), we find the Kubo-like formula for v B , analogous to equation (36) found in the continuous time case, where J = J k=0 and W = W k=0 . Although a slightly different object to the Liouvillian used in the continuous time case, it remains true that L does not commute with Φ. Leading again, inevitably, to the ballistic spreading of operators.

A minimal Floquet model
As a test of the formalism, we investigate a translationally invariant Floquet model with on-site random unitary scramblers. We take the single site Hilbert space to be dimension q and choose the Floquet unitary to be composed of a layer of nearest neighbour two-site unitary gates, followed by a layer of on-site Haar random scrambling unitary V applied to every site (figure 1), this ensures that the model has no additional conservation laws beyond those guaranteed by unitarity, The coupling unitary can be written as a product of commuting two-site unitaries e −iεH = Likewise, the conjugate of the gate e iεZ x Z x+1 is given by We write the Floquet unitary for the doubled operator space as U = U Z V, where U Z = e −iεH ⊗ e iεH ⊗ e −iεH ⊗ e iεH contains the two-local gates and V is the on-site scrambling unitary (appropriate for the four copies of state space). As in the case with a single replica, we split U Z up into a product bricks U x,x+1 , given by U This replicated gate has the diagrammatic representation (66) On each leg (labelled by the replica index 1, 1, 2, 2 introduced in equation (9) and by site position), the brick has the option of carrying either a Z or a . If a leg is carrying a non-identity factor A, we say that the leg is decorated and that the factor A is the decoration. In this spirit, and using equations (63) and (65), we find the decoration expansion of the brick to be given by (67) Multiplying F x | by a layer of two-site gates yields the following, We have only depicted the sites x and x + 1 either side of the cut (the domain wall between the + and − wiring configurations). Every brick U r,r+1 that does not straddle the cut is 'absorbed' into the state using the property To see this, we notice that the state +| ⊗ +| connects the replica 1 with 2 and 2 with 1 (see equation (11)), so that the two copies of U r,r+1 each find a copy of U † r,r+1 to yield +| ⊗ +| U r,r+1 = +| ⊗ +|. The isometry of U r,r+1 , S = (1 ↔ 2), relates +| and −| through ±| S = ∓|. Using this isometry, the first equation in equation (70) implies the second.
The calculation of Ω(k) is then straight forward. By utilising translational symmetry and definition equation (25) for the momentum space density super-operators W k , we write where When the cuts are misaligned (x = 0), equation (70) can be used . Ω(k) is then succinctly given by In section 6.2 we will see that the circuit averaged 6 memory matrix is O(1/q 2 ). Therefore we can use Ω alone in equation (58) to obtain a leading order expression for A straightforward calculation shows that brick-work circuits with commuting even and odd bricks (within one Floquet layer) have a strict light-cone of v LC = 1 (as opposed to v LC = 2 in brick-work circuits with non-commuting even and odd layers). Notably, the O(1) expression equation (73) for D vanishes when there is either no ballistic spreading (v B = 0) or when operators spread at the light-cone velocity (v B = 1), although it should be noted that v 0 can never approach this limit in this particular model, 0 v 0 1/2. This is reassuring as an operators spreading at the geometric light-cone velocity must have a front with zero width. The operator spreading dynamics of this model occupies a markedly different regime than that of holographic/SYK physics which exhibit a sharp front and of random unitary circuits [19,20], where the operator front diffusion constant is strongly suppressed at large q and v B is close to the maximum velocity allowed by any (two-local) brick-work circuit (recent work [43] investigates the crossover from holographic to random circuit behaviour of the front). In the large q limit of our Floquet circuit, the operator front diffusion constant is roughly the same size as v B , which itself is far from the light-cone velocity.

Corrections from memory effects: a summary
We have so far calculated the contributions to information transport (the butterfly constant v B and operator front diffusion constant D) coming from slow processes only. In the remainder of this paper we perturbatively calculate the corrections to the butterfly velocity coming from the fast processes packaged in the memory matrix. The parameter that controls this perturbative expansion is 1/q, and the corrections are a sum of real-time Feynman-like diagrams. This perturbative expansion encounters technical subtleties at large times where q → ∞ asymptotic methods are insufficient for circuit averaging. Fortunately, we are able to resolve this subtlety by conjecturing that processes contributing to the memory matrix possess a natural exponential decay timescale τ (ε) ∼ 1/|g(ε)| 7 for the (circuit averaged) real-time memory matrix. We provide analytical evidence backing this conjecture. This allows us to argue that the problematic large-time contributions are in-fact subleading, enabling us to safely compute corrections from memory effects to O(1/q 2 ).
In section 6.1, we present the tools for the computation of memory effects at O(1/q 2 ). These tools are q → ∞ scaling results for the Haar average of correlators and products of correlators. For proofs of these results see [51]. In section 6.2, we use these large q results to compute Σ(t), finding that the leading order contributions decay exponentially with a timescale τ (ε) ∼ 1/|g(ε)|. Although only valid for times t q, this result motivates our conjecture that the exponential decay of Σ(t) persists to arbitrarily late times. A detailed discussion of late-time correlation functions and limitation of the large q results of section 6.1 is given in section 7.1.
Before going through the calculation of memory corrections in detail, we present the result of section 6.2, in the form of a circuit averaged butterfly velocity, where the memory corrections δv S (ε) and δv F (ε) are given at with s(ε) = sin(ε) 2 cos(ε) 2 It turns out that the contribution δv S (ε) only arises because of the spatial (S) translation symmetry of the model: the scramblers in any particular Floquet layer are identical. On the other hand, δv F (ε) arises due to both the spatial and Floquet (F) symmetries in the problem. To see this, we analyse variations of the model without spatial translation and Floquet symmetry (see appendix E). In figure 2 we plot the different contributions to δv B ≡ v B − v 0 . Spatial disorder is often associated with localization, and a suppression of transport. On the other hand the spatiotemporal randomness in random circuits promotes ergodicity, and the rapid growth of operators. Our results for v B are more in agreement with the former intuition-translational symmetry enhances transport-because the butterfly velocity receives an enhancement δv S > 0 when spatial translation symmetry is present. Even less obvious is the role of Floquet symmetry; like translational symmetry, we find an enhancement δv F > 0 to v B . It will be interesting to understand whether these effects are robust at higher orders in perturbation theory, and hold for more general time evolutions.
We now briefly discuss the difficulties with small ε. The ε → 0 limit represents an obvious sanity check on our results, but also represents a challenging limit in a memory matrix calculation. As ε → 0, the memory time must diverge, limiting our ability to truncate the memory effects. This is demonstrated in the failure of equation (74) for v B to vanish for ε = 0 where sites decouple and the butterfly velocity is zero. As the memory time reaches t ∼ q, or equivalently once ε ∼ 1/q, our perturbative expansion in 1/q breaks down. Therefore, the corrections δv B as given in (74) are valid only for ε > O(1/q), below which δv B must quickly go to zero as shown in figure 2. We discuss this in detail in section 7.2.

Calculating corrections from memory effects
In this section we first present the large q scaling results that form the backbone of the perturbative expansion of Σ(t), before presenting a intricate booking keeping of the O(1/q 2 ) contributions in section 6.2.

Averaging n-point correlation functions and their moments
In this section we will present several useful theorems concerning the Haar averages of correlators and products of correlators with random unitary dynamics. The reader is directed to [51] for the proofs of the theorems below. We need only consider correlation functions involving Z's: where t = (t 1 , . . . , t n ) and . We call correlators of form equation (77) arbitrary-time-ordered (ATO) n-point correlation functions as the times t i are not forced to be ordered. A correlator is trivial if Z(t) = . For the purposes of this section, we assume that none of the correlators are trivial. Before we present the theorems, let us introduce what we call the decoration delta constraint, δ A,B . This is zero unless the two operators A = Z(a 1 ), . . . , Z(a n ) and B = Z(b 1 ), . . . , Z(b n ) are equal, AB −1 = , for all scrambling unitaries V. If the strings a = (a 1 , . . . , a n ) and b = (b 1 , . . . , b n ) do not have repeated consecutive times, then the delta constraint simply checks that a = b. Otherwise, we need to introduce a concept called the minimal form of a operator.

Definition 1 (Minimal form).
With Z(t) = Z(t 1 ) . . . Z(t n ) as a product of scrambled Pauli Z operators and Z(t ) = Z(t 1 ) . . . Z(t n ) as the form reached after exhaustively using the property Z(t) 2 = , we define the minimal form of Z(t) as Min(Z(t)) ≡ Z(t ). We also define the minimal form of the string t as Min(t) ≡ t .
We can then give a more general definition of the delta constraint.

Definition 2 (Decoration delta constraint). The delta constraint δ A,B on operators
(78) Theorem 1. The Haar average of a product of p ATO correlators has the following scaling behaviour, (79)

Theorem 2. The Haar average of a product of two ATO correlators is given by
where we have used the decoration delta constraint and where the sum over τ allows for t and t to differ by a global shift in time. The symmetry factor S(t) counts the degree of cyclic symmetry of the list of times t, if there exists n cyclic permutations α such that α(t) = t, then S(t) = n. We will often study a special subset of ATO correlators, which we dub physical OTOCs. These take the form

Theorem 3. The Haar average of a physical OTOC is given by
where we have again used the decoration delta constraint. This result relies on theorem 4 of [51] and is obtained in equation (B.6) of appendix B.

Computing Σ at O(1/q 2 )
We now compute the memory matrix at leading order in 1/q, enabling us to calculate the butterfly velocity to O(1/q 2 ). As discussed in section 3, we calculate the proxy σ(k, z) instead of Σ(k, z). We will find that the O(1/q 2 ) contributions to σ(k, t) decays exponentially fast with a decay-rate γ(ε) ≈ 2|g(ε)| set by the interaction strength ε.
It is convenient to express σ(k, z) as where Corrections to v B from fast processes (i.e., processes counted by Σ) are then given by We will often work with the real time version of D, We separate the scrambling part of each Floquet layer from the two-local bricks as before; where Using this and [V, Q] = 0 and simplifying the notation L Z → L and U Z → U, D(x, T) can be written as Using the decoration expansion for each brick equation (67), we express D(x, T) as a sum over decorations Γ, Γ = ⊗ r Γ r is a product of decorations on every site. The decoration on some site r is given by , a product of decorations on each leg of the site) and where for a binary string s r,i = (s r,i 1 , . . . , s r,i T−1 ). We will use the decoration expansion, and a carefully chosen decomposition of the initial states to express contributions as products of ATO'. We will use the theorems 1-3 to see what kinds of decorations can give rise to O(1/q 2 ) corrections to the circuit averaged σ, and then evaluate those. It turns out that only certain values of x are relevant, x = 0, 1, 2 at this order, as we will see.
Using the inversion symmetry of the Floquet unitary, we find We conclude that the contributions from x < 0 are at least a factor 1/q 2 smaller than the x > 0 contributions. In the remainder of this section we will see that the contributions for x 0 are no larger than O(1/q 2 ), and that therefore D(x < 0, T) = O(q −4 ). We will therefore only consider x 0 in the following sections.
To evaluate D Γ , we first project L(T) |F x onto the fast space. (90) where we have suppressed sites r < x (r > x + 1) and carefully chosen an orthogonal decomposition of QL(T) |F x in terms of four fast states, numbered from 1 to 4. We have used the shorthand Z ⊗2 and K to represent the following, Z(T) ⊗2 and K(T) are defined identically but with Z(T) in place of Z. Finally, |φ + (T) and |φ − (T) are given by where |⊥ = |− − 1 q |+ . The initial state is easily found using F 0 LQ = QL F 0 T . The four states numbered in equation (90) obey a useful set of identities, which allow us to identify and discard many lower order diagrams and significantly simplify the memory matrix calculation.
Using S, we can then write We next consider what happens when the wires of ±| are decorated. Let Γ r be a decoration on the four legs of some site r, i.e., Γ r = Γ r for some bit-string s i . This is shown graphically below.
The overlaps between |φ − (T) and a decorated +| state is then given by (95) Similar identities hold for the overlaps +| |φ − (T) and φ − | |± . These identities can be summarised as follows In general, Γ r will insert operators on each of the four legs of the input state. However, sometimes one can utilize the wirings between each of the legs to simplify the resulting expression, an example for +| Γ r is shown below, We say Γ r decorates the state if this simplification process cannot be used to remove all four components of Γ r . In the example above we were able to remove all of the non-identity operators from the (2, 1) wiring but not from the (1, 2) wiring. Notice that in every case in equation (96) and in (95), if either of the wirings in the +/− states carry a non-identity operator, the overlap with the φ ± states vanishes 9 .
Assuming that the decorations non-trivially decorate both wirings of the +/− states, these overlaps can be summarised as follows Where rather than give the full expressions we have simply presented the types of contributions (i.e., OTOCs, products of non-trivial correlators or terms that are manifestly O(1/q 2 )). This is often enough to identify diagrams that contribute to D(x, T) at O(1/q 3 ). In cases that require a more careful analysis we refer to equations (95) and (96).
These are useful identities because the diagrams contributing to the memory kernel tend to involve products of terms of this form. We will now see how, using our OTOC identities from section 5, this result allows us to pinpoint which diagrams are able to contribute at leading order in O(1/q 2 ). a,b (x, T). Casting our attention back to the orthogonal decomposition of the projected vector QL(T) |F x in equation (90) where labelled each of four orthogonal states from 1 to 4, we now use a short hand {|1, x, T , . . . , |4, x, T } to denote each of these states. This is also done for F 0 LQ. Using this, we define the following quantity,
In what follows, we examine D a,b for all possible pairs a, b; some calculations are carried out in appendices C and D. All contributions are O(1/q 3 ), except for the (2, 1) and (4,4) terms as summarised in 6.3.
• x > 0: Using the overlap identities equations (93), (95) and (96), we see that the contribution from site 0 and x either vanish or have the form Every other site may contribute either trivial or non-trivial correlators to the product. Therefore, after Haar averaging, theorem 1 of section 6.1 gives • x = 0: On site x = 0, we have The final three terms are manifestly O (1/q 4 ). The fifth and sixth terms are of the form Corr × Corr /q 2 , where these correlators are non-trivial. Therefore, using theorem 1, the Haar average of these terms (possibly multiplied by additional non-trivial correlators from other sites) is O (1/q 4 ). The second, third and fourth terms all have pre-factors of 1/q 2 ; if they are to contribute at this order, the accompanying correlators must be trivial. Using decoration delta constraints, this fact (a consequence of theorem 1) is written below All together, in the context of a Haar average, the following replacement is valid up to O(1/q 2 ).
We say that a decoration Γ r leaves a site r undecorated if it contributes only trivial correlators, . In the present case keeping only O(1/q 2 ) contributions forces all sites r = 0 to be left undecorated. This allows us to take the Haar average of equation (104) directly, using theorem 2 for the Haar average of a product of two correlators. This gives, The analysis of x < 0 would replicate that of x > 0, but with an additional factor of q −2|x| . The for all x, dVD 1,1 Γ r 2 Γ r † 2 = 2 (for r < 0 simply switch − ↔ + and 1 ↔ 2 in these equations). Choosing only Γ r that leave a site r < 0 (r > 1) undecorated (contributing only trivial correlators) is equivalent to the decoration delta constraint δ . The implementation of these decoration delta constraints is discussed in appendix B. The result of which is that for sites r < 0 we sandwiching each decoration layer Γ(t) by +| and |+ and by −| and |− for sites r < 0.
Writing the definition of K in equation (91) as K = Z 1 − Z 2 , where the index refers to which leg the Z decorates, we write the following, terms with more than two non-trivial correlators. (108) Because we are selecting only decorations which leave sites r = 0, 1 undecorated, we are able to take the Haar average of this expression in isolation. Terms with more than two non-trivial correlators are O(1/q 3 ) or smaller and are therefore discarded, leaving only the Haar average of the first term. Using theorem 2, this is given by These delta constraints are implemented by sandwiching the decoration layers Γ(t) with the appropriate wirings. The four different wirings configurations for sites 0 and 1 are shown below.

Table of results and summary
We summarise the contributions dVD a,b (k = 0, T) in the table below, highlighting the only contributions at O(1/q 2 ). We calculate the (a, b) = (2, 1) contribution in appendices C and in D we find that the remaining pairs (a, b) contribute at O(1/q 3 ) or smaller. Using equation (84), we are only required to know dVD a,b (k = 0, z = 0) to determine the butterfly velocity 10 . For (4, 4) this is given below, with the re-parameterisation s(ε) = sin(ε) 2 cos(ε) 2 , dVD (4,4) This re-parameterisation is motivated by the following observations about the dependence of v B and D on ε. Firstly, under the variable shift ε → π/2 + ε, the full coupling unitary then transforms as e −iεH → (−i) N−1 e −iεH . The operator dynamics is blind to global phases, meaning that ε → π/2 + ε is a symmetry of v B (ε) and D(ε). Secondly, by globally swapping leg 1 with 1 and 2 with 2, we find Σ V,ε (k, z) = Σ V * ,−ε (k, z). and Ω ε (k) = Ω −ε (k), where we have labelled Σ with the scrambler and coupling strength and labelled Ω with the coupling strength used (Ω is independent of V ). By integrating over V, we find another symmetry of the circuit averaged butterfly velocity and diffusion constant v B (ε) and D(ε) , namely ε → −ε. Using these symmetries, we determine that v B is a function of s(ε) only.
For (a, b) = (2, 1), we find dVD (2,1) where s(1 − 2s))] −1 and where f (ε) is found by numerically evaluating a sum involving a 5 × 5 transfer matrix in appendix C and given to good approximation (figure C2) by where a = 6.8 and b = 16.1. The factors 1 7 s(1 − 4s) 2 is obtained analytically by diagonalising the transfer matrix at small s and around the point s = 1/4. Altogether, we then find so that the correction to the circuit averaged butterfly velocity is where As discussed in section 5, the correction δv S (ε) only arises because of the spatial translation symmetry of the model. Whereas δv F (ε) arises due to both the spatial and Floquet symmetries. To illustrate this point, we carry out a similar calculation for a version of the model with independently distributed scramblers V x,t for each site x and at each layer of unitaries U t . This is done in appendix E where we find the corrections to the circuit averaged butterfly velocity δv ∼ O(1/q 3 ), i.e., v S,F both vanish. Turning to the variant of the model with spatial translation symmetry but no time translation symmetry, we find δv B = δv S (ε) + O(1/q 3 ). Comparing these result to equation (74) allows us to identify the corrections to v B with the symmetries of the model. Away from the weak coupling limit, the averaged butterfly velocity is given by equation (74). The weak coupling limit is discussed in section 7.2.

Discussion
In this section we discuss the late time behaviour of the memory function, addressing the limitations of the Haar averaged ATO' results of section 6.1. We also discuss the break down of the perturbative scheme used in section 6.2 as ε → 0.

Late times
We have presented a Kubo-like formula for v B in equation (61), involving the time integral of a correlator of fast operators, analogous to the J-J correlation functions in Kubo formulae for more conventional transport. For ε > 0 (more precisely g(ε) = 0), we find that the leading order correlation functions decay exponentially in time at leading order in 1/q. Given that the correlation functions are between fast variables, it is plausible that this exponential decay is not simply an artifact of the large q limit, i.e., the correlation functions decay over some time scale which is bounded below by a q-independent quantity τ (ε), which is positive except at exceptional point where the sites decouple i.e., when ε = nπ/2 (we discuss the decoupling limit later). Assuming that t ε (q) is a (cutoff ) time up to which we can safely use the large q correlator theorems of section 6.1 to identify the O(1/q 2 ) contributions to Σ, the error incurred by truncating the infinite time evolution is O(exp[−t ε (q)/τ (ε)]). We will now argue that t ε (q) increases linearly with q. As a result, the error due to truncation decays exponentially in q, which is much smaller than the O(1/q 2 ) corrections we calculate. This demonstrates the validity of our expansion in q −1 .
We now give some justification for the existence of a cutoff t ε (q) = cq for a q-independent constant c. We do this in two parts: (1) we argue that the expressions in section 6.1 for the Haar average of few correlators are valid for times t < cq; (2) we argue that this is enough to guarantee that contributions involving many correlators (more than two) are O(1/q 3 ) (for times t < cq).
In order to do this, we must go beyond the q → ∞ scaling results of section 6.1. In the absence of an analytical understanding of the corrections at finite q, we took a numerically Haar average of an assortment of ATO correlators and products of correlators, finding good evidence that the expression for the Haar average of a product of two correlators (theorem 2) and of OTOCs (theorem 3) experience only O(1/q 3 ) errors for times t < cq. It turns out that this, along with the use of a Holder's inequality and the monotonicity of the p-norm, is enough to bound the Haar average of arbitrarily many correlators as O(1/q 3 ).
To show this, we start by bound the expectation of a product of many (more than two) non-trivial correlators as shown below, Then, using the fact | Z(t i ) | 1, we can bound the average of a product of many correlators by an average over only a few. We choose to highlight three correlators Using a generalised Holder's inequality and monotonicity of the p-norm, we further bound equation (124) by Then, for times t < cq and using theorem 2 for the second moment of a correlator, this bound takes the form for an O(1) constant C. The correlators contributing to Σ have 1 S(t) 2, so that the right hand-side of equation (126) then simplifies to C /q 3 for an O(1) constant C . For times t < cq, any contribution to Σ with three of more non-trivial correlators is O (1/q 3 ), and the O(1/q 2 ) contributions are counted precisely as we have done in section 6.2 and the appendix C.

The ε → 0 limit
In the ε → 0 limit, the Floquet unitary is given by a product of single-site unitaries, so that there is no operator growth dynamics at all (v B = 0). However, the ε → 0 limit of the expression equation (74) yields v B = 1/q 2 . The reason for this failure to predict the correct operator dynamics as ε → 0 lies in the fact that our 1/q perturbative scheme breaks down once ε is as small as ε ∼ 1/q. Diagrams that we previously dismissed as O(1/q 3 ) at strong coupling, can in fact contribute at O(1/q 2 ) due to appearance of factors of ε ∼ 1/q in the denominator (after summing over time). We do not have access to the exact expressions for the Haar average of ATO correlators in the Floquet model, instead relying on O(1/q 2 ) results. However, in appendix E, we study a variant of the model with independently distributed scramblers V t between time-step for which exact Haar averaged results are possible for certain diagrams. If we were to naively identify the 1/q 2 contributions before taking a sum over time, as we did in section 6.2, we would find the same, incorrect (a, b) = (4, 4) contribution and incorrect behaviour of v B as ε → 0. In appendix E.1, we consider the same family of diagrams, those involving only two nontrivial correlators (whose contribution we dub D 4,4 2 ), but now take the Haar average exactly. In doing so, we find that the previously troublesome (4, 4) contribution now vanishes as ε → 0 equation (E.23), We suggest then, that there is a region of width O(1/q) in figure 2, where v B rapidly approaches zero.

Conclusion
In this paper, we re-purposed a hydrodynamic formalism (the MMF) for information transport calculations by identifying the conservation of the right density of a Heisenberg time evolved operator as a pseudo-local conservation law. A number of modifications to the existing MMF are necessary: in particular, we are led to use an unusual inner product on our space of observables (equation (20)), which leads to the prediction of ballistic operator growth assuming we have identified a sufficiently complete space of slow operators. We use this new formalism to produce a Kubo formula for the butterfly velocity (equation (36)) and also find symmetry constraints on the operator growth light-cone (equation (48)).
In section 4 we used this formalism to investigate a family of translationally invariant Floquet models, finding leading order expressions for the circuit averaged butterfly velocity v B and operator front diffusion constant D. By leveraging large q random unitary dynamics [51], we found that a simple hierarchy of contributions to the averaged memory matrix Σ emerges, organised by the number of non-trivial correlators contributed. This enabled us to select only the O(1/q 2 ) contributions, associated with processes that explore a manageable sub-region of Q in which only the sites directly either side of the cut (the +/− domain wall) are decorated by non-identity operators. We have then counted all these processes and found that at O(1/q 2 ), the memory matrix decays exponentially fast, with an O(1) decay rate. We used this to calculate corrections to v B , and were able to distinguish the effects of spatiotemporal symmetry on information transport by identifying which processes arise as a consequence of spatial translation and Floquet symmetry.

Further work/outlook
More exotic information hydrodynamics than biased diffusion is possible in the presence of additional conserved charges. With a U(1) charge, the diffusive conserved components acts as a source of non-conserved operators, giving rise to power-law tails in the spatial distributions of operator weight [21,22]. It will be interesting to incorporate additional symmetries, such as a U(1) or fracton symmetry, into the MMF and perform a mode coupling analysis to confirm and perhaps extend existing results. Even in the presence of conservation laws, the diffusive broadening of the operator front appears ubiquitous in chaotic and interacting integrable systems in 1D [15,52]. What makes this diffusive broadening so universal? Perhaps, by examining MMF expressions for the front diffusion constant, we can say something about the nature or number of additional slows required to find a hydrodynamical equation other than biased diffusion. Other avenues to explore include the formulation of an information mode MMF at finite temperature/chemical potential and, with only minor modifications, calculating purity. Another potentially fruitful application of formalism is in the setting of perturbed dual unitary circuits, which may serve as a testing ground for perturbative MMF calculations.
where the super-operator Φ is given by Q is the Hermitian projector onto Q. We must show that this constitutes a bona fide inner product by checking each of the inner product axioms. where we have used the fact that Φ is Hermitian, this is because χ x is real and the projector Q is Hermitian. (b) Linearity in second argument: A|βB + γC = β A|B + γ A|C for scalars β, γ, where we have used linearity in the second argument of the inner product ·| |· . (c) Positive definiteness: A|A > 0 for A = 0, where we have used the fact that Φ is a positive definite matrix. This is easily seen by noticing that all χ 2 x are real and positive. This confirms that A|B is indeed an inner-product and allows us to consider a new innerproduct space in which the weight operators are orthonormal.

Appendix B. Imposing the decorations delta constraints
In this section we revisit the Haar average identities for ATO correlators and their moments, specifically theorems 2 and 3. The final results (at leading order in 1/q) involved 'delta constraints' (as defined in definition 2) which are zero/one depending on whether or not the decorations were equal (for all scramblers V ). We have also seen these delta constraints whenever we have demanded that a correlator be trivial (see equation (104)). In this section, we show how these delta constraints can be imposed by placing the decorations on a contour and inserting projectors at every time step; the insertion of projectors has an appealingly simple graphical interpretation, which facilitates our calculation of σ in the main text.
The simplest example of a decoration delta constraint to consider is δ Γ 1 ,Γ 1 , where the two decorations are time-ordered, i.e., Γ 1 = Z(1) a 1 Z(2) a 2 . . . Z(n) a n and Γ 1 = Z(1) b 1 Z(2) b 2 . . . Z(n) b n for binary strings a and b. In this case, the delta constraint checks that a i = b i for all i. This is equivalent to putting Γ 1 and Γ 1 on a wiring with a single forward and backward contour Figure B1. A decoration Γ on the four legs 1, 1, 2 and 2 with n decoration layers.
(labelled 1 and 1) and then placing a projector between each of the decoration layers are shown below. (B.1) The right-hand side checks that at each decoration layer t, a t = b t , this is precisely the same as the decoration delta constraint. A site r 0 with decoration Γ r contributes +| Γ r |+ in the decoration expansion, suppose that we demanded that this contribution contained only trivial correlators ( +| Γ r |+ = 2 ). The decorations that meet this condition are those that satisfy The projector insertion technique described above selects precisely these relevant decorations as follows For sites r > x + 1, where the contributions take the form −| Γ r |− , the decorations that contribute trivial correlators are identified in the same way but by projecting with |− −|.
As in the previous case, we will obtain a prescription for rewiring the legs of a contour at every time step, this prescription will identify decorations Γ that satisfy the constraint. The delta constraint δ , appears whenever we demand that the decoration Γ r on some site r, with 1 < r < x, contributes only a trivial correlator, q −| Γ r |+ = = 1. Finding the Γ which contribute non-trivial correlators is equivalent to finding Γ which satisfy the delta constraint.
we will examine last. Let t i , 0 < t i n, be the first decoration layer in from the left that nontrivially decorates the −| wiring. Likewise, let t f be the first decoration layer in from the right that non-trivially decorates the |+ wiring. If t i = t f , the resulting correlator is certainly nontrivial and therefore X = (the delta constraint is not satisfied). Otherwise, if t i = t f , we have −| Γ |+ = −| Γ(t i ) |+ . In order for this to be a trivial correlator, Γ(t i ) must decorate −| such that −| Γ(t i ) = −| Z(t i ) ⊗2 (see equation (91) for definition of Z ⊗2 ). We can select this case by sandwiching every decoration layer t < t i by −| and |− , every layer t > t i by +| and |+ and the layer t i by −| on the left and on right by q |0 . Finally, the cases where no decoration layer decorates the −| (an obvious example where X = ) wiring can be selected by sandwiching every layer with −| and |− . Therefore, the decoration delta constraint can be rewritten as Or diagrammatically as

B.3. Connection to the OTOC Haar average
In this section we use the results of [51] to re-express the Haar average of a physical OTOC in terms of decoration delta constraints, arriving at the form of this theorem presented in theorem 3 of section 6.1. The Haar average of an OTOC given in [51] is quoted below Comparing this to equation (B.3), we have the following, (B.6)
Sites 0 and 1 each contribute a product of non-trivial correlation functions plus terms of size 1/q 2 . The Haar average of this is O (1/q 4 ).
Splitting the decoration site by site, we find The contribution from site 1 is given in full below, Every term is either an OTOC, a product of two non-trivial correlators, or is manifestly O(1/q 2 ). The decorations on each site r = 1 may result in contributions that are either: (1) a trivial correlator; (2) a single non-trivial correlator; (3) a product of two non-trivial correlators. Note that none of these non-trivial correlators are OTOCs because they live on a contour with only a single forward and backward segments. Therefore, if any decoration on sites r = 1 does anything other than contribute trivial correlators, we have dVD 2,1 . Keeping only O(1/q 2 ) contributions forces every site r = 1 to contribute trivial correlators only. This allows us to take the Haar average of equation (C.4) in isolation. To do this we find it useful to write the follows results (consequences of theorem 1), Where all Γ are products Z(1) α 1 . . . Z(T − 1) α T−1 for some binary string α = (α i 1 , . . . , α i T−1 ). Using theorem 2 we find the useful result Using these results and theorem 3 for the Haar average of a physical OTOC, we find that at O(1/q 2 ), every term in (C.4) cancels, and we find dVD 2, Sites 1 and x each contribute factors of form OTOC + Corr × Corr + O(1/q 2 ) (see equation (98)). Our theorem for the Haar average of a product of correlators (theorem 1) implies that if any other site contributes a non-trivial correlator, the Haar average of the total contribution will be O(1/q 3 ) or smaller. Thus, working to O(1/q 2 ), we will look for contributions where sites r = 1, x give only trivial correlators. Moreover, theorem 1 also implies that the leading order contribution comes from the OTOC 1 OTOC x cross term In summary, in evaluating the contributions for x 2, we need only consider those terms in the decoration expansion corresponding to OTOCs on site 1, x, and trivial correlators on all other sites. As in the (a, b) = (4, 4) calculation, we select decorations that leave the contours on sites r > x (r < 1) undecorated by inserting the projector |+ +| (|− −|) between every Floquet layer. For sites 1 < r < x the non-decoration condition is more delicate. For these sites, the input wiring configuration is of − type and the output wiring configuration is of + type giving an OTO type contour. The requirement that the OTO contour is undecorated (i.e., a trivial correlator) is equivalent to the decoration delta constraint δ , . In appendix B we show that this decoration delta constraint can be rewritten as The final term in equation (C.9) selects the decorations Γ r that never decorate the initial state −|, so that at each time step the −| wirings never carry any non-identity operators. With these |− −| projectors in place, let us sum over all decorations Γ with the coefficients C Γ of equation (88), in doing so we replace each of the decoration layers with the full unitary layers U(t). This is pictured below, where we have highlighted site x, with its terminating state |φ + (T).

(C.10)
Where, crucially, the brick property of equation (70) can be used to remove every brick to the right of site r (below site r in the diagram above). This yields the right hand-side of the equation above. A consequence of which is that the terminating states |φ + (T) on site x is contracted directly with −|. Using equation (93), we see that this diagram vanishes. Therefore, the decorations selected by the final term of equation (C.9) cannot contribute to D 2,1 (x > 2, T).
In what follows, we will consider only the decorations selected by the sum in equation (C.9).
As previously noted, only the OTOC 1 × OTOC x terms can contribute at O(1/q 2 ). This allows us to drop all but the −| Z ⊗2 term in the φ − | state (see equation (92)) of site 1 and the Z(T) ⊗2 |+ term in the state |φ + (T) of site x. The fact that OTOC 1 begins with −| Z ⊗2 and OTOC x ends with Z(T) ⊗2 |+ and the condition that these OTOCs are complex conjugates of each other (using theorem 2) forces both OTOCs to be physical OTOCs of the same length (the length of physical OTOCs is the difference between the latest and earliest time appearing in the OTOC). We will sum over all possible OTOC lengths τ , 1 τ T − 1.
We now describe how we select only those decorations that produce a physical OTOC with length τ . For OTOC 1 , we must ensure that every unitary layer t > τ does not decorate the |+ wirings on site 1. We do this by inserting the projector |+ +| to the left of each of these layers. Requiring then that the τ th unitary layer decorates the |+ wiring by leaving Z(τ ) ⊗2 |+ is achieved by sandwiching the layer with q ⊥| and |+ . Making all of these selections, the contribution from site 1 is takes the form shown below, We use the same strategy to select decorations that contribute physical OTOCs of length τ on site x as well. The resulting contribution takes the form shown below, where we have defined Rather than focus on a single decoration Γ, we are able to select every O(1/q 2 ) to D 2,1 (x = 2, T) simultaneous by summing over decorations Γ with the appropriate coefficients C Γ (as introduced in the decoration expansion in equation (88)), Each of these layers is contracted by various combinations of the +, −, ⊥ and 0 states. We introduce a short-hand for each of these contractions, this is given below, (C.14) One further short-hand we use is The contractions (of the unitary layers) at site 1 now take the more readable form, The short-hand version for site x is found similarly, but with T − τ − 1 '−' contractions, followed by a '0' contraction on layer T − τ , followed by the OTOC. We now also apply this short-hand to the contributions on site r, 1 < r < x, in particular, this yields (C. 16) where we have discarded the decoration that never decorates the initial state, as previously discussed.
By keeping only the O(1/q 2 ) contributions to D 2,1 (x 2, T), have found a set of diagrams labelled by: τ , the length of each of the physical OTOCs; t r for each site r ∈ {2, . . . , x − 1}, the positions the '0' contraction on site r. Setting t 1 = τ and t x = T − τ , we label each diagram by a sequence (t 1 , t 2 , . . . , t x−1 , t x ). An example diagram, labelled (t 1 , t 2 , t 3 , t 4 , t 5 , t 6 ) = (5,3,6,2,8,7), that contributes to D 2,1 (x = 6, T = 12) is given below In fact, for x > 2, only contributions where t 1 t 2 < t 3 < . . . < t x are non-zero. One of the following motifs must appear in any diagram not satisfying this property, each of which is zero, (C. 17) In the non-zero diagrams (i.e., those for which t 1 t 2 < t 3 < . . . < t x ) the contracted unitary layers collapse into contractions of only a short portion of the full layer this follows from the brick property equation (70). We demonstrate this process by collapsing a semi-infinite domain of '+' contractions below, We collapse the '−' domain in the same way. Ultimately, every layer reduces to one of the motifs below, (C. 19) where M(t) is a single site operator. Each diagram as a whole decomposes into a products of the motifs below, which we use to introduce a compact diagrammatic notation. We also give the numerical value of each motif.
To give an example of the correspondence between the contracted unitary layer diagrams and this new diagrammatic notation, consider the diagram corresponding to the sequence (t 1 , . . . , t 4 ) = (3, 3, 6, 7) below (C.21) For x > 2, the diagrams come in two qualitatively different types: (1) t 1 < t 2 and (2) t 1 = t 2 . All type 1 diagrams have the following diagrammatic form (C. 22) where T − t x = t 1 . Whereas, type 2 diagrams have the form (C.23) For x = 2, we will find diagrams with similar motifs. We had found that for x 2, the relevant diagrams are labelled by a sequence (t 1 , t 2 , . . . , t x ), with t x = T − t 1 and where t 1 = τ is the length of the OTOCs. For x = 2 the diagrams are simply labelled by (τ , T − τ ). A complication for x = 2 is the fact that the OTOCs may overlap in time. This is because no matter where we position the '⊥' contraction of site 1 and '0' contraction of site 2, we can never encounter any of the vanishing motifs of equation (C.17), which in the case of x > 2 force t 1 t 2 .
To address this complication, we split x = 2 in three types of diagram: (1) T − τ > τ, the OTOCs do not overlap-the treatment of these diagrams is exactly the same the type-1 diagrams discussed for x > 2; (2) τ = T − τ , the OTOC's 'touch'-this is similar to the type-2 diagrams in x > 2; (3) τ > T − τ , the OTOCs overlap.
The touching OTOC contributions (τ = T − τ ) have the following form/motif, We give the overlapping OTOC contributions (τ > T − τ ) the following compact notation, (C. 25) We study these contributions in detail in the next section, appendix C.4. We are interested in computing dV x D (2,1) (x, T), we have seen earlier in this appendix that the x = 0, 1 contributions are O(1/q 3 ) (as are the x < 0 contributions, equation (89)). Therefore, at O(1/q 2 ), we need only sum over x 2. Fortunately, there is an abundance of cancellation between these diagrams. We will cover some examples and then give the general result. Starting with the simplest, we compute dV x D (2,1) (x, T) for T = 2, 3 and 4 explicitly (where the Haar average is implied, but not written below). The additional factor of g 2 in equation (C.7) has been divided through in the equations below.
We have used the third rule of equation (C.20) to cancel the two terms (associated with x = 2 and x = 3) for T = 2 and to cancel the second and fourth terms and the fifth and sixth terms for T = 4. Notice that the terms that remain after cancellation are all connected diagrams (i.e., the OTOCs either touch or overlap), all the diagrams with 'gaps' (i.e., where a vertical line can be drawn through them without intersecting a wobbly line, representing an OTOC) have conspired to cancel. This is no coincidence, it is a consequence of the fact that processes that contribute to Σ (and hence the corrections to v B ) must explore only the fast space, this is due to the Q projectors that project out all slow components at every time-step in Σ. Diagrams with a gap represent processes that take a detour to the slow space. We can see this by returning to the contracted unitary layer picture; take, for example, the T = 3 diagram associated with (t 1 = 1, t 2 = 2), this is the first diagram in the T = 3 sum above. It is equivalently given by (C. 26) Between the two unitary layers we have vectors that are clearly in P. The only diagrams that have no gaps are those for x = 1 (which we have seen all vanish at O(1/q 2 )) and the overlapping or touching OTOC diagrams for x = 2. We will next evaluate the overlapping OTOC diagram contributions, before finally calculating the touching OTOC diagram contributions.

C.4. Overlapping OTOC diagrams
In this section we investigate the contributions to D (2,1) (x = 2, T) that take the form of overlapping OTOCs. We name this contribution D (2,1) O (x = 2, T). These OTOC overlap diagrams are given in detail below for OTOC length τ and total diagram length T.

(C.27)
It will be useful to define the single site operators M + (τ ) and M − (τ ) and repeat the definition for M(t) seen in equation (C. 19). (C.28) The Floquet layers T − τ + 1 through to τ − 1 have been reduced to two site operators which has already been introduced in equation (112) and named T (t) for Floquet layer t. With these definitions we have (C. 29) where we have denoted the following, Using theorem 2 of section 6.1, the Haar average of a product of two physical OTOCs is Each of the decoration delta constraints can be implemented as described in appendix B. In doing so, we sandwich each decoration layer with a wiring configuration labelled A for the first term in equation (C.30) and by a configuration labelled B for the second term. We then use the shorthand below.
(C. 31) where the grey boxes are placeholders for the possible decorations at each layer. A grey box (labeled t) may decorated each of the incoming legs 1 and 2 with Z(t) and each of the legs 1 and 2 with Z(t) * . On the right-hand side, every super-leg carries a label a which labels one of the permutations A, B of the legs 1, 1, 2, 2 given in equation (C.32) below, Each super-leg is implicitly carrying a factor 1/q 2 . The contraction between decorations within a column is given more explicitly below where on the right-hand side we have introduced a shorthand which makes obvious the decomposition into two chains of tensor contractions. In general, the Haar average decomposes into t + 1 chains. By summing over T 1, we are able to drop the delta constraint above, doing this sum is equivalent to calculating the Laplace transformed dVD 2,1 (x = 2, T) at z = 0, i.e. dVD 2,1 (x = 2, z = 0). This is sufficient for calculating v B . Before we determine the decomposition for a general s 0 and t 0, we first define the following chains, For any (s, t) There are two qualitatively distinct types of contribution to equation (C.34): (1) t + 1 ≡ 0(mod s + 1), a product of tC a chains and one C a +− chain; (2) otherwise, a product of t − 1C a chains, one C a + chain and one C a − chain. For case 1, the C a +− chain is n = s+1 t+1 − 1 0 transfer matrices long, i.e., C a +− (n), while all t of C a chains are n + 1 matrices long, C a (n + 1). For case 2 with s + 1 = n(t + 1) + k for 1 k t and starting from the left, the first k − 1 M's are on chains of length n + 1 and terminate on an M. The kth M sits on a chain of length n and terminates on M − , the following t − k M's sit on chains length n that terminate on an M. Finally, the M + sits on a chain of length n and terminates on an M. All together, the contribution is C a + (n)C a − (n)C a (n + 1) k−1 C a (n) t−k . This is summarised below, In both cases, all but the n sums can be evaluated to give, C a (n + 1)) . Notice that the dependence on the label a vanished. This is true of all chains. Algebraically, these chains (now without leg labels) are equivalently given by where these angles braces reflect the trace inner product for tensors with input and output super legs l = (1, 1, 2, 2).
(C. 42) Equipped with this, we can now write, for the contribution due to overlapping OTOC diagrams, the following where f (ε) is given by where the chains are implicitly dependent on ε. To evaluate this sum, we must understand the space that the transfer matrix acts on. Each of the legs 1, 1, 2 and 2 may be either undecorated or carry a Z decoration. This means that the state our state space is dimension 2 4 T is block diagonal in the subspaces S and A, and since all |M (±) lie in S, we are able to restrict our considerations to this five dimensional space only. In this restricted space with basis order ( , 4, S 1,1 , S 1,2 , S 1,2 ), T is given by the product T =T −ŨT + , wherẽ Four of the eigenvalues of T are bounded by |λ 1,2,3,4 | |g(ε)|(1 − 2|g(ε)|)/2. The largest eigenvalue (for all ε) is bounded by |λ 5 | (1 − |g(ε)|) 3 .
In section 4 we showed that v B (ε) must have the symmetries ε → −ε and ε → π/2 + ε. All analytically computed O(1/q 2 ) contributions (the (a, b) = (4, 4) contribution and the touching OTOC contribution) are found to respect this symmetry, as does the O(1) contribution from Ω. We therefore conclude that f (ε) must also have this symmetry. Additionally, we know that f (ε) is a function of sin(ε) 2 only (the transfer matrix and the initial and final vectors in our transfer matrix calculation are functions of sin(ε) 2 only). Together with the symmetry requirements, this means that f (ε) is in fact a function of s(ε) = sin(ε) 2 cos(ε) 2 .

C.5. Touching OTOC diagrams
We will now sum up all of the touching OTOC diagrams. These are given in equation (C.24) with T = 2τ so that the two OTOC have the same length. We name this contribution D (2,1) Touch (x = 2, T).
We are only interested in the z → i0 + limit, this is given by The factor gλ(t) 2

Appendix E. Model variation: independent scramblers V x,t
We can carry out a similar calculation for a version of the model with independently distributed scramblers V x,t for each site and at each layer of unitaries U t (analogous to the Floquet layer in the Floquet model), i.e., we break both spatial and temporal translation symmetry. We start with, for large t, the fact x xρ( Using the definition of ρ(k, n) we write the following Then writing Ω(k) = v 0 (1 − e −ik ) where v 0 = |g(ε)| is the q → ∞ butterfly velocity, and noticing that lim k→0 ∂ k (Ω(k)ρ(k, n)) = iv 0 , we find (1 + Ω(k)) n−m −1 σ(k, m).
Analogous to the definition in equation (85)  This is same expression for the circuit averaged correction to v B that we found using the MMF in equation (121). We calculate D(k = 0, z = 0) in the same way as in section 6.2. However, the two O(1/q 2 ) contributions found for the Floquet model (i.e., the so-called (2, 1) and (4,4) terms) relied on correlations between different sites and on the discrete time translation symmetry. We can employ the same calculation above for the variant of the model with spatial translation symmetry but no time translation symmetry (i.e., independently random scramblers between time-steps) in order to find δv B = δv S (ε) + O(1/q 3 ) where δv S (ε) is as given in equation (122). Comparing these result to the result of equation (121) allows us to see that the O(1/q 2 ) corrections only arise when there is spatial translation symmetry. Moreover, this enables us to identify the δv F contribution (see equation (122)) with the presence of time translation symmetry.
E.1. The ε → 0 limit in the absence of time-translation symmetry In section 6.2.4, we identified all the contributions to σ(x = 0, t) that contained the product of two non-trivial correlators and used theorem 2 to evaluate these contributions to O(1/q 2 ). We were unable to extract the correct ε → 0 behaviour of these contributions due to the appearance of ε in the denominator (after summing over the time t). Luckily, for the model variant with independently random scramblers at each time-step, we are able to accurately sum over all 'two-correlator' contributions, after circuit averaging. While this is ignores correlations between Floquet layers, it demonstrated the failure of naively expanding the memory matrix in 1/q before taking the ε → 0 limit and the resolution by summing over all contributions.
We identify the contributions that have the form of two non-trivial correlators in the same way as done in section 6.2.4. We repeat this below for ease of reading. (E.14) where, so as to ensure that only sites 0 and 1 contribute non-trivial correlators, we have sandwiched each layer by +| ( −|) and |+ (|− ) for sites x < 0 (x > 1) to leave the two site operators T (t) introduced in equation (112) and represented above in equation (E.14) by the rectangle blocks. The second ellipsis represents terms that contain at least three non-trivial correlators. Using the same argument as was used in section 6.2.4, we see that the (i, j) = (1, 1) and (2, 2) choices give identical contributions and the (1, 2) and (2, 1) contributions are the complex conjugate of the (i, i) contributions, equation (114). Therefore, we are free to study only the (i, j) = (1, 1) case. We insert projectors ensure that the 2, 1 contour of site 0 and the 2, 2 contour of site 1 remain undecorated (so as to ensure that there are only two non-trivial correlators). The contribution is given below, (E. 15) where each closed loops is associated with a normalising factor of 1/q that is suppressed in the diagram. The contracted single time-step unitaries are now just operators on four indices. We denote the contribution to D 4,4 (x = 0, T) that includes only two non-trivial contours as D 4,4 2 (x = 0, T). Separating out the scrambling part of the unitaries we find (E. 16) where each of the pink bricks is given by the following contraction of T . (E.17) Circuit averaging this tensor contraction is easy as each unitary only appears twice. We use the following result for the Haar average of two V's and two V † 's.
(E. 23) Importantly, this contribution, which keeps account of all orders in q, vanishes as ε → 0, unlike the O(1/q 2 ) contribution (naively) calculated in section 6.2.4, which (incorrectly) predicts lim ε→0 D 4,4 (x = 0, z = 0) = −1/q 2 . However, this is still far from a complete account of contributions to D(k = 0, z = 0) (for instance, contributions involving more than two correlators), but it does demonstrate that the apparent pathological behaviour as ε → 0 is the result of a naive (and incorrect at small ε) counting of the O(1/q 2 ) contributions. Notice that for qε 1, the small ε results of section 6.2.4 (which predict D 4,4 (x = 0, z = 0) ≈ −1/q 2 ) are in agreement with the analysis presented here.