Simulating lattice quantum electrodynamics on a quantum computer

U(1) lattice gauge theories (LGTs) offer a way to simulate quantum electrodynamics, one of the three forces unified by the Standard Model of particles physics. Here we provide complete, quantum-gate-by-quantum-gate algorithms to simulate U(1) LGTs on a fault-tolerant quantum computer. We further perform rigorous error analysis in order to derive concrete estimates of the quantum computational resources required for an accurate simulation of U(1) LGTs using a second-order product formula. We show that U(1) LGTs in any spatial dimension can be simulated using O˜(T32N32Λϵ−12) non-Clifford T gates, where T is the simulation time, N is the number of lattice sites, Λ is the truncation parameter for the bosonic gauge fields, and ε is the simulation error. This work paves the way towards fault-tolerant quantum simulations of physical models closely related to the Standard Model of particle physics.


Introduction
Simulating quantum systems is one of the most promising applications of a quantum computer [1], which may lead to exciting discoveries in both applied and fundamental sciences. Many efficient quantum algorithms designed to simulate quantum chemistry and materials science on fault-tolerant quantum computers have been proposed in recent years (see [2][3][4] and references therein). The same cannot be said true for quantum simulation for high-energy physics. While the development of quantum simulation for quantum field theories has made significant progress in digital quantum simulation [5][6][7][8][9][10][11][12], analog quantum simulation [13][14][15][16][17][18][19][20], and variational quantum simulation [21][22][23][24] (see [25,26] for reviews and more references therein), quantum algorithms for the fault-tolerant era are still quite scarce [6,[9][10][11]. Here we investigate how particle physics can be efficiently simulated on a universal fault-tolerant quantum computer. Specifically, we look at the problem of simulating quantum electrodynamics (QED), one of the three fundamental forces in the Standard Model of particle physics, via lattice gauge theories (LGTs).
LGTs are quantum gauge-field theories formulated on a discrete space-time lattice [27]. They provide a way to simulate particle physics non-perturbatively 4 , and have arguably become the most successful computational method in particle physics. This is evidenced by the fact that various quantities of physical relevance have been computed via Markov Chain Monte Carlo (MCMC) simulations of lattice quantum chromodynamics (QCD), i.e. the theory of strong interactions between quarks and gluons. These quantities include static properties of baryons, hadrons, and nuclei [28], and benchmark theoretical results to the recent g − 2 experiment [29,30]. Despite the success of MCMC simulations, it is not in general an efficient (classical) method to simulate quantum systems, as it suffers from the exponentially hard 'sign problem' [31]. the fermions and anti-fermions reside on the sites, while the gauge bosons that mediate the interactions between them occupy the links. As such, the lattice QED Hamiltonian is defined by [42] H =Ĥ gauge +Ĥ matter , (1) whereĤ gauge describes the dynamics of the electric and magnetic gauge fields, andĤ matter accounts for the dynamics that involve the fermionic matter. The pure gauge HamiltonianĤ gauge consists of the electricĤ E and magneticĤ B Hamiltonians, and is given by [42] H gauge =Ĥ E +Ĥ B , wherê ⃗ n,lÊ 2 (⃗ n, l), where g denotes the bare coupling strength, □ represents the elementary square cells, called plaquettes, of a lattice., andP □ is the so-called plaquette operator defined bŷ P □ =Û(⃗ n, i)Û(⃗ n +î, j)Û † (⃗ n +ĵ, i)Û † (⃗ n, j).
The gauge-field operatorsÊ andÛ act on an infinite-dimensional Hilbert space that is equivalent to that of a planar rotor. Here we adopt the angular-momentum-like electric basis |E , in whichÊ is diagonal and integer-valued, i.e. [43] andÛ is a ladder operator defined byÛ They satisfy the on-link commutators Furthermore, the squared electric-field operator is readily given bŷ Note that we dropped the link location indices for notational convenience. In order to represent the infinite-dimensional gauge-field operator on each link on a finite-size quantum computer, its Hilbert space must be truncated at a cutoff, Λ 6 . The electric field operator then becomeŝ A non-negative integer 0 ⩽ j < 2 η is represented on the binary η-qubit register as Using this binary computational basis, the eigenbasis |E is encoded via E = j − Λ. Then, the number of qubits on the link register for each link is given by η = log(2Λ), where Λ is assumed to be a non-negative power of two, and we have and will continue to assume all logarithms are base two, unless otherwise specified.
As in [9], we periodically wrapped the electric fields at Λ such that This spoils the on-link commutator at the cutoff to give However, as is also explicitly discussed in [9], for a truncation with a large cutoff value, the states |−Λ and |Λ − 1 are energetically unfavorable, and hence, will hardly be populated at all. Therefore, the spoiled commutator will likely not be a problem.
The matter Hamiltonian consists of two terms, massĤ M and kineticĤ K Hamiltonians, defined by [42] H matter =Ĥ M +Ĥ K , wherê (ψ(⃗ n) †Û (⃗ n, l)ψ(⃗ n +l) +ψ(⃗ n)Û † (⃗ n, l)ψ † (⃗ n +l)), where m is the fermionic mass, a is the lattice spacing,ψ(⃗ n),ψ † (⃗ n) are the fermionic annihilation and creation operators, respectively, at site ⃗ n. Physically, we can interpret creating (destroying) a particle at an even (odd) site as creating a fermion (anti-fermion).Ĥ M computes the mass of all fermionic matter by multiplying the number of fermions and anti-fermions in the lattice by m. As such,Ĥ M governs the dynamics of free fermions and anti-fermions in the absence of gauge fields.Ĥ K describes the dynamics of fermionanti-fermion pair creation and annihilation, and the corresponding changes in the mediating gauge fields. We map the fermionic operators to qubit operators using Jordan-Wigner (JW) transformation [44].

Quantum circuit implementations
In this section, we describe in detail how to implement each Trotter step on a quantum computer. First we provide the Hamiltonian in the qubit space, i.e. whereD are diagonal operators, where (−1) ⃗ n is either +1 or −1 depending on whether ⃗ n is a fermion or anti-fermion site, respectively, reflective of the use of staggered-fermions [42], and [(Û(⃗ n, l) +Û † (⃗ n, l))(X(⃗ n)X(⃗ n +l) +Ŷ(⃗ n)Ŷ(⃗ n +l))ζ ⃗ n,l + i(Û(⃗ n, l) −Û † (⃗ n, l))(X(⃗ n)Ŷ(⃗ n +l) −Ŷ(⃗ n)X(⃗ n +l))ζ ⃗ n,l ] is an off-diagonal operator. We useX,Ŷ, andẐ as the Pauli x, y, and z matrices, respectively. We abuse the notationl to denote a unit vector in direction l. The operatorsζ ⃗ n,l are tensor products ofẐ, which arise from JW. We consider a d-dimensional L d -site lattice, where there are L sites in each direction, with periodic boundary conditions. The length of eachζ ⃗ n,l is O(L d−1 ). For brevity, we suppress theζ ⃗ n,l operators in the remaining part of the section. The second off-diagonal operator due to the magnetic contribution is given bŷ where h.c. denotes Hermitian conjugate. We use Suzuki-Trotter formula [35] as our simulation method. The Trotter terms to be implemented are of the form e iD (M) ⃗ n t , e iD (E) ⃗ n t , e iT (K) ⃗ n t , e iL (B) ⃗ n t , where t is a sufficiently small number to ensure the Trotter error incurred is within a pre-specified tolerance. In the remainder of this section, we discuss synthesizing circuitss for each of the four Trotter terms.

Mass term e iD (M)
⃗ n t The implementation of this term is straightforward. A single-qubit R z (θ) = exp(−iθẐ/2) gate, where θ = −m(−1) ⃗ n t, applied to the qubit that corresponds to site ⃗ n in the site register suffices. Note that the angles of rotation are the same for all even and odd sites, respectively, up to a sign. The sign difference can be rectified by conjugating the z-rotations with NOT gates. Then, a circuit with one layer of R z gates with the same angle of rotation results. This circuit can be implemented efficiently using the weight-sum trick in [38,39]. Briefly, consider applying the same angle R z gates on p qubits simultaneously. This imparts a phase to an input state with the phase angle being proportional to the Hamming weight of the input. This can thus alternatively be implemented by first computing Weight(p) into an ancilla register, while incurring p − Weight(p) ancilla qubits and at most 4(p − Weight(p)) T gates, where Weight denotes the number of ones in the binary expansion of the integer number p. Finally, we apply log(p) + 1 R z rotations to the ancilla register to impart the correct phase, and then uncompute the weight on the ancilla register. For a d-dimensional lattice with L d lattice sites, p = L d .

⃗ n t
Here, we present a method to implement the electric term. The method modifies that presented in [9], and provides an improvement in gate counts. We import the steps detailed in [9] for the convenience of the readers.D

(E)
⃗ n is a sum of d commuting terms, and hence, its evolution can be implemented exactly as a product of d sub-evolutions, We next discuss the implementation of only one sub-evolution without loss of generality. We herein drop the link location index for notational convenience. The electric field operator and a qubit-encoded gauge field state obeys the eigenvalue relationÊ As such, the evolution of the electric part e it g 2 is given by To implement the term for each link, we first compute (j − 2 η−1 ) 2 into an ancilla register, and then, impart the phase by applying an R z gate on every qubit in the ancilla register. We perform the arithmetic operations by first computing j − 2 η−1 , using an out-of-place adder, which incurs 4(η − 2) T gates and η reusable ancilla qubits [38], and then squaring the (η + 1)−bit ancilla state, which costs 4η(12η − 3 log(η + 1) − 2) T gates with the multiplier proposed in [9]. We induce approximate phases (described below) and then finally uncompute the ancilla register. Therefore, the entire arithmetic operations cost 8(η − 2) + 8η(12η − 3 log(η + 1) − 2) T gates. Here, we choose to perform the arithmetic operations in series to reduce the ancilla-qubit count. Since there are dL d links on an L d -site d-dimensional lattice, the arithmetic operations on all links cost at most 8dL d [(η − 2) + η(12η − 3 log(η + 1) − 2)] T gates, 3(η + 1)dL d ancilla qubits to store j − 2 η+1 and (j − 2 η+1 ) 2 , and 3(η + 1) − log(η + 1) − 1 reusable workspace ancilla qubits [9]. If we choose to optimize the T-depth, we can parallelize the squaring operations, at the cost of increasing the workspace ancilla-qubit count.
We now discuss the phase induction. The correct phase can be induced by applying R z (2 k θ), where θ = g 2 t 2a d−2 , on the kth qubit of the 2(η + 1)−bit ancilla state, (j − 2 η+1 ) 2 . Hence, there are 2(η + 1) sets of dL d same-angle R z rotations to implement, where each set can be effected using the weight-sum trick. Once again, we first compute Weight(dL d ) into the ancilla register, incurring 4(dL d − Weight(dL d )) T gates and dL d − Weight(dL d ) ancilla qubits, and then, applying log(dL d ) + 1 R z gates to the ancilla register to induce the right phase.
There is an alternative method for simulations with a fixed Trotter step t, d, and g 2 , where a can be chosen such that The electric evolution is then given by Once again, we first compute (j − 2 η−1 ) 2 into the ancilla register. Then, we impart the phase by a phase gradient operation, which consists of an M-bit addition on a specially prepared phase gradient state [45] incurring 4M + O(1) T gates due to the M−bit adder [38]. Here, we perform the arithmetic operations and phase gradient operation on one link at a time. Since M = log( 2πa d−2 g 2 t ), the number of T gates required by the adders operations on all the links is 4dL d log( 2πa d−2 In order to synthesize the phase gradient state, which can be reused for all phase gradient operations, M − 1 Z α phase-shift rotation gates, defined by are needed [39]. Each Z α can be synthesized by using repeat-until-success (RUS) circuits [41].

Kinetic term e iT (K)
⃗ n t Here we present two different methods to implement the kinetic term. The first method is a small modification of the method in [9], so we import the steps detailed in [9] for the convenience of the readers. The second method is based on the diagonalization ofÛ operators. Herein, we drop the exact site and link position dependence and instead use r and r + 1 to denote two sites without loss of generality.

Diagonal decomposition
In order to decompose the off-diagonal term into elementary gates, we writê whereÂ =Î ⊗Î . . . ⊗X andÂ =Û †ÂÛ , and similarly, whereB =Î ⊗Î . . . ⊗Ŷ andB =Û †BÛ . Furthermore, we defineP To simulate the off-diagonal term, we approximate whereT Figure 1. Circuit for an individual kinetic term. Top: the zeroth qubit of the bosonic gauge field between fermionic sites r and r + 1. Middle/Bottom: fermionic sites r and r + 1. Conjugation of this circuit by controlled-Z gates with the target on the bottom qubit addresses the JW string, and with the control on a qubit in the JW string.

Figure 2.
Commutation and cancellation rules used to parallelize the U(1) kinetic term implementation. Note two closed circles connected by a line indicates a controlled-Z gate. A collision of two controlled-Z gates on the same two-qubit lines cancel each other (not shown). Two gates acting on two disjoint set of qubits commute (not shown).
In contrast to [9], wherein the Trotterization was performed for individualT terms in the order ofT r , andT (4) r , here, we Trotterize them into two terms (see (28)). We do this since there is an efficient circuit known to implement e −it(T (1) r +T (4) r )/2 and e −it(T (2) r +T (3) r )/2 [46]. Briefly, the circuit is a doubly-controlled R x gate whose angle is four times the angle of rotation in the Trotter term written in the Pauli basis, conjugated by a simple CNOT network-in this particular case the network is a CNOT gate with control on fermion site r + 1 and target on fermion site r, followed by another CNOT gate with the same control but the target being the zeroth bit of the Bosonic link in between. See figure 1. The doubly-controlled R x gate can be implemented using two uncontrolled R z gates and two relative-phase Toffoli gates, which cost 4 T gates each [47]. Note that the angles of R z -rotations here, one negative and one positive, per r, are the same, for all choices of r. Conjugated by a pair of NOT gates, the negative angle rotations become positive. As such, the emergent two subcircuits, each being a layer of individual R z gates associated with each r, have the same angle of rotation. As in the mass term, we can use the weight-sum trick to implement the kinetic term.
However, unlike the mass term, we cannot implement the kinetic terms of all the sites in parallel due to two reasons. First, the kinetic terms for nearest-neighbors do not commute. As such, we have to evolve the odd and even sites separately. Second, the evolution operator includes the multi-site Pauli-z operatorsζ, shown in (17), due to the JW transformation. Unless two nearest-neighbor sites are connected on the JW path, the multi-site Pauli-z operators need to be taken into account. Note on a d-dimensional L d -site lattice, there are dL d individual kinetic terms to implement. In the following, we construct a circuit that implements these individual terms in parallel whenever possible.
We start with the JW transformation. Specifically, we follow a zigzag pattern. On a one-dimensional lattice, this path is simply a line. On a two-dimensional L 2 -site lattice, one can draw L lines of length L in the x-direction, for instance, and connect the neighboring lines to form a zigzagging path. This zigzagging JW path can be generalized to d-dimensional lattices.
We next consider the terms in the bulk and on the edges of the lattice separately. The edges are (d − 1)-dimensional hyperplanes, on which the terms are connected with periodic boundary conditions, but not with open boundary conditions. There are dL d−1 terms on the edges. The terms in the bulk are connected in both periodic and open boundary conditions. There are d(L d − L d−1 ) terms in the bulk.
We implement the evolution of the bulk-and edge-terms one orthogonal direction at a time. For each direction, we use a template circuit in figure 1, modified to accommodate forζ. This is straightforwardly done by a conjugation of the circuit by controlled-Z gates [46], with the controls on the qubits thatζ act on and the shared target on the qubit that the R x operation is applied to (see figure 1).
We now parallelize the circuit that implements the kinetic term for each direction as follows. Denote the subcircuit that is to the left of the doubly controlled R x , including the aforementioned controlled-Z gates, as a circuit prefix P. Denote the subcircuit that appears to the right of the doubly controlled R x , including the aforementioned controlled-Z gates, as a circuit suffix S. We can gather all of the Ps, in the order of their appearance, to the left end of the circuit. Similarly, we can collect all of the Ss to the right end of the circuit. Note that this process requires applications of gate commutation and cancellation rules. We report these in figure 2.
The resulting, parallelized circuit has a layer of doubly controlled R x gates in the middle. We discussed earlier how to implement a doubly-controlled R x gate using two R z gates in parallel. With all angles of rotation being the same, we can employ the weight-sum trick to reduce the gate complexity.
We now gather everything. Recall we have four different levels to consider: (a)T (1/4) vsT (2/3) , (b) even vs odd sites, (c) bulk vs edge terms, and (d) each orthogonal directions i = 1, . . . , d in the d-dimensional lattice. Levels (a) and (b) imply that we apply four stages of circuits per given levels (c) and (d). Consider a single stage. For the bulk terms, for each direction i, the number of controlled-Z gates required in the parallelized circuit is O(L i−1 ), assuming our zigzag pattern is formed in the ascending order of i. The number of doubly-controlled R x gates implemented in parallel is N B = (L d − L d−1 )/2. Our implementation requires 8N B T gates to detach the double controls from R x gates, N B ancilla qubits to parallelize the R z implementations, and additional 2N B − Weight(2N B ) ancilla qubits and 4(2N B − Weight(2N B )) T gates to compute the weight into the ancilla register, after which log(2N B ) + 1 R z gates are applied to the ancilla register. For the edge terms, for each direction i, the number of controlled-Z gates required in the parallelized circuit is O(L i ). The number of doubly-controlled R x gates implemented in parallel is N E = L d−1 /2. The resource required is the same as above, where all occurrences of N B is replaced by N E . To be complete, for level (i), note e −it(T (2) r +T (3) r )/2 has an extra incrementer and decrementer. This incurs an additional cost of 8(η − 2) T gates and η reusable ancilla qubits [9], when using the compute-uncompute trick for logical ANDs in [38] and the Toffoli construction in [48].

Diagonalization ofÛ
Here we consider an alternative method to implement the kinetic term. We first consider an η-qubit QFT F defined according to ConjugatingÛ † with the QFT, we obtain Taking the Hermitian conjugate, By linearity, we obtain whereD c andD s are diagonal. The fermionic operatorsP r ,P r can also be diagonalized. Define the basis as, whereb is the binary negation of b. LetÛ Bell be 1 a,b=0 Then,P NoteD f is diagonal. As such, the kinetic term can be written as a sum of two diagonalizable operators To first order, the Trotterization of the kinetic term is then where the first equality is due to e itÛDÛ † =Ûe itDÛ † , withÛ andD being a unitary and a diagonal operator, respectively. Since the circuit implementation of the QFT and its inverse is known [49], the implementation of the kinetic term hinges upon the syntheses of e i t 8a (Dc⊗D f ) and e i t 8a (Ds⊗D f ) . ExpandingD c ⊗D f , we obtain the relation which can be implemented via applications ofÛ c ≡ e i t 4aD c controlled upon the fermionic state. Similarly, e i t 4a (Ds⊗D f ) can be implemented by controlled applications ofÛ s ≡ e i t 4aD s . We next show that bothÛ c andÛ s can be implemented efficiently via QSP [34]. First, we rewrite them in the form of λ e −iτ sin(θ λ ) |u λ u λ |:Û where τ = t 2a . QSP implements an operator in such form with N queries of an oracleÛ ϕ , which is defined aŝ In this case, for bothÛ c andÛ s , theŴ oracle is given bŷ which can be implemented with an η−qubit phase gradient at the cost of 4η + O(1) T gates. TheŴ oracle needs to be controlled by three qubits, one due to the QSP oracle in (48) and two from the fermionic register. See (44). This can be accomplished by applying a relative phase Toffoli on an ancilla, controlled by the three qubits, then applying aŴ controlled by the ancilla, and then uncomputing the Toffoli gate. The controlled-W operation can be effected by a controlled phase gradient, which can be synthesized with 8η + O(1) T gates [50], and each triply-controlled relative-phase Toffoli costs 8 T gates [47], which adds a constant overhead to the triply-controlled-Ŵ. Now we consider the simulation of an individual kinetic term, which is a diagonal norm-one Hamiltonian. In this case, the QSP query complexity depends on only the simulation length τ , and error ϵ. Since the simulation length is fixed to be τ = t 2a = π g 2 2 M , which is much smaller than one, as required by the phase gradient operation for the electric term, we expect that a small number of query is enough to implementÛ g with g ∈ {c, s}, whereÛ g is an approximation ofÛ g . In particular, since our error is bounded by 4τ q 2 q q! , where t ⩽ q − 1 and the number of queries is 2(q − 1) [34], with small τ , q = 2 is likely sufficient. We emphasize in passing that, when the cutoff is severe, the effects of the unwanted periodic wrapping terms in our implementation ofÛ r andÛ † r , which connect |−Λ = |00 . .  1) LGT, |−Λ will not arise because |Λ − 1 is in the null space ofÛ r . This effect can be reversed by the evolution e −it . This can be implemented using 2(η + 1) CNOTs, two Hadamard gates, and two R z gates.

⃗ n t
Here, we extend the two methods used to implement the kinetic term to implement the magnetic term. Once again, we drop the location indices for brevity.

Diagonal decomposition
First, we decompose the ladder operators into the following off-diagonal operators: whereR =Î ⊗Î ⊗ . . . ⊗σ + andÛ †RÛ are raising operators for |E when E is odd and even, respectively, and similarly,R † =Î ⊗Î . . . ⊗σ − andÛ †R †Û are lowering operators for |E when E is even and odd, respectively. DefiningR each plaquette operator inL (B) can then be expressed as where the superscripts in the parentheses are used to denote the links around a plaquette, and the sum is over the powers to which the gauge field ladder operators are raised. In order to maximize the cancellation ofÛ andÛ † in the plaquette operator's first-order Trotter evolution, we order the sum using the Gray code. The Gray code is a binary encoding where two successive values differ in only one bit. In particular, we first label each term by a vector of the ladder operators' powers (i, j, k, l), and then, arrange the labels in the Gray code ordering of the integers 0-15. The first-order Trotterization of the evolution is thus given by where GC(n) is the Gray code encoding of an integer n. There are 16 sub-evolutions, and between each consecutive sub-evolutions, there will be oneÛ orÛ † that needs to be implemented in this ordering. Including the oneÛ † operator at the end, there will be 16Û andÛ † operators in the evolution.Û andÛ † can be implemented as an η−qubit binary incrementer and decrementer, respectively, each of which costs 4(η − 2) T gates and η reusable ancilla qubits [9]. Furthermore, each e −i t 2a 4−d g 2R□ operator costs two relative-phase triply-controlled Toffoli gates, which take 16 T gates in total to construct, and two R z gates [46].
Briefly, any operator U that is of the form e −i(θ⊗ kσk +h.c.)/2 , whereσ k ∈ {σ + ,σ − } and k = 1, 2, . . . , k max can be diagonalized by C as CDC −1 , where C is composed of a CNOT network with k − 1 CNOTs and a Hadamard gate and D is a diagonal operator. Without loss of generality, consider a case where there are pσ + 's and k max − pσ − 's. Further pick an arbitrary qubit index, say, k max andσ kmax =σ + . We apply NOTs to k max − pσ − , all controlled on the same qubit k max . We now pick one of the k max − p qubits withσ − , say k ′ . We apply NOTs to theσ + qubits except for the k max 'th qubit, of which there are p − 1, all controlled on k ′ . Applying a Hadamard gate on the k max 'th qubit diagonalizes U, i.e. D would now be an k max − 1-controlled R z gate with target on k max . A standard method to detach the controls results in two uncontrolled R z gates.
Note the R z gates have the same rotation angles, up to a sign, for each and every plaquette. This means, once again, just as in mass and kinetic terms, we can use the weight-sum trick to reduce the number of R z gates to be implemented. Since the magnetic terms for nearest-neighbors act on overlapping links, their evolutions are difficult to implement in parallel. Similar to the case of the kinetic term, we implement the magnetic evolutions for the even and odd sites along each two-dimensional plane, separately, in parallel.
Without loss of generality, we assume the number of odd and even sites are L d 2 each. Hence, for each two-dimensional plane, there are L d 2 plaquette evolutions to apply in parallel for even and odd sites, respectively. Consider just the even sites on one plane, the relative-phase triply-controlled Toffoli gates contribute L d 2 · 16 · 16 T gates. Further, for each of the 16 sub-evolutions, there is a layer of L d equal-angle R z gates to implement in parallel. Once again, we employ the weight-sum trick to effect the equal-angle R z gates. The first step of computing Weight(L d ) costs 4(L d − Weight(L d )) T gates and L d − Weight(L d ) ancilla qubits. Then, we apply log(L d ) + 1 R z rotations for 16 times. Lastly, the incrementers and decrementers cost L d 2 · 16 · 4(η − 2) T gates and η ancilla qubits [9].
Again, by linearity, Now we show that the evolution operator implied by the magnetic term, all of which are of the form, where h.c. denotes the Hermitian conjugate, can also be diagonalized by a tensor product of four QFTs. Taylor expanding the evolution operator, we get Since the circuit implementation of the QFT and its inverse is known [49], all that remains is to find a circuit that implementsÛ □ .
To implementÛ □ , we consider its action on the input state |k 1 , Similar to the kinetic term implementation, we use QSP to implement the evolution of the plaquette term. As explained in the kinetic term section, it boils down to synthesizing a controlledŴ operator, which is given bŷ The controlledŴ operator can be effected by first computing k 1 + k 2 − k 3 − k 4 into an ancilla register, then performing a controlled phase gradient operation on the ancilla, and finally uncomputing the ancilla. We can compute k 1 + k 2 − k 3 − k 4 using two out-of-place η−bit adders and one out-of-place (η + 1)-bit adder proposed in [51]. The two η−bit adders compute |k 1 |k 2 |k 3 |k 4 → |k 1 |k 2 |k 3 |k 4 |k 1 + k 2 |k 3 + k 4 , using 2 · (5η − 3 log(η) ) − 4) Toffoli gates and at most η − log(η) − 1 ancillas. The (η + 1)-bit adder computes where |k 1 + k 2 and |k 3 + k 4 are (η + 1)−qubit registers, and |k 1 + k 2 − k 3 − k 4 is an (η + 2)−qubit registers. This operation requires 5(η + 1) − 3 log(η + 1) − 4 Toffoli gates and (η + 1) − log(η + 1) − 1 ancillas. As such, the computation and uncomputation of T gates, using the Toffoli construction in [48], where each Toffoli gate costs four T gates and an ancilla qubit. As such, η − log(η + 1) workspace ancilla qubits are used, and 3η + 4 qubits are needed to store the outputs We consider the simulation of an individual magnetic term, which is a diagonal norm-one Hamiltonian. As in the case of kinetic term, the QSP query complexity is determined by the simulation length τ = t a 4−d g 2 , shown in (59). In order to keep the trotter error small, t, and hence t a 4−d g 2 , must be much smaller than one. Once again, we expect that two queries are enough to implementÛ □ , which is an approximation ofÛ □ . Therefore, the evolution of a magnetic term requires three serial R z gates, which is an improvement over the diagonal-decomposition method, which needs 16 serial pairs of parallel R z gates. As for the entire magnetic Hamiltonian evolution, we adopt the same parallelization strategy as in the diagonal-decomposition method, and achieve factors of 32/3 and 16/3 improvement in the total number and layers of R z gates, respectively.
As in the kinetic term, when the cutoff is severe, the effects of the unwanted periodic wrapping terms of the operatorsÛ ( †) , which connect |−Λ = |00 . . . 0 , |Λ − 1 = |11 . . . 1 , are no longer negligible. Suppose we want to undo the periodic wrapping effect on the ith link of the plaquette. Then, we implement the evolution , where the Pauli ladder operators act on the ith link. This evolution can be implemented using both the diagonal-decomposition method and the diagonalization method. Using the diagonal-decomposition method, the evolution can be implemented by whereR is defined in (51), and e it can be effected using 2(η + 2) CNOTs, two Hamadard gates, and two R z gates. Using the diagonalization method, we first diagonalize the evolution by applying CNOTs and Hadamards on the ith link to diagonalize the Pauli ladder operators. Next, we apply QFT on links j, k, l to diagonalize the remaining part of the evolution, and obtain where b q is the bit-value of the qth qubit of the ith link register |m i . Once again, the evolution can be implemented using two controlled QSP, as in the diagonalization method for the kinetic term. Now suppose we are to undo the periodic wrapping effect on two links, e.g. jth and kth links. Then, we implement the , using either the diagonal-decomposition or diagonalization method. This technique can be straightforwardly generalized to undo the periodic wrapping effect on all combinations of links on each plaquette.

Resource requirement estimates
In this section, we analyze the algorithmic and synthesis errors for our simulations. First in section 4.1, we provide the algorithmic error for the Suzuki-Trotter PF, and leave the full derivation of it in appendix A. Then in section 4.2, we compute the R z synthesis error. Finally in section 4.3, we combine the two errors to report the gate complexity and ancilla requirements.

Trotter error
The first step for any PF algorithm is to divide up a total simulation time T into r segments. Then, the evolution generated by a HamiltonianĤ can be expressed as e −iĤT = (e −iĤT/r ) r ≡ (e −iĤt ) r . Further, the HamiltonianĤ is decomposed into a sum of simpler Hamiltonians,Ĥ = l j=1Ĥ j , where each e −iĤ j t can be simulated with an efficient quantum circuit. The first-order PF is given bŷ The error is bounded by [40] Therefore, for a given spectral-norm error ϵ, the value of r grows as O(T 2 /ϵ). This scaling can be improved by using a higher-order PF. For instance, consider the second-order PF, defined aŝ In this case, the error bound is [40] which leads to a Trotter-step r scaling of O(T 3/2 /ϵ 1/2 ). Further improvements can be achieved with high-order PF, which can be constructed recursively [35]. However, it is reported in [9] the higher-order PFs are rarely preferred for quantum simulations due to the fact that the second-order formula can actually cost fewer computational resources and outperform asymptotically more efficient methods such as LCU and QSP [52]. Furthermore, it is reported in [9], compared to the error bounds for the first-and second-order PF, those for higher-order PF are unlikely to be tight and are more difficult to compute [40]. Therefore, we as well choose to use the second-order PF as our simulation algorithm, just as in [9], and evaluate the commutator bound for the error given in (66). The result is where ρ = 1 12 The full derivation of the above results is shown in appendix A. Readers interested in how the results compare with the size of the synthesis error and how, together, they affect our simulation gate complexity should proceed to sections 4.2 and 4.3.

Synthesis error
Here, we compute the synthesis error for R z gates required for each of the four terms, i.e. e iD (M) ⃗ n t , which were described in detail in section 3. To start, we consider the mass term. In this term, we have log(L d ) + 1 R z gates to implement. Therefore, we incur for each mass term log(L d ) + 1 · ϵ(R z ) amount of error, where ϵ(R z ) denotes the error per R z gate.
Next, we consider the electric term, which has 2(η + 1) log(dL d ) + 1 R z gates. Therefore, each electric term incurs 2(η + 1) log(dL d ) + 1 · ϵ(R z ) amount of error. If we instead use the phase gradient operation, once the gadget state |ψ M in (23) is prepared, each quantum adder call to implement the operation does not incur any synthesis error. We come back to the error incurred in preparing the gadget state itself in the next section.
For the magnetic term, there are 16d(d − 1) log(L d ) + 1 R z gates to apply. Hence, the amount of error per magnetic term is 16d(d − 1) · log(L d ) + 1 · ϵ(R z ).
Lastly, we compute the total synthesis error ϵ synthesis . Note that each term appears twice per Trotter step in the second-order PF in (65). However, the implementation of the diagonal terms can be optimized. In particular, the diagonal mass and electric terms applied at the beginning (end) of each Trotter step can be applied together with the terms at the end (beginning) of the previous (next) Trotter step, unless the terms are at the very beginning or end of the simulation. As such, there are r + 1 diagonal mass and electric terms, and 2r off-diagonal kinetic and magnetic terms to implement in total. Thus, ϵ synthesis is given by where r, to reiterate for the convenience of the readers, is the total number of Trotter steps.

Complexity analysis
Having computed both the Trotter and synthesis errors in the two previous sections, we are now ready to perform the complexity analysis for the U (1) LGT. The total error is given by Here, we evenly distribute the total error between the synthesis and Trotter errors. Focusing on the Trotter error, we obtain the number of Trotter steps by As such, we can compute the error each R z gate can incur by With this, we obtain the number of T gates required to synthesize each R z gate using RUS circuit [41], Combining the T gates required for implementation of R z gates and the T gates used elsewhere in the circuit, we obtain the total number of T gates for the entire circuit as The size of the ancilla register is given by the ancilla qubits required by the electric Hamiltonian, since it requires the most out of all circuit elements. Taking this into account, we obtain the total number of qubits required for the simulation by summing up those in the ancilla, fermionic, and gauge-field registers, which is given by We plug in various parameter settings to (74) and (75) so as to obtain the upper bounds for the T-gate and qubit counts in numbers, which are displayed in table 1.
Note that in the case where the electric term is implemented using phase gradient operation, the T-gate count changes to where the Cost(R z ) needs to be modified, since ϵ(R z ) has changed to ϵ(R z ) = ϵ total 2 T 3/2 2 1/2 ρ 1/2 ϵ 1/2 total Further, Cost(|ψ M ), which denotes the one-time synthesis costs of the phase gradient gadget state. Here, we choose to use the synthesis method delineated in [50]. Briefly, we apply Hadamard gates to the register |00 . . . 0 , and then apply gates Z, Z −1/2 , . . . , Z −1/2 M−1 . Each Z α gates are synthesized using RUS circuits [41]. Let δ be the error of preparing the gadget state |ψ M . Then, each gate can incur at most M/δ error, and thus, costs 1.15 log(M/δ), using RUS circuits [41]. Thus, the gadget state preparation costs 1.15M log(M/δ). Finally, in this case, the ancilla-qubit count is given by that of the magnetic term and the phase gradient state. As such, the total qubit count is given by (78)

Conclusion
In this paper, we have shown how to efficiently simulate QED in d spatial dimensions via the complete U(1) Kogut-Susskind Hamiltonian on a universal fault-tolerant quantum computer. Should such a simulation be one day realized, it could enable the calculation of physical properties of gauge-field theories that are hard to extract classically, including but are not limited to transport coefficients [53], scattering amplitudes, and cross-sections [54]. Furthermore, since QED is one of the fundamental forces that underpins the Standard Model of particle physics, our work paves the way towards scalable quantum simulations of Standard Model physics, and thus, constitutes an important milestone in the nascent field of quantum computational particle physics. One avenue of future work is to investigate the implementation and resource requirements of simulation algorithms, other than the Suzuki-Trotter formula that we have used, applied to lattice QED. This will provide insights into the practical performances of these algorithms. Analysis of initial state preparation and measurement protocols for physical observables in lattice gauge theories also form interesting subjects for future research. Furthermore, one could consider extensions of the Kogut-Susskind Hamiltonian, for instance, by adding a topological θ-term [55], which is directly related to and could help demystify the matter-anti-matter asymmetry.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.
In this appendix, we derive the Trotter errors for a second-order PF simulation of lattice QED in d dimensions. We begin the derivation by ordering the terms in the HamiltonianĤ according to an ordered list ⃗ ne,1 +T ⃗ ne,1 +T where the grouping of terms as they appear in the list is motivated by the commutation so that each element in the list does not commute with at least one of the elements in the ordered set. For convenience, we group all the mass terms and electric terms (the first two in the list) into one term each, since the grouping incurs no Trotter error as the individual mass/electric terms commute with one another. For the off-diagonal kinetic and magnetic terms, we will analyze only the diagonal-decomposition method because it is simpler to analyze than and costs about the same as the diagonalization method. ⃗ n i,p , where a ∈ {1, 2, 3, 4}, acts on a link in direction p, which originate from either an odd or even site, depending on i, in either the bulk or edge, depending on the value of p. Furthermore, the sum of all kinetic terms is equivalent to ⃗ nT (K) ⃗ n , defined in (17). The set of subindices on each magnetic term {⃗ n k,□ }, with k ∈ {e, o}, stands for the set of even or odd sites, respectively, on a two-dimensional plane denoted by □. In relation to (18), each □ corresponds to a specific pair of directions (i, j). For a d-dimensional lattice, there are d(d−1) 2 two-dimensional planes and plaquette operators in the magnetic term at each site. Moreover, note that the sum of all magnetic terms yields ⃗ nL (B) ⃗ n , as defined in (18).
⃗ nD The first equality in (A13) is due to the fact that each kinetic term at site ⃗ n couples two sites, ⃗ n and ⃗ n +l with l denoting the direction considered. The inequality that immediately follows from it is due to the triangle inequality. This is because, although the kinetic operators haveζ JW strings, mass terms are diagonal, and thus they commute with the JW strings. The second inequality is due to (A4). In the last inequality, the bound for the mass and kinetic terms are due to (15) and (29)-(32), respectively. C 1,3 is straightforward to evaluate since mass and magnetic terms commute and hence, Before we evaluate the commutator between the electric and kinetic terms (C 1,4 ), we provide a couple useful properties about the kinetic operatorsK(⃗ n, l). Acting on the fermionic space, a kinetic operator takes a computational basis state to another basis state. Acting on the gauge field on a link, it takes |E → |E ± 1 , where E ∈ [−Λ, Λ − 1], up to a multiplicative constant. Therefore, if we consider an electric and a kinetic operator acting on the same link, we obtain where we usedÊ 2 (⃗ n, l) to denote an electric term for the link (⃗ n, l) up to a multiplicative constant, and Using these equations, we evaluate the bound of C 1,4 and obtain ⃗ nD Next, we evaluate the commutators between the electric and plaquette operators (C 1,5 ), which are trivial unless the operators act on a common link. For the sake of brevity, we let the plaquette operator act on links 1, 2, 3, 4, and let the electric field operator act on link 1. The commutators are then Combining the two commutators, we obtain We are now equipped to evaluate the bound of C 1,5 . The bound is ⃗ nD ⃗ n . Considering C 1,6 , we first fix the evenness and oddness of the sites. We consider the bulk and edge terms separately, although the required analysis is similar. We use p to denote the direction of the links acted on by the kinetic term. The commutator is trivially zero if the magnetic part and kinetic part act on different links.
Thus we consider the case where each plaquette operator in the magnetic part acts on links that the kinetic operators also act on. For direction p, we then have where N k ∈ {N B , N E } is the number of odd or even sites in the bulk or on the edges in each direction. Recall that N B = (L d − L d−1 )/2, and N E = L d−1 /2. In the third line of (A22), we have used the fact that for each kinetic term, there is a plaquette operator, of the type ⃗ n i,□ , acting on the same link. Expanding this to include all sites, all directions, and all plaquette and kinetic operators, we obtain where in the last inequality, the factor d comes from the fact that there are d directions each for the bulk and edges of the lattice. Further, for each direction p, there are d − 1 two-dimensional planes □ that contain links in that direction, hence the factor d − 1. Now, we compute C 1,7 , which consists of the commutators between kinetic terms. There are three types of commutators; those between (a) two bulk terms, (b) two edge terms, and (c) a bulk and an edge term. We analyze case (i) first. We reiterate that there are 4 d bulk terms of the form T in total (we remind the readers that there are even/odd sites, thenT (1) +T (4) andT (2) +T (3) ), where {⃗ n i,p } runs over all sites for a given parity i, with fixed direction p. Notice that, in a non-vanishing commutator, two bulk terms must either act on the same set of links and sites, or they overlap on one of the fermionic sites. This is so since, in all other scenarios, either the two kinetic terms simply act on disjoint Hilbert spaces (they act on two disjoint sets of qubit registers) or a kinetic term's JW stringζ always commutes with the other kinetic term in our zig-zag JW path. To illustrate, consider a number system with basis d, where a number here encodes a fermionic site. A kinetic term is defined over picking a pair of two numbers that are different by one digit, with the difference of that digit being one. Consider two pairs of these numbers. The two pairs have two different ranges of numbers, over which the JW string acts. The former scenario of disjointedness arises when the two ranges do not overlap. The latter scenario arises when one range is inside the other range. Since operatorsP r andP r in (27) that act on fermion registers that collide with the JW string commute with the string, the latter scenario does not contribute to the commutator bound.
Since we always implement {⃗ n i,p }T , as shown in (A1), for each ⃗ n i,p , we Hence, the commutator for the case where two bulk terms act on the same set of links and sites is The first inequality is due to the fact that the kinetic terms acting on different sites and links commute. We used (A4) for the second inequality. In the case where the two bulk terms overlap on one of the fermionic sites, the commutator is given by where we once again used (A4) and the fact that each term at site ⃗ n i,p has two neighboring kinetic terms acting on the sites ⃗ n i,p or ⃗ n i,p +p. We now compute the number of occurrences of the commutators considered in (A24) and (A25). Since there are d directions, and two parities, there are 2d terms of the form (A24). For (A25), since each term T (a,b) ⃗ n i,p commutes with itself, but not with the terms that are implemented afterwards, there are where a factor of 2 is due to the fact that there are two (a, b)-combinations, non-vanishing commutators in total. We have also used the fact that there are 2d combinations of parity and direction, i.e. i and p, in the bulk. As such, the sum of the commutators between all the bulk terms is bounded by Using similar arguments, we obtain the bound for the sum of the commutators between all the edge terms, (16d 2 − 6d) NE a 3 . Lastly, we evaluate the commutators between a termK(⃗ n j , p ′ ) from the bulk and a term T on an edge link has twoK(⃗ n j , p ′ ) terms from the bulk acting on the same sites, and hence, the commutator between a bulk and edge term is bounded by 4NE a 3 , where we have replaced N B with the smaller N E in (A25). Since there are 2dK(⃗ n j , p ′ ) terms from the bulk, and 4 d T terms from the edge, there are in total 8d 2 commutators between bulk and edge terms. The bulk and edge terms that act on links along the same direction and sites with the same parity commute, and there are 4 d pairs of such terms. Hence, the sum of such commutators is upper-bounded by (8d 2 − 4d) 4NE a 3 . In total, C 1,7 is bounded from above by Next, we analyze the second sum in (66), which is given by where where we have implicitly assumed in C 2,8 , C 2,9 and C 2,10 that (T ⃗ n i,p )/2. We have removed any trivially vanishing terms, which involve commutators between mass and electric terms, mass and magnetic terms, and two magnetic terms, since in each respective case the operators commute with each other. In the following, we evaluate the bounds for each C 2,n .
For C 2,1 , we obtain the bound ⃗ nD where in the first inequality, we have used the fact that each kinetic termK(⃗ n, l) acts on the sites ⃗ n and ⃗ n +l, and the link (⃗ n, l). The term does not commute with the mass and electric terms acting on the same space, but commutes with the rest. For C 2,2 , we obtain the bound ⃗ nD where the factor of (4d − 2) in the third norm term of the second inequality is due to the fact that there are (4d − 2) kinetic terms of the typeK(⃗ n, l) acting on the same fermionic sites asK(⃗ n, l).
Using a similar method, we evaluate the bound of the C 2,4 term (we consider the C 2,3 term right afterwards). The only difference from the C 2,2 term is that the mass term is replaced by the electric term. As such, we obtain the upper bound of C 2,4 , where in the first inequality, we have used the fact that the electric terms acting on different links as the kinetic terms commute, and in the third inequality, we used the bound in (A15).
Moving onto the C 2,3 term in (A29), we evaluate its bound as follows: where in the second inequality, we have used the fact that there is one kinetic and 2(d − 1) magnetic operators acting on the same link.
We bound the C 2,5 term using a similar method, but with the mass term replaced by the electric term. We evaluate its bound as follows: where we have used (A15). Each commutator in the C 2,6 term contains an electric, magnetic, and kinetic term. Evaluating the bound, we have ⃗ nD where we used the fact that terms not acting on the same links always commute, and in the second inequality, we used the bound in (A20).
We now evaluate the bound of the C 2,7 term, which contains two magnetic terms, using a similar method, where in the first inequality, we used the bound in (A20) for the first norm term, and the fact that there are 8d − 11 magnetic operators acting on the same plaquette as the magnetic operator in the inner commutator for the second norm term. The factor 8d − 8 is due to the fact that each of the four links on a plaquette is acted on by 2(d − 1) magnetic terms, but 3 out of the 8d − 8 have been overcounted. C 2,8 is a sum of commutators between three kinetic terms, which we label asĤ i ,Ĥ j , andĤ k . We remind the readers that the edge terms are evolved before the bulk terms, as indicated by (A1). As such, we divide up the tuples (Ĥ i ,Ĥ j ,Ĥ k ) into five types: (a) all three terms are bulk terms, (b) all three terms are edge terms, (c) H i andĤ j are edge terms, andĤ k is a bulk term, (d)Ĥ i andĤ k are edge terms, andĤ j is a bulk term, and (e) H i is an edge term, andĤ j andĤ k are bulk terms.
We consider the type-(i) terms, and further divide it into six separate cases. In the first case, we consider a scenario whereĤ i andĤ j act on the same links andĤ k =Ĥ j . There are a total of 2d of such instances with even/odd parities and d directions. Since we always implement {⃗ n i,p }T . Hence, the bound for each of the instance is given by where the first equality is due to the fact that kinetic operators acting on different links and sites commute, and we have used (A4) for the inequality. The second case we consider is whenĤ i andĤ j act on the same links andĤ k =Ĥ j . There are in total 2d 2 − d of such instances. This is so because a link can have 2d combinations of parity and direction, and ifĤ i andĤ j act on the link labeled by the nth combination, thenĤ k can act on links with 2d − n different combinations, since k > i andĤ k =Ĥ j . Thus, we obtain (2d − 1) + (2d − 2) + . . . + 1 = 2d 2 − d for the total number of instances. We consider one of such instance, where in the first inequality, the extra factor two in the second norm expression is due to the fact that there are two choices of sites in {⃗ n k,p ′ } forĤ k that result in non-vanishing commutator withĤ i andĤ j due to their collisions on the two fermionic sites that sit at the two ends of a link thatĤ i andĤ j act on. Similarly, in the third case,Ĥ i andĤ k act on the same links, whileĤ j =Ĥ k act on links of a different parity in the same direction. Since (1, 4) is implemented after (2, 3), and even terms are implemented before odd terms,Ĥ i ,Ĥ j andĤ k are of the forms {⃗ ne,p}T (1) ⃗ ne,p +T ⃗ ne,p +T , respectively. There are d such instances. We consider one of such instance, and obtain its bound ⃗ ne,p +T ⃗ ne,p +T where in the first inequality, the factor two in the second norm term is due to the fact that there are two choices of sites in {⃗ n o,p } forĤ j that result in collisions on two sites withĤ i . Further, there are threeĤ k terms that collide with the inner commutator, which acts on three links and four sites, i.e.Ĥ k andĤ i act on the same link, andĤ k collides withĤ j on two sites. We consider the fourth case where nowĤ j acts on links in a different direction. There are 2d 2 − 2d such instances. This is so because a link can have d different directions, and ifĤ i andĤ k act on the link with the nth direction, thenĤ j can act on links in d − n directions. Thus, we obtain 4[(d − 1) + (d − 2) . . . + 1] = 2d 2 − 2d for the total number of instances, where the factor of 4 is because there are two parities forĤ i andĤ k , andĤ j . We consider one of such instance, (a) ⃗ n k,p ′ 2 , and obtain its bound where the factor two in the second norm expression of the third line is due to the fact that there are two choices of sites in {⃗ n k,p ′ } forĤ j that collide on two sites withĤ i andĤ k . The fifth case we consider is when Similarly, in the third case,Ĥ i andĤ k act on the same links, whileĤ j acts on links in a different direction of the same parity. There are d 2 − d of such instances, as in the second case, and each instance is bounded by L d−2 2a 3 , following similar arguments in (A45). The fourth case we consider is whenĤ i andĤ j act on links in a different direction of the same parity, There are twice as many instances as the second case, i.e. 2d 2 − 2d, becauseĤ i can be of the kinds (1, 4) or (2, 3). Once again, each instance is bounded by L d−2 2a 3 , following similar arguments in (A45). We now consider the fifth case, whereĤ i ,Ĥ j , andĤ k all act on links of different directions, but the same parity. There are 2 3 (d 3 − 3d 2 + 2d) such instances. This is so because ifĤ i acts on links in the nth direction, thenĤ j andĤ k can act on links in d − n and d − n − 1 different directions. Thus, we obtain 2 for the total number of instances, where the factor of two is due to the two parity degrees of freedom. Furthermore, since each edge is a (d − 1)−dimensional surface,Ĥ i ,Ĥ j , andĤ k will collide on a (d − 3)-dimensional surface, containing L d− 3 2 sites for each parity. As such, we consider one of such commutator in this case, and obtain its bound All together, the type-(ii) bound is given by Now, we consider type-(iii) terms. We divide the terms into two cases. In the first case, the edge terms,Ĥ i andĤ j , act on the same links. There are 4d 2 of [[Ĥ i ,Ĥ j ],Ĥ k ] commutators, in this case, since there are 2d choices forĤ i andĤ j pairs and another factor of 2d forĤ k , but 2d of them are trivially zero. This is so since for a given parity and direction forĤ i andĤ j , there is oneĤ k term, of the type 4 a=1T (a) 2 , with the same parity and direction asĤ i andĤ j , that commute withĤ i andĤ j . Therefore, there are 4d 2 − 2d nonvanishing commutators in this case, and each is bounded by 2NE a 3 , where we have replaced N B with the smaller N E in (A39). In the second case,Ĥ i andĤ j act on different links. There are 4d 2 − 2d such pairs ofĤ i andĤ j . Since, there are 2d choices ofĤ k , there are 8d 3 − 4d 2 terms in this case, and each of which is bounded by 4NE a 3 , using (A43). Therefore, we obtain the bound for the sum of all type-(iii) terms to be Note for type-(iv) terms, the same bound can be obtained along similar lines of reasoning, with the only difference beingĤ j is in the bulk, andĤ k is in the edge. Lastly, type-(v) terms can be separated into two cases. The first case is when the bulk termŝ In the second case,Ĥ j andĤ k act on different links. Now, for each non-trivial pair ofĤ i andĤ j , there are 2d − 2 choices ofĤ k . Thus, there are 4d(2d − 1)(2d − 2) such terms, each of which is bounded by 4NE a 3 , using (A43). Therefore, type-(v) terms are bounded by As such, C 2,8 is bounded by Now we consider the C 2,9 term, whereĤ i andĤ j are kinetic terms, andĤ k is a magnetic term. The terms can be separated into three types. Type (i): the terms whereĤ i andĤ j are both of an edge kind. Type (ii): the terms whereĤ i andĤ j are both of a bulk kind. The terms whereĤ i is in the edge andĤ j is in the bulk belong to type (iii). We further divide type (i) into three cases. The first case is whenĤ i andĤ j act on the same link. There are 2d such pairs ofĤ i andĤ j , and for each pair, there are 2(d − 1) magnetic terms that act on links in the same direction. As such, there are 4d 2 − 4d commutators in this case, each of which is bounded by where in step two, we have used the fact that only kinetic and magnetic operators that act on the same links yield non-zero commutators. The bound for all the terms in this case is then The second case is whenĤ i andĤ j act on different links, but in the same direction.Ĥ i andĤ j have to be even and odd, respectively, due to the ordering. Since the links in the edge for any given direction are not connected,Ĥ i andĤ j must commute. Therefore, the bound for this case is zero.
In the third case,Ĥ i andĤ j act on links in different directions. There are 4d 2 − 4d pairs in this case. This is so because ifĤ i acts on links in the nth direction, thenĤ j can act on links in d − n different directions. Thus, we obtain 8[(d − 1) + (d − 2) + . . . + 1] = 4d 2 − 4d, where the factor of 8 is due to the fact thatĤ i can be of the kinds (1, 4) or (2, 3), and each ofĤ i andĤ j can be of two parities. For each pair ofĤ i andĤ j , there are 2 magnetic terms that do not commute with both, since the two directions from the pair form a plane, and for each plane the magnetic terms can be of different parities. Each inner commutator acts on three links, due to the fact that there are twoĤ j terms acting on the same sites as anĤ i term. For each of such triple-link configuration, there are 2(d − 2) magnetic terms acting on each single link, where the factor of two comes from the two parities, and the factor of (d − 2) is due to the fact that each magnetic term do not act on both links at the same time. Therefore, for each pair ofĤ i andĤ j , the commutator is bounded by  Therefore, the third case is bounded by Finally, the bound for type-(i) terms is then Similarly, we separate type-(ii) terms into three cases, and obtain the bounds for case (i) and (iii) as and respectively. In the second case,Ĥ i andĤ j act on different links, but in the same direction. There are 2d such pairs ofĤ i andĤ j becauseĤ i andĤ j have to be even and odd, respectively, due to the ordering, andĤ i can be either (1, 4) or (2, 3), whileĤ j is of the form 6d − 10 magnetic terms that do not commute with each pair, following similar arguments in (A54). Once again, for each pair ofĤ i andĤ j , the bound for the commutator is (24d−40)NE g 2 a 6−d , using similar arguments as (A54). As such the bound for type-(iii) terms is given by N E g 2 a 6−d (192d 3 − 416d 2 + 224d). (A62) Summing up the bounds for type-(i), -(ii), and -(iii) terms, we obtain the bound for C 2,9 as L d g 2 a 6−d (48d 3 − 102d 2 + 54d) + L d−1 g 2 a 6−d (96d 3 − 232d 2 + 136d). (A63) Now for C 2,10 , we can use a similar approach. The only difference is that in each case, we only need to consider the magnetic terms,Ĥ j , that do not commute with the first kinetic term,Ĥ i . This is because if [Ĥ i ,Ĥ j ] = 0, [[Ĥ i ,Ĥ j ],Ĥ k ] = 0. If we divide up the terms into cases based on the kinetic termsĤ i andĤ k as we did for C 2,9 , and use the fact that there are 2(d − 1) magnetic termsĤ j that do not commute with eachĤ i , we can compute the bound for each case. For case (i) of type (i) and (ii), we obtain the bound by replacing the number of non-commuting magnetic terms. By summing up the bounds for all cases, we obtain the bound for C 2,10 , Lastly, we consider C 2,11 , whereĤ i is a kinetic term, andĤ j andĤ k are magnetic terms. We evaluate its bound as follows: where in the second line, the factor of 2(d − 1) in the second norm term is the number of magnetic terms that act on the same link as the kinetic term, since there are (d − 1) planes that share a direction with the kinetic term, and on each plane, there is a pair of magnetic terms of two different parities. For each pair of such magnetic terms, there are 8 colliding magnetic terms on the same plane, and 7 · 2(d − 2) colliding magnetic terms on different planes, where 7 is the number of links acted on by the pair and 2(d − 2) is the number of colliding magnetic terms on each of the 7 links. This explains the factor of 8 + 7 · 2(d − 2) = 14d − 20.
As a final step, we include the second-order Trotter error for the magnetic term, of which the ordering is given by (55). We use (A4) to bound both the commutators [[Ĥ i , j>iĤ j ],Ĥ i ] and [[Ĥ i , j>iĤ j ], k>iĤ k ]. We remind the readers that eachĤ i is of the form of which the norm is two, excluding the prefactor −1 2a 4−d g 2 . Therefore, we obtain the inequality 4 a 12−3d g 6 . (A75) As such, in accordance to (66) and a straightforward counting argument, the error of the magnetic term is given by where L d is the number of sites, d(d−1) 2 is the number of plaquettes per site, and 1 a 12−3d g 6 is the Trotter error per plaquette.