Quantum-circuit design for efficient simulations of many-body quantum dynamics

We construct an efficient autonomous quantum-circuit design algorithm for creating efficient quantum circuits to simulate Hamiltonian many-body quantum dynamics for arbitrary input states. The resultant quantum circuits have optimal space complexity and employ a sequence of gates that is close to optimal with respect to time complexity. We also devise an algorithm that exploits commutativity to optimize the circuits for parallel execution. As examples, we show how our autonomous algorithm constructs circuits for simulating the dynamics of Kitaev's honeycomb model and the Bardeen–Cooper–Schrieffer model of superconductivity. Furthermore, we provide numerical evidence that the rigorously proven upper bounds for the simulation error here and in previous work may sometimes overestimate the error by orders of magnitude compared to the best achievable performance for some physics-inspired simulations.


I. INTRODUCTION
Feynman proposed quantum computing as a means to "imitate" quantum dynamics in order to overcome apparent intractability of universal quantum simulation on classical computers [1].He conjectured that a universal quantum simulator (UQS) could efficiently simulate quantum evolution.Lloyd formalized Feynman's concept by employing a Trotter ordered-operator expansion to convert continuous time evolution into a quantum circuit C comprising unitary quantum gates [2].The UQS is now also referred to as "digital quantum simulation", both theoretically [3][4][5][6] and experimentally [7].The adjective "digital" is used to contrast with the term "analogue quantum simulation", which aims to emulate evolution of a Hamiltonian Ĥ in a custom-designed experiment [8][9][10][11][12].The importance and nearfuture feasibility of the (digital) UQS, albeit without quantum error correction, drives experimental efforts to create these simulators for restricted types of Ĥ [7].
We present the first autonomous algorithm to design circuits for simulating the evolution generated by a general n-qubit k-local Hamiltonian Ĥ(n) within a pre-specified tolerance ǫ.An n-qubit k-local Ĥ(n) is defined to be a linear combination of m Hamiltonians ĥ(n) j , each acting on n qubits as an identity operator 1 1 on all but k ∈ polylog(n) qubits [13], and polylog(n) is a polynomial function of log n.Our tolerance ǫ is the worst-case 2-norm distance between the true evolved state under the specified evolution and the simulated output state, maximized over all allowed input states.We also show in Sec.VIII that these worst-case error bounds that go into these estimates can overestimate the error for random 2-local Hamiltonians by orders of magnitude, which suggests that UQS experiments may be much more feasible than previous simulation work has suggested [14][15][16][17][18].
In our analysis, we consider two independent cases of universal gate sets.One case corresponds to a finite set comprising a single two-qubit entangling gate plus a finite number of one-qubit gates.The second case incorporates a single two-qubit entangling gate plus both discrete and continuously-parameterized single-qubit gates.Strictly speaking only a finite gate set should be permitted for quantum error correction and scalability, but there is a trend in experimental studies to report quantum simulation with continuously-parametrized single-qubit gates.We want our algorithm to be relevant both to the strict case of a finite gate set and to experimental efforts that employ continuous tunability.
Our resultant circuits are not only efficient (meaning that the circuit size scales polynomially with the number of simulated qubits for fixed k) but also uses the smallest number of qubits possible given the size of the system being simulated.Additionally, the circuit size also scales near-optimally with the run-time t of the simulation.This minimization over space and time costs (number of qubits required in the simulator and number of gates) is important for making quantum simulators as close as possible to practical implementation.Each additional qubit and each additional gate can be challenging to implement in practice so reducing these costs is not only important for proving that the scaling is efficient hence possible in principle but also to reduce the costs to make the simulation feasible with small simulators in the near future.Our minimum run-time algorithm is also improved by parallelizing gates by grouping commuting terms in the Trotter decomposition of the evolutionary operator, thus enhancing the near-term feasibility of the quantum simulator.Our work thus enables feasibility of UQS circuits.Experimental UQS circuits will be valuable to predict resultant states under Ĥ-evolution or to provide UQS-generated states as inputs to quantum algorithms for purposes such as acquiring spectral properties or eigenstates of Hamiltonians [19] or to determine particle scattering, e.g. in a relativistic quantum field theory [20,21].The ground state is regarded as the most important eigenstate as it uniquely determines all properties of the system [22] and could be used to solve outstanding condensed-matter problems such as determining the energy gap for general Bardeen-Cooper-Schrieffer (BCS) superconductivity models of finite systems [23].Dynamical simulation algorithms have been devised to simulate ground-state properties via adiabatic state preparation [23] or via dissipative interaction with the environment [4,5].

A. Concept
Our algorithm yields a string representing a quantum circuit that comprises a sequence of quantum gates to simulate k-local Hamiltonian evolution within fixed 2-norm distance ǫ.The n-qubit k-local Hamiltonian is expressed in the Pauli operator basis as The total number of non-identity Pauli operators in each ĥ(n) j is at most an n-independent constant k.Our algorithm is designed to produce a poly(n)-size description of a quantum circuit that implements a unitary where • denotes the 2-norm.This condition implies that the trace distance between the ideal evolution and the simulated evolution is at most ǫ for any initial state [17,18].Below we discuss the algorithmic input, processing and output.

B. Input
The algorithm requires the following inputs: n: number of qubits in the system; k: locality parameter of the Hamiltonian; t: evolution time for the simulation; ǫ: worst-case 2-norm error tolerance (distance) between the true evolved state and the simulated state; ̟; specifies which single-qubit gate set to use, namely {H, The Hamiltonian Ĥ(n) is entered into the algorithm as a bit string no larger than a poly(n)-size representation for this input to be efficient.Our algorithm accepts the Hamiltonian Ĥ(n) input as the bit-string representation Ĥ(n) := {(a j , l j , S j ) ; j = 1, . . ., m} , with the vector corresponding to the numbers of each type of Pauli operators in ĥ(n) j and S j = (S Xj , S Y j , S Zj ) (6) with S Xj , S Y j and S Zj the strings corresponding to the positions of each of the X, Y and Z operators respectively in ĥ(n) j .In Eq. ( 4), substrings representing Hamiltonians ĥ(n) j appear as triplets of strings.These triplets are delimited by parentheses thus ensuring easy parsing of the overall string for Ĥ(n) .Whereas a general matrix representation of Ĥ(n) is exponentially large in n, our representation [ Ĥ(n) ] of this k-local Hamiltonian has poly(n) size.
As an example of this encoding, consider the three-qubit Hamiltonian with m = 3.This Hamiltonian is representated as and all other strings are empty.
Our algorithm can compute the number of time steps r of equal duration t r so that the evolution from 0 to t is broken down into a sequence of finite steps.Furthermore our algorithm employs the Trotter-Suzuki (TS) orderedexponential decomposition [17,18,24,25] of order χ to simulate the evolution sequentially over each of the r time steps.However, our algorithm determines r and χ based on optimizing an upper bound on tolerance, and superior choices of r and χ are possible but not by using our algorithm or any other known algorithm.We provide numerical evidence of this fact, for random 2-body Hamiltonians, in Section VIII.
As the user of our algorithm might wish to employ or test other choices of r and χ, we design the algorithm so that the user can override our choices of r and χ.If our algorithmically determined value of r is overridden, the guarantee that the quantum simulations yields a resultant state within tolerance ǫ no longer holds.Therefore, overriding the algorithm's r comes with the warning that the error in the quantum simulation is unknown.Specifically setting r = 0 and χ = 0 causes our algorithm to determine optimal r and χ values based on minimizing the number of gates required to guarantee that the simulated state has 2-norm error less than ǫ for any input state.Positive values of r and χ override our algorithmic determination of these parameters and use the input values instead.
The purpose of the input variable ̟ is to enable circuit design that strictly uses a finite gate set in accordance with principles of quantum error correction and scalability [26] or to include a continuously-parametrized single-qubit gate, namely rotations by θ around the z-axis using unitary operator R z (θ), in accordance with prevalent experimental practice [7].More generally one could consider the case that ̟ is any desired universal gate set, but here we restrict our attention to just two single-qubit gates, either {H, T = Z −1/4 } or {H, R z (θ)}, so ̟ is a binary, or logical, variable.In our algorithm, both of these sets are accompanied by two-qubit controlled-not gate CNOT ij with the i labelling the control qubit and j labelling the target qubit.

C. Output
Our algorithm yields an output string [C] that represents the resultant quantum circuit.This quantum circuit is a sequence of quantum gates that simulate the dynamics of the quantum system for an arbitrary input state.In our algorithm the basic quantum gates are members either of the finite-size universal instruction set {H, T, CNOT} or the continuously-parametrized set {H, R z (θ), CNOT}.
The single qubit gates are represented in the output as a string of the form "Hx" or "Tx" with x a bit string labeling the qubit acted upon by gate T or H, respectively.These gates, along with the identity 1 1, which is not explicitly stated in the circuit, can be parallelized and concatenated to form circuits.For example, the string H1T 2 represents a circuit that first performs the Hadamard operation on qubit 1 then a π/8-gate on qubit 2. The CNOT operation is represented similarly with the string "CNOTx,y" representing the quantum operation CNOT xy .For the case of the gate set that includes continuously parameterized single-qubit z-rotation gates, R z (θ), in the output, The action of R z (θ) on qubit x is represented as the string "RZ[θ], x" where [θ] is a string representing the rotation angle θ.

D. Processing
Our circuit-design algorithm proceeds through three stages.
Hamiltonian sorting algorithm: mutually commuting terms in H (n) are grouped together resulting in parallelized quantum simulation that reduces total runtime.
Circuit design algorithm for Pauli-exponentials: determine the quantum circuit and convert into output string [C].These algorithmic stages are described in the following sections.

E. Summary
An algorithm consists of input and output with the output obtained by processing the input.We have been careful to discuss the input and output as bit strings as our algorithm is classical and runs on a classical computer.The output of the algorithm is a design procedure to construct a quantum simulator circuit that would simulate the evolution of a quantum state.In the following sections we describe the algorithmic stages.

III. TROTTER-SUZUKI FORMULAS
In this section we discuss the second stage of the algorithm, which concerns decomposing the TS ordered-operator exponential into a sequence of exponentials of tensor products of Pauli operators.The second stage of the algorithm is discussed first because the first stage groups these TS terms so understanding the second stage helps to understand the terms being grouped in the first stage.
We use the TS method to determine a product of exponentials of ĥ(n) j that approximates U (n) (t) within 2-norm distance ǫ [17,18,24].The total time of evolution t is divided into r time intervals each of equal duration ∆t = t r .
For Ũ (n) χ the χ th -order TS iterate approximating the n-qubit unitary evolution U (n) , the distance between the 'true' evolution operator U (n) (t) and the TS approximated evolution operator is The iterative TS formula for generating U χ is well known [24].The formulae are widely used in quantum simulation algorithms because they generate an approximation which is a product of unitary evolutions, for Hamiltonians ĥ(n) j1 represented by the sequence j i , and a sequence of times t i .The TS formulae comprises a sequence of exponentials of Pauli operators that have simple circuits for quantum computer implementation [26].
χ (∆t) ( 14) using Suzuki's procedure [24].end function and the integer p obeys 1 < p ≤ χ.We emphasize this form of the TS formula because it is important for our grouping algorithm that the order of the exponentials in the product formula matches the order of the exponentials in the Hamiltonian.We express this TS stage of the algorithm as an outline of a computer program.The program's input is the bitstring representation [ Ĥ(n) ] of the the n-qubit Hamiltonian, the desired order of the Suzuki iteration χ, the evolution time t and the number of intervals r, which together yield the time step ∆t (9).The program's output bit-string representation for the χ th -order approximation to the true evolution: with M = 2m5 χ−1 following from the recursive form of the TS formulae ( 12) [24].The representation in ( 14) stores each exponential in the approximation as a sequence of strings that represent a Hamiltonian term a j h and then the duration of the evolution step (stored to finite precision).Our representation stores the exponentials in order of their execution; although, in this case, the symmetry of the TS formulae implies that we could also store them in reverse order without changing the result.The TS algorithm that generates a TS approximation of the form ( 14) is given by Algorithm 1.
The performance of the resulting simulation depends strongly on the chosen values for r and χ.If r = 0 or χ = 0, our program determines suitable values of r or χ; otherwise this program uses the user-supplied values.Given a specified value of χ, then is optimal [14,16], which guarantees that [14] provided that We employ the value of r in (15) as the default value of r for the algorithm and take our default value of χ to be because it causes the number of operations in the simulation to scale nearly linearly with t [14].
The value of r given in (15) can be larger than necessary for certain Hamiltonians (as shown in Section VIII).If a guarantee that the error is less than the tolerance ǫ is not required, then choosing r smaller than the optimal value given in (15)  Now we return to the first stage of the algorithm, which aims to reduce runtime by grouping TS terms based on generation by commuting Hamiltonians.In other words, TS terms are grouped together to parallelize the quantum simulation circuit.The benefits of grouping terms and exploiting parallelism is most notable in the case of physically local Hamiltonians, where we show that parallelism typically leads to a near-quadratic improvement to the scaling of the time required to simulate the quantum system's evolution.
The algorithm achieves this grouping by decomposing the Hamiltonian (1) into m groups of terms as such that all ĥ(n) l ∈ G j mutually commute.The Trotter-Suzuki algorithm can then be used to determine a sequence of exponentials of ĝ(n) j , namely the sum of the terms in each commuting set that simulates within error ǫ.Product-formula approximations are not needed to decompose the exponential of each group into a product of exponentials of Pauli-operations as each ĥ(n) j in any given group mutually commutes.In other words exp −iĝ for any value of t.
Finally, we note that explicitly grouping the terms in Ĥ(n) into ĝ(n) is unnecessary.Instead it suffices to sort the terms in the Hamiltonian by group membership (i.e., terms that are assigned to group G 1 appear first in [ Ĥ(n) ], then terms in G 2 appear next and so forth).If we use (12) to construct the TS formulae, then the resulting simulation will sort the steps in the simulation into commuting groups of operations.As commuting operations can often be executed in parallel, this procedure reduces the time required to execute the circuit in systems that can use parallelism.
In the first stage of the algorithm our program accepts the string S i , as introduced in Sec.II, to determine if two terms in Ĥ(n) commute.The Hamiltonians ĥ(n) i and ĥ(n) j commute if and only if, for the dummy variables v, w ∈ {X, Y, Z}, where | • | denotes the size of a set •.
As a clarifying example, consider the Hamiltonian (7).The first two terms in (7) commute because of the anticommutativity of Pauli-operators and because both terms have differing actions on an even number of qubits.Criterion ( 22) also tells us that they commute because On the other hand, the second and third terms do not commute as Criterion ( 22) accounts for the commutativity of Hamiltonians that act on disjoint sets of qubits as well as other commuting Hamiltonians such as the star and plaquette operators in the toric code [27].A proof of the general validity of ( 22) as a criterion for commutativity is given in Appendix A. If desired, a more restrictive grouping condition can be used only to group together operations that have actions on disjoint sets of qubits.
The criterion for commutativity in (22) can be used to find an efficient classical algorithm for grouping { ĥ(n) j } into groups of mutually commuting Hamiltonians, which we describe below formally.This grouping algorithm is efficient because m is polynomially large in n for local-Hamiltonians and ( 22) can be efficiently evaluated.
The depth of the resulting quantum circuit depends on the value of m for the Hamiltonian.The reductions in the depth vary with the number of groups required for the Hamiltonian in question.The question can, however, be addressed for the cases of generic k-local or physically k-local Hamiltonians (by generic we mean k-local Hamiltonians that include every possible p-body interaction for p ≤ k.).
We estimate the worst-case scaling of m for generic k-local Ĥ(n) by using the more restrictive grouping condition that ĥ(n) i and ĥ(n) j are assigned to the same group only if condition (25) is satisfied.This requirement will generically result in a smaller value of m than our grouping algorithm will yield.In this case, at most ⌊n/k⌋ k-body terms can be assigned to each group for k-local interactions.Terms with (k − 1)-body interactions (or fewer) can be neglected because n is assumed to be large and the vast majority of terms in the k-local Hamiltonian are k-body.Specifically, there are O(n k−1 ) terms with only (k − 1)-body interactions or fewer and O(n k ) terms with k-body interactions, which means that we can neglect terms that are only (k − 1)-local for generic Hamiltonians in the limit of large n.
The number of k-body Hamiltonians that can be assigned to each group before the Hamiltonians violate the grouping criterion (22), scales as Θ(n/k), with Θ the Bachmann-Landau notation for indicating that a function of this order is asymptotically bounded above and below by n/k up to multiplicative constants.For k a constant Physically local Hamiltonians are constrained such that each qubit interacts with at most a constant number of qubits.This implies that there are O(n) k-body terms present in physically k-local Hamiltonians.Therefore, the number of groups required for physically-local Hamiltonians scales as which we will see constitutes a nearly-quadratic reduction in the circuit depth for some simulations.

V. IMPLEMENTING TROTTER-SUZUKI FORMULAS
The main primitive element of the circuit-design algorithm is a basic circuit element C ℓ corresponding to a particular sequence of gates in our gate set to simulate evolution due to each exp{−ia j ℓ ĥ(n) j ℓ t ℓ } in the output of the subroutine described in Sec.III.The circuit construction that we use is an optimized version of that presented in [26].Underpinning this primitive C ℓ is the conversion of Pauli gate operations to operations in our gate set [4][5][6].
The simplest way to explain this step is by example: shown in Fig. 1.This circuit allows for a continuously parametrized phase-rotation gate R z (2φ) = exp(−iφZ), but of course a proper quantum algorithm would work with a finite gate set.However, the gate R Z (2φ) can be reduced to a finite gate set using the constructive version of the Solovay-Kitaev algorithm [28].Our algorithm takes a logical variable,̟ ∈ {0, 1} as an input that specifies the gate set from which the gates in the output should be drawn and uses the Dawson-Nielsen algorithm [28] to convert the continuous rotation gates into discrete gates if gate set 0 is chosen.We label the gate sets The circuit primitive for C ℓ , as illustrated in Fig. 1, is constructed by first choosing one of the system qubits to be the 'parity qubit'.The role of this qubit is to track the parity of qubits affected by ĥ(n) j ℓ when expressed in the eigenbasis of ĥ(n) j ℓ .This parity is required to perform U ℓ (a j ℓ t ℓ ) using diagonalization [4][5][6].The parity qubit is always chosen to be the qubit with the largest label amongst this set that is non-trivially affected by the Hamiltonian.We say that a qubit is trivially affected by a Hamiltonian if it acts as the identity on that qubit.As an example, the fourth (bottom) qubit is chosen to be the parity qubit for the Hamiltonian in Fig. 1.
The method we use to construct the simulation circuit is given in Algorithm 3. Specifically this stage of the algorithm produces a bit-string representation for a circuit approximating exp{−ia i h The algorithm for this is provided below.

VI. MAIN ALGORITHM A. Complete Procedure
We now construct the circuit-design algorithm, which employs the programs described in the previous three sections.The algorithm begins by using Algorithm 2 to sort the terms in the Hamiltonian, which ensures that neighboring entries in the list of terms that comprise the Hamiltonian commute (if possible).The next step uses the Trotter-Suzuki algorithm to find a sequence of simulations of the ĥ(n) j ℓ that approximates e −i Ĥ(n) t .The final step utilizes Algorithm 3 to find a quantum circuit that approximates each of the exp{−i ĥ(n) j ℓ t ℓ } to yield a complete description of the overall simulation circuit.The procedure is described in greater detail in Algorithm 4.
⊲ The efficiency of our main algorithm depends on whether r and 2m5 χ0−1 are polynomially large.It is straightforward to see by substitution that the default values used for both of these quantities scale polynomially with the simulation parameters, and hence is efficient.On the other hand, if the default values of r and χ are not used then the above algorithm may not be efficient.

B. Cost Estimates
We now provide upper bounds for the scaling of the circuit-size of the circuits yielded by our design algorithm using the default values χ 0 and r 0 , which are chosen to guarantee that the simulation time scales near-linearly with t and the error is at most ǫ/2 respectively.There are three costs that we consider: the number of gates in the resulting circuit, N op , the time required to execute the circuit using parallelism, τ , and the number of qubits required which in our case is trivially n.We assess the remaining costs N op and τ below.
The value of N op is bounded above by the number of circuit primitives, C ℓ , needed to simulate the evolution multiplied by the maximum cost of implementing a circuit primitive.The cost for implementing each circuit primitive C ℓ for a k-local Ĥ requires at most 10k single-qubit gates, 2k − 2 CNOT gates and one R Z rotation.This worst-case estimate is given by the cost of simulating a ĥ(n) j that acts as Y on k qubits because Y -interactions are the most expensive to simulate using our algorithm for finding C ℓ .The Solovay-Kitaev algorithm of Dawson and Nielsen [28] gives the cost of implementing the continuous gate R Z (2φ) within tolerance ǫ/(4m5 χ0−1 r 0 ) as

Algorithm 4 Main Algorithm
Input: [ Ĥ(n) ]: bit-string representation of the Hamiltonian n: The number of qubits t: evolution time ǫ: error tolerance r: number of time steps ⊲ r = 0 guarantees error is at most ǫ χ: iteration order of TS formula ⊲ χ = 0 guarantees near-linear time scaling.̟: logical value that indicates whether the discrete or continuous gate setis used Output: [C]: simulation circuit for exp(−i Ĥ(n we have that log(5 χ0 ) ∈ O( log(ma max t/ǫ)).Then using the properties of logarithms, we find that The total number of operations used to implement each primitive circuit C ℓ within tolerance ǫ is thus is simulated by C r so the total number of operations scales as We then substitute the value of r 0 into this expression to find that, with ǫ o (1) , m o (1) , and (max i |a i |t) o (1) representing quantities that scale sub-polynomially but not quite polylogarithmically.As N SK varies sub-polynomially with respect to all parameters and k is a constant, N SK + k can be incorporated into (m max i |a i |t/ǫ) o (1) in Relation (33).This leads to the conclusion that The performance of our circuit-design algorithm is enhanced if a reasonable extra restriction is placed on the k-local Ĥ.The upper bound m is different for k-local vs physically k-local Ĥ as we now see.For k-local Ĥ(n) , because there are at most 3 q n q q-body terms in Ĥ(n) for q = 1, . . ., k.The scaling m ∈ O(n k ) arises from standard inequalities for binomial sums and, as k is a constant, then so is 3 k .If Ĥ(n) is physically k-local, m ∈ O(n) because each qubit interacts with at most a constant number of neighbors.
We can then eliminate m from ( 33) by noting that, if Ĥ(n) is k-local, then m ∈ O(n k ) for k a constant, and We substitute this scaling into (33) to obtain Comparing ( 36) to (37) shows that the simulation cost is dramatically reduced for Ĥ(n) physically k-local rather than just k-local.This cost reduction does not occur from a modification of the algorithm, but rather a more careful costing of the performance of our algorithm.
In fact the circuits generated by our algorithm are optimal, or near-optimal, in three distinct ways.First, they exhibit near-optimal scaling with t because linear scaling is known to be a lower bound for general quantum simulation [17,18,29].Second, they have optimal space complexity because a minimum of n-qubits of memory is needed to simulate the quantum dynamics of an n qubit system.Finally, the n-scaling of ((36)) and ( 37) is unlikely to be surpassed by other general purpose TS-based simulation algorithms because the scaling with n is derived from the value of m for the Hamiltonian [17,18].
The fact that better scaling with n cannot be obtained by using a superior decomposition method for the Hamiltonian follows from Vizing's edge-coloring graph algorithm [30], which states that a graph with maximum degree d cannot be colored using fewer than d colors.This implies that a d-sparse Hamiltonian can be decomposed into at best d one-sparse matrices.A k-local Hamiltonian can be at most O(n k ) sparse, which implies that O(n k ) terms will be present in the Hamiltonian using the optimal decomposition method.This value of m coincides with the value that our algorithm finds for simulating k-local Hamiltonians; hence our algorithm is unlikely to be significantly surpassed by other algorithms that use similar strategies.Now we will examine the scaling of the time required to implement the resultant quantum circuits on quantum computers that can exploit parallelism.Without grouping, the depth of the quantum circuits yielded by our algorithm scales with m is at worst m 2+o (1) .Although our grouping step does not change the circuit size, it causes the depth of the resulting circuits to scale as mm 1+o (1) , where m may be smaller than m.
The factor of m 1+o (1) remaining in the scaling comes from the upper-bound used to estimate error in the Trotter-Suzuki formulas, which does not change if grouping is used.Thus, parallel execution of the exponents in each group can be used to reduce the scaling of the execution time of the quantum simulation, τ , for k-local Hamiltonians to and for the case of physically k-local Hamiltonians it becomes which scales nearly-quadratically better with n than what we would expect if grouping were not used.

VII. EXAMPLES
We now examine the performance of our circuit construction algorithm when applied to simulating the quantum dynamics of two important physical systems.Specifically, we examine simulating Kitaev's Honeycomb model and pairing models similar to the Bardeen-Cooper-Schreifer model of superconductivity.
A. Simulating Kitaev's Honeycomb Model Consider Kitaev's honeycomb model [27] described by the Hamiltonian Ĥ with the links shown in the honeycomb-lattice representation depicted in Fig. 2.Although exactly solvable [27], the ground state has applications for topological error correction and is difficult to experimentally prepare.Our simulation circuits can then be used (in conjunction with a ground-state preparation method such as adiabatic state preparation [23] or the Abrams Lloyd algorithm [19]) to prepare the ground states of such Hamiltonians.
The resulting sequence of exponentials yielded by our circuit design algorithm yields a sequence of exponentials of X i X j , Y i Y j and Z i Z j .Figure 3 gives the simulation circuits that our algorithm uses to simulate each of these exponentials.We can use these diagrams to find the number of operations used in the simulation by using the fact that each term in the Hamiltonian appears 2(5) χ−1 times in the TS formula and that there are n/2 different XX, Y Y and ZZ interaction terms in the Hamiltonian.
As r TS formulas are used in the simulation, the total number of times each of these three types of interactions appears is n5 χ−1 r.The total number of gates required for the simulation can be found by multiplying the number of interactions of each type by the number of gates needed to simulate that type of interactions (explicit constructions for these circuits are given in Fig. 3).The total number of operations required to simulate the Honeycomb model is summarized below.
• Hadamard gates: 8n5 χ−1 r, • T gates: 16n5 χ−1 r, • Z-Rotation gates: 3n5 χ−1 r, • C-NOT gates: 6n5 χ−1 r, where r is the number of time steps used in the simulation and χ is the iteration order of the Trotter-Suzuki formula used in the simulation.If r is chosen as per (15) then the error is promised to be less than ǫ/2 given ǫ is sufficiently small [17,18]; however, other choices of r are possible if rigorous guarantees that the simulation error is less than ǫ are not desired.
We can then directly estimate the scaling of the circuit size with the simulation parameters if χ and r are taken to be the default values for our algorithm.Kitaev's honeycomb-model Hamiltonian is physically two-local with m ≤ 3n/2.

Combining the observation that max
with (37) implies that our circuit-design algorithm yields C for simulating U (n) (t) within error tolerance ǫ and with a circuit size that scales as elementary gates from G acting on only n qubits.This scaling is significantly better than previous algorithms, which had a bound on the number of gates in O(n 4 log * n) and utilize quantum oracles that may be difficult to implement [17,18].

B. Simulating Pairing Models
Our second example simulates general pairing Hamiltonian evolution, which is central to studies of superconductivity in many-body systems.A notable example of such Hamiltonians is the BCS Hamiltonian, which describes the interaction of electrons on a lattice according to [31,32] where âp is the fermionic annihilation operator for a fermion in states p = (p, ↑) and −p = (−p, ↓) with p the particle's momentum and ↑ and ↓ its spin, Np is the number operator for fermions in state p, E p is the on-site interaction strength, and V pl is the interaction strength between neighboring fermions.The general BCS Hamiltonian (42) can be mapped to a spin system by using each spin to represent the presence or absence of a fermion in that mode.Wu, Byrd and Lidar [23] show that a large class of pairing models that subsume the BCS Hamiltonian can be expressed as for X p , Y p and Z p the Pauli X, Y and Z operators applied to qubit p. Specifically, the BCS Hamiltonian has V − pl = 0 and γ p = E p + V pp .As in the previous example, our circuit design algorithm yields a sequence of exponentials of the different terms in the Hamiltonian.The circuit implementations of the resulting exponentials can also be seen in Fig. 3.By multiplying the total number of exponentials of each type by the number of gates used in the implementation of each exponential and collecting the result, we find that our algorithm yields a circuit containing the following numbers of gates: • Hadamard gates: 16n(n − 1)5 χ−1 r, • T gates: 32n(n − 1)5 χ−1 r, • Z-Rotation gates: 2n(2n − 1)5 χ−1 r, • C-NOT gates: 8n(n − 1)5 χ−1 r, where the algorithms cited scaling is found by choosing r as per Eq. ( 15) and substituting the asymptotic scaling of r for χ chosen approximately optimally [17,18].The value of χ that reduces the total number of gates can be found by minimizing the sum of these gate counts over all χ.
Previously, circuit design yielded circuits with O(n 5 t 2 ) gates with no promises about the accuracy of the simulation [23].The circuits yielded by our algorithm are polynomially shorter than this previous best method.This follows from the fact that H p is two-local and hence m ∈ O(n 2 ), which implies that our algorithm yields a circuit with complexity when the default values of r and χ are used.
Our algorithm can thus generate efficient quantum circuits for simulating the dynamics of general BCS Hamiltonians.Our resultant circuits can be used in conjunction with eigenvector simulation techniques or adiabatic state preparation to approximate the ground state of a quantum system and can thus be useful for autonomously determining whether classes of pairing models afford exotic types of superconductivity.

VIII. NUMERICAL ESTIMATES OF ERROR IN LOW-ORDER TROTTER-SUZUKI FORMULAE FOR 2-BODY HAMILTONIANS
The results of the previous section suggest that quantum simulations of pairing Hamiltonians may be difficult for large n because the number of required operations scales, at most, as n 4+o (1) ; however, we do not know whether this upperbound is tight.Here we provide numerical evidence that the error bounds on which this scaling is based can substantially overestimate the error in the Trotter-Suzuki formula for a given simulation.The upper bound in question is used to estimate the number of time steps, r, needed in the simulation and therefore we will conclude that the upper bounds for r cited in [14,17,18] are too loose for some practical cases.
We analyse the Trotter-Suzuki error for Hamiltonians chosen randomly from an ensemble of 2-body Hamiltonians that have their a j independently distributed according to a Gaussian with mean zero and unit variance.We choose two-body, rather than 2-local Hamiltonians, for simplicity because the 1-body terms present in 2-local Hamiltonians become less relevant to the random Hamiltonians that we consider as n increases.Their ellimination therefore allows us to estimate the scaling of the error and H (n) over a larger range of n.
We estimate the error invoked by using the two lowest-order Trotter-Suzuki formulas by numerically sampling random 2-body Hamiltonians taken from our Gaussian ensemble.The error in the formula is measured by the 2-norm FIG.6: This plot shows the mean error in the second-lowest-order TS formula as a function of the duration of each time step and the approximation (46) for a set of 50 randomly generated 2-local Hamiltonians sampled from our Gaussian ensemble for each data point.We observe that the data is within statistical error of the fit in (46).
If inequality (51) is not satisfied, then higher-order formulae such as are likely to yield more efficient simulation circuits.Present simulation experiments, however, are often confined to short evolutions with relatively large error tolerances (because direct comparison with numerical results is possible).As a result, low-order approximations remain relevant for present experiments.
Tables I and II provide estimates of the number of exponentials that need to be implemented by Algorithm 3 in a simulation of a random 2-body Hamiltonian.We vary the number of qubits over an experimentally reasonable regime (n = 2, 4, 10 qubits) and then extrapolate the scaling beyond the limitations of existing classical simulators to 40 and 100 qubits.In order to perform this extrapolation, we need to know the norm of the Hamiltonian.We find from data fitting for cases up to 8 qubits that the ensemble average of the norm of our random 2-body Hamiltonians obeys that have to be implemented to simulate a typical random 2-body Hamiltonian with aj ∼ N (0, 1) for ǫ = 10 −6 and t = 0.01. of ǫ.Note that although the time is kept constant for these data sets, the (expected) norm of the Hamiltonian does not.
The number of operations required in the simulation scales (for fixed order Trotter-Suzuki formulas) as O(n 2 r).Our numerical estimate of the ensemble mean of the norm of H in (52) and the approximate bounds for r in (48) and (49) lead us to the conclusion that the number of operations required to simulate a random 2-body Hamiltonian scales with n as o(n 3.4 ) as opposed to the n 4+o (1) scaling predicted from the upper bounds for the error incurred by using either U  The results in Tables I indicates that using U    II shows that U 2 is more efficient than U 1 in every case considered except for the case where n = 2.We can therefore conclude that low-order Trotter-Suzuki formulas can sometimes be more efficient than high-order formulas for relatively undemanding simulation problems.The data also suggests that extremely small gate errors (error on the order of 10 −7 per gate) may be required to extend the DQS paradigm out to 40 qubits and beyond because millions of gates are expected to be required for relatively modest simulations of 2-local Hamiltonians.

IX. CONCLUSION
In conclusion, we have designed an efficient classical algorithm for autonomous construction of efficient quantum circuits to simulate state evolution.The circuits are costed in terms of a small standard gate set and require neither a Hamiltonian oracle nor ancillary qubits, thereby making the universal quantum simulator minimal in space cost.Furthermore, we show that the costing is still too pessimistic in some cases and could be improved but some orders of magnitudes.The algorithm also systematically searches out commuting terms in the Hamiltonian and groups them to reduce the depths of the circuits yielded by our algorithm; thereby reducing the time required to execute the circuits using parallelism.
Our circuit construction algorithm is a significant advance because it is straightforward to implement the algorithm on a computer, it is highly efficient and it gives upper bounds for the simulation error.Knowing error bounds is important for assessing the veracity of any quantum simulation.Our work thus provides an important step towards constructing a practical, efficient and trustworthy simulator of quantum dynamics.
There are several remaining open problems that have not been addressed by this work.First is the issue of simulating time-dependent quantum systems.Such issues can be resolved by employing Trotter-Suzuki formulae for ordered operator exponentials [14,15] although the optimality of those error bounds for actual circuits remains to be checked using algorithmic methods we have introduced here.
Similarly, providing better upper bounds for r remains an important problem as our work has shown that in physically significant cases these bounds can be far too pessimistic.Providing such bounds would enable much more challenging simulation experiments to be performed in cases where rigorous error bounds are required of the output states.
A final important extension of this work is discussing the optimization of the resulting circuits that are yielded by the algorithm.Further optimization should be possible by using circuit identities to simplify the resulting circuits.Finding autonomous methods to optimize the output of such algorithms would be a significant asset in the development of a DQS that exceeds the power of existing classical simulators of quantum systems.

FIG. 2 :
FIG. 2: Kitaev's honeycomb lattice with each vertex representing a physical qubit and each edge representing an interaction between two qubits denoted by x-, y-, and z-links.

FIG. 3 :
FIG.3: Circuits for simulating evolution due to exponentials of the terms present in the honeycomb model or pairing models: (a) simulates e −iY i Y j φ (b) simulates e −iX i X j φ and (c) simulates e −iZ i Z j φ .Exponentials of the form e −iZpφ can be trivially simulated with an Rz rotation.

FIG. 4 :
FIG.4:This plot shows the mean error or mean upper bound for the error incurred when using the lowest-order Trotter-Suzuki formula on 50 randomly generated two-body Hamiltonians acting on 4 qubits that were sampled from our Gaussian ensemble of Hamiltonians.This result shows that the lowest-order Trotter-Suzuki formula given in (44) can be too loose by as much as six orders of magnitude when applied to simulations of 2-local Hamiltonians.

FIG. 5 :
FIG. 5:This plot shows the mean error in the lowest-order TS formula as a function of the duration of each time step and the approximation (45) where each point is randomly drawn from a Gaussian ensemble of 2-body Hamiltonians.Error bars are computed via the standard deviations of the measured results for the ensembles and only the upper bar is plotted because the standard deviation is comparable or exceeds the mean value for all of these data points.The data are therefore within statistical error of our approximation.
/r) or U /r).This suggests that the complexity of simulating random two-local Hamiltonians may be polynomially smaller than previous work implies.
/r) instead of U /r) leads to a reduction in the simulation complexity for small values of n, but U /r) leads to more efficient simulations for n ≥ 40.In contrast, the data in Table could suffice and thereby reduce runtime.From 2 to m do isAssigned ← 0. for p from 1 to ms do if isAssigned = 0 then⊲ Checks if term commutes with terms in Gp isAssigned ← 1 if v =w |Sv i ∩ Sw j | for each (ai, li, Si) in Gp. if isAssigned = 1 thenGp ← Concatenation of Gp with (aj, lj , Sj).

Table I
shows how the number of required exponentials varies if U 2 is used instead of U 1 for a relatively modest value of ǫ and t, whereas Table II presents the number of exponentials for a shorter evolution with a very small value n Nexp for U j