Divide and conquer approach to quantum Hamiltonian simulation

We show a divide and conquer approach for simulating quantum mechanical systems on quantum computers. We can obtain fast simulation algorithms using Hamiltonian structure. Considering a sum of Hamiltonians we split them into groups, simulate each group separately, and combine the partial results. Simulation is customized to take advantage of the properties of each group, and hence yield refined bounds to the overall simulation cost. We illustrate our results using the electronic structure problem of quantum chemistry, where we obtain significantly improved cost estimates under very mild assumptions.


Introduction
Classical simulation of quantum mechanical systems is a very difficult problem. The computational cost of the best classical deterministic algorithm known grows exponentially with the system size. In some cases classical randomized algorithms, such as quantum Monte Carlo, have been used to overcome the difficulties, but these algorithms also have limitations. On the other hand, as Feynman proposed [1], quantum computers may be able to carry out the simulation more efficiently than classical computers. This led to a large body of research dealing with quantum algorithms for Hamiltonian simulation , with numerous applications to problems in physics and chemistry [30][31][32][33][34][35].
In the Hamiltonian simulation problem one is given a Hamiltonian H acting on q qubits, a time  Î t , and an accuracy demandε, and the goal is to derive an algorithm constructing an operatorŨ approximating the unitary operator e −iHt with error  e --  U e Ht i measured in the spectral norm. When the Hamiltonian is given explicitly, the size of the quantum circuit realizing the algorithm is its cost. In particular, the cost depends on the complexity parameters q, t and ε −1 . On the other hand, when the Hamiltonian is given by an oracle, the number of queries (oracle calls) used by the algorithm plays a major role in its cost, in addition to the number of qubits and the other necessary quantum operations. Different types of queries have been considered in the literature. It is interesting to note that there are cases where the query complexity is low, but when considering the query implementation and the resulting total gate count the picture may be quite different. We give such an example in section 2.
There are papers that study only the query complexity. For example, [13] uses a splitting formula of order + k 2 1 [36,37]  . It is assumed that the Hamiltonian H is given by an oracle (a 'black box'), and that H can be decomposed efficiently by a quantum algorithm using oracle calls into a sum of Hamiltonians H j , j=1, K, m, that individually can be simulated efficiently. They approximate e −iHt with error ε by a sequence of N unitary operators of the form H . This kind of query has been considered in numerous other papers, see e.g., [2,8,10,12,19]. The cost of the simulation is the total number of oracle calls, which is proportional to the number of exponentials N. For each Hamiltonian H j appearing in the sequence, the algorithm must make oracle calls to simulate it. In principle, since the H j are obtained by decomposing H by the algorithm, an oracle call to any H j is simulated by making oracle calls to H; see [10, section 5] for details.
Then [13] shows the number of exponentials is bounded from above by Our approach is especially useful when the number m of Hamiltonians H j is large, and many of the H j have relatively small norm. Such Hamiltonians are common in physics and chemistry [31,34,35,[38][39][40][41]. For example, a system of interacting bodies or particles is described typically by a Hamiltonian of the above form.
Without loss of generality, assume that the H j are indexed as For many problems, the norms   H j vary substantially, and many Hamiltonians may have norm      H H j 1 . Then one can take advantage of the discrepancy between the norm sizes to derive fast simulation algorithms. The main idea is as follows: 0 Partition the Hamiltonians H 1 , H 2 , K, H m into groups using the magnitude of their norms Ideally Hamiltonians with similar norm magnitudes are grouped together.
1 Approximate e −iHt pretending that the sum of Hamiltonians in each group can be simulated exactly.
2 Simulate the sum of the Hamiltonians in each group separately with sufficient accuracy.
3 Combine all the group simulation results, by plugging them into the approximation of step 1, to get the overall simulation of H.
A rough top-level description of the procedure above applied to two groups and utilizing splitting formulas is shown in figure 1 below. Nevertheless, our approach is not limited to splitting formulas. independently, and then combine the partial simulation results using a splitting formula. Observe that e −iHt → e −iAt as    H 0 3 and in the limit the total simulation cost becomes independent of m. Thus in the limit the bound (1) holds with m replaced by 2. This suggests that when many Hamiltonians are small in norm one should be able to improve the cost estimate(1) by partitioning them into groups and, for instance, using splitting formulas of different orders as we indicate in figure 1, and we will explain in detail later.
An application enjoying these properties is the simulation of the Born-Oppenheimer approximate electronic Hamiltonian in the second quantized form [42,43] where a † and a are fermionic creation and annihilation operators respectively, and  Î h h p q r , , , , , pq pqrs  = ¼ s 1, , , are provided as input and are computed with respect to a selected set of single-particle basis functions. This simulation problem has been well-studied in the literature; see e.g. [17, 27-29, 31, 44-47]. The number of single-particle basis functions  is typically chosen to be proportional to the number of particles in a given problem. The best classical algorithms can reasonably solve problem instances with  in the range 50-70, and it is believed that a quantum computer able to simulate problem instances with   100 will solve many important applications ranging from chemical engineering to biology [45]. In these cases, the required modest number of qubits (typically  Q( ) [48]) makes these very attractive applications for early quantum computers. Note that the cost of Hamiltonian simulation remains the primary bottleneck to solving this problem on a quantum computer, despite the many recent advances in quantum simulation algorithms. To see this, even without considering the other complexity parameters et, 1 , the number of Hamiltonians is  = Q( ) m 4 in (4) which is substantial. Algorithms that have cost, say, proportional to  Q( ) 9 , appear impractical because for  = 100 we have  = 10

Summary of the main results
For simplicity and brevity we only discuss here the case where we partition the Hamiltonians in two groups, but the idea extends to many groups, as we show in section 3. Let H=A+B, with = å =  . The bound (1) for the number of queries N scales with m as + m k 2 1 2 , and our goal is to improve that.
1. Suppose we have two arbitrary algorithms t t - i , simulating the Hamiltonians A and B for a certain amount of time  t Î , respectively. We show how splitting formula structure may be used to combineŨ A andŨ B such that an approximation~( ) U t to U(t)=e −iHt is achieved. Indeed, slicing the time t into n intervals of length t/n and using the Strang splitting formula [13] we get the overall approximation 2 e e e e , w i t h . 5 Then to obtain are each of order ε/n. It may be desirable to use higher-order splitting formulas instead of the Strang splitting formula to combine the partial results, such that error and cost are further reduced 2 .
In what follows we consider splitting formulas to deriveŨ A andŨ ; B however, in principle, other applicable simulation techniques can be used instead forŨ A andŨ B . Moreover, in practice criteria other than the norms could be used potentially to group the Hamiltonians, such as sparsity, commutativity, or unitarity or any other property which may allow one to use an advantageous algorithm for simulating the Hamiltonians in that group. .
Roughly speaking, the two terms of the cost bound above correspond to the cost of simulating the Hamiltonians in the two groups forming A and B, respectively, plus some partitioning/recombining overhead that is captured by the maximum function. The novelty of the algorithm is that it uses substantially fewer exponentials to simulate Hamiltonians of small norm, relative to the number of exponentials required for Hamiltonians of much larger norm, while maintaining the desired accuracy. In this respect, different time slices are chosen adaptively to simulate Hamiltonians in different groups. As a result, it is possible to use few exponentials to simulate a large number of Hamiltonians H j of relatively small norm for longer time slices, and this reduces the overall simulation cost. We emphasize that even though the cost bound (6) appears complicated, implementing the algorithms achieving this bound is straightforward; for example see (5). Items 3 and 4 below illustrate the impact of the divide and conquer approach, relative to earlier work, as the number m of terms grows and becomes huge. Item 5 shows the practical advantage of the divide and conquer approach for the simulation of the electronic Hamiltonian.
In particular, when a relatively small number of H j form A so that ¢ = ( ) m O m a , and when -¢ = , for 0ba<1, we have a speedup over the number of queries in independently of e t, , where N prev denotes the upper bound shown in (1) with the same k. Observe that this quantity goes to 0 as  ¥ m .
4. In [10,13,19], it is shown how for splitting methods the order of the splitting formula may be selected 'optimally' such that the derived cost bound is minimized. We show how the parameters k, k A , and k B may be similarly selected for our algorithm to minimize a cost upper bound. Let * N prev and * N be the resulting numbers of queries for the algorithm in [13] and for our algorithm, respectively. We show conditions for a strong speedup over [13] in the sense that forfixed , . m prev 5. We apply our algorithm to the approximate electronic Hamiltonian (4) of quantum chemistry. Let  be the number of single-particle basis functions. The number of Hamiltonians in (4) is  ( ) O 4 . We can assume that the largest Hamiltonian norm in the sum is constant. It is known that in practical cases a large number of terms have very small norm [28,46,49]. This allows us to dramatically improve the simulation cost. The table that follows illustrates this point by comparing our techniques to others. Recall that the important complexity parameter is  and not ε. We express the cost with respect to  in table 1 below assuming e t, are constants. We remark that our cost estimates of   - 5 7 are consistent with empirical studies indicating that previous cost and error estimates may be overly conservative for practical applications [27]. We emphasize that standard circuits implementing the evolution of the terms in (4) can be incorporated into our algorithm directly to yield its gate level implementation. For example, one can use the circuits in [17] which have been obtained through the Jordan-Wigner transformation to obtain a total gate count proportional to the number of queries multiplied by  . Alternatively, using the Bravyi-Kitaev transformation [50], the total gate count is proportional to the number of queries multiplied by  log .
In the remainder of this paper, we describe our general approach to Hamiltonian simulation in detail, derive two explicit algorithms that use splitting formulas for the group simulations, and consider their application to quantum chemistry. The paper is organized as follows. In section 2, we discuss general considerations concerning the query complexity and different types of queries for Hamiltonian simulation. In section 3, we review splitting methods, discuss recursive splitting formulas, analyze generally the recombination of partial simulation results. In section 4 we give two algorithms for Hamiltonian simulation and derive error and cost bounds, and the speedup they offer. In section 5 we apply our approach to simulating the electronic Hamiltonian and show how for many practical problem instances our approach gives a significant speedup. Several of the proofs of our results are included in the appendix.

General considerations for Hamiltonian simulation
We briefly discuss some considerations concerning algorithms dealing with the query complexity of Hamiltonian simulation. As we indicated, algorithms using splitting formulas assume that = å = H H j m j 1 is given by an oracle, and the query complexity is the number of oracle calls the algorithm makes and that is proportional to the number of exponentialse H t i j . It is further assumed that the exponentialse H t i j can be computed exactly, for otherwise any error must be accounted for in the total error estimates. When we are dealing with query complexity, the implementation cost of eache H t i j is not a concern. However, any physical realization of the algorithm should account for that as well. There are cases where the implementation cost of the exponential is low, for example, when dealing with the Laplacian Δ [51], or other operators that can be diagonalized efficiently, as well as the terms of the electronic Hamiltonian (4) as shown in [17,50].
The algorithms in [10,13], as well as the algorithms presented in the paper, use splitting formulas and express the query complexity in terms of the norms of the Hamiltonians comprising H. The implementation of these algorithms requires the knowledge of the norms This could be a limitation, although there are applications where adequate norm information is available or can be easily obtained. In certain applications involving partial differential equations, the norms are known analytically. Furthermore, we do not require precise estimates of these norms to implement our algorithms. Overestimates will maintain the accuracy while increasing the cost accordingly. In the case of the computational chemistry problem that we consider, modulo constant factors, the norms are given by the | | | | h h , pq pqrs and are obtained by estimating the one-electron and two-electron integrals numerically with sufficient accuracy. These integrals depend on the basis function set under consideration, and are typically approximated on a classical computer.
An advantage of splitting methods is that they lead to algorithms that are conceptually easy to understand since they are products of matrix exponentials. They offer flexibility in the design and analysis of the algorithms by allowing one to perform partial simulations and then combine their results, due to the fact that they can be implemented recursively as we show in section 3. They are deterministic in the sense that any repetition produces the same output with exactly the same accuracy. The simulation methods [24][25][26] do not have this property. Finally, splitting methods are regarded as having some appealing features for physics applications. A splitting method 'conserves important symmetries of the system dynamics', and has a ' remarkable advantage' according to [52]. Suzuki also remarks that splitting methods are particularly useful for studying quantum coherence [36].
There are alternatives to splitting formulas. In particular, [24] uses a different type of query to simulate dsparse Hamiltonians. Namely, one is given access to a d-sparse Hamiltonian H acting on q qubits via a black box that accepts a row index i and a number j between 1 and d, and returns the position and value of the jth nonzero entry of H in row i. The paper shows a clever technique applied in combination with oblivious amplitude Table 1. Summary of the cost, expressed in terms of  , for different algorithms for the simulation of the electronic Hamiltonian (4). In rows 1 and 3 the cost is expressed in terms of the number of queries. In row 2 the cost is expressed as the total gate count.

Method
Cost dependence on  Splitting formulas [13,45]   - 8 9 Truncated Taylor  where   · max is the maximum norm. This is an important result. The dependence of the cost on ε −1 is exponentially better in the latter case. However, this fact is not sufficient to conclude that the algorithm is exponentially faster than previously known simulation algorithms, because the size of the other complexity parameters, τ and particularly the Hamiltonian norm   H max , needs to be taken into account as well. For instance, the spectral and maximum norms are proportional to dh −2 in the case where H=−Δ h +V h is a matrix obtained from the discretization of the d-variate Laplacian Δ and a uniformly bounded d-variate potential function V on a grid with mesh size h [53]. In this case, the sparsity of H is Q( ) d . Thus, for univariate functions the sparsity is constant. If we set e = h , both 3 cost estimates (1) and (7) become polynomial in ε −1 and there is no exponential speedup. It is easy to extend this argument to d-variate functions and the situation is more interesting. In this case H is a matrix of size e é while that of (7), modulo polylog factors, is proportional to ed t. 3 2 Both query estimates are low degree polynomials in each of the complexity parameters. Moreover, polynomial improvements, such as reducing the exponent of d in (7) by one, as in [26], hardly make a difference. This situation is typical for matrices obtained from the discretization of ordinary and partial differential equations. We may have an exponential speedup when τ is at most polylogarithmic in ε −1 , but this is not typically the case in practice. Indeed [24] does not mention any practical situation where an exponential speedup is realized. These considerations apply to other recent papers also showing polylogarithmic dependence on ε −1 of the query complexity [25,26]. It is interesting to observe that the query complexity might be low and depend on ε −1 polylogarithmically as in [25], yet when one considers the total gate count the picture may be quite different. An example can be found in [29, tables 1 and 2] which applies [25] to the simulation of the second-quantized electronic Hamiltonian (4). In particular the query complexity is proportional to  t 4 times a quantity polylogarithmic in  t, and ε −1 , while the total gate count is proportional to  t 8 times a quantity polylogarithmic in  t, and ε −1 . Improvements of the gate count are possible under significant assumptions on the class of basis functions used and assumptions about the cost and accuracy in computing the h pq and h pqrs by the quantum algorithm. Moreover, in chemistry the desired accuracy is not arbitrarily small [28] and thus it may impact the cost only by a constant factor. The important parameter is  which is the number of single-particle basis functions used in the approximation of the Born-Oppenheimer electronic Hamiltonian. Larger values of  give more accurate approximations of the Hamiltonian operator.
Although improving exponentially the dependence of the simulation cost on ε −1 is very significant, there are other issues as well to consider. We already mentioned that the other complexity parameters may be dominant. The particular type of queries used assumes that one has precomputed and stored the positions and values of all nonzero entries for every single row of the Hamiltonian. This is also discussed in [54]. Simulation algorithms relying on oblivious amplitude amplification are probabilistic. This means that for applications where numerous Hamiltonian simulations need to be carried out, such as in phase estimation, the overall success probability must be boosted. In particular, making oblivious amplitude amplification deterministic requires a special rotation denoted by S in [24, figure 2]. This rotation depends on the exact computation (i.e., with potentially infinitely many bits of accuracy) of a numerical expression that depends on the input data, and the physical implementation of this rotation must be exact using quantum gates from a finite universal set. The cost of this can be immense and this is not explained in [24] which only mentions that such a rotation S exists. The difficulty in obtaining a deterministic algorithm is a numerical stability consideration.

Divide and conquer approach to Hamiltonian simulation
We now formalize the main ideas of our approach, and give preliminary analysis of the general case where arbitrary simulation algorithms may be used for the simulation of different groups of Hamiltonians. We first review some details of quantum algorithms based on high-order splitting formulas. In particular, the analysis of our algorithms in later section will generalize that of [10,13]. Along the way, we have derived slightly tighter results than those of [13] for the usual application of splitting formulas, and we include them in the appendix.
Our goal is to take the Hamiltonian simulation problem and partition it into a number of smaller and simpler Hamiltonian simulation problems, then solve each one of them, and combine the results. The splitting should be customized to take advantage of the properties of each of the subproblems, yielding refined bounds for the overall simulation cost.
In certain applications, for instance in chemistry, Hamiltonians with very small norm can be discarded from the sum (2), to the extent that this does not affect the desired accuracy.
3.1. Hamiltonians that do not affect the accuracy can be ignored We formalize the notion that Hamiltonians of very small norm relative to the accuracy ε may be discarded, and it suffices to consider the simulation problem for the remaining Hamiltonians. This may substantially reduce the cost, particularly for problems where ε is not arbitrarily small.
The proof the proposition is shown in the appendix. Thus, when the conditions of the proposition are satisfied, simulating A with error e 2 implies the simulation of H with error ε. Remark 1. Equation (8) implies that the aggregate norm of the discarded Hamiltonians must be small, not just the norms of each of the discarded Hamiltonians. Generally, Hamiltonians cannot be discarded without considering how many they are and the magnitudes of the other problem parameters for the particular problem instance.
, then a sufficient condition for the Hamiltonians H j in B to satisfy In practical applications a large number of 'negligible' Hamiltonians are sometimes discarded, often using heuristics. For example, in quantum chemistry, an ad hoc fixed cutoff parameter, say 10 −10 , is used [43]. However, in general the effect of discarding terms must be accounted for in the error analysis. We will assume that possible discarding of Hamiltonians according to proposition 1 may have happened as a preprocessing step. Our results and proof techniques do not depend on whether Hamiltonians have been discarded or not. Thus, from this point on m will refer to the total number of Hamiltonians that we consider for our algorithms. Using this formula with finite n gives an approximation of e −iHt . Without loss of generality, and to avoid dealing with absolute values, we will assume > t 0. Selecting n, often called the Trotter number, large enough such that . This gives a secondorder approximation. A third-order approximation is given by the Strang splitting formula , e e ... e e e ... e , 11 Applying S 2 (Δ t) over each time slice Δt yields the approximation Assume for the moment that Δt is chosen such that the number of time slices n=t/Δt is indeed a positive integer. Otherwise, we would have = D ⌈ ⌉ n t t , and a single different final time 4 Note that [46, equation (24)] considers discarding Hamiltonians in the context of eigenvalue estimation and derives a similar condition to (9), but without the factor of t which is important for simulation. 5 For simplicity, when the underlying Hamiltonian decomposition is clear we will use Suzuki [36,37] gave high-order splitting formulas. These are recursive formulas S 2k of order over each time slice Δt and unwinding the recurrence relation yields a product of N exponentials It is important to observe that Suzuki's formulas hold asymptotically for sufficiently small Δ t, and do not reveal the dependence of the error on m or the norms Application of these formulas requires explicit calculation of the prefactors in the error bounds. Typically, cost estimates for splitting methods are expressed as the product of the number of time slices and the number of exponentials required to carry out the simulation within each time slice. In particular, the estimates for the simulation error and cost in [13] depend on e m k , , , the largest norm   H 1 , and the second largest norm   H 2 . In [13, section 4] the quantity M is defined as This bound is derived as the product of two terms The first factor is the number of exponentials per time slice, which is bounded by - The second factor is equal to n which bounds the number of time slices. Note that if the argument of the ceiling function is at most one, a single time interval suffices for the simulation. The cost bounds in section 4 for algorithms 1 and 2 are generalizations of (15).
Recall that the upper bound (15) does not account for any finer problem structure, such as the possibility that a number of Hamiltonians have norms significantly smaller than   H 2 . The Hamiltonians may be partitioned into groups based on their relative norms, and each group simulated independently with our algorithms. This leads us to refined cost estimates which depend not just on   m H , 1 , and   H 2 , but on the number of Hamiltonians in each group and largest Hamiltonian norm within each group.
Furthermore, under weak conditions which guarantee the argument of the ceiling function in (15) is at least one (e.g., for sufficiently large   m H t , , 1 , or e 1 ), (15) may be bounded to obtain and from this [13, section 5] shows the 'optimal' k (in the sense of minimizing the upper bound N(k)),  We will be comparing our results against this estimate. Further observe that * k is given by an extremely slow-growing function of the problem parameters. For example, for the values Therefore, in most practical cases, one can determine the optimal value of k by inspection, without carrying out a formal analysis.

Recursive splitting formulas
Our discussion in this section can be extended to high-order splitting methods. However, for brevity we consider the Trotter formula to illustrate the ideas.
Suppose the number m of Hamiltonians is large, and we are given a partition, specified by some number ¢ < m m, as We consider partitions into two groups to make the ideas of this section clear; it is straightforward to extend to an arbitrary number of groups μ. As A and B are themselves Hamiltonians, we may apply the Lie-Trotter formula with respect to them to give  (20) to approximate e −iHt . On the other hand, we can recursively apply (10) to e −iAt/n and e −iBt/n to obtain The limits may be taken outside and in any order, which yields the recursed Lie-Trotter formula Compared to (10), there are now three parameters n, α, β in (21) which reduce the error of the truncated product approximation as they are increased.
for some 1<ℓ=m; then, grouping the large Hamiltonians in A and the remaining Hamiltonians in B, it follows that we may want to take α>β as to reduce the overall error, while keeping β relatively small to reduce the overall cost. We will shortly derive divide and conquer simulation algorithms based on splitting formulas which will take three parameters k, k A , k B specifying the order of each formula. Thus we may use a high-order splitting formula for A and a low-order splitting formula (and also larger time slices) for B, without compromising the error and such that the overall cost is reduced. Recall that methods which do not distinguish between A and B would spend the considerably more effort required for the simulation of A in the simulation of B as well.
We remark that generalizing (21) to more than two groups of Hamiltonians gives a Trotter step parameter α i for each group. Alternatively, this formula could be recursed deeper by further decomposing A and B into subgroups of Hamiltonians and again applying (10). (2), (3), and U=e −iHt . Assume the H j have been partitioned into μ=O(1) disjoint groups, where we denote by A 1 , K, A μ the sums of the Hamiltonians in the respective groups. We are not concerned with how the partitioning is done at this point. As we will see later, the partitioning can be done adaptively and follows from general cost estimates. In practice small values of μ will suffice and we will see an example in section 5.

Consider a Hamiltonian as in
Let . Suppose we divide the simulation time t into intervals  D = Î t t n n , ; we will select n later. Applying a high-order splitting formula of order + k 2 1with respect to this partition yields the operator is given in (12). Then, if we have algorithms t ( ) i j in the right-hand side above, we can substitute them into (22) and obtain the approximation as an ordered product of exponentials A The precise ordering of the product is obtained from the particular choice of the splitting formula of order + k 2 1;see [36,37]. For example, using the Strang formula, for k=1 we have In principle, any available method may be used to implement the approximationsŨ A j , with the possibility of using different subroutines for different j.
We bound the overall error by We refer to - U as the first-step error and second-step error, respectively. Clearly, if both error terms are e ( ) O , then so is the overall error -  U U .
The first-step error depends only on the splitting formula used at the first step, and is independent of the subroutines used to simulate each group at the second step. We have , , , e , , , , 26 , , , is the error of S 2k over a single time slice. Following the approach of [13] (see (14)) for the simulation of a sum of μ-many Hamiltonians, we define the quantity Observe that the final time slice may be smaller than Δt. With this in mind, for simplicity we . The total simulation cost is the number of time slices n times the cost per time slice. The latter is Therefore the total simulation cost is We remark that the results of [13] are important for our analysis. Specifically, they show how the simulation cost depends on the largest and second largest Hamiltonian norms, which translates here to a dependence on   A 1 and   A 2 .

Algorithms
We give two algorithms using recursive splitting formulas, and derive worst-case error and cost bounds.

Consider again Hamiltonians of the form
, where m is large. Recall that one of our goals is to reduce the dependence of the simulation cost on m; particularly for problems where there is a substantial difference between the largest and smallest Hamiltonian norms, and where the number of Hamiltonians with relatively small norm is significant.
Consider (2), (3), and the partitioning H=A+B given in (19). The two algorithms we present are based on figure 1. Algorithm 1 is a special case of algorithm 2. The difference between them is that algorithm 1 uses k=1, while algorithm 2 considers a general k in step 1. Even though this difference might appear insignificant, the analysis of algorithm 2 turns out to be much more complicated. Algorithm 1 is simpler to understand and implement. On the other hand, algorithm 2 is more general, offering one the possibility to reduce the number of exponentials by selecting k optimally, as we will see later.

Algorithm 1
The construction of algorithm 1 follows that of section 3.4 for the general case, applied to the partition H=A+B. At the first step, applying the Strang splitting formula, k=1, gives the operators where Δ t=t/n and we will define n below; see figure 1. For the second step, algorithm 1 approximates the operators e −iAΔ t/2 and -D e B t i using different high-order splitting formulas , respectively. This yields the overall approximationŨ of U=e −iHt which is defined by For splitting formulas, such a rescaling of the Hamiltonian norms is equivalent to a rescaling of the respective group simulation times, i.e., . Observe that the Hamiltonians in A and B are rescaled by different quantities, which in general leads to different simulation times for A and for B.
The time slice sizes for simulating Since we have effectively rescaled the simulation times by dividing by the respective largest Hamiltonian norms, we are actually subdividing an interval of size Clearly, the last of these subintervals in either case may have length less than 1/M A or 1/M B , respectively. In such a case, the length of the last subinterval is equal to , respectively. That is the reason why we have taken the floors of the exponents in the first factors of (32) and (33).
From (25), we have We consider each error term separately. The first error term in (34) is independent of our implementations ofŨ A andŨ B , and results only from the first-step Strang splitting and time slice size  D = Î t t n n , . From lemma 2 in the appendix, we have From (29) the cost of our algorithm is proportional to n, and therefore we would like to minimize this quantity. Setting the right-hand side of the equation above to e 2 we obtain For instance, when Now consider the second error term in (34).
, we have (see equation (28)) where the terms The quantity M A is defined by applying (14) to the simulation of A with time t n 2 and error at most e n 8 , to obtain is subdivided. When the ceiling function argument is at most one, no subdivision is necessary. Then it may be possible to reduce the cost further by decreasing k A . Now consider For the simulation of B for time Δ t=t/n and error at most e n 4 , we set M B (k B ) as in (14) to obtain is subdivided. We may now bound the total cost of our algorithm, i.e., bound the number N of exponentials of the form , that are used to constructŨ . From (29), we have , define the quantities and letŨ be defined by (31). Then the number N of exponentials for the approximation ofe Ht i byŨ with accuracy e is at most   (42) is minimized with respect to k A and k B by appropriately selecting optimal values * * k k , It is relatively straightforward to extend algorithm 1 to the case where H is partitioned into μ2 many groups H=A 1 +...+A μ . The overall approximationŨ of U=e −iHt follows from (22) and (23) with k=1 and becomes , , 43 is given in (24). The analysis is similar to the case μ=2 considered above. The main difference is that lemma 2 in the appendix no longer applies for bounding the first-step error (35), and we use [13, lemma 2] instead, which gives a similar a result. The rest of the analysis is the same as that in the proof of proposition 2. We summarize our results in the following theorem.  The proof follows from that of theorem 2 which we state in the next section and is found in the appendix.
Remark 5. The way the Hamiltonians are grouped will influence the upper bound (45). Ideally, the formation of the groups should minimize this upper bound. Roughly speaking, Hamiltonians of relatively large norm should be put in groups of relatively small cardinality.

Remark 6.
The bound for the number of exponentials shown in the previous theorem holds under general conditions and does not depend on how the partitioning of the Hamiltonians into groups is performed. Finding parameters that minimize equation (45) is a separate task, which is to be carried out on a classical computer.

Algorithm 2
Algorithm 2 generalizes algorithm 1 by applying an arbitrary splitting formulas at its first step instead of applying specifically the Strang splitting formula; see figure 1. The details and analysis of algorithm 2 are similar to, but more complicated than, those of algorithm 1. We state the main results here, and provide the proofs in the appendix. and letŨ be defined by (23). Then the number N of exponentials for the approximation ofe Ht i byŨ with accuracye is at most Remark 7. If = k 1, we get the cost bound of algorithm 1 (up to small constant factors). Note that in some cases, e.g. when e   D t is large, even though we could use = k 1, selecting a value > k 1 may yield n that is substantially smaller than that shown in (40) in proposition 2. , or  e   e D t 16 are violated, then we end up with an easier simulation problem as it can be seen in the proof of this proposition. Roughly speaking, it would imply that e is relatively large. So, in a way, these conditions specify the interesting case.

Remark 8. If any of the conditions
It is again relatively straightforward to extend algorithm 2 to the case where H is partitioned into μ2 many groups H=A 1 +...+A μ . The overall approximationŨ of U=e −iHt then becomes is constructed as in (23). We summarize the results for this case in the following theorem whose proof can be found in the appendix. and define the quantities Remarks 5 and 6 apply to this theorem as well. is used here for simplicity. For general groupings, the theorem still holds with the quantities   A 1 and   A 2 replaced by the largest and second largest group norms.

Selecting the order of the splitting formulas
For any partitioning of the Hamiltonians into μ groups we want to determine the order of the splitting formulas. The parameters k 1 , K, k μ allow splitting formulas of different orders to be used for the Hamiltonians in each group. The parameter k determines the order of the splitting formula used in the first algorithm step. Ideally we want to find the optimal parameters * * * ¼ m k k k , , , 1 that minimize the simulation cost bound (50), which takes the value * . This expression is complicated and to simplify matters we provide sharp upper bounds ¼ m , , max 1 max max to the optimal values. Proof. We have

Proposition 4. The simulation cost bound (50) of theorem 2 is minimized by integers
Let the functions ( ) ≔ ( ) g k n k 5 k and Note that k cancels out in the latter case so h j (k j ) is a univariate function. Consider minimizing g(·) and h j (·) independently. For g(k), setting its derivative to zero gives m e -=   k e A t 2 ln 25 3 ln 8 0, 2 2 which gives k (max) as in (52). Repeating this argument for h j (k j ) gives ( ) k j max as in (53). Since g(·) and h j (·) are log-convex functions [55], the values k (max) and ( ) k j max give the respective minima. Observe that we may rewrite the right-hand side of (50) as . Therefore, the minimizers * * * Hence, without loss of generality we may restrict to parameters kk (max) and  m = ¼ Remark 10. For each individual group, (53) shows that small cardinality and small Hamiltonian norms reduce the order of the splitting formula that suffices for its simulation.

Speedup
We illustrate our results by showing the speedup of our algorithms relative to those in [13] for a number of cases. Generally, our approach is preferable when there is a disparity in the Hamiltonian norms and many of them are very small. where k is the order of the splitting formula. Selecting * = k k as in (17)  for any δ>0. The explicit expressions for (55) and (56) are shown in (16) and (18), but here we use asymptotic notation to focus on the problem parameters and to ignore the constants.
For , and B equal to the sum of the remaining Hamiltonians, as in (19). The number of exponentials for algorithm 2 is shown in (46) in proposition 3. Assume that      A B and in addition assume Note that the left-hand side of the inequality above is an upper bound to   B , and the inequality relates that to the number of Hamiltonians forming A times the overall second largest Hamiltonian norm. This condition is easy to check in principle, and it holds in cases where the original Hamiltonians have quite disproportionate norms Clearly m m H m 1 and we select the parameter n of proposition 3 by The quantities * * * k k k , , , , A B denote the optimal cost bound of algorithm 2. The cost bound of proposition 3 satisfies For different values of the parameters we have the following speedups.
1. Comparison when all splitting formulas have the same order, i.e., k=k A =k B , k=O(1): The cost bound (46) has the same dependence on t and ε as that of (55), so when we divide the two cost bounds to obtain the speedup the parameters t and ε cancel out. where C is a constant, and hence the speedup over [13] (with the same k) is for all ε, t. Therefore, the algorithm in [13] is slower than the one in this paper by a factor proportional to a polynomial in m′/m, the degree of which is in the range [1, 2.5]. This is particularly important when ¢  m m.
2. Comparison to the cost of [13] with optimally chosen parameter: We use the previous case to derive a rough estimate. Observe that, for fixed k we have Thus, again considering k=k A =k B , k=O(1), we have which follows from (56) and (59). Therefore, for fixed   H t , 2 , and ε, the algorithm in [13] with optimally chosen parameters remains slower than algorithm 2 with arbitrary = = k k k A B . The speedup depends on a polynomial in m′/m, the degree of which is in the range [1,2]. Clearly, optimally selecting k, k A , and k B as in section 4.3 can only improve the speedup over [13].
3. Comparison among optimal methods, i.e., using the optimal splitting formulas in the respective cases: Assuming that all complexity parameters are fixed, with the exceptions of m and ¢ = ( ) m O m 5 6 , we have where * h is given by (46) for optimally chosen k, k A , and k B . The proof is given in appendix A.5.
In this sense we achieve a strong speedup over [13].
4. Comparison when a significant number of Hamiltonians have very small norm relative to   H 2 : Recall that we are interested in simulation problems where a significant number of the Hamiltonians H j are relatively small in norm, where existing simulation methods do not take advantage of this structure. We use two parameters 0ba<1 to describe the relationship of   B and   A . This approach has applications to problems such as the simulation of the electronic Hamiltonian, as we will see in the following section. Suppose  (1) above (where the speedup is independent of ε and t), using these assumptions in (59) we obtain Similarly, for the case of (60) with fixed   H t , 2 , and ε, using the new assumptions we obtain Therefore, selectingk such that the exponent of the denominator is positive yields a speedup the grows withm. In the next section we will use the parameters a and b to estimate the cost of our algorithms for practical instances of the electronic Hamiltonian.

Application to quantum chemistry
Solving difficult problems in quantum chemistry is viewed as a primary application of quantum computers. We apply our algorithms to simulate the electronic Hamiltonian, which describes molecular systems. Quantum algorithms for Hamiltonian simulation also have applications in the calculation of electronic energies, reaction rates, and other chemical properties [11,17,31,48,56]. Robust classical algorithms for this simulation exist (e.g. diagonalization), but in general they are intractable as their cost grows exponentially with the number of particles. Thus, large molecules are out of reach for classical computers [17]. On the other hand, quantum algorithms [17,48] can efficiently simulate the second-quantized form of the approximate Born-Oppenheimer electronic Hamiltonian (4). This Hamiltonian can be represented in the form = å = H H j m j 1 that we consider in this paper. There exist quantum algorithms simulating (4) with cost that exhibits a polynomial dependence on m. Unfortunately, the combination of the size of m and the degree of the polynomial would make the simulation cost prohibitive in cases of interest [17,23,41,45,46]. Hence, reducing the cost of Hamiltonian simulation will have a significant impact in chemistry.

Electronic Hamiltonian
Recall the Born-Oppenheimer approximate electronic Hamiltonian in second-quantized form (4), i.e., The quantities h pq and h pqrs are obtained by considering  single-particle basis functions (spin-orbitals) taken from a given family of such functions. Particularly, the h pq and h pqrs are one-electron and two-electron integrals, respectively, as defined in [17, section 3] through a set of  basis functions. The † a p and a p are the creation and annihilation operators for the pth orbital, which encode the fermionic exchange antisymmetry of the problem. The general Hamiltonian form is the same for all molecules. Therefore, the Hamiltonian of a particular molecule is defined by  and the h pq and h pqrs . Using the terminology of the earlier sections of the paper, the Hamiltonian above can be written in the form 4 , and H j are Hamiltonians obtained from the terms of (4) by combining adjoint pairs; see e.g. [17,49]. Thus, modulo constant factors, the norms   H j are given by the | | h pq and | | h pqrs . These quantities depend on molecular geometry and the chosen set of basis functions [42,43]. For basis functions that are spatially localized, which are called local basis sets, many of the | | h pq and | | h pqrs are small or very small relative to their largest magnitude [43,57,58]. We use this disparity to partition the Hamiltonians into groups for our algorithms.
For example, [49] considers the quantum simulation of the lithium hydride (LiH) molecule with different choices of basis sets. The authors of [49] consider Slater-type (STO-3G) [59] and triple-zeta (TZVP) [60] basis sets and in both cases they find that a substantial fraction of the H j have quite small norm. In table 2, we illustrate how one can partition the Hamiltonian using the h pq , h pqrs values shown in [49] to obtain H=A+B, where the Hamiltonian B is the sum of the terms for which the corresponding | | h pq and | | h pqrs are less than or equal to different 'cutoffs' and A is the sum of the remaining terms Clearly, different partitions lead to different bounds for the norm of each group, which will be reflected in the cost bounds of the algorithms as shown in theorems 1 and 2. Extending this idea to μ>2 groups is straightforward.
We digress for a moment to remark that in practical applications of quantum chemistry, the computational cost is often reduced by discarding terms of H that have 'negligible' norm relative to some cutoff, say 10 −10 [43,49,57], but this cannot be done in an ad hoc way. Recall that proposition 1 shows that we may possibly, depending on the particular problem instance, discard certain terms from H, subject to the relationship between the cutoff, e t, , and the number of terms -¢ ( ) m m below the cutoff. On the other hand, when the product of the cutoff with -¢ ( ) m m exceeds ε/t, we cannot arbitrarily discard -¢ ( ) m m terms, even though individually they may be tiny, because this could introduce truncation error that would exceed the desired simulation accuracy. This is also made particularly clear in the last three rows of table 2, where excluding the terms below the cutoff may introduce error exceeding any reasonable accuracy as evidenced by the respective estimates of   B .

Simulation cost
In chemistry problems the desired simulation accuracy is not arbitrarily small [28], while  can be quite large so that (64) adequately represents the system of interest [46]. Therefore, the important parameters affecting the simulation cost are the number of single-particle basis functions  , and the magnitudes of the h pq and h pqrs . In the second quantization, i.e., the occupation number representation, states are given by linear combinations of  -bit strings, where a 1/0 indicates which orbitals are occupied/unoccupied by electrons, respectively [42,43]. Thus H acts on  qubits. Each Hamiltonian H i in (64) can be represented by tensor Table 2. For simulating LiH with bond distance 1.63 Å, in [49] they approximate the Born-Oppenheimer electronic Hamiltonian in two ways. The first approximation uses a minimal basis set (STO-3G) and has m=231. The second approximation uses a more accurate basis set (TZVP) and has m=22155. In each case m is the number of nonzero h pq and h pqrs . The quantity ¢ m is the number of | | h pq and | | h pqrs that are larger than the different cutoff values. The quantities of the leftmost four columns in the first two rows are are taken from [49, section 3.2]. The quantities of the leftmost four columns in the remaining rows are estimated from [49, figure 10]. We consider H=A+B, where A is the sum of the ¢ m terms corresponding to | | | | h h , pq pqrs larger than the cutoff, and B is the sum of the remaining -¢ m m terms The norms of A and B are estimated using the triangle inequality, i.e., products of Pauli matrices through the Jordan-Wigner transformation, and can be simulated efficiently using  ( ) O standard quantum gates [17]. Alternatives to the Jordan-Wigner transformation have been proposed, such as the Bravyi-Kitaev transformation [50,61] which improves the gate count for simulating the individual . Hence, our cost bounds for the number of queries (exponentials) immediately translate to bounds for the total gate count through multiplication. Thus, using [50,61], modulo polylogarithmic factors, the total gate count is proportional to the number of queries. This is what we will consider for our algorithms.
Consider now the simulation of H using splitting formulas. For  spin-orbitals, the number of terms in (64) is  = Q( ) m 4 . Naively applying an order + k 2 1splitting formula (15) yields a number of queries (i.e., a number of exponentials) Thus, for arbitrary k the cost grows with  at least as  8 . In particular, if we use the Strang splitting formula (k=1), the number of queries is proportional to  10 . Hence, a straightforward application of splitting formulas yields a number of queries in the range   -8 1 0 , which clearly becomes impractical even for moderate  (e.g.,  = 100).
Improving this cost bound is critical for quantum computers to have an impact in quantum chemistry applications. A sequence of papers [27][28][29][45][46][47] describe the recent progress. They show both analytic and empirical results. Some of them perform gate-level optimizations across queries, and are thereby specific to the particular problem instance. In [45, table 1], the number of queries using the Strang splitting formula is shown to be proportional to  10 , which corresponds to the one that follows from [13] as shown above. It is also shown in [45, appendix B] that the number of queries can be reduced to become  9 , and it is conjectured that the proof leading to this reduction in the case k=1 could be extended to high-order splitting formulas (k>1). The paper also considers the implementation of the queries using the Jordan-Wigner transformation. Thus, the total gate count becomes proportional to  10 , but allowing parallel gate execution the circuit depth becomes proportional to  9 . Moreover, the authors of the paper carried out numerical tests of molecules from a random ensemble suggesting a number of queries proportional to  8 as shown in [45, table 1]. Gate-level optimizations on the entire circuit are considered in [47]. In particular, using the Jordan-Wigner transformation for implementing the queries, the authors of that paper conclude that their optimizations make the total gate count proportional to the total number of queries. Therefore, for the Strang formula as presented in [45], the total gate count is proportional to  9 , and allowing parallel execution in conjunction with gate-level optimization leads to a circuit with depth proportional to  7 . In [28,46], it was argued using empirical evidence that similar improvements on the number of queries are possible for certain restricted but commonly used basis function sets, and this may lead to a number of queries proportional to   -5.5 7 , while [27] reports even better empirical query estimates in the range   -5.5 6.5 . Finally, a recent paper [29] that uses the simulation method of [25] with different queries than the matrix exponentials used in splitting formulas, obtains a total gate count proportional to  8 , modulo polylogarithmic factors. Furthermore, in a special case they are able to obtain a total gate count proportional to  5 (up to polylogarithmic factors), under strong assumptions on the basis functions and the computation of the h pq , h pqrs and the resulting accuracy and cost. However, we point out that other authors consider the computation of these quantities to be 'complicated business' in general [43, section 9.9.5].
Further note that the possibility of using problem specific information in quantum chemistry (e.g., simulating different Hamiltonians for different amounts of time, or simulating them in a certain order) to improve the simulation cost was suggested in [27,28,45,47,49] without presenting an algorithm or a rigorous analysis exhibiting error and cost bounds. Our goal is obtain rigorous simulation cost improvements under fairly general conditions.

Divide and conquer simulation
Consider a set of local basis functions. In general, the number of non-negligible | | h pq and | | h pqrs is significantly less than  4 . In [43, section 9.12.2], it is argued that for sufficiently large molecules this number is of order  2 . In [46], the authors claim that this number can scale even as  using local basis functions; however, for practical problems scaling closer to  2 is expected. Also, [28] has found this number to be  , with 0ba. Thus, [43] suggests that a=1/2 and [28,46] suggest that a=1/4. Observe that our assumptions are consistent with the situation depicted in table 2.
Consider algorithm 2 with H=A+B. Assume t and ε are arbitrary but fixed, and let us study the simulation cost with respect to  . Even if we do not select the optimal values for k, k A , and k B , and we simply assume they are O(1), we obtain a simulation cost improvement. The quantities of proposition 3 become Using these quantities and (46) the number of queries for simulation is bounded from above by where c 1 , c 2 >0 are constants and e t, are fixed. Since 0ba1/2 the previous expression is bounded by a quantity proportional to Taking kk B yields that the simulation cost is proportional to Recall that by using the Bravyi-Kitaev transformation [50] for the terms of (64), the number of queries of our algorithms, modulo polylogarithmic factors, is proportional to the total gate count. We compare our results to those from the literature in table 3 (we have summarized this table in table 1

in introduction).
Remark 13. Using the estimates = a 1 2 and = a 1 4 for local basis functions from [28,43,46] table 3 shows that the number of queries of algorithm 2 scales as   - . This is consistent with the empirical results in [27,28].
, 0, the cost of algorithm 2 tends to  ( ) O 4 , which is a lower bound to the simulation cost since the input size is  Q( ) 4 . In contrast, a naive application of an order + k 2 1splitting formula without partitioning the Hamiltonian would still have cost proportional to  + k 8 2 .
Our algorithms take advantage of problem structure in terms of the Hamiltonian norms, without relying on other domain-specific information or implementation-level assumptions. As part of future work, it would be interesting to study how gate-level optimizations and other information specific to chemistry could further improve the performance of our algorithms. Furthermore, partitioning the Hamiltonian into μ>2 groups may lead to further cost improvements in applications such as those in chemistry.
We conclude by pointing out that the advantages of our approach extend to problems beyond chemistry. Our algorithms take advantage of the problem structure without relying on heavy assumptions, and are as simple to implement as standard splitting formulas, but can lead to significantly lower cost. Just like splitting formulas, our algorithms succeed deterministically and therefore they can be used as subroutines that can be called numerous times in other quantum algorithms without this affecting the overall success probability. The Table 3. Comparison of empirical and analytic cost bounds with respect to the number  of single-particle basis functions for the simulation of the electronic Hamiltonian. The top half of the table are estimates taken from the literature, ignoring any polylogarithmic factors. The bottom half of the table is the estimated scaling for algorithm 2, where the parameters a and b have been estimated; see the text for details. We give a range for the cost dependence in cases where it varies with some of the algorithm parameters, or when the cost is obtained empirically. All cost estimates refer to the number of queries, except in the case of [29] which does not use splitting formula and presents the total gate count. For all estimates concerning queries, the transition from queries to gate counts involves a multiplication by a  ( ) O log factor in the most favorable case.

Appendix
In this appendix we give several formal results necessary for the analysis of our algorithms. We give the full details of algorithm 2, and give some slightly improved results for the usual application of splitting formulas.
A.1. Hamiltonians that do not affect the accuracy can be ignored Proof of proposition 1. Using the variation-of-constants formula [62] for any vector v we have  In particular, for α=β=1 the usual Lie-Trotter formula (10) is reproduced.
Moreover, we may also take limits with respect to α and β, and in any order, i.e., , , e .  to which we may apply the Lie-Trotter formula with respect to n to yield = ---+ + -- This expression holds for arbitrary but fixed α and β. Now suppose we take a  ¥ while keeping n and β fixed. Then we have  [39] provides error bounds for the Trotter formula, the Strang formula, and other high-order splitting formulas. We will build on the analysis of [45] to derive a useful bound for the error of the Strang splitting formula. We note that the original analysis of [45] contains a small error, which has been corrected in [34, appendix A]. We also use results from [39].
Throughout the paper we make use of the inequality (e.g., [39, lemma 1]) for a, b elements of a Banach operator algebra and  Î n . Also recall the identity for the commutator operator Lemma 2 (Strang splitting formula error). Let  Î > n t , 0, and D = t t n : . Let A B , be Hermitian matrices and = + H A B. Then . Finally, ,, e e e e , which follows from (66) with unitary a and b. ,

A.4. Algorithm 2 details
We give the details of algorithm 2, which generalizes algorithm 1 by first applying a splitting formula of order + k 2 1;see figure 1. We apply the results of [13], which achieves improved bounds to the simulation error and cost by rescaling the Hamiltonians to have norm at most 1. Note that such rescalings are equivalent to rescalings of the simulation time. Indeed, for Hamiltonians A, B, H=A+B, and ℓ>0 we have , where the definition of S 2k is given in (12).
Proof of proposition 3. Recall the preliminary analysis given in section 3.4. Consider the Hamiltonian H as in (2) and (3) . We have Ht , and the quantity M>1 is sufficiently large and will be defined shortly. Also define . Thus the (algorithm first-step) time slice size is -  ( ) M C 1 , and the number of (first- denote the integer and fractional parts of Recall it is equivalent to simulate Unwinding the recurrence (12) defining S 2k for two Hamiltonians X and Y and  t Î yields [12,13] where K=5 k−1 and each z ℓ is defined according to the recursive scheme of (12), ℓ=1, K, K. In particular, each z ℓ is given as product of k−1 factors as =    2 and this yields Hence, applying the above to (67) we get Error and cost. Using (67) and (68), we bound the overall error by The first term in the right-hand side corresponds to the error of a splitting formula at the first step of the algorithm, where we pretend that exponentials are given to us exactly, and the second term corresponds to the error in the second step of the algorithm, i.e., the error introduced by splitting formulas approximating e −iAτ and e −iBτ .
As explained in section 3.4, to guarantee  e -   U U 2 we set the quantity M as in (27), which gives To apply [13], theorem 1 for H=A+B and accuracy e 2, the condition of that theorem becomes  e   ( ) et D 16 , 71 which implies M1. Hence, the number of S k 2 comprising  U in (67) is at most where   ⌈ ⌉ M C t gives the number of time intervals at the first step. The interesting case is    M C t 1, for which it suffices to assume    C t 1 (otherwise, as explained in the analysis of algorithm 1, we would be dealing with an easy problem). Then the above quantity may be further bounded by In order to obtain estimates to N A and N B we turn to the second-step error, where we require Note that the argument of this ceiling function may be greater than or less than one, depending on the problem instance and algorithm parameters. In the latter case, the time intervals of length ℓ z M 2 do not require any subdivision at all.
We now consider  ( ) ℓ U z M which approximately simulates   B C for time z ℓ /M, and proceed similarly. to give error at most Observe that the factors of M and   C have again canceled, and M B is of the same form as M A . To apply [13, theorem 2]  Clearly this inequality is valid by replacing  n by any n such that   n n. , Proof of theorem 2 As the analysis is similar to that of proposition 3, we give only the important parts. Recall the preliminary analysis given in section 3.4. Consider a Hamiltonian as in (2) and (3), partitioned into μ groups H=A 1 +...+A μ as in section 3.4, labeled such that    m       A A A ... 1 2 . We approximate = -U e Ht withŨ given in (23), i.e., where in the statement of the theorem we have assumed  m e   t A 2 . Note that we do not require  Î n ; however, as evident from (72)  We remark that setting k=1 above reproduces the results of theorem 1 up to small constants.

A.5. Proof of (61)
Proof. Consider all problem parameters to be fixed except ¢ m m , . Observe that (46) contains two maximum functions, and hence we have four cases to consider with respect to the relative magnitudes of n(k), n A (k, k A ), and n B (k, k B ). Recall n(k) is given in (58). Here we estimate the maximum function by the sum of its arguments to get    where we have used asymptotic notation in the equality above and in the rest of this section to focus on how the problem parameters affect the number of queries. The quantities under the square roots are derived as in [13]. Next observe that from (57)