Quantum bootstrapping via compressed quantum Hamiltonian learning

A major problem facing the development of quantum computers or large scale quantum simulators is that general methods for characterizing and controlling are intractable. We provide a new approach to this problem that uses small quantum simulators to efficiently characterize and learn control models for larger devices. Our protocol achieves this by using Bayesian inference in concert with Lieb–Robinson bounds and interactive quantum learning methods to achieve compressed simulations for characterization. We also show that the Lieb–Robinson velocity is epistemic for our protocol, meaning that information propagates at a rate that depends on the uncertainty in the system Hamiltonian. We illustrate the efficiency of our bootstrapping protocol by showing numerically that an 8 qubit Ising model simulator can be used to calibrate and control a 50 qubit Ising simulator while using only about 750 kilobits of experimental data. Finally, we provide upper bounds for the Fisher information that show that the number of experiments needed to characterize a system rapidly diverges as the duration of the experiments used in the characterization shrinks, which motivates the use of methods such as ours that do not require short evolution times.


Introduction
Rapid progress has been made within the last few years towards building computationally useful quantum simulators or computers, which promise to revolutionize the ways in which we solve problems in chemistry and material science, data analysis and cryptography [1][2][3][4][5]. Despite this, looming challenges involving calibrating and debugging quantum devices suggests another possible application for a small scale quantum computer: designing a larger quantum computer. This application is increasingly relevant as experiments push towards building fault-tolerant devices [6] and demonstrating large scale verifiable quantum computing protocols [7]. This task can be quite challenging classically. Simply characterizing the Hamiltonian dynamics of the system via tomography is inefficient [8] and existing efficient methods such as [9] require an amount of data that scales polynomially with the error tolerance, are not known to be error robust, and are only efficient for specific classes of Hamiltonians and measurements. This can render them impractical for problems like designing controls for quantum systems where exacting error tolerance and low fault sensitivity is required. Other methods are error robust and use a logarithmic amount of data, but also require performing quantum simulations that are intractable classically [10,11]. The use of quantum simulators has been proposed as a solution to this problem [12,13] but such schemes do not provide a means for characterizing and controlling a large quantum system because they require a simulator that is at least as large as the system being characterized. Other schemes have been introduced that allow a small quantum system to efficiently certify that a large quantum system implements a set of quantum gates or prepares a given state [7,[14][15][16]. These schemes are inspired by multi-prover systems and output a certificate that states whether the errors in the gates are above or below a threshold. Such protocols are rigorous, require very weak assumptions about the errors in the larger system and allow the verifier to use a rudimentary quantum device. Unfortunately, they can also be computationally expensive and are difficult to apply to the case of certifying analog simulation. In particular, existing certification schemes that use small quantum verifiers do not characterize the larger system's Hamiltonian dynamics.
We provide a framework in this paper for overcoming the aforementioned obstacles to quantum device charactization. The idea behind our approach is to use a small simulator as a point of reference to compare the dynamics of the larger system against. We achieve this by using an interactive protocol wherein the dynamics of subsystems of the larger device are measured against the dynamics of the smaller system; thereby allowing us to infer a model for the larger system using the data collected about the relative dynamics of the two models on the individual subsystems. We call this process compressed quantum Hamiltonian learning (cQHL).
It is often insufficient to give a model of a system relative to an experimental device for which no mathematical model is known. To this end, we consider starting with a small quantum simulator for which a firm mathematical model is known. This system, which we call the trusted simulator [12,13] is the key to allowing our method to provide an absolute, rather than a relative, model for the larger system's quantum dynamics.
Compressed Hamiltonian learning in turn leads to the ability to learn models of control dynamics, rather than simply the internal dynamics intrinsic to the untrusted device under study. This inferred model for control dynamics then allows cQHL to be performed again, with the previously-untrusted system now being used as the trusted simulator. That is, cQHL directly enables quantum bootstrapping, the process of iteratively building larger trusted simulators out of smaller trusted simulators.
To summarize, we provide two distinct applications: Compressed QHL: Learning a Hamiltonian model for a large quantum system using a small quantum simulator.
Quantum bootstrapping: Designing and calibrating controls for a quantum system with rapidly decaying interactions using a smaller quantum simulator.
These applications are efficient if the unknown Hamiltonian belongs to an efficiently parameterized class of model Hamiltonians for which the interaction strengths between subsystems decay rapidly with distance, experiments are chosen such that the resultant distribution of likelihoods is far from uniform and that the control maps in the bootstrapping case are well conditioned. Efficiency also requires that the resampling algorithms used in the methods do not introduce substantial error in the system. This assumption empirically holds for non-degenerate learning problems, such as those we consider here and in prior work [12,13].
The remainder of the paper is laid out as follows. We first provide a review of Bayesian methods for Hamiltonian characterization and quantum Hamiltonian learning (QHL) in section 2. We then discuss cQHL in section 3 and quantum bootstrapping in section 4. We then present in section 5 numerical results for bootstrapping and compressed quantum Hamiltonian in an important special case, characterizing and controlling 50 qubit Ising model simulators before concluding.

Review of Bayesian characterization and QHL
In developing cQHL, and hence quantum bootstrapping, we will use Bayesian particle filters as a subroutine. Here, we briefly review these methods in a classical context, as well as with the inclusion of quantum resources.

Bayesian characterization and sequential Monte Carlo
Bayesian methods have been used in a wide range of quantum information applications and experiments; for instance, to discriminate [17] or estimate states [18,19], to incorporate models of noisy measurements [20], to characterize drifting frequencies [21], and to estimate Hamiltonians [10,[22][23][24]. They are particularly wellsuited for quantum information, owing to their generality, robustness and the ease with which prior information can be incorporated into the algorithm. Moveover, Bayesian approaches have been shown to allow for near-optimal Hamiltonian learning in simple analytically tractable cases [10,25].
Bayes' theorem provides the proper way to re-assess, or update, prior beliefs about the Hamiltonian for a system given an experimental outcome and a distribution describing prior beliefs. In particular, where | H Pr( data) is called the posterior distribution, H Pr( ) is the prior distribution that encodes our initial beliefs about H and where |H Pr(data ) is the likelihood function, which computes the probability that the observed data would occur if the Hamiltonian H correctly modeled the system. The likelihood function can be estimated by sampling from a quantum simulator for the Hamiltonian H and thus Bayesian inference causes Hamiltonian learning to reduce to a Hamiltonian simulation problem [10,12,13,25].
Once the posterior distribution is found, an estimate of the Hamiltonian, Ĥ , is given by the expectation over the posterior This integral is unlikely to be analytically tractable in practice, as it requires integrating the likelihood function |H Pr(data ) over H. Monte Carlo integration, on the other hand, can be much more practical.
The sequential Monte Carlo algorithm (also known as a particle filter) provides a means of sampling from an inaccessible distribution using a transition kernel from some initial distribution [26]. We can sample from the posterior by using Bayes' rule as the SMC transition kernel, given samples from a prior distribution and evaluations of the likelihood function. Integrals over the posterior can then be approximated by using these samples, which allows quantities such as Ĥ to be efficiently estimated. SMC has seen use in a range of quantum information tasks, including state estimation [19], frequency and Hamiltonian learning [10], benchmarking quantum operations [27], and in characterizing superconducting device environments [11]. Similar methods have also been applied to quantum error correction [28].
Hamiltonians are not usually represented explicitly as matrices when using SMC algorithms and are instead parameterized by a vector x of model parameters such that = x H H( ). This representation allows for parameter reduction with prior information and can include effects outside of a purely quantum formalism, such as control distortions or stochastic fluctuations in measurement visibility. It also has the advantage that Hamiltonian learning is possible even in cases where matrix representations of individual terms in the Hamiltonian are not formally known.
Concretely, the SMC algorithm approximates prior and posterior distributions by weighted sums of deltafunctions such that the current state of knowledge can be tracked online using a classical computer to record a list of particles, each corresponding to a hypothesis x i , and having a relative weights w { } i . These weights are then updated by substituting the SMC approximation (3) into Bayes' rule (1) to obtain where d is an observation from the experimental system. Over time, the particle weights for the majority of the particles will diminish as the SMC algorithm becomes more confident that certain hypotheses are wrong. This reduces the total effective number of particles in the posterior distribution and ultimately prevents learning. This issue is addressed by using a resampling algorithm, which draws a new set of uniformly weighted SMC particles that approximately maintain the mean and covariance matrix of the posterior distribution [29].
Although the remainder of the process of learning using the SMC process is rigorous and well understood, error estimates are not known for the resampling step. Furthermore, resampling methods such as the Liu and West algorithm [29] have known pathologies where they can fail if they are provided with a multi-modal distribution or if an improbable sequence of measurements leads to a resampling step that causes the SMC particle cloud to have no support over the true model. These issues are often addressed by varying the parameters used in the resampling algorithm, majority voting on the identity of the true model over multiple runs of SMC or adjusting the guess heuristic used to choose experiments. Moreover, these shortcomings are often heralded by effective sample size criteria built into SMC software [30], such that a more appropriate resampler or set of resampling parameters can be chosen. In spite of these theoretical shortcomings, resampling methods work exceptionally well in practice for problems in Hamiltonian learning [10][11][12][13], machine learning [31], computer vision [32], and artificial intelligence [26,33].

Quantum Hamiltonian learning
QHL builds upon SMC by introducing weak simulation, in which the experimentalist has access to a 'black box' that produces data according to an input hypothesis x. By repeatedly sampling this black box for each SMC hypothesis, the likelihood can be inferred from the frequencies of data output by the black-box simulator [34].
QHL is therefore a classical Bayesian inference algorithm that uses a fast quantum method for estimating the likelihood function via quantum simulation [12]. This augmented procedure is robust to errors in the likelihood function introduced by finite sampling of the black box and to approximation errors in the Hamiltonians used [13]. This latter property will be of particular importance in the development of cQHL, as it allows us to use a truncation of the complete system as an approximate simulator.
The simulators used in QHL can take many forms: they could be special purpose analog simulators such as an ion trap that implements a family of transverse Ising models [35]. On the other hand, the quantum simulation could be implemented by using a quantum computer to run a digital simulation algorithm that is capable of efficiently simulating a wide array of Hamiltonian models [36] (such as d-sparse row-computable Hamiltonians). We refer to all such devices as quantum simulators to reflect the fact that they need not be a fault-tolerant quantum computer. In our work we also require that the simulator be able to accept its initial state as input from another quantum system, but there are simpler QHL methods that do not need to be run in this fashion.
The simplest experimental design proposed for QHL is quantum likelihood evaluation (QLE), in which the experimenter prepares a state ψ | 〉 on the untrusted system, evolve under the 'true' Hamiltonian x H ( ) 0 for some time t, and then measures ψ ψ ψ ψ | 〉〈 | −| 〉〈 |  { , } on the trusted simulator. This experiment is then repeated for each SMC hypothesis x i until the variance in the estimated likelihood becomes sufficiently small. The experiment design is illustrated in figure 1. QLE can be effective for learning Hamiltonians, although it suffers from the fact that the evolution times used by the experiments must be small for most Hamiltonians. In the case of QLE, long evolution times for typical Hamiltonians (such as Gaussian random Hamiltonians [37,38]) produce a distribution that is very close to uniform over measurement outcomes, such that experiments provide an exponentially small amount of information about the parameters.
The tendancy of quantum systems to rapidly equilibrate also causes the update step in the SMC algorithm to become unstable [13]. Here by unstable we mean that small perturbations in the estimated likelihoods result in large deviations in the posterior distribution. This can be combatted by using short experiments, which lead to uncertainties about the true model (after one update) that scale at least as . As a result, short-time evolutions necessitate processing exponentially more data than would be required if long experiments could be performed. Moreover, given that the time required for state preparation is independent of t, it is clear that the total experimental time used to learn the true model will be prohibitively large in such cases. Thus the ability to use long experiments can lead to substantial improvements for Hamiltonian learning.
To use the long evolution times requisite for expedient high-accuracy characterization the system of interest can be coupled to the simulator using SWAP gates, as shown in figure 1. This experiment design, interactive quantum likelihood evaluation (IQLE), uses the simulator to approximately invert the forward evolution under the unknown system, such that the measurement is approximately described by x H ( ). Such experiments also reduce the norm of the effective system Hamiltonian, which typically allows the system to evolve for much longer before the quantum probability distribution becomes flat. These SWAP gates need not be perfect, as the learning protocol is known to be robust to such errors [13]. We further discuss the effects of faulty SWAP gates on bootstrapping in appendix D. In cases where a SWAP gate cannot be implemented, such as when the trusted resource is implemented using an incompatible modality, non-interactive QHL can be used to perform the first iteration of bootstrapping, such that SWAP gates are available in all further iterations. That is, we can use QLE to initialize the bootstrapping procedure, and can proceed using IQLE. In order to combat the exponentially diminishing likelihood of the system returning to its initial state after the inversion, we require that∥ [12]. We use the particle guess heuristic (PGH) to achieve this. The PGH involves drawing two hypotheses about H, − x and ′ − x , from the prior distribution and then is an estimate of the uncertainty in the Hamiltonian, we expect that at most a constant fraction of the prior distribution will satisfy x H ( )is linear and the prior distribution has converged to a unimodal distribution centered about the true Hamiltonian. The heuristic therefore seldom leads to experiments for for most x. Moreover, since the PGH relies only on the current SMC approximation to the posterior, the heuristic incurs no additional simulation costs. Rather, the PGH provides adaptivity by depending on the current state of knowledge about the state of the quantum system through the particles − x and ′ − x . Experiment design via the particle guess heuristic has been shown to lead to efficient estimation of Hamiltonians using IQLE [12], and has since been usefully applied in other experimental contexts [11].
Previous work has analyzed the complexity of learning using IQLE [12]. In cases where the error in the characterized Hamiltonian scales as γ − e N exp and N samp samples are used to estimate the likelihood function, the protocol requires samp particles simulations to learn the vector of Hamiltonian parameters to within error ϵ as measured by the 2-norm. In practice, the decay constant γ depends on the number of parameters used to describe H and the properties of the experiments used. It does not directly depend on the dimension of H [12,13]. The updating procedure used to combine these results is further known to be stable provided that the likelihoods of the observed experimental outcomes are not exponentially small for the majority of the SMC particles [12]. This occurs for well posed learning problems that use two outcome experiments.

Compressed QHL
Information locality is what enables cQHL and in turn quantum bootstrapping. This idea is made concrete via Lieb-Robinson bounds, which show that an analog of special relativity exists for local observables evolving under Hamiltonians that have rapidly decaying interactions [9,[39][40][41]. Lieb-Robinson bounds give an effective 'light cone', as illustrated in figure 2, in which the evolution of an observable A can be accurately simulated without needing to consider any subsystem outside of the light cone. Specifically, they imply that a local observable A(t) provides at most an exponentially small amount of information about subsystems that are further than distance st away from the support of , where s is the Lieb-Robinson velocity for the system and t is the evolution time. Here, s is analogous to the speed of light, and only depends on the geometry and strengths of the interactions in the system [39][40][41]. Thus, if st is bounded above by a constant and the initial support of A is small then the support of A(t) is at most a constant. This shows that the dynamics of A(t) can be simulated using a small quantum device, provided st is sufficiently small. cQHL exploits this intuition by evolving an initial state under the untrusted quantum simulator, swapping the quantum state of a subsystem of the larger (uncharacterized) system into a quantum simulator, and then approximately inverting the evolution by guessing the Hamiltonian dynamics and simulating their inverse. It then measures the simulator to determine whether the inversion yielded the initial state or a state in its orthogonal compliment. One step of this process is illustrated in figure 2.
The inversion process used in interactive QLE not only leads to more informative experiments, but we will show in section 3.2 that generalizing to include repeated applications of swaps and inverse simulations also delay the rate at which the light cone propagates from the observable. This in turn allows much longer evolution times to be used without the observable stretching beyond the confines of the trusted simulator.
In particular, this swapping procedure leads to characteristic Lieb-Robinson velocities that shrink as the experimentalist learns more about the system. That is, the light cone represents an 'epistemic' speed of light in the coupled systems that arises from the speed of information propagation depending more strongly on the uncertainty in the Hamiltonian than the Hamiltonian itself. Since the effective speed of light slows as more information is learned, long evolutions can be used when the uncertainty is small. This removes a major restriction of the method of Da Silva et al [9] since the variance of any unbiased estimator of the Hamiltonian parameters diverges as Ω − t ( ) 2 for Hamiltonian learning methods that use short time experiments (see appendix A for more details). To analyse the error introduced by compressed simulation, we consider learning a Hamiltonian H by measurement of an observable A supported on a sites, using a simulator with support on > w a sites. We then expand H into those terms H in which we can access with our simulator, the terms H out supported only outside of the simulator entirely, and the interaction H int between these two partitions. We then further break down H int into those terms ⋂ H A int which have non-trivial action between sites in A and the neglected sites, and those terms ⧹ H A int which act upon sites in the simulator, but not the observable. That is, we expand H as The decomposition of the interaction Hamiltonian H int into couplings that include and exclude A is illustrated in figure 3. cQHL neglects the terms included in ⋂ H A int when processing data collected from the system of interest. If H exhibits a finite Lieb-Robinson velocity, then we can bound the error introduced by this approximation. Moreover, we will show that by using interactivity as a resource, we can reduce this error as we become more certain about the dynamics of the untrusted system. We discuss these two considerations in more detail below.

Commuting Hamiltonians
It is helpful, however, to first build intuition by considering the special case that all terms in the unknown system's Hamiltonian are local and mutually commute. This is true, for instance, in the Ising models (22) that we consider in numeric examples. In this case, the compressed interactive likelihood evaluation experiment described in figure 1 is particularly simple to analyze.
If we work in the Heisenberg picture then it is easy to see from the assumption that the Hamiltonian terms commute with each other (but not necessarily A) that . This implies that where A t ( )is the simulated observable within the trusted simulator. Using Hadamard's Lemma and the triangle inequality to bound the truncation error ∥ − ∥ A t A t ( ) ( ) , we obtain that If the objective is to have error at most δ in the compressed simulation then it suffices to choose experiments with evolution time at most If the sum of the magnitudes of the interaction terms that are neglected in the simulation is a constant then (8) shows that t scales at most linearly in δ as δ → 0. This is potentially problematic because short experiments can provide much less information than longer experiments so it may be desirable to increase the size of the trusted simulator as δ shrinks to reduce the experimental time needed to bootstrap the system. QHL is robust to δ [12,13] and δ ≈ 0.01 often suffices for the inference procedure to proceed without noticeable degradation.
then infinite-time simulations are possible for commuting models (such as Ising models) because no truncation error is incurred. Non-trivial cases for QHL only occur in commuting models with long range interactions.

Non-commuting Hamiltonians
If the Hamiltonian contains non-commuting terms then the factorization of − e Ht i used in (7) no longer holds. This is because unlike in commuting models. Such dynamics can also lead to observables A(t) that rapibly obtain non-negligible support near the boundary of the trusted simulator. The trusted system will not tend to simulate these evolutions accurately because significant interactions exist between A(t) and the neglected portion of the system. This means a more careful argument will be needed to show that bootstrapping will also be successful here. H H 1 then r swaps of the two registers will not cause the A(t) to have substantial support on the boundary of the trusted simulator at any step in the protocol.
If a large value of r is chosen, then the system effectively evolves under = We expect that the dynamics of A will therefore be dictated by the properties of Λ for short evolutions. We make this intuition precise by showing in appendix B that the error from using a small trusted simulator obeys for cases of nearest-neighbor or exponentially decaying interactions between subsystems. H in , H int , H out and related terms are explained in (5) and the surrounding text. Here s is the Lieb-Robinson velocity for evolutions under Λ and μ is related to the rate at which interactions decay with the graph distance between subsystems. It is worth noting that (10) can be improved by using higher order Trotter-Suzuki formulas in place of the basic Trotter formula to reduce r [42] and also by using tighter Lieb-Robinson bounds for cases with nearest-neighbor Hamiltonians.
The variable Λ is related through the PGH to the uncertainty in the Hamiltonian, which implies that the speed of information propagation is also a function of the uncertainty in H [39,40]. That is, longer evolutions can be taken as H becomes known with ever greater certainty. This means that the Lieb-Robinson velocity does not pose a fundamental restriction on the evolution times permitted because → s 0 as Λ → 0. Of course, the error term ∥ ∥ ⋂ H t 2 A int in (10) places a limitation on the evolution time but that term can be suppressed exponentially by increasing the diameter of the set of qubits in the trusted simulator for systems with interactions that decay at least exponentially with distance. Thus the roadblocks facing cQHL can be addressed at modest cost by using our strategy of repeatedly swapping the subsystems in the trusted and untrusted devices.
As an example, if we assume (a) that the interactions are between qubits on a line (b) that − w a is chosen such that μ < − st w a 8 then in the limit as → ∞ r suffices to guarantee simulation error of δ. This result is qualitatively similar to (8) and requires that| − | w a scales at most logarithmically with the total evolution time desired.

Scanning
The previous methods provide a method for characterizing a subsystem of the global Hamiltonian. These results cannot be used directly to learn the full system Hamiltonian because the trusted simulator lacks couplings present in the full system. Instead the Hamiltonian must be inferred by patching together the results of many Hamiltonian inference steps. This process can be thought of as a scanning procedure wherein an observable is moved across the set of qubits collecting information about the couplings that strongly influence it. The scanning procedure is illustrated in figure 4.
In order to properly update the information about the system, we modify sequential Monte Carlo to use two particle clouds. The first is the global cloud, which keeps track of the prior distribution over all the parameters in the Hamiltonian model. The second is the local cloud, which keeps track of all of the parameters needed for the current cQHL experiment. The global cloud is constrained such that the weights of each particle in the cloud is constant (i.e. the probability density is represented by the density of particles rather than their weight), whereas the local cloud is not constrained in this fashion. This constraint on the global cloud is needed because resampling does not in general preserve the indices of each particle, so that there is no way to sensibly identify a global particle that corresponds to a particle in the local posterior.
Instead, by copying a subset of global parameters into the local cloud, our modified particle filter approximates the prior by a product distribution between the local and remaining parameters. Resampling the local posterior then makes this approximation again, ensuring that the local weights are uniform. Thus, we can copy the (newly resampled) local cloud into the global cloud, overwriting the corresponding parameters. Once the local cloud is merged back into the global cloud in this way, we begin the next step in the scan by selecting a different set of parameters for the local cloud, and continuing with the next cQHL experiment.
We implement this scanning procedure in our numerical experiments by using a local observable centered as far left on the spin chain as possible. We then infer the Hamiltonian for this location using a fixed number of experiments, swap the Hamiltonian parameters from each of the SMC particles to the global cloud and then move the observable one site to the right. This process is repeated until the observable has scanned over the entire chain of qubits, and then we begin again by scanning over the first a 2 qubits in reverse, where a is the width of the observable. We do this to reduce the systematic bias that emerges from the fact that Hamiltonian parameters associated with couplings learned earlier in the procedure will have greater uncertainty.  assume that the error within each scan decays as γ − e N exp scan then the number of experiments needed to make the combined error in the inferred vector of Hamiltonian parameters, x, at most ϵ it suffices to use a number of experiments that scales as

Complexity of cQHL
Here we have used the fact that if there are n scans and the error in the Hamiltonian parameters is ϵ n then the error in the reconstructed Hamiltonian parameters after n scans is at most ϵ ϵ × = n n . This bound is again pessimistic as in practice the information learned in subsequent scans actually reduces, rather than increases, the error in parameters learned from previous scans.
The number of calls to our trusted simulator needed to update the weights of the particles in the SMC cloud is In cases with non-commuting Hamiltonians, error is also incurred by using a non-infinite value of r. If we also demand that the contribution from this source of error is δ O ( ) then it is straight forward to see from (10) that the necessary value of r scales at most as in int in 2 The above relations set the complexity (as measured by the number of experiments and the number of swaps) of Hamiltonian characterization, but the space requirements also are problem dependent in cases that have non-commuting Hamiltonians. If we assume that all interactions between arbitrary qubits x and y decay at least as ν − e x y dist ( , ) , then the distance between A and the neglected part of the Hamiltonian ⧹ H A int can be chosen as { } is the number of sites on which A is supported, s is the Lieb-Robinson velocity for evolution under Λ, and where μ is the exponential clustering parameter used in (10).
From the PGH, we have that st is asymptotically constant and thus both the number of qubits and the number of experiments scale logarithmically with the desired accuracy. The number of inversions used r scales polynomially with t, which scales as ϵ O (1 ) [12] and thus the number of swaps scales polynomially with the desired error tolerance. This scaling can be made to approach the Heisenberg limit of ϵ O (1 ) scaling by replacing the Trotter formula used in the inversion step with increasingly high-order Trotter-Suzuki formulas as ϵ shrinks [42].
The above analysis can also be extended to systems with polynomial decay of interactions. In such cases, the cQHL algorithm is less efficient because the logarithmic scaling with ϵ 1 is replaced with polynomial scaling with ϵ 1 in most instances of polynomially-decaying interactions. Even in such cases, however, we expect the sequential Monte Carlo algorithm to remain robust to simulation errors such as truncation. Thus, we expect that cQHL can offer advantages when the Lieb-Robinson velocity of an untrusted system under inversion by a hypothesis is finite.
In some cases, such as dipolar coupling, existing Lieb-Robinson bounds diverge [39]. This does not imply, however, the lack of a finite information propagation velocity however; indeed, it is worth noting that finite speeds of information propagation are expected theoretically and observed experimentally for finite systems with dipolar couplings [43]. Moreover, in the presence of disorder, experimental evidence suggests that the information propagation velocity can be dramatically reduced [44].
An important remaining issue is the scaling of δ and the number of particles. We know from prior work that Bayesian inference is highly tolerant of errors in the likelihood function [13] and that δ ≈ 0.01 typically suffices. Furthermore, The number of particles in the SMC cloud, N part , is a slowly increasing function of the number of model parameters for the Hamiltonian and does not explicitly depend on the Hilbert space dimension [45]. In the numerical experiments we have performed in this, and prior work, we observe that N part tends to scale roughly logarithmically with the number of model parameters [10,12,13]. This means that none of the above issues present a fundamental obstacle for cQHL.

Quantum bootstrapping
We will now turn our attention to quantum bootstrapping, which is an application wherein cQHL is used to infer control maps for uncharacterized devices. Control maps relate control settings of a device to its system Hamiltonian. Learning these maps is of particular importance if cross-talk or defects cause different parts of the system to respond differently to the same controls. In such cases, Hamiltonian characterization is a necessary part of the control design and calibration process.
To concretely show how QHL can address this challenge, we consider a model in which a row-vector of control settingsC is related to the system Hamiltonian by an affine map C H ( ),  This yields a vector of Hamiltonian parameters that describes each control term H k . If we then imagine the matrix G such that

then a model for C
H ( )is given by C G T , which allows the effect of control on the quantum system of interest to be predicted.
Nonlinear controls can be learned in a similar fashion by locally approximating the control function with a piecewise-linear function.
We complete our description of quantum bootstrapping by detailing how control learning can be used to calibrate an initially untrusted device. If C H ( )is an affine map then this can be accomplished using the following approach: 2 . This means that an arbitrary Hamiltonian formed from a linear combination of the  j can be implemented. If ≠ H 0 0 , then this process is less straightforward. It can be solved by applying a pseudoinverse to find a control vector C a b ( , )that produces   + a b 1 2 , but such controls will be specific to a and b. A simple way to construct a general control sequence is to use Trotter-Suzuki formulas to approximate the dynamics in terms of  = C H ( ) Higher-order Trotter-Suzuki methods can be used to reduce the value of R if desired [42]. Errors accrue as the bootstrapping procedure progresses. However, since the error shrinks exponentially with the number of experiments for well-posed learning problems, the number of experiments needed per recursion will often scale linearly with the total number of intermediate untrusted devices needed to reach the final system of interest (see appendix D).

Error propagation in bootstrapping
Let G be the control map that is inferred via the inversion method discussed above and let  + G be the actual control map that the system performs. If we measure the error in a single step of bootstrapping to be the operator norm of the difference between the bootstrapped and the target Hamiltonians then we have that the control error for the system after bootstrapping obeys where + G is the pseudo-inverse of G. Equation (18) shows that the error after a single step is a multiple of the norm of the control Hamiltonian that depends not only on the error in the cQHL algorithm but also on which measures the invertibility of the control map. Since the error is a multiplicative factor, it should not come as a surprise that the error after L bootstrapping steps grows at worst exponentially with L. In particular, the bootstrapping error is at most ) over all the L bootstrapping steps, κ max is the maximum condition number for G,  ∥ ∥ max and∥ ∥ + G max are the maximum values for the error operator and the pseudoinverse of G over all L steps. The proof of (19) is a straight forward application of the triangle inequality and is provided in appendix D.
Given that the error tolerance in the bootstrapping procedure is Δ ⩽ 1, G is invertible and that w, a and t are chosen such that  ∥ ∥ ⩽ γ − e N max exp (i.e. a constant fraction of a bit is learned per experiment) it is easy to see that This process is efficient provided γ is at most polynomially small. If G is not invertible, then the error cannot generally be made less than Δ for all Δ > 0. Although, if G is not invertible then the system is not fully controllable and so the task of calibrating a simulator will seldom be possible in cases where∥ − ∥≈ +  GG 0 irrespective of the method used to control it.
It is difficult to say in general when the conditions underlying (20) will be met, as it is always possible for experiments to be chosen that provide virtually no information about the system. For example, the observable could be chosen to commute with the dynamics so that no information can be learned from the measurement statistics. Great experimental care must be taken in order to ensure that such pathological cases do not emerge [13]. These pathological experiments can be avoided for Ising models with exponential decaying interactions and we expect exponential decay of  ∥ ∥ max to be common for a wide range of models that also include noise and non-commuting terms based on previous studies [10,12,13].
The bootstrapped simulator also need not have as many controls as the simulator that is used to certify it. This does not necessarily mean that the controls in the bootstrapped device are less rich than that of the trusted simulator. If we assume, for example, that a general Ising simulator is used to bootstrap an Ising simulator with only nearest neighbor couplings (and universal single qubit control) then more general couplings terms can be simulated using two body interactions. For example, next-nearest neighbor interactions can be simulated using nearest neighbor couplings and single qubit control via: where the middle qubit in ϕ | 〉 is set to| 〉 0 . Higher-order and parallel methods also exist for engineering such interactions [46,47].

Numerical results
In order to show that the cQHL and bootstrapping algorithms are scalable to large systems, we provide numerical evidence that 50 qubit Ising Hamiltonians with exponentially decaying interactions can be learned using an 8 qubit simulator. We further observe that only a few kilobits of experimental data are needed to infer an accurate model and that the observable, A, that is used for the inference only needs to be supported on a small number of qubits. Finally, we apply the cQHL algorithm to use the 8 qubit simulator to bootstrap a 50 qubit quantum simulator from an initially uncalibrated device with crosstalk on the controls. The bootstrapping procedure reduces the calibration errors in a 50 qubit simulator by two orders of magnitude using roughly 750 kilobits of experimental data. This calibrated 50 qubit simulator could then be used to bootstrap an even larger quantum device.
We perform numerical simulations using the open-source QInfer, SciPy and fht libraries [30,48,49]. All numerical results simulate using IQLE (figure 1) with compressed simulation on an 8 qubit register.

Compressed quantum Hamiltonian learning
Since quantum devices capable of implementing our bootstrapping protocol are not currently available, we examine systems that can be simulated efficiently using classical computers in order to demonstrate that our algorithm applies to large systems. Thus, we focus on the example of an Ising model on a linear chain of qubits, with exponentially decaying interactions . In all cases, the observable used is = + +| ⨂ A ( ) a for = a {2, 4, 6}, as this observable is maximally informative for Ising models. For more general models, a pseudorandom input state and observable can be used instead [13]. Figure 5 shows that a compressed quantum simulator using only 8 quantum bits is capable of learning a Hamiltonian model for a system with 50 qubits. The errors, as measured by the operator norm of difference between the actual Hamiltonian and the inferred Hamiltonian, are typically on the order of 10 −2 after as few as 300 experiments per scan where 49 scans are used in total. This is especially impressive after noting that this constitutes roughly 750 kilobits of data and that this error of 10 −2 is spread over 1225 terms. The data also shows evidence of exponential decay of the error, which is expected from prior studies [12,13].
An important difference between this result and existing QHL schemes [12,13] is that the observable will need to be, in some cases, substantially smaller than the simulator. Choosing a small observable is potentially problematic because it becomes more likely that an erroneous outcome will be indistinguishable from the initial state. Also, if a is too small then important long-range couplings can be overlooked because their effect becomes hard to distinguish from local interactions. We find in table 1 that the cases where a = 4 and a = 6 are virtually indistinguishable whereas the median errors are substantially larger for a = 2, but not substantially worse than a = 4 for 200 experiments per scan. This provides evidence that small a can suffice for Hamiltonian learning.

Quantum bootstrapping
The next set of results show that cQHL can be used to bootstrap a quantum simulator for a 50 qubit 1D Ising model. The bootstrapping problem that we consider can be thought of as correcting crosstalk in the large simulator. This crosstalk manifests itself when the experimentalist attempts to turn on only one of the Ising . We also take = H 0 0 . Figure 6 reveals that our bootstrapping procedure reduces control errors by two orders of magnitude in cases where 300 experiments per scan are used in the QHL step. Further reductions could be achieved by increasing the number of experiments per scan, but at 300 scans much of the error arises from∥ − ∥ ≠ +  GG 0 so a richer set of controls in the 50 qubit system would be needed to substantially reduce the residual control errors. The errors are sufficiently small, however, that it is reasonable that the device could be used as a trusted simulator for nearest-neighbor Ising models. This means that it could be subsequently used to bootstrap another quantum simulator.

Scaling with n
All of the examples considered so far examine cQHL for 50 qubits. Although the fact that the protocol scales successfully up to 50 qubits already provides strong evidence for its scalability, we provide further evidence here that the errors in cQHL do not rapidly vary as a function of the number of qubits in the untrusted system, n. We see in figure 7 that the error, as measured by the median L 2 distance between the inferred model and the true model, is a slowly increasing function of n. The data is consistent with a linear scaling in n, although the data does not preclude other scalings. This suggests that the error in cQHL does not rapidly increase for the class of Hamiltonians considered here and provides evidence that examples with far more than 50 qubits are not outside the realm of possibility for cQHL.

Conclusions
We show that small quantum simulators can be used to characterize and calibrate larger devices, thus providing a way to bootstrap to capabilities beyond what can be implemented classically. In particular, we provide a cQHL algorithm that can infer Hamiltonians for systems with local or rapidly decaying interactions. The compressed algorithm is feasible because of the fact that local observables remain confined to light cones. Typically these light cones spread at a velocity that is dictated by the Hamiltonian. By contrast, in cQHL, light cones spread at a speed that depends on the uncertainty in the Hamiltonian in cQHL. This not only allows more informative experiments to be chosen but also shows that an epistemic speed of light can exist in systems that interact with an intelligent agent.
We then show that this algorithm provides the tools necessary to bootstrap a quantum system; wherein a small simulator to learn controls that correct Hamiltonian errors and uncertainties present in a larger quantum device. This protocol is useful, for instance, in calibrating control designs to deal with cross-talk, uncertainties in coupling strengths and other effects that cause the controls to act differently on the quantum system than the designed behavior.
Our approaches, being based on QHL, inherit the same noise and sample error robustness observed in that algorithm [12,13]. We have provided numerical evidence that our techniques apply to systems with as many as 50 qubits, can further tolerate low precision observables, and are surprisingly efficient. Thus, quantum bootstrapping provides a potentially scalable technique for application in even large quantum devices, and in experimentally-reasonable contexts. Our work therefore provides a critical resource for building practical quantum information processing devices and computationally useful quantum simulators.
There are several natural extensions to our work. While we have focused on the case of time-independent quantum controls and Hamiltonians, our approaches can be generalized to the time dependent case using more general Lieb-Robinson bounds [50]. This is significant because techniques such as the method of Da Silva et al [9] do not apply for H(t). Additionally, it would be very interesting to see if quantum simulation can be used to allow local optimization to design even better experiments than those yielded by the PGH. The introduction of cost effective local optimization strategies may lead to significant advantages for bootstrapping systems with non-commuting Hamiltonians.
As a final remark, our work provides an important step towards a practical general method for calibrating and controlling large quantum devices, by utilizing epistemic light cones to compress the simulation, thus enabling the application of small quantum devices as a resource. In doing so, our approach also provides a platform for building tractable solutions to more complicated design problems through the application of quantum simulation algorithms and characterization techniques. We then have that Upon substituting back into (A2), this yields  Third, as illustrated in figure 3, there are two types of interaction terms: interactions between the neglected particles and those in the support of A and interactions between neglected qubits and those not in the support of A. The Hamiltonians composed of only these interactions are denoted This justifies the claim in the main body about polynomial scaling and shows that increasing w to increases the maximum value of t allowable in the experiment design step.

Appendix D. Bounds for errors in bootstrapping
To begin let us consider the error incurred by trying to find a control sequence that produces a Hamiltonian  k on an initially untrusted quantum device. If the inferred control map is G 1 and the actual control map is  + G 1 1 then the error in the implemented Hamiltonian, after one bootstrapping step, is Now let us consider the error incurred after bootstrapping L times. Or in other words, consider the error that arises from using a trusted simulator that was calibrated via − L 1 steps of bootstrapping. If we define G j and  j to be the control maps and error operators that arise after j steps (where each  j is the error with respect to the 'trusted simulator' calibrated via − j 1 bootstrapping steps) then the error is The result in (19) then follows from the fact that + ⩽ x (1 ) e x for all ∈  x . Note that this bound is expected to be quite pessimistic for bootstrapping in general. The analysis makes liberal use of the triangle inequality and uses worst case estimates on top of that. Additionally, the user in the bootstrapping protocol has some knowledge of the error from the fact that − +  G G j can be computed for these problems since the matrices are of polynomial size. We avoid including this knowledge in the argument since the user does not necessarily know what  j is and hence it is conceivable in extremely rare cases that the errors from the approximate inversion could counteract the errors in the Hamiltonian inference. A more specialized argument may be useful for predicting better bounds for the error in specific applications.
Finally, if the swap gates also have miscalibration errors of Δ then there is a maximum value of r that can be used before the contributions of such errors become dominant. A simple inductive argument shows that suffices to guarantee that such errors sum to at most δ. This shows that the protocol is only modestly sensitive to such errors and that if quantum bootstrapping is used to calibrate the swap gates then it is reasonable to expect that δ can often be made sufficiently small using a logarithmic number of experiments.