Classical and quantum cost of measurement strategies for quantum-enhanced auxiliary field quantum Monte Carlo

Quantum-enhanced auxiliary field quantum Monte Carlo (QC-AFQMC) uses output from a quantum computer to increase the accuracy of its classical counterpart. The algorithm requires the estimation of overlaps between walker states and a trial wavefunction prepared on the quantum computer. We study the applicability of this algorithm in terms of the number of measurements required from the quantum computer and the classical costs of post-processing those measurements. We compare the classical post-processing costs of state-of-the-art measurement schemes using classical shadows to determine the overlaps and argue that the overall post-processing cost stemming from overlap estimations scales like O(N9) per walker throughout the algorithm. With further numerical simulations, we compare the variance behavior of the classical shadows when randomizing over different ensembles, e.g. Cliffords and (particle-number restricted) matchgates beyond their respective bounds, and uncover the existence of covariances between overlap estimations of the AFQMC walkers at different imaginary time steps. Moreover, we include analyses of how the error in the overlap estimation propagates into the AFQMC energy and discuss its scaling when increasing the system size.


I. INTRODUCTION
More accurate and faster solutions to the electronic structure problem are needed across various industryrelevant use cases, such as designing new drugs, developing better cathode materials and finding more efficient synthetic pathways of fertilizers.Quantum computing is believed to provide an avenue for more accurate simulations of chemistry and materials science [1][2][3].
Quantum algorithms and hardware can be classified under two umbrella terms: noisy intermediate-scale quantum (NISQ) [4] and fault-tolerant quantum computing (FTQC).The latter indicates a true paradigm shift with rigorously proven quantum advantages and promising avenues for industrial applications [5,6].In contrast, in the current NISQ era, proven quantum advantages for applications are not available.However, with the FTQC era still many years away, it is worth investigating the usefulness of NISQ algorithms for applications in industry.Due to present errors on NISQ-era computers, NISQ algorithms, such as the Variational Quantum Eigensolver (VQE) [7], are limited to shallow circuits acting on a small number of qubits.This restricts the application of those algorithms to small molecules with only tens of spin orbitals, in turn limiting the range of applicability of NISQ algorithms in industry.
A second major challenge of applying NISQ algorithms to industrial applications is the extraction of observables from the quantum computer through quantum measurements.As quantum measurements are probabilistic, numerous samples are required to estimate expectation values accurately.Recently, there have been many developments in improving the measurement bottleneck using, for example, randomized measurements [8] such as classical shadows [9][10][11][12][13][14].
In the present work, we investigate a NISQ algorithm called quantum-enhanced auxiliary field quantum Monte Carlo (QC-AFQMC) [15].This method aims to enhance the computational accuracy of classical auxiliary field quantum Monte Carlo (AFQMC) [16,17] by utilizing a quantum trial state wavefunction generated on a NISQera computer [15].A key aspect of this quantum algorithm is the estimation of overlaps between the walker in AFQMC, which are non-orthogonal Slater determinants, and the trial wavefunction.Different schemes for evaluating the overlaps have been recently investigating.For example, in Ref. [18], a general Bayesian inference method has been introduced to reduce the variance of the energy computed via AFQMC with finite number of measurements.
Here, we discuss the prevalent measurement challenge in NISQ algorithms in the context of estimating the overlaps in QC-AFQMC and investigate the scaling of the required measurements to obtain error bounds of the error in the final AFQMC energy estimates when increasing the system size.Additionally, we investigate the classical arXiv:2312.09872v2[quant-ph] 20 Mar 2024 post-processing cost to use the measurement outcomes to determine the overlaps.
The paper is organized as follows.In Sec.II, we provide an overview of the QC-AFQMC algorithm.In Sec.III, we discuss how to prepare a trial wavefunction using the variational quantum eigensolver (VQE) on the quantum computer.In Sec.IV, we highlight the steps of the algorithm where estimating overlaps between walker states and the trial wavefunction are required.In Sec.V, we discuss the various techniques for estimating overlaps when using a quantum computer.In Sec.VI, we discuss the statistical behavior associated with the overlap estimation using the different measurement schemes.In Sec.VII, we introduce an error model, which allows us to perform numerical simulations of QC-AFQMC under a finite number of measurements, and estimate the ground state energy of small molecular systems.Finally, in Sec.VIII we discuss the quantum and classical computational costs of the algorithm.

II. QUANTUM ENHANCED AUXILIARY
FIELD QUANTUM MONTE CARLO (QC-AFQMC) In this section, we briefly introduce the Auxiliary-Field Quantum Monte Carlo (AFQMC) technique and its quantum counterpart QC-AFQMC and refer the reader to the literature [17,19] for more details.
We consider the electronic Hamiltonian in the second quantization where a † p and a q are fermionic creation and annihilation operators of orbitals p and q, respectively.The indices p, q, r, s run over the set [N ] = {1, . . ., N } where N is the total number of spin orbitals.The terms h pq and v pqrs are the matrix elements of the one-body H 1 and two-body interactions H 2 of H, respectively.Here, we omit the spin indices for simplicity.
AFQMC relies on decomposing the two-body interactions H 2 in terms of sum of squares of one-body operators v γ such that the Hamiltonian H becomes with v 0 = H 1 and v γ = i pq L γ pq a † p a q .The N γ matrices L γ pq are obtained via a Cholesky decomposition of the two-body matrix elements v pqrs via v pqrs = Nγ γ=1 L γ pr L γ qs .The ground state wavefunction of H is obtained from the imaginary propagation of an initial state where ∆τ is the imaginary time step, and E 0 is the unknown ground-state energy, which can be estimated adaptively as the calculation progresses.Here we assume that ⟨Ψ 0 |Ψ I ⟩ ̸ = 0 and we choose |Ψ I ⟩ to be a Slater determinant.
To find an efficient implementation of the exponential in Eq. ( 3), a Hubbard-Stratonovich transformation [17,20] is used where p(x) is the standard normal distribution and x ∈ R Nγ are the auxiliary fields.The effect of this transformation is to map the actual interaction to an integral over one-body evolution operators.In this way, the state of the system at imaginary time step n of the propagation can be described by an ensemble of N w Slater determinants { |ϕ z ⟩ , z = 1, . . ., N w ; n = 1, . . ., N ∆τ } (N ∆τ being the total number of imaginary time steps) with weights {W n,z } and phases {e iθn,z }.The ensemble of Slater determinants is generally referred to as random walkers.
In the so-called free-projection AFQMC [17], the walker states, their weights, and phases are updated according to where the operator contains only one-body terms.The overlaps ⟨Ψ T |ϕ z ⟩ are computed by using a trial state |Ψ T ⟩.Ref. [21] has shown that quantum computers enable us to explore trial wavefunctions |Ψ T ⟩ beyond what is classically tractable, enhancing the performance of classical AFQMC calculations.With this approach, a 16-qubit experiment has been performed [15].
Different forms of the Hubbard-Stratonovich transformation exist, and they can improve the performance of AFQMC [17].One of them is called mean-field subtraction and introduces a shift v γ → v γ − v γ in the one body operators v γ in the Hamiltonian in Eq. ( 2) where the constant v γ are expectation values on the trial state This reduces fluctuations and minimizes systematic errors during the free-projection calculation [22,23].
During the AFQMC propagation, the walker ensemble allows us to access observables of the system.For example, the total energy is computed as where E loc (ϕ defined as the mixed expectation value of the Hamiltonian with the trial state |Ψ T ⟩. The energy estimator in Eq. ( 9) often experiences an exponential increase in its variance due to the fermionic sign problem [24].The phaseless approximation was introduced in Ref. [24] to prevent this exponential growth.Following Ref. [25], the projection is accomplished by changing the updating rule of the walkers to where the components of vector x n,z , called the force bias, are defined as The function I ph is defined as where I, called the importance function, is O n,z is the overlap ratio and the phase θ n,z is given by

III. GENERATION OF QUANTUM TRIAL WAVEFUNCTIONS
In this work, we use a variational quantum eigensolver (VQE) [7,26,27] to obtain approximate ground states; subsequently used as trial wavefunctions in AFQMC.Several VQE ansätze have been proposed to approximate ground states of molecular systems, such as the chemically inspired unitary coupled-cluster ansatz [28], hardware-efficient ansätze [29] or the qubit coupled cluster ansatz [30].One major challenge in achieving accurate ground state energies is the high depth of those ansätze in the presence of errors on the quantum device [31].Another challenge is ensuring the correct particle number and spin symmetries in the ansatz.The quantum-number preserving (QNP) ansatz introduced in Ref. [32] provides a low-depth circuit which preserves the particle number, the number of alpha and beta electrons, as well as the total spin squared, S 2 .In this work, we implement the QNP ansatz within the framework of PySCF [33], OpenFermion [34], and cirq [35].To optimize the variational parameters, we use L-BFGS-B [36] while employing finite differences to compute the first-order gradients.

IV. OVERLAP CALCULATIONS IN QC-AFQMC
To utilize the output from a quantum computer as a trial wavefunction within AFQMC, overlap calculations between non-orthogonal Slater determinants |ϕ (n) z ⟩ and the state present on the quantum computer |Ψ T ⟩ must be performed at various stages of the algorithm, as highlighted in Fig. 1.In the practical implementation of the algorithm [25], the evolution is split into N B -many blocks where in each block the walkers are evolved by N ∆τ time steps ∆τ .Thus, the walkers are propagated N B N ∆τ times.
During the propagation of the walkers, Eq. ( 16) must be evaluated for every walker to calculate its importance function.This requires N w N B N ∆τ overlap calculations throughout the AFQMC run.
Another instance of recurring overlap calculations is observed in the computation of the force bias, as defined in Eq. (13).The matrix element of the N γ one-body Cholesky vectors between the trial wavefunction and the walker state must be calculated to determine the force bias.As highlighted in the Appendix of [21], the force bias can be rewritten as where ϕ q p denotes a singly excited walker state with an excitation from orbital p to orbital q.We omit here the indices n and z for the projection step and the walker, respectively.The calculation then requires the estimation of the overlaps Ψ T ϕ q p on the quantum computer, while the Slater-Condon rule can be used to efficiently calculate ϕ q p L γ pq ϕ on the classical computer.Hence, the force bias calculation requires the quantum computation of O(N 2 ) overlaps for each walker, or O(N 2 N w N B N ∆τ ) overlaps in total.
A third instance of overlap calculation arises in the most important quantity of AFQMC, the local energy, as defined in Eq. (10).Decomposing the local energy into its one-and two-body terms yields where ϕ q p and ϕ rs pq denote singly and doubly excited walker states respectively.Again, the overlaps Ψ T ϕ q p and Ψ T ϕ rs pq can be calculated using the quantum computer.In contrast, the matrix elements ϕ rs pq H ϕ and ϕ q p H ϕ can be efficiently calculated on a classical computer using the Slater-Condon rule and the fact that the Hamiltonian H is a two-body operator.Without considering any exploration or compression techniques, the twobody part of the Hamiltonian consists of O(N 4 ) terms, which in turn requires an equal number of overlap calculations.Using factorization techniques, such as double factorization or similar methods [37][38][39][40][41], reduces the scaling to O(N 3 ).In contrast to the overlap ratio and force bias, the local energy is calculated only periodically.Although this reduces the number of overlap calculations, it remains the most computationally demanding quantity for larger systems due to its O(N 3 ) scaling.In total, the calculation of the local energy would require O(N w N 3 N B ) overlap calculations in the entire algorithm.
We want to point out that determining the mean field shift, Eq. ( 8), also requires access to the trial wavefunction.Consequently, this quantity must be computed using the quantum state, too.However, unlike other quantities, this calculation cannot be efficiently translated to overlap calculations.Furthermore, it is worth mentioning that this quantity only needs to be computed once throughout the entire AFQMC run.In cases where the quantum state is generated via VQE, this quantity has already been calculated.Otherwise, if the state was generated differently and one would like to avoid measuring this expectation value, one could use a classical approximation to the trial wavefunction.
In the subsequent sections, we will discuss various approaches for computing the necessary overlaps using a quantum computer.

V. MEASUREMENT METHODS
This section reviews the quantum measurement schemes employed in this work to obtain estimates of the overlap terms ⟨Ψ T |ϕ z ⟩, including the Hadamard test in Sec.V A and the classical shadows algorithm in Sec.V B, especially its variants: N -qubit Clifford shadows in Sec.V B 1, matchgate shadows in Sec.V B 2, and orbital rotation shadows in Sec.V B 3. Readers familiar with the topic can proceed to the next section.The main takeaway is that different measurement methods exhibit varying guaranteed upper bounds to sampling complexity and post-processing costs when obtaining the desired quantities from the quantum device.In Sec.VI, we then report statistics of overlap estimates obtained from numerical simulations, while in Sec.VIII, we focus on examining the concrete scaling behavior of these methods.

A. Hadamard Test
The Hadamard test [42][43][44] is a quantum algorithm that allows estimating expectation values ⟨Ψ|U |Ψ⟩ directly from a quantum processor with sampling complexity O(1/ϵ 2 ) where ϵ is the additive error.The underlying idea is to encode the numerical value of ⟨Ψ|U |Ψ⟩ directly into the probability amplitude of an auxiliary qubit.To implement the Hadamard test, one initializes an auxiliary qubit in |+⟩ and applies a controlled U gate between the auxiliary qubit and the target state |Ψ⟩.When measuring the auxiliary qubit in the X basis, we obtain the real part Re( ⟨Ψ|U |Ψ⟩) as the expectation value by associating outcomes |0⟩ and |1⟩ as +1 and −1, respectively.The imaginary part Im( ⟨Ψ|U |Ψ⟩) can be estimated with the same procedure, but the auxiliary qubit being initialized to 1/ √ 2(|0⟩ − i |1⟩).This approach can be adapted to estimate overlaps between two quantum states [45], see Fig. 2, allowing its application in QC-AFQMC where we are interested in the overlap ⟨Ψ T |ϕ z ⟩.However, the Hadamard test necessitates access to controlled versions of state preparation unitaries |Ψ T ⟩ = U ΨT |0⟩ and |ϕ z ⟩ = U ϕz |0⟩ resulting in high circuit depths.Due to this condition, the Hadamard test is impractical in the current NISQ era.More resource-conscious modifications of the Hadamard test are studied in [46][47][48].
Although the sampling complexity of the Hadamard test does not increase with the system size, it will still demand significant computing time on the quantum device as the circuits need to be executed for each walker at each time step.

B. Classical Shadows
Classical shadows, a scheme introduced in [9] based on the shadow tomography proposal [49], can predict many properties of a quantum system from very few measurements.Since its publication, various flavors [12][13][14] and applications to hybrid quantum-classical algorithms have been developed [10,50,51].QC-AFQMC provides a natural application of the classical shadow framework when one has only limited quantum resources because the algorithm requires the calculation of many observables of the same state in the form of overlaps of the quantum trial wavefunction and classically computed walker states ⟨Ψ T |ϕ z ⟩.In the following, we summarize the general workflow of the classical shadow method as introduced in [9].
• Step 1: Generate n s copies of the N -qubit quantum state ρ, of which we want to estimate M properties or observables • Step 2: Measure each copy in a randomly rotated basis.In practice, this is equivalent to applying a randomly sampled unitary from a tomographically complete ensemble U ∼ U and then measuring in the computational basis to gather n s bit strings | b⟩ ∈ {0, 1} N .
• Step 3: Generate snapshots and eventually the classical shadows from the measurement statistics by uncomputing | b⟩.When the ensemble U is chosen carefully, the mean estimate of the measurements is a quantum channel For each measurement outcome | b⟩, one obtains an unbiased estimator, the classical snapshot ρ of the quantum state ρ by inverting this channel The specific form of inverse channel M −1 is dependent on the choice of the distribution U.The collection of all snapshots forms the classical shadows.
• Step 4: Predict the desired property ⟨O i ⟩ from the classical shadows.Generally, it is advised to use the median-of-means method to confine the estimator's variance [9].Thus, one divides the set of shadows into K equally-sized parts and compute the ρ-estimator per subset ρ(k Note that the last step is not necessary if the statistics of the estimator follow a Gaussian distribution, as is the case in this work (see Fig. 10).Ref. [9] provides rigorous statistical bounds to the shadow estimates.The variance of general shadows scales as where the shadow norm is defined as

N -qubit Clifford Shadows
If we choose the ensemble U to be the set of all Nqubit Clifford operations, the inverse of the measurement channel becomes (24) and thus, the variance can be bounded with A numerical scheme to sample efficiently from U was devised in [52], and made available through Qiskit [53].
The overlap estimation can be formulated in terms of Hermitian observables as illustrated in Ref. [15].This is achieved by incorporating a state orthogonal to the two states in question.Instead of the direct VQE output, one constructs on the quantum computer.Then, using two projectors the overlap is estimated as The upper bound of the variance of the estimate in the Clifford case is given by Eq. ( 25), giving rise to a highly efficient sampling complexity of O(log(M )/ϵ 2 ) for O = P + + iP − , where M is the number of observables and ϵ is the desired additive error.Hence, in the context of QC-AFQMC, overlaps of all walkers at all time steps can be estimated from a small set of quantum measurements of |τ ⟩.However, evaluating Eq. ( 29) gives rise to an exponentially costly classical post-processing step due to the different bases of the Clifford shadows and the walker states, which are non-orthogonal Slater determinants.This issue was later addressed by Ref. [13], which proposes constructing shadows using matchgate tensors.

Matchgate Shadows
To circumvent the exponential post-processing step of Clifford shadows, a different form of shadows, known as matchgate shadows, was proposed [13].
Instead of sampling from Clifford circuits, the distribution used for the shadows is matchgate circuits.Matchgate circuits are described by fermionic Gaussian unitaries (FGU) U (Q) determined by 2N × 2N matrices from the real orthogonal group Q ∈ O(2N ), such that where γ µ are Majorana operators, γ 2j−1 = a j + a † j and γ 2j = −i(a j − a † j ).Matchgate circuits represent FGUs in quantum gates after the Jordan-Wigner transformation, which do not in general preserve the number of fermions η.This method efficiently calculates overlaps between a pure state, such as the trial wavefunction |Ψ T ⟩, and arbitrary Slater determinants, such as the walker states in QC-AFQMC.
Similar to the Clifford circuit case, to calculate overlaps with Slater determinants, one prepares the state |τ ⟩, see Eq. ( 26), on the quantum device.
To estimate the overlap between the trial wavefunction and a walker, one first calculates the coefficients c ℓ in front of the monomial z ℓ in the polynomial where α η,N = i η/2 /2 N −η/2 , Pf(A) is the Pfaffian of a real-valued and anti-symmetric matrix A = −A T , C i is the covariance matrix of the computational basis state i, W is a basis change from the set [13], Q encodes the Slater determinant, Q is the matchgate rotation for a given snapshot and The Pfaffian is a polynomial with degree of at most N −η/2; thus, calculating the coefficients of q(z) requires O((N − η/2) 4 ) time on a classical computer using polynomial interpolation [13].The coefficients are then used to calculate the unbiased estimate o (j) (j ∈ [n s ] with n s number of shadows) of the overlap with a Slater determinant using The final estimate of the overlap ⟨ϕ|Ψ T ⟩ is given by averaging the unbiased estimators ⟨ϕ|Ψ T ⟩ ≈ j o (j) /n s or by median-of-means estimate [13].
The variance of this estimation is bounded by Eq. (C2) and scales with O( √ N log N ), which means that the number of samples required scales polynomially with system size.Thus, this method is both efficient in the number of samples and in its classical post-processing step [13].
Either the starting state |Ψ T ⟩ or the observables being measured must be even, in that they are in the span of the product space of operators with an even number of Majorana operators.These hold in most practical cases.And in the case of QC-AFQMC, if this condition is not fulfilled, one can transform the problem to satisfy it, as shown in Appendix A of [13].

Orbital rotation shadows
Shadow estimation schemes [12,14] twirling over the particle number restricted ensemble of matchgates have been proposed.The latest [14] recovers efficient estimation independent of N , only scaling in the number of occupied fermionic modes η, also referred to as the particle number of the trial state.When restricting FGUs to preserve particle number symmetry, one obtains the ensemble of single-particle basis rotations or orbital rotations U (u) which are defined by restricting the following transformations of the fermionic creation and annihilation operators where u ∈ C N ×N is a unitary matrix defining the change of basis of the fermionic modes, with their representation U (u) ∈ C 2 N ×2 N in Hilbert space.We refer to the submatrix restricted to the k-particle subspace as In this notation, it is convenient to use expectation values of k-particle reduced density matrices (k-RDM) defined by for any quantum state ρ, where ⃗ p (⃗ q) are bitstring representations of the p,q,(r,s) indices from Sec. II for general k-body excitations in normal ordering and index the column (row) of the k-RDM.More explicitly, ⃗ p, ⃗ q ∈ S N,k , where (35) p i (q i ) refer to the bit of i-th index of the bitstring ⃗ p(⃗ q).As the evaluation of general k-RDMs is possible from these measurements we describe the following procedure for general k, the basis state overlaps can be calculated from the special case of the (k = η)-RDM as discussed below.
The single-shot estimator for the expectation value of a given k-RDM for a measurement string b in the computational basis under the basis transformation generated by u is given by [14] where the basis states ⟨⃗ p| and |⃗ q⟩ are defined by with s ′ = i r o i r i and ⃗ r indexing the basis states in the k-particle subspace in the same notation as ⃗ p and ⃗ q.To estimate overlap with Slater determinants, one needs an additional register of ancillas equal to the particle number η of |Ψ T ⟩ and must prepare a modified state |τ ′ ⟩ [14]: where |⟩ S refers to the original register and |⟩ A to the added ancilla register.This is due to the state |Ψ T ⟩ fully occupying the subspace of η particles in general, therefore one has to add additional 'space' for a reference state as the ancilla register as the reference state used in Eq. 26 has the wrong number of particles.

VI. STATISTICAL PROPERTIES OF OVERLAP ESTIMATIONS
In the following, we discuss the statistical properties of the measurement methods reviewed in the previous section.Specifically, we focus on their application in estimating overlaps between non-orthogonal Slater determinants and a trial wavefunction.

A. Hadamard test
The statistical properties of the Hadamard test are well-established [54].When employed to compute the real component of the overlap, the variance of the estimate is given by where n m denotes the number of circuit repetitions for a single overlap estimation.An analogous equation holds for the imaginary part.It is important to note that since we are dealing with a continuous set of walker states { |ϕ z ⟩}, the Hadamard test needs to be repeated for each overlap calculation, and the measurements cannot be reused.By utilizing the amplitude estimation algorithm, the Hadamard test could achieve Heisenberg scaling [55,56]; however, this comes at the expense of longer quantum circuits.

B. Classical Shadows
Using classical shadows to calculate the overlaps presents a significant advantage over the Hadamard test, as it allows for collecting the measurement outcomes prior to initiating the AFQMC calculations.This approach not only minimizes the idle time required to compute new overlaps between imaginary time steps but also reduces the number of measurements required.All measurements can be used to calculate the overlaps between the trial wavefunction and all walker states, rather than just a single overlap with a walker state.
To examine how the error in estimating overlaps ⟨Ψ T |ϕ⟩ scales with an increasing number of classical shadows, we utilize results from a simulation of the H 4 square with a side length of 1.23 Å in STO-3G.Upon generating an approximate ground state using the VQE ansatz from Sec. III, we construct classical shadows using the N -qubit Clifford ensemble, matchgate shadows, and orbital rotation shadows.In Fig. 3 (main plot), we present the statistical analysis of estimating the overlap between the VQE output state and two arbitrarily chosen walker states from an AFQMC run using matchgate shadows.Each data point in the histogram signifies a single estimation of the overlap, employing 10 4 shadows.We calculated the overlap 2000 times to obtain statistics, with each calculation involving the sampling of new shadows.A fit of the data to a Gaussian distribution highlights that the error in the overlap estimations is normally distributed [57].In the inset of Fig. 3 we investigate the scaling of the variance of the overlap estimation, Eq. ( 22), for one walker state as the number of snapshots increase.We observe the expected shot-noise scaling of classical shadows n −1/2 s [9].In Fig. 10, we show the same finding for the case of Clifford shadows.
We further calculate the average shot variance of the three flavors of shadow estimation schemes (Matchgate, Clifford, Orbital rotation) for all computational basis states with the correct particle number for an approximate ground state of H 4 in the STO-3G and 6-31G basis obtained with VQE.We compare the numerical outcome against their analytical bounds of their respective estimators in Fig. 4. The bounds are given by 4 b(N, η = 4) from Eq. (C2) (matchgate), 3 Tr(O 2 ) from Eq. ( 25) (Clifford) and 4 E ⃗ p,⃗ q Var D⃗ p ⃗ q from Eq. ( 40) (Orbital rotation).The factors of 4 for matchgate and orbital rotation shadows in the analytical bounds are due to a factor of 2 in the estimation of the overlap with Slater determinants from Eq. (C1) and Eq. ( 39).The bound given for the orbital rotation shadow is bounding the average shot variance of the entries of the η-RDM [14], while the bounds in Clifford and matchgate shadows apply to the variance of individual observables; for further investigation, see Appendix D. We investigate the scaling behavior for fixed η and k.The matchgate shadows scale O(N 2 ) while both the Clifford and orbital rotation shadows stay bounded, the orbital rotation shadow estimation shot variance even decreases with increasing N .This comes at the cost of increasing the qubit requirement by η, an increase in circuit depth for the orbital rotation shadows, and a classically inefficient post-processing step for the Clifford shadows.For the smaller basis, STO-3G, the matchgate shadow attains a lower shot variance than the orbital rotation shadow, while for the larger 6-31G basis, the orbital rotation shadow achieves the lowest average shot variance of ≈ 2.23, roughly a factor of 2 lower than the numerical bound.Up to this point, we studied the error scaling of the various methods assuming we are interested in calculating single overlaps.However, in QC-AFQMC, the same shadows are used to estimate the overlaps between the trial wavefunction and walker states over many imaginary time steps.Although the walker states fluctuate throughout the AFQMC run, they maintain a significant overlap and cannot be considered independent.This raises the question of whether covariances might have an impact when the same classical shadows are reused to calculate overlaps between the (dependent) walker states and the VQE output.
To answer this question, we gather 100 independent shadows, each generated by averaging 100 snapshots.We use each shadow to compute the overlap between the VQE output state and (i) all computational states and (ii) 1000 walker states obtained from propagating a single walker 1000 imaginary time steps.Subsequently, we use the data to determine the covariance matrices for both sets of states.In Fig. 5, we present our findings for the Clifford shadows, the matchgate shadows, and the orbital rotation shadows.Our findings reveal that for all ensembles, no covariances are present when estimating overlaps of computational states (right); however, covariances play a significant role when estimating overlaps with respect to walker states (left).As walker states have significant overlap with each other, see Fig. 8, this observation is not unexpected but has to be taken into account when running the algorithm.However, we note that many techniques exist to reduce covariances from statistical estimates, such as resampling techniques [58] or the use of Cholesky transformations.

VII. QC-AFQMC UNDER A FINITE NUMBER OF MEASUREMENTS
In the following, we evaluate the performance of QC-AFQMC under the assumption that only a finite number of measurements can be taken on the quantum computer.

A. Propagation of errors
The task of simulating the estimation of overlap between a VQE output state and walker states via classical shadows becomes computationally demanding for larger systems due to the many overlaps required in the propagation, Eq. ( 17), the calculation of the local energy, Eq. ( 10), and calculation of the force bias, Eq. ( 13).Furthermore, modern and efficient AFQMC implementations do not evaluate the local energy as in Eq. ( 10), but rather leverage more efficient representations via Green's functions [25].To emulate the impact of overlap errors (and finite measurements) on the performance of AFQMC when applied to larger systems, we introduce an error model.This allows us to carry out efficient AFQMC calculations in the presence of finite measurements.As all measurement methods generate estimates, which follow a Gaussian error model, we draw a random error with mean 0 and standard deviation ϵ and add it to the exact overlap.Using Gaussian errors in combination with standard error propagation techniques, we propagate the error from the overlap calculation to the local energy calculation and the force bias calculation.In the following, we show this propagation for the local energy.For more details and the discussion of the error propagation to the force bias, we refer to Appendix A.
As outlined in Sec.IV, the computation of the local energy in Eq. ( 10) requires the estimation of up to N 4 overlaps on the quantum computer.Propagating the errors from all overlap estimations yields the following error in the local energy: Here, λ 2 = pq |h pq | 2 + pqrs |h pqrs | 2 depends on the 2-norm of the Hamiltonian terms.We note that this error estimate decreases when restricting the sum to elements that correspond to reachable determinants within one-and two-body excitations from the walker state, see Appendix A for details.In contrast, the error in variational quantum algorithms such as VQE, depends on the 1-norm of the Hamiltonian terms [59][60][61].In molecular Hamiltonians where many terms are small, the here found 2-norm can be significantly smaller than the 1norm.Assuming error-free walker weights and disregarding auto-correlation, we can propagate the error to the final estimate of the ground state energy, finding where N B and N w is the number of blocks and walkers respectively.Ignoring potential auto-correlations and other statistical issues, rather than allocating more measurements on the quantum computer, one could augment the number of blocks or walkers, albeit at the expense of escalating computational efforts on the classical computer.

B. Numerical simulations
As a first test system, we employ the same system as in Sec.VI; a H 4 square with a side length of 1.23 Å, a frequently examined small system for correlated methods due to its present non-trivial correlations [21,62,63].To simulate the VQE circuit on a classical computer, we use a minimal basis set, STO-3G.We implement the previously described QNP-VQE with 4 layers.Upon optimizing the parameters using L-BFGS-B [36], we find a VQE energy of −1.962Ha, which is 7mHa away from the FIG. 5. Single-shot covariances using 100 independent shadows estimates from the H4 in STO-3G, each obtained by averaging 100 snapshots.(left) The covariance matrix obtained when using the shadows to estimate the overlaps of 1000 walker states obtained from 1000 imaginary time steps.(right) The covariance matrix when using the same shadows to estimate the overlaps of the 70 computational basis states with correct particle number.The plots (a), (b), and (c) correspond to Clifford shadows, matchgate shadows, and orbital rotation shadows, respectively.exact ground state energy.It is worth mentioning that achieving a VQE value closer to the exact ground state energy for this small system is possible by increasing the number of layers and allowing more optimization cycles.However, our primary objective is to examine the impact of erroneous overlaps on the performance of QC-AFQMC rather than exploring the convergence of the VQE ansatz or comparing the performance of QC-AFQMC to classical quantum chemistry methods.
Following this, we utilize the VQE output state as the trial wavefunction for AFQMC calculations.For all AFQMC calculations, we use the Python package ipie [25] with N w = 1000 walkers, N B = 1000 blocks with N ∆τ = 20 imaginary time steps per block with an imaginary time step size of ∆τ = 0.005.To estimate the The dashed red curve represents the exact ground state energy for each system, while the red shaded area corresponds to energies which lie within chemical accuracy with respect to the ground state energy.
AFQMC energy, we disregard the first 200 blocks as the equilibration period.Fig. 6 (top) displays the AFQMC estimates in dependence of the noise level ϵ used.As illustrated in the plot, introducing errors into the propagation, Eq. ( 16), results in a higher energy estimate than the error-less AFQMC calculation.When incorporating errors into the propagation, Eq. ( 16), the calculation of the force bias, Eq. ( 13), and the estimation of the local energy, Eq. ( 10), we observe a similar pattern as before; however, the errors in the estimate of the AFQMC energy are significantly larger than previously.It is worth mentioning that we did not employ reblocking methods to eliminate autocorrelation when calculating the AFQMC energy estimates.Statistical errors on the overlap propagate into the energy estimates at each step, causing fluctuations and making detecting autocorrelations more challenging.To ensure a fair comparison across all runs, we disregarded the effect of any autocorrelations.
As the second, slightly-larger test system, we use H 2 O with the geometry from the HEAT dataset [64], r OH = 0.95623 Å and θ HOH = 104.25 • .We again exploit a minimal basis, STO-3G, and use the QNP-VQE ansatz with 12 layers to generate a trial wavefunction.After optimizing the parameters, we find an energy of −75.002Ha, which is 10 mHa away from the exact ground state energy.As before, we use the VQE output as a trial wavefunction in AFQMC, while exploring varying noise levels to estimate the overlaps.We report the results in Fig. 6 (bottom).When reaching a noise level of 10 −2 , the AFQMC algorithm converges to an energy within chemical accuracy.However, the larger errors in the local energy estimation make the final AFQMC energy less predictable.
To study the effect of finite measurements on larger systems, we examine the H 4 in larger basis, cc-pVDZ and cc-pVTZ.As this maps to 40 and 128 spin orbitals and correspondingly the same number of qubits, running VQE is not feasible anymore.Instead, we use PySCF's [33] heat-bath CI implementation to generate trial wavefunctions.For both basis sets, we focus on finding the lowest singlet state.Subsequently, we use the trial wavefunction inside AFQMC with varying noise levels ϵ.We summarize our results in Table I.For both basis sets, we find that the AFQMC energy estimates converge to the wrong value when the overlap errors used in the propagation increase.When incorporating overlap errors in the local energy estimation as well, we find that the errors increase faster when using the larger basis set due to Eq. (42).

VIII. DISCUSSION ABOUT CLASSICAL & QUANTUM COSTS
As mentioned in Sec.IV, the overlaps for the importance function need to be calculated N w N B N ∆τ ∼ O(1) many times in the AFQMC run.There may be some implicit dependence on system size with regard to how many walkers and how long of an evolution are required,  but we ignore this for simplicity.The force bias requires O(N 2 ) overlaps for each walker to be calculated; thus, it needs ) calculations of the overlap.Finally, the local energy estimate scales O(N 3 ) (when using compression methods) but must only be calculated N B -many times.These are the base factors for the scaling of calculating each quantity regardless of the method.We first consider the classical cost, summarized in Table II, and then the quantum cost, summarized in Table III.
On top of the previously noted quantities of how many overlaps needed, we focus now on the number of classical operations required to calculate one overlap.In the case of the Hadamard test, the overlap is calculated by a simple average of samples and is not dependent on the system size.Thus, this method scales with O(1) and thus is independent of the system size.
In the case of Clifford shadows, the walkers, which are arbitrary Slater determinants, are described by an exponentially scaling linear combination of computational basis states, |ϕ⟩ = α c α |b α ⟩; thus, calculating the overlap scales exponentially with system size.Further, this needs to be performed for each snapshot.Thus, the scaling is O(2 N ).
In the case of matchgate shadows, the calculation is dominated by the coefficient estimation performed by polynomial interpolation, which scales as O(N 4 ) because interpolation requires O(N )-many calculations of a Pfaffian, of which the calculation scales as O(N 3 ) [65].In orbital rotation shadows, this is extended to a grid of N 2 points for two-dimensional polynomial interpolation, therefore scaling as O(N 5 ).We further note that the default precision is typically insufficient as the system size grows for interpolating the polynomial coefficients.One needs to increase the numerical precision to get a sufficiently accurate estimate of the coefficients, which we discuss in more detail in Appendix B. Keeping the error energy estimates in AFQMC constant when increasing the system size typically requires O(N 2 ) samples of energy estimates [66].For a fixed number of walkers N w and imaginary time steps N ∆τ , the classical computational cost to post-process all overlaps scales like O(N 9 ) when using matchgate and O(N 10 ) when using orbital rotation shadows.We also note that it might be beneficial to determine the overlaps of all computational states with the right symmetries and classically calculate the overlaps with the walker states.While this approach would  technically scale exponentially with the system size, for certain regimes (e.g., skinny spaces), it might require less computational cost than determining the walker overlaps one by one.
The quantum computational cost of each method has two considerations: the number of copies of the trial state that are needed and the additional depth of the circuit required to perform the discussed quantum measurement protocols.We first look at the number of copies of the state required to calculate the overlaps for all walkers in the system, summarized in Table III.Then, we discuss the circuits required for each case.
When using the Hadamard test, the number of samples needed is determined by the desired accuracy of the result O(1/ϵ 2 ); however, one needs new copies of the trial state for every overlap calculation, in other words for every walker state, N w N B N ∆τ in the total algorithm for the calculation of the importance sampling.In the case of calculating the local energy, O(N 3 ) copies per walker are required.Each term in the force bias also requires multiple copies of the trial state, which needs O(N 2 ) many overlap calculations per walker.In the case of Clifford shadows, the number of samples required scales with O(log(N w )/ϵ 2 ).In the case of matchgate shadows, the number of samples grows according to O( √ N log(N ) log(N w )/ϵ 2 ).In both cases, due to the covariance problem of the walker states throughout the evolution, additional copies may be needed, but most likely only to a constant prefactor, to mitigate the effects of the covariance problem.We note that ϵ corresponds to the error made in the overlap estimation.As shown in Eq. ( 43), to bound the error in the AFQMC energy, the error has to decrease according to the 2-norm of the Hamiltonian terms.
The scaling of the circuit depth also depends on the method used to calculate the overlaps.As seen in Fig. 2, when using the modified Hadamard test, the circuit requires controlled versions of the state preparation circuit of the two states.Since the scaling of the controlled unitary acting on the full system scales with a high polynomial, one could employ methods that can replace the controlled unitary with an uncontrolled unitary [46][47][48].In our case, the unitaries prepare the walker state and the trial wavefunction.The walker state is an arbitrary Slater determinant, which can be efficiently pre-pared with the latest algorithm using O(η log 2 (N )) gate depth [67][68][69][70].The ansatz used for the trial state can be chosen to be efficient; however, the circuits for the walker states are still beyond the hardware available in the NISQ era.In the shadow technique framework, the depth of the circuit is dependent on the ensemble being sampled.In the case of N -qubit Clifford circuits, the Clifford unitaries can be implemented in O(N log(N )) depth using fully-connected topology [71].In the case of matchgate circuits, the matchgate circuits that are also Clifford circuits can be used as an ensemble.Thus, the implementation of matchgate circuits is also efficient.Orbital rotation unitaries can be implemented in O(N ) [72].

IX. CONCLUSION
In this work, we studied the applicability of QC-AFQMC in terms of the number of measurements required from the quantum computer and the postprocessing costs on the classical computer.We used various state-of-the-art measurement schemes to calculate the required overlaps between the non-orthogonal Slater determinants and the trial wavefunction prepared on the quantum computer and examined their statistical behavior.Numerical simulations revealed that unwanted covariances are present between estimates of walker overlaps at different imaginary time steps.Moreover, we exploited an error model based on Gaussian errors in the overlap calculations to analyze the error propagation to the AFQMC energy estimates.We found that the error of the AFQMC energy estimate can be upper bounded by the 2-norm of the Hamiltonian terms.This is in contrast with variational quantum algorithms which typically depend on the 1-norm.We further discussed the classical post-processing cost of the various measurement methods and found that the post-processing of all required overlaps in QC-AFQMC scales with O(N 9 ) using matchgate shadows and O(N 10 ) using orbital rotation shadows.This scaling of the current version of the algorithm hinders future applications in industrial settings.
Several steps are required to make QC-AFQMC viable for industrial applications.Firstly, regarding quantum computing, the development of quantum algorithms to generate classically intractable trial wavefunctions is vital for the future success of the method.Secondly, it is crucial to decrease the post-processing cost associated with the method substantially.Moreover, a comparison where the additional classical cost is spent on creating trial classical wavefunctions, e.g., from selected CI methods or low-bond-dimension DMRG calculations, could shed light on the application range of QC-AFQMC.Next to advancements in the algorithm, most importantly, it is necessary to pinpoint industrial-relevant molecular systems where the method's high polynomial scaling can be justified in comparison to existing classical methods.
where the coefficients {c i } n i=0 are unknown.One can form a system of linear equations by evaluating the polynomial at the points {x i } m i=0 obtaining the outputs {p(x i ) = y i } m i=0 described by the matrix vector multiplication      where the matrix is known as the Vandermonde matrix, V .This matrix is always invertible when m = n.Thus, one can obtain the unique solution of the coefficients c = V −1 y.The best choice for the points {x i } m i=0 is the Chebyshev nodes which are defined in the interval [−1, 1] as When implementing this, one encounters issues of numerical stability due to the high condition number of the Vandermonde matrix.When using standard python scripts, we saw that once the polynomial reaches a degree of 20-25, the instability is so significant that the results are no longer sufficiently accurate to machine precision (assumed to be 10 −13 ).Therefore, one needs to increase the precision of the calculations used.
We used the package mpmath [73] to increase the precision of the calculations.We calculated the maximum error of interpolating the polynomial coefficients as the degree of the polynomial increases for various digits of precision used in the calculations.As seen in Fig. 7, the error in the estimate for every level of precision grows as the order of the polynomial increases.As seen in the inset of Fig. 7, our analysis suggests that the required digits of precision grow linearly with the order of the polynomial.

Covariances in overlap estimations
It is well known that for real-valued random variables X i , Y j and constants a i , b j the covariances between linear combinations of the random variables can be calculated as: In our case, the overlap of a trial state and a general Slater determinant ϕ a is a linear combination out of overlaps with the overlaps of computational basis states c i , which have vanishing covariances as discussed above.For the covariance where we used that covariances between estimates of different computational basis states vanish and approximated all variances by the mean variance Var mean , which is a good approximation for a large number of shots as shown in [57].We can see that the covariance of the overlap is approximately proportional to the overlap between the walkers ⟨ϕ a |ϕ b ⟩, which can be seen when comparing the plot of the walker overlaps in Fig. 8 with the plot of the covariances in Fig. 5.

FIG. 1 .
FIG.1.Visualization of the AFQMC algorithm.Colored terms are steps where we intend to measure overlaps between the trial wavefunction |ΨT⟩ and walker states |ϕ⟩ on a quantum device in QC-AFQMC.We note that the O(N 3 ) scaling in the local energy is achieved when exploiting Hamiltonian compression methods, such as double factorization or similar methods[37][38][39].Without these compression strategies, it would scale as O(N 4 ).

FIG. 3 .
FIG. 3. Estimating overlaps with matchgate shadows.(main plot) The histograms show the distribution of the absolute values of the overlap estimates of two walker states w.r.t. a VQE output state (H4) by conducting 2000 repeated shadow experiments, with each data point utilizing 10 4 matchgate shadows.The dashed red lines represent two Gaussian distributions fitted to both distributions.(inset) The variance in the overlap distribution as the number of shadows increases.

FIG. 4 .
FIG.4.Numerical estimation of the average shot variances of all overlaps with computational basis states of the matchgate, Clifford and orbital rotation shadows for approximate ground states of H4 in STO-3G basis (N = 8, η = 4) and in 6-31G basis (N = 16, η = 4) for 10 4 shots in comparison with the analytical bounds of the variances.Offset in N of orbital rotation shadow estimation is due to the reference state requiring N + η qubits.Dotted lines serve as a guide to the eye.

FIG. 6 .
FIG.6.AFQMC energy estimates for two systems: (top) the H4 square in STO-3G basis set, and (bottom) H2O in STO-3G basis set.We investigate the impact of varying noise levels ϵ in the overlap estimation.For comparison, we apply finite measurements in two scenarios: (1) only during propagation (blue curve), and (2) for estimating all quantities (orange curve).The dashed red curve represents the exact ground state energy for each system, while the red shaded area corresponds to energies which lie within chemical accuracy with respect to the ground state energy.

FIG. 7 .
FIG.7.We calculated the maximum error of interpolating the polynomial coefficients as the degree of the polynomial increases for various digits of precision used in the calculations.The inset shows the required number of digits of precision to stay below machine error (10 −13 ) for various different degrees of polynomial and the line of best fit (the dashed red line).

≈
FIG.8.The overlap matrix ⟨ϕi|ϕj⟩ between walkers at different time steps i and j.

FIG. 10 .
FIG. 10.Estimating overlaps with Clifford shadows.(main plot) The histograms show the distribution of the absolute values of the overlap estimates of two walker states w.r.t. a VQE output state (H4) by conducting 2000 repeated shadow experiments, with each data point utilizing 10 4 Clifford shadows.(inset) The standard deviation in the overlap distribution as the number of shadows increases.
The quantum circuit representing the modified Hadamard test to calculate the overlap ⟨ΨT|ϕz⟩ between the trial wavefunction |ΨT⟩ and a walker state |ϕz⟩ by using an ancilla qubit.The gate H is a Hadamard gate, the unitaries UΨ T and U ϕz prepare the states |ΨT⟩ and |ϕz⟩, respectively.Depending on the application of the phase gate S, the real or imaginary part of the overlap is calculated.

TABLE I .
Energy estimates derived from AFQMC calculations of H4 using the cc-pVDZ (top) and cc-pVTZ (bottom) basis sets, considering different overlap errors (ϵ).The Propagation column presents estimates from AFQMC runs with errors affecting only the overlaps in the importance function, Eq. (16).Conversely, the Full column displays results where all overlap calculations are influenced by errors.All energies are reported in Hartree.

TABLE II .
The scaling of the classical post-processing cost to estimate the overlap, the force bias for, and the local energy of a single walker.Only showing the largest scaling factor.