Accelerating variational quantum eigensolver convergence using parameter transfer

One impediment to the useful application of variational quantum algorithms in quantum chemistry is slow convergence with large numbers of classical optimization parameters. In this work, we evaluate a quantum computational warm-start approach for potential energy surface calculations. Our approach, which is inspired by conventional computational methods, is evaluated using simulations of the variational quantum eigensolver. Significant speedup is demonstrated relative to calculations that rely on a Hartree–Fock initial state, both for ideal and sampled simulations. The general approach of transferring parameters between similar problems is promising for accelerating current and near-term quantum chemistry calculations on quantum hardware, and is likely applicable beyond the tested algorithm and use case.


Introduction
Quantum computing has been proposed as a possible next step in the field of computational chemistry, especially when applied to highly correlated systems [1]. Theoretical proof of potential quantum speedup exists for problems involving highly correlated systems [2,3]. However, these results rely on fault-tolerant quantum computers. While quantum computing has come a long way, fault-tolerant computing is yet to be realized [4][5][6]. While the technology matures, other avenues to useful quantum computing are being explored. The period leading up to fault-tolerant quantum computing is often referred to as the Noisy Intermediate-Scale Quantum (NISQ) era [7]. Quantum devices in this regime are susceptible to many types of errors and noise [8,9].
Despite these impediments, algorithms have been designed to leverage existing NISQ hardware for quantum chemistry calculations [1,3,10]. Several such approaches rely on so-called Variational Quantum Algorithms (VQAs) [11]. These variational algorithms, a subset of which will be described in more detail below, utilize a parameterized quantum circuit to generate trial states for which a cost (or loss) can be evaluated through measurement on the NISQ device. By coupling the quantum computer to a classical optimization algorithm, the circuit parameters can be optimized such that the cost is minimized. Even though the addition of a classical optimization step allows for some level of noise suppression [11,12], several works [9,[12][13][14][15] have highlighted the challenges of optimizing the high dimensional parameter spaces associated with these types of quantum circuits. While there is no formal proof that VQAs can out-perform classical implementations, they are nonetheless useful in pushing the limits of near-term quantum computation [16,17].
The number of parameters and energy evaluations required for VQE calculations can reach into the tens or hundreds of thousands, even for small molecules [12]. The large number of parameters makes calculations slow, and in some instances practically unfeasible. For example, more than 16 000 variational parameters and 69 qubits are required to describe a single hydrogen cyanide molecule using unitary coupled cluster theory including single and double excitations (UCCSD) and a standard 6−31 + G(d) basis set. More advanced versions of the VQE algorithm exist that reduce the number of parameters. However, the fundamental challenge of high dimensional optimization remains as the size of the simulated system increases.
In this work, we outline and evaluate a general strategy for accelerating convergence of VQE-type [23][24][25] algorithms through the use of a parameter transfer (PT) approach. We provide proof-of-concept calculations that demonstrate speedup of VQE-based evaluation of potential energy surfaces (PESs) of molecules, both for ideal simulations and in the presence of sampling noise.

The VQE algorithm
The objective of any VQA is to find the parameters θ min that minimize the value of some cost (loss) function C (θ). How one defines the cost function varies. Nevertheless, some aspects of the cost function are common to all implementations. The type of cost functions that we will discuss in this work can be seen as functions of (1) a parameterized circuit, U (θ), and (2) a set of measurable operators (observables), Ô k , that sums to the cost.
The VQE is a particular implementation of a VQA that aims to identify a set of parameters that minimize the expectation value of a Hamiltonian [26]. The VQE algorithm can consequently be formulated as a minimization problem where the cost is the energy E (θ). Because we rely on the Born-Oppenheimer approximation, energy minimization is performed for fixed nuclear coordinates R with the corresponding HamiltonianĤ (R). When formulated in second quantization this electronic Hamiltonian reads as, In equation (2), h ij and h ijkl are one-and two-electron integrals, and E NN is the classical nuclear-nuclear repulsion energy,â + m andâ m are the fermionic creation and annihilation operators for the spin orbital m [27,28]. As we simulate physical systems, the energy minimization of equation (1) is commonly subjected to constraints, such as particle and spin conservation.
The implementation of VQE can be broken down into three main components: quantum circuit execution, measurement, and classical optimization of parameters with respect to the expectation value (figure 1). The quantum circuit itself can be represented by two unitary operators: the state preparation, U prep , and the ansatz, U (θ). The state preparation circuit acts on the all-zero state to create some initial state Φ init ⟩ = U prep 0 . . . 0⟩. This initial state is then modified through the application of the ansatz to produce a trial state |Φ (θ)⟩, that is U (θ) |Φ init ⟩ = | Φ (θ)⟩. Measurements of relevant Pauli operators are subsequently used to reconstruct the expectation value that, in turn, is used to guess a new set of parameters to be used in the ansatz. This process is repeated until some convergence criterion is met. While the VQE algorithm may appear straightforward, each step presents challenges that are critical for the quality of the calculation. A more detailed description of how the electronic structure problem is mapped to a quantum circuit ansatz is provided in the supporting information (SI).
A crucial part of the VQE algorithm, which can make or break any attempt at convergence, is providing a suitable initial guess (state), |Φ init ⟩, for the wavefunction (state preparation in figure 1). In this work, we explore the general approach of accelerating convergence by transferring parameters from a previously converged calculation or otherwise known solution.

Parameter Transfer (PT)
Central to the idea of PT is the existence of an underlying similarity between two problems. In many cases, some form of commonality is apparent. For example, in chemistry, molecular structures might differ by a minor chemical substitution, or be related by symmetry or conformation. An important class of problems also related to similarity is quantum mechanical problems treated at different levels of theory or accuracy. The level of theory might, for instance, be increased, by utilizing larger basis sets, more flexible wavefunctions, or by limiting approximations in the Hamiltonian. Others, including one of us (MS), have shown that PT can be used to accelerate post-Born-Oppenheimer VQE calculations, by relying on related  Born-Oppenheimer approximation-based calculations [29]. In what follows, we will first briefly discuss PT in the context of a general cost function, C (θ). We will then address the case of molecular ground state energies (cf E (θ) in equation (1)). Providing a general definition or measure of similarity is outside the scope of this work. We here view similarity in terms of relations between the shape of the landscapes of two cost functions, C A (θ A ) and C B (θ B ). Regions around minima of cost functions, θ min A and θ min B , are considered similar if the two minima offer approximate solutions to the other, i.e. if C B θ min Note that the two cost functions need not have similar values, nor the same number of parameters. To understand this seeming dichotomy, we can consider a simple example: the separate minimizations in x of y = x 2 and y = x 4 + c. Both problems share the same solution in x min = 0 while differing in cost at an arbitrary value x, since c is unbound. For the purpose of optimization, it is important that the two convex regions surrounding each minimum overlap (figure 2). Given that an overlap exist it is possible to start at θ min A and follow the steepest decent to reach θ min B . In the optimization problems considered herein, there is an underlying order to the calculations; one is performed before the other. We use source to mean an initial calculation. Target, in contrast, refers to the calculation to which the parameters are transferred and that we wish to accelerate. . Schematic view of two VQE calculations being performed in series, where spirals represent gradual convergence towards a ground state. The converged parameters, θ min k , for a geometry R k are passed to the next VQE calculation for geometry R k+1 . PT thereby allows for shortcutting the optimization process, as compared to an otherwise default initial guess.
The underlying principle of PT is common practice in conventional quantum chemistry. For example, PT can be useful for speeding up convergence of self-consistent field optimization in PES sampling, either by transferring parameters between wavefunctions of neighboring geometries or different levels of theory. PES, in turn, form the basis for evaluating equilibria and transition state geometries, reaction rates, spectroscopic constants, and other properties of molecules. PT is also related to the field of transfer learning in machine learning [30,31]. The implementation of PT in quantum computation has thus far been limited. Promising speedup for the Quantum Approximate Optimization Algorithm [32][33][34][35][36][37] and for VQA models [38] have been demonstrated with similar approaches. IBM has implemented extrapolation (or bootstrapping) methods in their Qiskit framework for the purpose of accelerating PES sampling of molecules [39]. However, while some methods are available, testing and characterization of these methods is lacking. Overall, little is known about the efficiency of PT when applied to the VQE, or in the presence of noise.
Here, we combine PT with VQE and apply it to the problem of PES sampling of molecules. Figure 3 illustrates how PT connects, and ideally accelerates, consecutive variational VQE optimizations on a PES, by providing improved initial guesses. We emphasize that this combination of PT with VQE is but a small first step towards extending PT to quantum computation of chemistry more generally. Some of us (MS) have already built upon this first step and explore PT between levels of theory in another work [29].
The initial geometry, R 0 , in a PT-VQE PES evaluation is, naturally, performed without any transfer (as there is no previous solution to transfer from). The subsequent calculation of a neighboring geometry, R 1 , is then initiated using parameters θ min 0 , optimized for R 0 . Extending to the general case of neighboring geometries R k , and R k+1 , we utilize a converged source ground state, |Φ θ min R k ⟩, as the initial guess for the target ground state |Φ θ GS R k+1 ⟩. The difference in geometry (the transfer step) is then d = R k+1 − R k and the transfer distance is d = ||R k+1 − R k ||. Here we intentionally leave the norm unspecified for the general case since what constitutes distance may vary between systems. We will apply PT to accelerate exploration of relatively simple one-dimensional PES sampling using VQE. For this reason, we refer to Euclidean distances (2-norm).

Quantifying speedup
To quantify the efficacy of PT-accelerated VQE, we compare it to the de facto standard method for chemically inspired ansätze: Hartree-Fock (HF), a mean-field description of electron interactions [28]. We define speedup for a given target geometry R and transfer step d as where, N HF and N PT are the number of times energy expectation values, ⟨E⟩, needs to be evaluated for convergence to be reached with a HF and a PT initial guess, respectively. In our following evaluation of PT, we will additionally present speedup averaged over all n considered combinations (R, d) where ∥d∥ = d, Some optimization algorithms require additional evaluations as part of a setup routine, as discussed in the supporting information.

Methods
All simulations were performed using the VQE algorithm implemented in Qiskit 0.39.3 [39] combined with Gaussian 16, version B.01-AVX2 [40]. All calculations relied on Unitary Coupled Cluster Singles and Doubles (UCCSD) theory, while basis sets and active spaces were varied as indicated in figure 4. We denote the number of active orbitals and electrons using the notation (electrons, orbitals). Active spaces were determined using Qiskit's ActiveSpaceTransformer, limiting the active space to those orbitals closest to the Fermi level. Molecular Hamiltonians were mapped to simulated qubits using parity encoding. All simulations implemented two-qubit reduction, made possible by considering alpha and beta spin parity conservation [41]. The resulting quantum circuits thus range in size from 2 to 14 qubits in the noise-free simulations (details are provided in the SI).
We relied on Qiskit's Sequential Least Squares Programming (SLSQP) [42] optimization algorithm for exact simulations. Constrained Optimization By Linear Approximation (COBYLA) [43] and the Simultaneous Perturbation Stochastic Approximation (SPSA) [44] optimization algorithms were both evaluated for sampling-based simulations. Convergence of the SPSA algorithm was determined over the last 100 energy evaluations, by comparing ⟨E⟩ averaged over the first and last sets of 50 values. The calculation was considered converged if the difference between the two sets were below a threshold of 0.0001 Ha.
Sampling and noise: Each combination of target and transfer geometry were run with the QASM simulator in Qiskit, which performs sampling from an ideal statevector simulation (as opposed to exact statevector). As a practical compromise between speed of simulation and sampling precision when using the SPSA optimizer we took 32 786 (2 15 ) samples for each calculation. When using the COBYLA optimizer 1048 576 (2 20 ) samples were instead used, owing to the algorithm's lower noise-resilience. These choices retain a substantial contribution to stochastic noise in the simulations, and is similar in magnitude to what can be expected in current and near-term hardware experiments [20]. To evaluate average performance, multiple calculations were performed for both the HF starting guess and the PT implementation, the results of which were averaged. Speedup for each point (R, d) was evaluated as an average over all possible combinations of 10-15 separate calculations initialized with HF and PT. To limit computational cost associated with these simulations, the considered circuit sizes were reduced to include only 2 (H 2 and HeH + ) and 4 qubits (LiH).

PT in noise-free VQE
To evaluate PT, we first look at noise-free (statevector) simulations. While not being faithful models of NISQ hardware, noise-free simulations can provide valuable insight into the limits of quantum algorithms. By removing the effects of noise, algorithmic errors can be separated from those caused by imperfect hardware and choices of peripherals such as optimizers. We will later return to assess the effects of sampling noise in a subset of examples.
Our test set consist of 36 different quantum chemistry problems: PES of 10 molecules treated with different basis sets and active spaces, which are detailed below. For each case, we consider the dissociation of a single atom from a molecule, making a one-dimensional PES. In triatomics, such as BeH 2 , H 2 O and HCN, a single H atom is dissociated, while all bond angles remain fixed.
The molecules we study cover a range of electronic structures in terms of electron correlation, including both archetypical ionic dimers (HeH + and NaCl) and covalently bound molecules (H 2 and H 2 O). Our test set was selected to represent a variety of parameter set sizes, quantum circuits and Hamiltonians of relevance to current and near-term hardware experiments, while still allowing for classical simulation.
The speedup quantified for a given calculation will inevitably be linked to the choice of optimization algorithm. Some such algorithms will function well for noiseless simulations, whereas others are more suitable in the presence of noise. For noise-free optimization, we here rely on the SLSQP optimizer. SLSQP is a representative optimizer for convex optimization in the absence of noise, commonly used in statevector simulations of VQE [45][46][47][48]. As we later implement sampling noise, we will subsequently use the SPSA algorithm, which is better adapted for optimization in the presence of noise. We stress that it is outside the scope of this work to provide benchmarking of optimization algorithms (such benchmarking has been done by others [49][50][51]). Nevertheless, we expect the general conclusion of our work-useful speedup with PT-to hold for reasonable choices of optimizers.
One important limit to PT is the transfer distance d. As d approaches zero in the absence of noise the transfer perfectly corresponds to the already converged solution. Figure 4(a) shows an expected decrease in average and median speedup over our test set as a function of increasing transfer distance. Figures 4(b)-(e) serve to illustrate the variety in speedup between calculations in our test set. We attribute these differences to the suitability of the HF initial guess: there is less benefit to PT in systems that are well-described by a HF state. In some cases, such as the H dissociation from HCN, the speedup is marginal or non-existent, except for very short transfer distances.
For a transfer distance of 0.1 Å, which is a practical choice for PES sampling, we note an average speedup of 70% ( figure 4(a)). Note that the choice of a fixed transfer distance for all systems is far from optimal, and 0.1 Å is here merely chosen to facilitate comparison. In figure 5 we look further at the speedup corresponding to this specific transfer distance, but additionally averaged over entire bond dissociation curves, from compressed bonds to equilibrated geometries and dissociation. Results in figure 5 correspond to calculations with increasing number of circuit parameters from left to right. An overall trend is not apparent, except for some molecules (like Li 2 ) where an improvement with the number of parameters may be present. The PT approach appears to perform well under noiseless conditions even for calculations exceeding 100 parameters.

Effects of measurement sampling
Even with fault-tolerant quantum computing, which is yet to be realized, one needs repeated sampling of expectation values of the quantum system. Such sampling from a noise-less state introduces a stochastic variability, or sampling noise. In this section, we reevaluate three of our test set calculations, H 2 (2,2), HeH + (2,2), and LiH (2,3), in the presence of sampling noise. Due to the presence of sampling noise, SLSQP is no longer suitable (nor practically feasible) and we use the SPSA and COBYLA optimization algorithms ( figure 6).
The most notable difference in the performance of PT when sampling noise is introduced is a markedly different dependence of speedup on transfer distance. Instead of a rapid decline, as in figure 4, speedup appear to decrease linearly with increasing transfer distance. We attribute this difference to the inherently larger challenge of optimization in the presence of noise. Figure 6(c) serves to highlight the importance of choosing optimizers well suited for the task at hand. Even though convergence is reached with the COBYLA optimizer, the number of evaluations at convergence decreases with increasing sampling noise, implying that figure 6(c) reflects a poor choice in optimizer, not a failure of PT.
The large effect of even relatively low noise levels in combination with improper choices of optimizers, highlights a challenge that NISQ algorithms face when being compared to conventional methods. We further discuss optimizer suitability in the supporting information. Overall, our evaluation suggests the potential for substantial PT-driven speedup both without and in the presence of sampling noise.

Conclusions
In this work, we have outlined and provided a proof-of-concept for PT, an approach for accelerating consecutive VQE calculations. In our test set, PT provides an average speedup of 70%, as long as a suitable optimization algorithm is used, compared to when a simulation is initialized by a standard HF initial state. Speedup is apparent both in ideal and sampled statevector simulations, provided that source and target geometries are reasonably (∼0.1 Å) close. Results can nevertheless vary substantially between molecular systems, as expected. The PT approach does not appear sensitive to problem complexity, quantified in terms of number of quantum circuit parameters. Simulations subjected to sampling noise also appear less sensitive to the choice of transfer distance. Combined, these results lead us to conclude that PT is an attractive approach for accelerating VQE convergence even for larger more correlated systems. While this study is limited to PT within the context of PES sampling, we also stress that PT can be considered a general principle extendable to other types of calculations. In particular, calculations involving transfer of parameters between different levels of theory is a promising avenue, as shown in [29].

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).