Near-term quantum algorithms for linear systems of equations

Solving linear systems of equations is essential for many problems in science and technology, including problems in machine learning. Existing quantum algorithms have demonstrated the potential for large speedups, but the required quantum resources are not immediately available on near-term quantum devices. In this work, we study near-term quantum algorithms for linear systems of equations of the form $Ax = b$. We investigate the use of variational algorithms and analyze their optimization landscapes. There exist types of linear systems for which variational algorithms designed to avoid barren plateaus, such as properly-initialized imaginary time evolution and adiabatic-inspired optimization, suffer from a different plateau problem. To circumvent this issue, we design near-term algorithms based on a core idea: the classical combination of variational quantum states (CQS). We exhibit several provable guarantees for these algorithms, supported by the representation of the linear system on a so-called Ansatz tree. The CQS approach and the Ansatz tree also admit the systematic application of heuristic approaches, including a gradient-based search. We have conducted numerical experiments solving linear systems as large as $2^{300} \times 2^{300}$ by considering cases where we can simulate the quantum algorithm efficiently on a classical computer. These experiments demonstrate the algorithms' ability to scale to system sizes within reach in near-term quantum devices of about $100$-$300$ qubits.


I. INTRODUCTION
Quantum computing promises speedups for a set of problems including integer factoring and search. Speedups have also been discussed for finding approximate solutions to linear systems of equations and convex optimization [1][2][3][4][5][6]. Many of these algorithms require a large amount of low-decoherence and fully connected quantum bits, beyond the reach of near-term available quantum computing hardware. As near-term quantum devices approach sizes of more than 50 qubits, a large amount of research has been devoted to finding tasks where such devices can outperform classical computers. One area of research concerns so-called "quantum supremacy" [7][8][9][10], which is about exhibiting a task for which the classical simulation is conjectured to be hard but which is performed efficiently on a quantum device. While the theoretical guarantees are sound, usually such tasks do not have straightforward practical applications, such as in the case of Boson sampling [11] and IQP circuits [12]. On the other hand, many investigations focus on finding applications for near-term quantum computers. Such applications are believed to be in quantum chemistry, optimization and machine learning, and possible algorithmic candidates are the variational quantum eigensolver (VQE) [13][14][15] and quantum approximate optimization (QAOA) [16,17]. A good definition of near-term quantum computing for applications is to find quantum algorithms that minimize the number of qubits, the number of gates, the complexity of the quantum gates (for example, in terms of controlled operations), and are tailored to the available hardware, while at the same time the problem should be of significant practical relevance.
Linear systems are important in a large variety of applications in engineering and sciences. Generically, a linear system is specified by a non-square matrix A ∈ R M ×N and a right-hand side vector b ∈ R M . The task is to find a solution vector x ∈ R N for which Ax = b. Depending on the dimensions M and N and the rank of the matrix, the task of solving the linear system takes on various forms. First, if the matrix is square and invertible, we can use matrix inversion to solve the linear system to find a unique solution. If the matrix is square and non-invertible, the pseudoinverse inverts only non-zero eigenvalues. If the matrix is non-square, we have the case of overdetermined and underdetermined equation systems. An overdetermined equation system appears for example in regression, where a few parameters, given by the vector x, are used to explain a larger amount of data points, specified by the vector b. In this case, no exact solution is possible and one often minimizes the 2 norm Ax − b In this work, we study near-term quantum algorithms for solving linear systems. We start by analyzing the use of basic variational algorithms for this task. In variational algorithms, the quantum computer is used to prepare candidates for the solution vector, using a shallow sequence of parameterized quantum gates. Then, measurements are performed on the solution candidate to evaluate the quality of the candidate defined in terms of a loss function. Finally, an optimization loop updates the variational parameters to improve the quality of the solution candidate. We propose two types of Ansätze for these basic variational algorithms. The first one uses Ansätze that are hardware-efficient, and without explicit usage of the matrix A and b, hence is called the "Agnostic" Ansatz. Such Agnostic Ansätze can be used in various forms of optimization methods, such as Nelder-Mead method [20], imaginary-time evolution [21,22], or adiabatic-inspired optimization [23]. The second Ansatz, an Alternating Operator Ansatz, is strongly dependent on the linear system and fully uses A and b via Hamiltonian simulation [24][25][26][27][28]. This approach is inspired by the adiabatic approach and the QAOA method. The drawback is the more far-term nature of the approach. Very recently, there have been other works that investigate the use of variational algorithms to solve linear systems [29][30][31], which share similarities with the above ideas.
We encounter simple types of linear systems which exhibit a potential problem for variational approaches. For these linear systems, any pre-specified Ansatz with a polynomial number of variational parameters will have a mostly flat optimization landscape with large loss function, which can be considered as a type of plateau effect. A flat optimization landscape will thwart any optimization techniques to find the optimum. We analyze various efforts that circumvent the known barren plateau issue [32], such as properly-initialized imaginary time evolution or the adiabatic-inspired optimization. We show that these attempts may still show the plateau effect, which motivates further efforts and alternative ideas to solve large-scale linear systems with near-term quantum devices and achieve quantum advantage.
To provide a potential solution to the aforementioned problems, we pursue a different route and propose a different class of algorithms for solving linear systems on near-term quantum devices. These algorithms are based on a concept we call Classical Combination of Variational Quantum States (abbreviated as CQS). As the name suggests, we show that different quantum states, variational or not, can be combined via classical pre-and post-processing to increase the power of near-term quantum computers. This combination avoids difficulties in optimizing the variational parameters and provides some provable guarantees for solving linear systems. We introduce the notion of an Ansatz tree and show how to find a good set of quantum states and the optimal combination coefficients. We show that at enough depth the Ansatz tree is guaranteed to include the solution, and also study the case of Tikhonov regularized regression. We introduce a heuristic approach for judiciously pruning the tree in the less important branches and expanding the tree in the more important branches. Our proposed heuristics is called the gradient expansion heuristics, and can be considered as just one example of many potential heuristics for CQS and the Ansatz tree. To demonstrate the potential of this class of algorithms, we have conducted numerical simulations for solving linear systems with sizes up to 2 300 × 2 300 . Finally, we show that a variation of the CQS method can achieve a similar provable guarantee as existing quantum algorithms for linear systems and can improve upon a recent work [33], reducing the quantum gate count by (1/ )-fold, where is the desired error to the optimal solution (e.g., = 0.01), while maintaining the use of only one ancilla qubit.

II. CLASSICAL AND QUANTUM SETTING
We are given a Hermitian matrix A ∈ C N ×N with spectral radius ρ(A) ≤ 1. Assume without loss of generality that N = 2 n . For non-Hermitian matrices, the standard Hermitian embedding can be used. The right-hand side vector b ∈ C N , which we assume to be normalized and write in quantum notation as |b . The normalization is not a major restriction as we could rescale the length of x. The main task is to find a vector x that solves the system of equations In the quantum setting, we have to make assumptions about the access to the linear system. First, we require quantum access to the right-hand side vector, i.e., a quantum circuit that prepares the vector |b as a quantum state.
Assumption 1. Assume availability of an efficient n-qubit quantum circuit described by the unitary U b such that U b |0 n = |b .
Next, we require access to the matrix defining the linear system. Here, our main assumption is that the matrix is given by a small linear combination of known unitaries. This assumption is weaker than the assumption of an efficient Pauli decomposition. Assumption 2. Assume an efficient unitary decomposition of the matrix A ∈ C N ×N , i.e. A = K A k=1 β k U k , with K A = O (poly(log N )) and unitaries U k ∈ C N ×N with known efficient quantum circuits. We can always absorb the phase of β k into U k , so we can assume β k > 0.
Next, we discuss the main loss functions used in this work. The first loss function is the well-known 2 -norm loss used in regression methods. This loss function is convex in x. Definition 1. Let the linear system be given by A ∈ C N ×M and |b ∈ C N . Define the loss function L R (x) := Instead of Definition 1, one may want to use the regularized version. This version is common in statistics and machine learning, and is known as Tikhonov regularization [34], or ridge regression [35].
Definition 2. Let the linear system be given by A ∈ C N ×M and |b ∈ C N . Define the loss function L T (x) := The regularization turns the loss function into a strongly convex function. This loss function will be used to prove a faster convergence of the Ansatz tree approach in Proposition 4. The third loss function is obtained by defining a Hamiltonian which has a unique ground state that is the solution to the linear system. This definition borrows techniques presented in [33] for solving the linear system via a method inspired by adiabatic quantum computation. To keep the adiabatic Hamiltonian positive across the adiabatic sweep, an ancilla has been introduced in Definition 3. Among other properties, in [33] it was shown that H(1) has a unique ground state with zero eigenvalue given by |+ |x * = |+ A −1 |b , which is proportional to the solution A −1 |b after removing the ancilla. This Hamiltonian implies the following loss function. The loss function can also be written as L H (|x ) = x|A 2 |x − x|A|b b|A|x without the ancilla.

III. VARIATIONAL ALGORITHMS AND ANSÄTZE
We first discuss basic variational algorithms for solving linear systems. A typical variational algorithm works as follows: one prepares multiple copies of a parameterized quantum state Ansatz and measures observables on it; the measurement results provide an estimate of the loss function. An optimization loop changes the parameters of the Ansatz with the goal of minimizing the loss function. In this section, we consider two types of variational Ansätze. Different Ansätze require different assumptions on the available hardware and can lead to different sets of solutions.
• Agnostic Ansatz: We take Ansätze which perform single qubit rotations and entangling operations. We do not take into account information of the linear system itself except by measuring the loss function.
• Alternating Operator Ansatz: We alternate the use of operators constructed from A and the vector |b for generating the Ansatz. This requires Hamiltonian simulation of operators derived from A and |b b|.
In particular, we focus on minimizing the Hamiltonian loss function L H (|x ), which is equivalent to finding the ground state of the Hamiltonian H(1). This allows the use of tools such as variational quantum eigensolver in quantum chemistry to solve linear systems of equations. The detailed procedure to measure the Hamiltonian loss function is discussed in Appendix A.
A. Details on variational algorithms for optimizing the Ansatz We first discuss the details of variational algorithms. We consider an Ansatz generated by a quantum circuit parametrized by θ, i.e., |ψ(θ) = U Ansatz (θ)|0 n . First, we show the basic variational quantum eigensolver (VQE) for finding the ground state of a Hamiltonian H. Initialize the variational parameters θ to be θ init . While θ has not converged, do the following steps: 1. Prepare quantum state |ψ(θ) on the quantum computer.
3. Update θ according to the obtained estimate of the loss function (e.g., using Nelder-Mead).
In addition to using Nelder-Mead, another strategy for optimizing the variational parameters is through the use of imaginary time propagation. Ideally, imaginary time propagation will move a given initial state to the ground state of the Hamiltonian as all excited states will be quickly suppressed. As Ref. [36] shows, instead of propagating the quantum system, one can directly propagate the parameters θ. The detailed algorithm works as follows. Set θ(0) = θ init . For t = 0, δt, 2δt, · · · , T , do the steps: 1. Obtain an estimate for all terms C i (t) and M ij (t) using copies of |ψ(θ(t)) , where M ij (t) = Re ∂ ∂θi ψ(θ(t)) † ∂ ∂θj |ψ(θ(t)) and C i (t) = Re ∂ ∂θi |ψ(θ(t)) † H(s)|ψ(θ(t)) .

Perform variational imaginary time propagation
An approach to improve convergence to the solution is based on adiabatic evolution. Adiabatic evolution gradually changes the Hamiltonian H(s) from the initial Hamiltonian (at s = 0) to the target Hamiltonian (at s = 1). In the VQE setting, an adiabatic-assisted optimization was used in [37,38]. We follow [38] and refer to this approach as the adiabatic-assisted VQE (AAVQE). To implement AAVQE, we discretize s into T adiabatic steps, s 0 = 0, s 1 , . . . , s T −1 , s T = 1. At each adiabatic step t, we use the optimized variational parameter θ * t−1 for H(s t−1 ) as the initial guess for H(s t ). We first initialize θ to be θ * 0 , where |ψ(θ * 0 ) is the ground state for the initial Hamiltonian H(0). Then, for t = 1, · · · , T , we perform Nelder-Mead or imaginary time propagation on the parameter θ to find an optimized variational parameter θ * t for H(s t ) by starting from θ * t−1 .

B. Agnostic Ansatz
We consider a pre-specified Ansatz with several layers, where each layer consists of single-qubit rotation for every qubit and a set of controlled NOT (CNOT) gates for entanging different qubits. The variational parameters are the rotation angles in the single-qubit rotations. This Ansatz does not take explicit account of the linear systems A, b and hence we use the name Agnostic Ansatz. We have performed numerical experiments on the Rigetti quantum virtual machine [39] with system sizes up to N = 16 and explored various patterns of how the CNOT gates are applied. We observed that most CNOT gate patterns are able to find the solution as one increases the number of layers, and hence also increases the number of variational parameters. We have also tested the adiabatic-assisted VQE algorithm [37,38]. The average accuracy (average fidelity of the output vector with the actual solution over randomly generated linear systems) approaches unity as one increases the number of adiabatic steps. In particular, we observed an improvement using adiabatic-assisted VQE over standard VQE. For plots and detailed findings, please refer to Appendix E.

C. Alternating Operator Ansatz
We now discuss a different Ansatz that contains information about the linear system, i.e., the matrix A and the vector b. This Ansatz comes at the cost of requiring Hamiltonian simulation of operators involving A and b. The Ansatz is inspired by the method presented in [33] for solving the linear system via adiabatic techniques. We can write out the Hamiltonian in Definition 3 as Examining the Hamiltonian leads to four Hermitian operators that make up H(s), which are scaled by combinations of s and ±(1 − s). The Hamiltonians are b|, aside from the identity matrix which only shifts the spectrum and induces a global phase in the dynamics.
Based on the four Hamiltonians H 1 , H 2 , H 3 , and H 4 , we can construct an Alternating Operator Ansatz, which is a direct translation of the approach in [33] into the QAOA framework [16]. Let us define the Ansatz as follows. Let p be the number of layers of alternating unitaries. For a set of variational parameters θ k,j , k ∈ [p] and j ∈ [4], we define the parameterized unitaries corresponding to the four Hamiltonians, U j (θ k,j ) := e −iθ k,j Hj . Then our variational Ansatz is This Ansatz contains explicit information of A and b, which avoids potential problems in variational algorithms discussed in Section III D. However, the suitability of this method for the use in near-term quantum computers depends on the difficulty of simulating the unitaries U j (θ k,j ). Given Assumptions 1 and 2, we can express the Hamiltonians as x ) k 1 ⊗ · · · ⊗ (σ (n) x ) kn , |b = |0 n using the variational Ansatz Eq. (3). Left: Here, we plot a one-dimensional cut through the high-dimensional landscape, tracing along a line connecting the initial point (parameter = 0) and the optimal solution (parameter = 1) for varying system size. We can clearly see the appearance of a plateau with the solution being a sharp valley at some point in the high-dimensional surface. Right: We plot the landscape on a surface that contains the initial point (bottom left corner with loss function = 1.0) and the optimal solution (the middle point with loss function = 0.0) for a system with n = 100. The loss function is near flat everywhere except a sharp hole in the middle that contains the solution.
. As these operators are combinations of unitaries and projectors, Hamiltonian simulation for simple cases may be within the realm of near-term hardware. However, at this point, there are no guarantees on performance due to the potentially difficult optimization of the variational parameters θ k,j .

D. Potential problems in variational algorithms for solving linear systems
Typical optimization for linear systems minimizing Ax − |b 2 2 from Definition 1 is convex in x, and hence is easy to solve in principle. This is because the gradient is larger when we are further away from the optimal solution and the negative gradient always points in the descent direction. On the other hand, when we restrict to the variational quantum state space, the optimization landscape is no longer convex and is poorly understood. We consider toy classes of linear systems that show difficulties when using the variational methods. The difficulties arise essentially from the fact that a random linear system in an exponentially large Hilbert space will have solutions that only have exponentially small overlap with the Ansatz |ψ(θ) under most parameters θ. While this may extend to a broader class of linear systems than discussed here, we note that this argument does not preclude the existence of certain classes of linear systems and Ansätze in combination with properly chosen loss functions for which variational optimization can provide quantum advantages. Linear systems with additional structure may be one example and a better characterization of such linear systems is left for future work. In section IV, we provide a method to overcome the issues discussed here.
Consider the following toy problem. Let k ∈ {0, 1} n be an arbitrary n-bit string, and U b be the quantum circuit for generating the state |b . Let the problem be given by When U b is 1, the solution to the equation A|x = |b is simply |k . Note that A is sparse and the condition number of A is 1, hence existing quantum algorithms for linear systems [1,40,41] are able to solve this linear system efficiently. Now assume a variational Ansatz |x(θ) that contains the solution (up to global phase) for every k ∈ {0, 1} n . For example, one possible, but not the only, choice would be with m = n variational parameters θ. We now show that for both loss functions Definition 1 (the regression loss function) and Definition 4 (the Hamiltonian loss function), no matter what the initial θ is, the loss function will be flat at that point with an exponentially small slope with high probability. For A|x(θ) − |b 2 2 from Definition 1, consider an initial point θ 0 , the local expansion is The loss function at the initial point θ 0 is 2 − 2Re b|A|x(θ 0 ) and the gradient in the i-th parameter θ i is −2Re b|A ∂ ∂θi |x(θ 0 ) . We usually consider an Ansatz |x(θ) that changes slightly when θ differs by a small amount, e.g., when θ are the rotation angles of single-qubit rotations. Formally, this means ∂ ∂θi |x(θ 0 ) ≤ G, where G is some constant. Now, we will show that the gradient in each direction is exponentially small. First, because the norm of ∂ ∂θi |x(θ 0 ) is bounded by G, we can see that there are at most 2 n/2 entries in ∂ ∂θi |x(θ 0 ) with squared absolute value ≥ G 2 /2 n/2 . So the probability of | b|A ∂ ∂θi |x(θ 0 ) | 2 ≥ G 2 /2 n/2 is at most 1/2 n/2 . By union bound, with probability at least 1 − m/2 n/2 ≈ 1, we will have Hence the gradient in the i-th variational parameter b|A ∂ ∂θi |x(θ 0 ) will be exponentially small for all i = 1, . . . , m with high probability as long as m 2 n/2 . Hence, around θ 0 , the loss function A|x(θ) − |b 2 2 is flat and has an almost constant value 2 − 2Re b|A|x(θ 0 ) .
The analysis is analogous for x(θ)|(A 2 − A|b b|A)|x(θ) from Definition 4. Pick an initial point θ 0 , the local expansion of the loss function at θ 0 is We have that b|A ∂ ∂θi |x(θ 0 ) is exponentially small for all i with high probability. Thus the loss function is flat with a function value of 1 − x(θ 0 )|A|b b|A|x(θ 0 ) around θ 0 . The flat region is not a local minimum, but a plateau with large loss function.
For the example linear system defined in Eq. (2), this plateau problem holds no matter how the variational circuit for generating |x(θ) is structured. The behavior appears even for shallow circuits, as in Eq. (3). The behavior only becomes evident when the system size is large compared to the number of variational parameters, i.e., m 2 n/2 , but still within reach in the NISQ era [42] with few tens of qubits. A numerical experiment that demonstrates this behavior can be seen in Figure 1. From the figure, we can clearly see the appearance of the plateau as the system size grows larger.
An intuitive view on the failure in this toy example is that a rank-1 projection, such as x(θ)|A|b b|A|x(θ) in the Hamiltonian loss function, would be exponentially small, so it gives little signal on how to find the solution. In [43][44][45], another type of loss function, called the local loss function, have been defined to ameliorate this problem. In the context of linear systems, it is defined as where U b is the circuit to generate the state |b and |0 i 0 i | is the projection onto the i-th qubit. Intuitively, this prevents the projection from being exponentially small because it is now a sum of single qubit projections. When x ) kn , this local loss function could indeed provide sufficient signals to solve the problem. However, a similar plateau problem may appear in cases where A and U b contains many entangling circuits. In Appendix B, we will show that when A and U b are random polynomial-sized quantum circuits, the local loss function plateaus at the value 1/2. The basic idea is that the single-qubit reduced density matrix of an entangled quantum state will be very close to a completely mixed state. Even though single qubit projection |0 i 0 i | will not be exponentially small, it will be sharply concentrated at the value 1/2 with an exponentially small variance. While the local loss function would have a flat landscape when A and |b are based on polynomial-sized quantum circuits sampled randomly, if A and |b have more structure the local loss function can provide benefits for the variation optimization [43][44][45].
In all cases when the loss function landscape is essentially flat, the presence of small errors due to statistical fluctuations in the quantum measurements would make it very hard for existing optimization approaches to find the optimal solution efficiently even if there exists a solution in the Ansatz. For example, if we use variational imaginary time evolution as described in Section III A to optimize the variational parameters, the same analysis shows that C(t) used in the propagation of θ would be an exponentially small vector. This means even if C(t) is measured to exponential-precision (which already requires exponential time due to the statistical error in quantum measurements), the imaginary time propagation would still take exponential time to find the ground state.
One important note is that this plateau effect originates from a different cause compared to the barren plateau problem [32], which appears in Ansatz initialized as a random quantum circuit with enough depth. Hence previous attempts to evade barren plateaus may still face this problem. For example, one may perform adiabatic evolution to avoid barren plateaus, as suggested from Definition 3. Start with |x(θ 0 ) = |−, b , which is the ground state of H(s = 0) = A(0)(1 − |+, b +, b|)A(0), with A(s) = (1 − s)Z ⊗ 1+sX ⊗ A; then gradually change s from 0 to 1 and use variational imaginary time evolution to maintain the quantum state in the ground state. Intuitively, when s is changed slightly, the ground state of H(s) will only shift by a small amount, so performing imaginary time evolution allows to follow closely the adiabatic path. This intuition is true in the x ) kn , |b = |0 n using the variational Ansatz Eq. (3). Here, we plot a one-dimensional slice through the high-dimensional landscape on a line passing through the initial point (parameter = 0) to the optimal solution (parameter = 1). Left: A system size of 5 qubits. Right: A system size of 100 qubits. For small system size, after s > 0.5, the variational parameter will be able to move toward the solution. For large system size, the adiabatic evolution will continue to be stuck at the initial point and end up at a plateau when s = 1.0.
original exponential-sized Hilbert space, but does not hold in the polynomial-sized variational parameter space. The reason is that the topology in the polynomial-sized variational parameter space is very different from the original exponential-sized Hilbert space. For example, let a, b, c ∈ {0, 1} n , and 1, then |a + |b is close to |a + |c in the original Hilbert space, but they may be very far away in the polynomial-sized variational parameter space.
Let us consider the same toy example as in Eq. (2) and the loss function analogous to Definition 4 is . At s = 0, the optimum is |−, 0 n . At s = 1, the optimum is |+, k . The adiabatic evolution starts at s = 0 with |x(θ 0 ) = |−, 0 n . In the original Hilbert space, this loss function is quadratic in |x and we can always move in a direction that decreases the loss function. However, in the polynomial-sized variational parameter space (m 2 n/2 ), the landscape of g(θ) and h(θ) will both be flat around θ 0 due to the inner product +, b|(1 ⊗ A)|x(θ) and the previous analysis. This means the loss function looks like c(s) − (1 − s) 2 f (θ) locally around θ 0 and is always minimized at |−, 0 n = |x(θ 0 ) for all s ∈ [0, 1]. So even if s has been changed to ∆ > 0, the variational parameter will still stay at θ 0 . The adiabatic evolution in the variational parameter space would always stay at |x(θ 0 ) = |−, 0 n and fail to find the solution. An illustration of this analysis can be found in Figure 2.
The conclusion is that for variational algorithms with a pre-specified Ansatz, it could be challenging to solve many linear systems due to the flat landscape. This is not to say that variational quantum algorithms are hopeless in achieving quantum advantages for linear systems. If we have a way of finding an Ansatz with an initial parameter that is close to the solution by making use of structure in A, |b , then this problem could be circumvented. For example, an Ansatz that contains explicit application of A, such as e −iAt , in its variational circuit does not fall into this problem. The Alternating Operator Ansatz proposed in Section III C is such a choice that may be able to avoid this problem, but require further study of convergence guarantees and the optimization of variational parameters may also be hard. In the next section, we aim to propose an alternative approach that could circumvent the problem discussed here and offer some provable guarantees.

A. Main idea
Most hybrid quantum-classical algorithms parameterize the quantum state with classical parameters, and have the quantum state created on the quantum processor. To extend the reach of near-term quantum devices, we consider an approach that broadens the set of manipulable states. Let the N = 2 n -dimensional Hilbert space be H. Let U denote an n-qubit circuit with k real parameters, i.e., it may denote a circuit with alternating parameterized single-qubit rotations (k in total) and all-to-all fixed CNOTs. Also let H denote the corresponding quantum states. A typical variational algorithm is based on a such a parameterized Ansatz V U and tries to find the best |ψ U (θ) in V U by tuning θ. However, there are two drawbacks when using typical variational algorithms.
• V U may not be large enough to contain the solution. Such a drawback may often be the case in hardwareefficient Ansätze and even the Alternating Operator Ansatz discussed above.
• Even when V U contains the solution, the variational parameter θ could be difficult to optimize. This problem can already be seen in the toy examples presented in previous sections.
Here, we improve on both drawbacks with a method called classical combination of variational quantum states (CQS). In classical combination of variational quantum states, we consider a hybrid quantum-classical state. Let U i for i = 1, . . . , m be quantum circuits with k i parameters each. We construct a state vector x ∈ H as a quantum-classical hybrid, where α i are the combination parameters and θ i are the usual variational parameters. Both parameters are stored on the classical processor. However, the state vector x ∈ H is never created on the quantum processor. Furthermore x may not be normalized, so it is not a quantum state in general.
To manipulate x with near-term quantum algorithms, the most important component is the ability to measure its expectation value for an observable O. We can obtain the expectation value x † Ox by performing quantum measurements and classical post-processing, via the following steps.
1. Estimate ψ Ui (θ i )|O|ψ Uj (θ j ) using a modified Hadamard test (see Proposition 9 in Appendix A) on the quantum processor. The modified Hadamard test comes at the cost of preparing |ψ Ui (θ i ) and |ψ Uj (θ j ) in superposition using one additional ancilla.

Compute
on the classical processor. For comparison, suppose for the moment that x is a (normalized) quantum state. To create x on the quantum processor, we need at least O(log(m)) ancilla qubits and m controlled unitaries that prepare all |ψ Ui (θ i ) , ∀i in superposition. Hence the improvement in terms of quantum resources using the hybrid quantum-classical state x instead of the corresponding quantum state is gate count: m times → 2 times, ancilla count: O(log(m)) → 1.
The CQS method comes at the cost of many repetitions in quantum measurements. However, we do not need to maintain quantum coherence between measurements, hence it would be especially beneficial on noisy intermediate-scale quantum devices as the gate count is reduced m/2-fold. For example, when we consider a classical combination of 300 variational quantum states, then we can reduce the gate count by 150 times. On a near term quantum device, the gate count is often limited due to the error present in the device. Hence the space of possible variational quantum states V U that can be prepared without error on the quantum processor will be limited by the gate count. The classical combination of quantum states thus provides a considerable improvement upon the space of manipulable states on near-term quantum processors. An illustration of this idea is shown in Figure 3.
We now present the meta strategy for finding an x ∈ H that solves the linear system of equations. The approach consists of an optimization and an expansion step. The approach may avoid the optimization of θ i which can involve a complicated optimization landscape. We start with m = 1 and a quantum state |ψ U1 (θ 1 ) . Each iteration proceeds as follows.
A few comments are in order. The optimization Step 1 is convex and is described in the next Section IV B. The expansion Step 2 assumes that we can find a new circuit U m+1 . This circuit may or may not be parameterized by a parameter vector θ m+1 ∈ R km+1 , the meta strategy can operate in both cases. In fact, the remainder of this section including the Ansatz tree approach in Section IV C does not explicitly discuss these parameters, but all states can also be thought of as parameterized. In the case that the circuits are indeed parameterized, setting the parameters to useful values may involve optimization, which again may run into the plateau issues discussed above. However, the strategy can also be used without optimizing θ m+1 as long as the new state |ψ Um+1 (θ m+1 ) is sufficiently different from the previous states. Also note that the number of parameters k m+1 may not have any particular relationship with the number of parameters of the previous steps k 1 , . . . , k m .  . This case is illustrated by the middle picture on the right-hand side. Using a hybrid quantum-classical emulation, we can operate in this larger space using only a single additional ancilla qubit and twice as many gates, as illustrated in the top picture on the right-hand side.

B. Optimization of combination parameters
We first focus on the case when we have selected a good set of |ψ U1 , . . . , |ψ Um , e.g., A −1 |b ∈ span{|ψ U1 , . . . , |ψ Um }, and we want to optimize over α 1 , . . . , α m . We will show that the optimization of α 1 , . . . , α m will always find the optimal solution. This optimization is in stark contrast to the optimization over θ i , which can result in plateaus and local minima. To simplify notation, we let |u i = |ψ Ui further on. In order to solve linear systems of equations, the standard regression loss function is as in Definition 1. Given x = m i=1 α i |u i , we can reduce the optimization in an exponentially large space x ∈ H to an optimization over m variables. Let V = (v 1 , · · · , v m ) with the column vectors v i = A|u i . We can now simply express the left-hand side of the linear system as Ax = m i=1 α j A|u i = V α. Thus, we would like to minimize where we introduced q i = i|V † |b = u i |A † |b . We obtain a simple regression problem for the combination parameters α with the kernel matrix (V † V ) ij = u i |A † A|u j . We can cast this quadratic optimization problem with complex variable α ∈ C m to a real optimization problem min z z T Qz − 2r T z + 1, by letting z = [Re {α} , Im {α}] ∈ R 2m and let Once all the input quantities Q and r are determined, such a regression problem can be solved with standard methods for convex quadratic programming. The inputs Q and r can be measured on a quantum computer using the strategies in Appendix A and C. However, such measurements result in erroneous estimates of the quantities. The error will translate into an error in the loss function and the proposed solution for the combination parameters α. Using standard results in random matrix theory, we are able to achieve a rigorous bound on the error of the obtained solution, see Proposition 1. See also Appendix C for a detailed analysis and Proposition 10 for the complete statement.
Proposition 1 (informal). We can find anα ∈ C m such that it is -close to optimal, On the left-hand side, we select a node that is a child of the nodes in the green region and that has the largest overlap with the gradient, here U2|b . The right-hand side shows the tree after adding the new element to the subspace.

Growing the Ansatz Tree
A natural question that arises is whether the problem of solving linear system will become much easier when we consider optimization over a small subspace span(|u 1 , |u 2 , . . . , |u m ). We can show that finding a near-optimal combination parameters in a subspace is BQP-complete.

Proposition 2 (informal).
Finding the combination parameters of |u 1 , |u 2 , . . . , |u m to minimize See Proposition 11 in Appendix C for the complete statement and proof.

C. Ansatz tree approach for finding the subspace
We have shown good theoretical properties in the case where the subspace is fixed, such as a guarantee for finding a near-optimal solution in the subspace and BQP-completeness. However, the results so far rely on already knowing a subspace that approximately contains the solution x. Here, we propose an approach that constructs the subspace by exploring the space of solutions on a tree structure we call the Ansatz tree. We use the structure of A in Assumption 2 to construct such a tree based on the unitaries that make up A. The core idea is to associate the nodes of this tree with the subspace states |u i from the previous section. We show that a near-optimal solution is guaranteed to be found after enough nodes are included to the Ansatz tree. While the size of the tree may be very large in the worst case, the tree allows the systematic use of heuristic approaches to explore, prune, and expand it.
The construction of the full Ansatz tree is given in Definition 5. We start with the quantum state |b and recursively construct the child nodes generated by the matrix A. An illustration is shown in Figure 4. The root of the tree is |b . Each node |ψ on the tree has K child nodes: U 1 |ψ , . . . , U K |ψ .
Most straightforwardly one can take all the nodes of the Ansatz tree up to some depth as the solution subspace. We now derive guarantees for this approach which also show that the number of required nodes may be very large. We then discuss a heuristic approach to prune the tree in the less important directions and expand the tree in the more important directions, a method we call gradient expansion heuristics. This heuristic can reduce the number of nodes included in the subspace for a good solution.
Taking the full Ansatz tree and the regression loss from Definition 1, we are guaranteed to find a near-optimal solution after enough depth, see the next proposition.
The result is an extension and variation of known results on using polynomial approximation of 1/x to solve linear systems of equations [41]. See Appendix D for a detailed proof. The required depth can be large in the worst case, especially when κ is large. Moreover, this guarantee requires a large number of nodes on the Ansatz tree for a good approximation to the solution. In particular, the number of nodes scales exponentially in the condition number, i.e., m = K O(κ log (κ/ )) A . It is hence important to find ways to reduce the number of nodes.
One possible way of reducing the number of nodes is by solving a regularized linear system of equations, using the loss function from Definition 2. Here, a polynomial number of nodes is enough to guarantee the performance of the solution even in the worst case, see the next proposition.
The depth only depends on how good the approximation is, which is characterized by . For example, when = 0.02, we only need depth at most 4 and number of nodes m ≤ K A 4 .
We sketch an argument for the log(1/ ) dependency here based on standard convex optimization results and refer to Appendix D for a careful analysis. First, using A is Hermitian, the gradient of the loss function is , and the Hessian is given by ∇ 2 L T (x) = 1 + 2A 2 . Hence we have the bounds 1 ∇ 2 L T (x) 31 for all x. Performing one gradient descent steps at the t-th iteration yields the following relation 2A|b ). The step size η can be determined by exact line search [46]. Suppose we start from x (0) = |b , then after T iterations, the solution can be expressed as a polynomial p(z) of degree T 2 of the matrix A applied to |b , i.e., x (T ) = p(A)|b , with well-defined polynomial coefficients. This polynomial directly expresses how to linearly combine the nodes in the Ansatz tree up to depth T 2 . From standard convex analysis [47], for strongly convex functions, if Hence, we have derived a required depth of O log 2 (1/ ) . Using Newton's method this depth can be improved to O (log(1/ )). As we have to include number of nodes exponential in the depth, the prefactor in the log 1/ dependency is crucial for near-term quantum computing applications. Proposition 4 shows a favorable prefactor using a more intricate proof (see Appendix D). Another line of idea for reducing the number of nodes is to investigate methods which judiciously include nodes on the tree and prune branches which are not essential to the problem. We introduce a heuristic procedure for exploring the tree which we call the gradient expansion heuristics. The key idea is to use the gradient information of the current state to expand the state space. At the start, let the subspace S contain only the root of the Ansatz tree, i.e., S = {|b }. At each step, we perform the following steps.
1. Solve for the optimal x S = |ψi ∈S α * i |ψ i by optimizing over the combination parameters α 1 , . . . , α m as discussed in Section IV B.
2. For each quantum state |ψ in the set of child nodes of the set S on the Ansatz tree, denoted as C(S), compute the gradient overlap ψ|∇L R (x S ) = 2 |ψi ∈S α * i ψ|A 2 |ψ i − 2 ψ|A|b . This can be done by estimating ψ|ψ i , ψ|A|ψ i , ψ|A 2 |ψ i for all |ψ i ∈ S. The overlaps can be computed efficiently using the Hadamard test via Proposition 7 and 9.
3. Add a new node to the subspace S, such that the node has the largest overlap with the gradient. More formally, select |ψ * = arg max |ψ ∈C(S) | ψ|∇L R (x S )| and grow the set S ← S ∪ {|ψ * }.
An illustration of this method can be found in Figure 4. This gradient expansion procedure can be justified by the following proposition (See Appendix D for proof).

Proposition 5.
If |ψ * has gradient overlap g = | ψ * |∇L R (x S )| > 0, then after expanding the subspace S ← S ∪ {|ψ * } and optimizing the combination parameters, it is guaranteed that As a result of this proposition, if we find a new quantum state |ψ * with a nonzero gradient overlap, we are guaranteed that the next state vector x S∪{|ψ * } will be better than the current vector x S . Furthermore, a larger gradient overlap guarantees a larger decrease in the loss function. Hence it is best to find a state vector that has the largest gradient overlap.
We now pinpoint limitations of the proposed Ansatz tree CQS approach. The potential problems for variational algorithms discussed in Section III D do not apply to this Ansatz tree CQS approach and the problematic linear systems could actually be solved easily. For the Tikhonov loss function, the CQS approach is also guaranteed to find a solution with near-optimal loss function in polynomial time. However, for the regression loss function, this approach is guaranteed to efficiently find the solution only when the condition number of A is Left: Comparison of breadth-first search and the gradient expansion heuristics for adding nodes on the Ansatz tree to the subspace. We consider solving linear systems with a system size of 256 × 256 where A is generated by sampling random weighted sum of Haar-random unitaries. Matrices A generated this way have a large condition number (as large as the system size), so a large number of Ansatz states are needed find the solution. Each line represents an independent run. Right: Solving linear systems over a wide range of system sizes (from 2 10 × 2 10 to 2 300 × 2 300 ). For efficient classical simulation, the linear systems are generated as random weighted sums of Pauli strings, i.e., tensor product of Pauli operators. The shaded areas represent the standard deviation over five independent runs.
bounded by a constant. Note that traditional quantum algorithms for linear system [33,41] are efficient for condition numbers that are polynomial in the number of qubits. When the condition number of A is too large, this Ansatz tree CQS approach cannot guarantee to find the optimal solution in polynomial time. The gradient expansion heuristics can ameliorate this shortcoming in practice, but, in the worst case, it may still require an exponential amount of time when κ is too large.

D. Numerical experiments
We now present numerical experiments for the CQS-based algorithm. The experiments are shown in Figure 5. In Figure 5 (left), we compare the use of gradient expansion heuristics with the use of a breadth-first search that simply includes every node on the Ansatz tree layer-by-layer. We consider randomly generated linear systems of size 256 × 256. We generate a random linear system by selecting several unitary matrices U 1 , . . . , U S from the Haar measure, random scalars α 1 , . . . , α S from uniform distribution [−2, 2] and let A = . This construction guarantees that A is Hermitian and is a weighted sum of unitary matrices. The condition number generated this way is very large (in the order of the system size). In particular, we consider S = 10 (hence A is a sum of 20 unitaries). A clear improvement can be seen when using the gradient expansion heuristics. This gradient heuristics converges quickly to the optimal point. On the other hand, a breath-first search, which layer by layer includes every node on the Ansatz tree, results in a very slow convergence after an initial rapid convergence for 20 rounds (this includes the first layer of the Ansatz tree).
In Figure 5 (right), we consider a special class of (sparse) linear systems that are extremely large. In particular, we consider system sizes ranging from 2 10 × 2 10 to 2 300 × 2 300 to investigate whether our new approach suffers from the plateau issue discussed in Section III D. To facilitate classical simulation, we consider A ∈ C 2 n ×2 n with efficient Pauli decomposition, i.e., A = j is a single-qubit Pauli operator (including the identity). In addition, we set |b = |0 n . To generate a random matrix A, we sample each α i from the uniform distribution over [−2, 2], and a random tensor product of Pauli operators from the uniform distribution over 4 n possible choices. Here, we consider S = 8. Note that when S = 1 and we only sample from I or X in the Pauli string, then we recover the toy problem that leads to the plateau issues discussed in Section III D. From Figure 5, we can see that this approach circumvents the plateau issue and has a clear convergence over an remarkably wide range of system sizes.

E. Hamiltonian CQS approach and connection to previous results
The Ansatz tree approach so far is based on the direct use of the decomposition of the matrix A, see Assumption 2. As an alternative approach, one can generate a subspace by performing Hamiltonian simulation.
Because of the use of Hamiltonian simulation, this approach is less near-term compared to using the Ansatz tree. The benefit of this approach is that it greatly reduces the size of the Ansatz subspace compared to Proposition 3.
Ref. [41] shows that the function 1/x can be well approximated by a Fourier series with near-optimal number of terms. This Fourier series can be viewed as a series for approximating 1/A and obtains a linear combination of unitaries of the form e −iAtj with pre-specified times t j . Using a truncated Taylor method for quantum simulation of this linear combination [25] obtains a quantum state proportional to the exact solution A −1 |b . Based on Ref. [41] (Lemma 11), we can choose the set of Ansatz states as follows where J = Θ κ 2 log 2 (κ/ )/ and κ is an upper-bound on the condition number of A. This set is of size 2J + 1 = Θ κ 2 log 2 (κ/ )/ and the number of gates to generate each Ansatz is about t J = O (κ log(κ/ )) (assuming each application of A takes constant number of gates).
Using the CQS strategy, we are guaranteed to find an -close solution for Ax = |b because of the Fourier approximation results of [41]. As mentioned, the single run circuit depth is O(κ log(κ/ )) and uses only one additional ancilla for performing the Hadamard test. Because we never create the solution x on a quantum computer, but emulate x quantum-classically, we avoid the need of many ancilla qubits in existing quantum algorithms based on amplitude amplification and function approximations [25,41]. We now compare this approach to the random-sampling approach taken in [33].
The approach in [33] eliminates the use of amplitude amplification and function approximation. In that work, the operators e −iH(sj )tj are used with the Hamiltonian from Definition 3. The s j are fixed via a natural parameterization and the t j are randomly sampled from an interval of size about [0, O κ 2 ] for j = 1, . . . , Θ log 2 (κ)/ . The method is able to again achieve an -close approximation. The circuit depth is about O (κ log(κ)/ ), not counting details of the Hamiltonian simulation for H(s). In comparison, the Ansatz defined by the set in Eq. (4) shows the single-run circuit depth of O (κ log(κ/ )). As an example, when = 0.01, we would achieve a roughly 100-times reduction in the circuit depth. This reduction in quantum resources comes at the cost of more classical repetitions. Hence, this Hamiltonian CQS approach fits into the general near-term strategy of trading the more expensive circuit depth for the cheaper number of runs of the experiments.

V. DISCUSSION
The work provides algorithms for solving linear systems on near-term quantum computers. The flavor of the presented algorithms is two-fold. The first set of algorithms are variational in nature and draw their inspiration from other variational quantum algorithms for quantum chemistry [13,37,48,49] and quantum optimization [16,50,51]. For such algorithms, the quantum computer implements a single wavefunction Ansatz which is dependent on a set of variational parameters, usually in a non-linear fashion. The type of Ansatz in this setting can, in the extreme cases, be linear system-independent (agnostic) or fully linear system-dependent. As the agnostic case is useful for example when limitations of the hardware are dominating the overall implementation, such Ansätze have also been called "hardware efficient" [15]. On the other hand, the dependent Ansatz takes into account the linear system at the cost of requiring Hamiltonian simulation which increases the overall complexity of implementing such an Ansatz in a near-term quantum processor. In this work, we exhibit types of linear systems for which variational approaches with a polynomial number of variational parameters show plateau issues. These plateau issues may however not arise for other types of linear systems and Ansätze with more structure and when using different loss functions. An important future work is to better characterize those linear systems where variational methods offer near-term quantum advantages.
The second set of approaches are based on classical combination of variational quantum states (CQS). The method is inspired from the basic concept of diversification and robustness. Using a single class of methods can provide only limited benefits when compared to combining multiple different methods and using the best parts of each. This method introduces a new set of combination parameters to add together different variational quantum states. The combination is emulated classically rather than represented directly on the quantum computer. Hence, the method increases the overall expressiveness and power of the Ansatz without the need of additional quantum resources. We can use the variational states such as the ones presented in the first part, and also others yet to be developed. Our CQS approach is also reminiscent of techniques used for example in quantum chemistry, where the linear combination of atomic orbitals (LCAO) approach allows to optimally construct molecular orbitals from atomic orbitals.
To avoid the complexity of optimizing the variational parameters, which can involve an ill-shaped optimization landscape with plateaus and local minima, we have proposed an approach that alternates between solving for the optimal solution in a subspace and growing the subspace on an Ansatz tree. This approach is inspired by the Krylov subspace method in solving linear systems. Krylov subspace is a subspace spanned by {b, Ab, A 2 b, . . . , A r−1 b}, which is similar to the Ansatz tree we defined. The Krylov subspace method solves for the optimal solution in the subspace and increases r if the obtained solution is not good enough. A popular variant of Krylov subspace method for linear systems is the conjugate gradient method [52]. The Ansatz tree is also reminiscent of the coupled-cluster Ansatz in quantum chemistry [53][54][55][56], which systematically takes into account higher and higher orders of the electron correlation at the cost of increasing the complexity of preparing the Ansatz.
We have performed numerical experiments solving exponentially large linear systems with sizes up to 2 300 × 2 300 . These experiments are achieved by considering a special class of linear systems that allows efficient simulation of the proposed quantum algorithm on a classical computer. To achieve actual quantum advantage, we require either A to be a sum of unitaries that cannot be simulated efficiently on a classical computer or b to be a quantum state generated by some quantum circuit. It should be noted that there will always be a trade-off between how near-term the quantum algorithm is (the required quantum coherence, entanglement, and interference) and how much quantum advantage we can expect from executing the quantum algorithm. An important future direction would be a detailed analysis on the performance of the proposed algorithms under the effect of decoherence and imperfections of real-world quantum devices. We believe the synthesis and future improvement of the proposed ideas can provide real benefits for solving linear systems when quantum computers achieve sizes of 50-70 high-quality qubits. Lemma 1. Let > 0 and P k be a certain Pauli string over n qubits. Let multiple copies of an arbitrary n-qubit quantum state |ψ be given. The expectation value ψ|P k |ψ can be determined to additive accuracy with failure probability at most δ using O 1 2 log( 1 δ ) copies of |ψ . Proof. A single measurement obtains the outcome m ± = ±1. We have ψ|P k |ψ = pm + + (1 − p)m − = 2p − 1, where p is the probability of measuring +1. To estimate this probability, perform independent trials of the Bernoulli test. Each trial has expectation value p. We use the statistic M + /M , where M + is the number of positive outcomes over M trials. For the error estimate, we require P [|2M + /M − 1 − ψ|P k |ψ | ≥ ] ≤ δ from which the number of measurements is O 1 2 log( 1 δ ) via Hoeffding's inequality. Proposition 6 (Swap test). Given multiple copies of n-qubit quantum states |u and |v . There is a quantum algorithm that determines the overlap | v|u | 2 to additive accuracy with failure probability at most δ using O 1 2 log( 1 δ ) copies andÕ 1 2 log( 1 δ ) operations. Proof. Use an ancilla and perform a controlled swap 1 √ 2 (|0 + |1 ) |u |v → 1 √ 2 (|0 |u |v + |1 |v |u ). Performing a Hadamard on the ancilla obtains 1 2 (|0 (|u |v + |v |u ) + |1 (|u |v − |v |u )) =: |ξ . Now measure the ancilla in Z.
The expectation value is ξ|Z|ξ = 1 4 ( u| v| + v| u|) (|u |v + |v |u ) − We can also measure the real and imaginary part separately under a different input model.
Proof. Use an ancilla prepared in (|0 + α|1 )/ √ 2 with α = 1 or α = i. Apply U prep to obtain 1 √ 2 (|0 |v 0 + α|1 |v 1 ). Another Hadamard on the ancilla obtains 1 2 (|0 (|v 0 + α|v 1 ) + |1 (|v 0 − α|v 1 )) =: |ξ . Now measure the ancilla in Z. The expectation value is ξ|Z|ξ = 1 2 (α v 0 |v 1 + α * v 1 |v 0 ). If α = 1, then Proposition 8 (Measuring the Hamiltonian Loss Function). Given Assumptions 2 on the unitary decomposition of A and multiple copies of the quantum state |x . The loss function x|A 2 |x − x|A|b b|A|x can be estimated efficiently on a quantum computer. More precisely, an estimate up to additive error can be obtained with Proof. From Definition 2, define the size parameter η = k,l |β k β l |. The first term of the loss function is A 2 = k l β k β l U k U l . Measure each term x|U k U l |x individually using the Hadamard test in Proposition 7 with |v 0 = |x and |v 1 = U k U l |x . For each term, perform O |β k β l |η 2 log K A δ quantum measurements to produce an estimate x|U k U l |x of x|U k U l |x up to additive accuracy / η|β k β l | with success probability . Thus the additive accuracy for the estimate x|A 2 |x of x|A 2 |x is x|A 2 |x − x|A 2 |x ≤ , which is obtained by adding up the variances of each independent estimate scaled by the coefficients |β k β l | 2 and taking the square root. The total success probability is 1 − δ For the second term, with A = k β k U k estimate b|A|x = k β k b|U k |x . Use the Hadamard test in Proposition 7 with |v 0 = |b and |v 1 = U k |x to estimate each term b|U k |x . For each term, perform  Proof. Use an ancilla prepared in (|0 + α|1 ) / √ 2 with α = 1 or α = i.
Another Hadamard on the ancilla gives |ξ = 1 2 (I ⊗ U ) (|0 (|v 0 + α|v 1 ) + |1 (|v 0 − α|v 1 )). Now measure the ancilla in Z to obtain z a ∈ {±1} and the rest of the state in the computational basis to obtain b ∈ {0, 1} n . Then compute z a f (b). The expectation value Consider the case where A ∈ C 2 n ×2 n is Hermitian and unitary (equivalently, all eigenvalue of A are 1 or −1) and |b = U b |0 n . And let |x(θ) be some pre-specified Ansatz with variational parameter θ. We now analyze the local loss function When U b is a random quantum circuit consisting of O n 2 1D nearest neighbor two-qubit gates, [57] shows that for any quantum state |x(θ) , the two-design property of the random unitary U † b gives where S is the swap operator and · 1 is the trace norm.
where Z i is the Pauli Z operator on the i-th qubit. Hence by Markov's inequality, For any linear operator P, Q and Q , we have Combining Inequality (B1) and (B3), we have Now putting this result back to the Markov's inequality in (B2), we have Using union bound and the definition of the local loss function, we have with probability at least 1 − n/2 n/2 . Hence, for any θ 1 , . . . , θ M (there can be an exponential number of them, i.e., M ≤ 2 n/2 50n ), with probability at least 0.98. This means that even if we randomly select a very large number of variational parameters, the local loss function will still be exponentially close to the value 1/2 for all of them. Now, consider a local expansion of L L (θ) around θ 0 , We analyze the partial derivative of the loss function in θ k . We consider the random variable , and define u = |x(θ 0 ) and v = ∂ ∂θ k |x(θ 0 ) . We assume that v = ∂ ∂θ k |x(θ 0 ) ≤ C, for some constant C, which is true when the variational parameters are single-qubit rotation angles. The random variable g k could be written as We first note that the normalization condition of quantum states x(θ)|x(θ) = 1 implies that ∂ ∂θ k x(θ)||x(θ) = u † v + v † u = 0. Using the results in [57], the two-design property of the random unitary U b gives Now we bound the second moment of the random variable g k , The first line uses Equation (B4), and the second line uses Inequality (B3) and (B5). The third line uses the fact that u † v + v † u = 0, while the last line uses the fact that v ≤ C. Again by Markov's inequality, we have Thus given m 2 n/2 , union bound on the above result gives with high probability (more precisely, with probability ≥ 1 − 2m/2 n/2 ). This means that the gradient would be exponentially small and the optimization landscape is locally flat around θ 0 . From the analysis, we can see that the local loss function L L (θ) will plateau at the value 1/2 with an exponentially small deviation at each point and an exponentially small gradient.
if a total number of measurements of O B 2 K 2 A m 3 Q Q −1 2 (1 + z * ) 2 / is used. A m 2 T quantum measurements, we obtain an independent estimate for all the entries in Q, r. Standard results in random matrix theory [58] give Q − Q ≤ O m/T and r − r ≤ O m/T with high probability.
Let z * = Q −1 r be the solution to the original problem. The solutionẑ =Q −1r minimizesL(z) = z TQ z−2r T z.
The last inequality requires setting T > Cm Q Q −1 2 (1 + z * ) 2 / , with a constant C, such that 1 − The total number of quantum measurements is The following is a detailed statement on the BQP-completeness for optimizing combination coefficients.
Proposition 11. Given a Hermitian matrix A that have an efficient unitary decomposition and each entry can be efficiently computed classically, a quantum circuit that generates |b , and two quantum circuits for |u 1 , |u 2 . It is BQP-complete to outputα 1 ,α 2 ∈ C, such that Proof. Consider a quantum circuit consisting of T single-and two-qubit gates U 1 , . . . , U T acting on n qubits. Determining the probability of measuring 0 in the first qubit after applying U 1 , . . . , U T on |0 n , P 0 ≡ 0 n |U † 1 . . . U † T (|0 0| ⊗ 1 ⊗ . . . ⊗ 1)U T . . . U 1 |0 n , up to small error is BQP-complete. The same is true for the probability of measuring 1, P 1 = 0 n |U † 1 . . . U † T (|1 1| ⊗ 1 ⊗ . . . ⊗ 1)U T . . . U 1 |0 n . We now construct a linear system of dimension 2 n+1 , involving 1 + n qubits where in the notation the first qubit is now the added one. Let the matrix A be a simple controlled-NOT gate, controlled by the second qubit and acting on the first qubit. Note that A is both Hermitian and unitary, and each entry can be efficiently computed classically. The three quantum states are given by We can easily see that b|A|u 1 = P 0 . Similarly, b|A|u 2 = P 1 . Suppose there is an algorithm that can efficiently findα 1 ,α 2 ∈ C that satisfies Equation (C1). We now show thatα 1 ,α 2 can be used to infer P 0 , P 1 . By expansion, we have = |α 1 | 2 + |α 2 | 2 − 2Re {α 1 b|A|u 1 + α 2 b|A|u 2 } + 1.
This means the algorithm can useα 1 ,α 2 to determine P 0 and P 1 , which is BQP-complete.
We provide proof for the following propositions. This is a simple extension and variation of known results on using polynomial approximation of 1/x to solve linear systems of equations [41].
Proof. By including all nodes {|u 1 , . . . , |u m } on the Ansatz tree with depth at most O(κ log(κ/ )), the subspace contains p(A)b for any polynomial p(·) with degree at most O(κ log(κ/ )). In Lemma 14 in [41], it was shown that there exists a set of constants p j , ∀j = 0, . . . , j 0 such that p(z) = The last equality uses the fact that x = A −1 b satisfy Ax − b 2 = 0. Note that we actually achieve 2 error, which is better than since < 1. Proof. We first diagonalize A to be V DV † , where D is diagonal and V is a unitary matrix. We also set N = 2 n to be the system size. The eigenvalues of A are denoted as λ i , ∀i = 1, . . . , N . We setb = V † |b and note that N i=1 |b i | 2 = 1, as |b is normalized. We consider a rotated x,x = V † x ∈ C N . Usingx, the loss function We can minimize this expression analytically as x i = 2λ ibi 2λ 2 i + 1 ∈ C, ∀i = 1, . . . , N.
Plugging this optimal solution into the loss function yields where y i = 2λ i /(2λ 2 i + 1) ∈ R. We now consider the space of all linear combinations of A k |b , ∀k = 0, . . . , K 0 , which is a subspace of span(u 1 , . . . , u m ). This space is written as K0 k=0 p k A k |b . In this subspace, the loss function can be written as We now analyze how accurate a polynomial k p k x k can approximate 2x/(2x 2 + 1) within  Proof. Consider the loss function on x S + α|ψ * , where α ∈ C. Because x S is a linear combination of |ψ i ∈ S and L R (x) = Ax − b 2 2 , we know that the optimal combination of S ∪ {|ψ * } satisfies min α1,...,αm+1∈C for any α ∈ C. By expanding L R (x S + α|ψ * ), we have (α denotes the complex conjugated α) L R (x S + α|ψ * ) = L R (x S ) + |α| 2 ψ * |A † A|ψ * + Re α ψ * |∇L R (x S ) .
By selecting α = − ψ * |∇L R (x S )/2 ψ * |A † A|ψ * , we have L R (x S + α|ψ * ) = L R (x S ) − | ψ * |∇L R (x S )| 2 4 ψ * |A † A|ψ * ≤ min The last inequality uses the assumption that the spectral radius of A is no greater than 1. The mean accuracy versus circuit layer depth for various Ansätze employed to solve linear systems for N = 16 is shown. The mean accuracy goes to unity for line graph, ring graph and complete graph Ansätze with a 20-layer circuit. The star-graph Ansatz performs performs worst and eventually levels at accuracy 0.6. Right: We implement the adiabatic-assisted VQE (AAVQE) approach for N = 8 and see improvement over standard VQE (adiabatic steps = 1) over an average of all Ansätze and number of layers. Employing more adiabatic steps also improves the mean accuracy. For example, the mean accuracy for the five and nine layered star Ansätze improves to 1.0 as we increase the number of adiabatic steps from 1 to 6. The dots represent the mean accuracy and the bars represent the spread corresponding to a standard deviation with the upper and lower cutoffs as 1 and 0 respectively.