Entanglement Diagnostics for Efficient Quantum Computation

We consider information spreading measures in randomly initialized variational quantum circuits and introduce entanglement diagnostics for efficient variational quantum/classical computations. We establish a robust connection between entanglement measures and optimization accuracy by solving two eigensolver problems for Ising Hamiltonians with nearest-neighbor and long-range spin interactions. As the circuit depth affects the average entanglement of random circuit states, the entanglement diagnostics can identify a high-performing depth range for optimization tasks encoded in local Hamiltonians. We argue, based on an eigensolver problem for the Sachdev-Ye-Kitaev model, that entanglement alone is insufficient as a diagnostic to the approximation of volume-law entangled target states and that a large number of circuit parameters is needed for such an optimization task.


I. INTRODUCTION
Noisy Intermediate-Scale Quantum (NISQ) technology is being developed rapidly and poses a great challenge to come up with efficient quantum algorithms [1], which will operate on the NISQ computers and perform better than classical algorithms.Many real-world use cases are associated with machine learning and optimization, for which variational quantum circuits offer an appropriate framework.The typical optimization tasks can be formulated as a search for the ground state of a Hamiltonian H, which may encode an exact combinatorial problem [2,3].
The variational quantum algorithms (VQA) consist of two elements [4].The first part is quantum, where one constructs a parameterized quantum circuit composed of L unitary layers on the product state of n qubits, |0 ⊗n .The layer unitaries and quantum gates therein depend on continuous parameters, each initialized with the uniform measure on [0, 2π).Denoting all the circuit parameters collectively by θ, the variational state is written as The second part of the variational quantum algorithm is classical, where we estimate the Hamiltonian expectation value with the variational circuit state, i.e., and minimize it in the nL-dimensional parameter space using the gradient descent method.Entanglement encodes information in the qubit correlations, which are generated by the successive application of the circuit layers.Given two complementary systems A/B, the Renyi-k entropy of the reduced density matrix measures their entanglement, so does the von Neumann entropy that corresponds to (3) in the special limit k → 1: J K T C R m t h R / v Y J N 0 q 7 X / K t a / e G 6 0 q j m 9 R T R O b p A V e S j G 9 R A 9 6 i J W o i i F L 2 g V / T m P D v v z o f z u V w t O P n N G V q B 8 / U L b x W Z M w = = < / l a t e x i t > |0i < l a t e x i t s h a 1 _ b a s e 6 4 = " 9 r t M C d F m l j j 1 H x D j 4 U S z f X H e n Y 9 l a 8 7 J Z k 7 h D 5 z P H 8 W t j M g = < / l a t e x i t > = < l a t e x i t s h a1 _ b a s e 6 4 = " Y U d F x M I H D h 0 v u K 1 e v N n M q X H a 6 c A = " > A A A B 6 H i c b V D L S g N B E O y N r x h f U S + C l 8 E g e A q 7 U d C L E P D i M Q H z g G Q J s 5 P e Z M z s 7 D I z K 4 S Q L / D i Q R G v f p I 3 / 8 Z J s g d N L G g o q r r p 7 g o S w b V x 3 W 8 n t 7 a + s b m V 3 y 7 s 7 O 7 t H x Q P Z K X u X 5 U r 9 q l Q 9 y e L I w y m c w Q V 4 c A 1 V u I c a N I A B w j O 8 w p v z 6 L w 4 7 8 7 H o j X n Z D P H 8 A f O 5 w + B e Y y b < / l a t e x i t > (b) Figure 1.The circuit architecture used in Sections III and IV.
(a) The horizontal axis can be interpreted as the discrete time L. We call the commuting set of simultaneous 2-qubit gates as the circuit layer.(b) Each gate consists of the single-qubit Pauli-y rotations (19) followed by the CZ operation (20).
The reduced density matrix ρ A is obtained from the full circuit density matrix ρ c (θ) = |ψ c (θ) ψ c (θ)| by taking a partial trace over the subsystem B.
The performance of the variational quantum algorithm depends largely on whether the quantum circuit can prepare an initial variational state |ψ c (θ) that is close to the target ground state |ψ g of the Hamiltonian.In this paper we argue that the average entanglement entropy (3) or (4) of random circuit states provides a distance measure that can quantify a successful minimization of the energy function.Note that, for their computation, we specifically use the equal partition n A = n B = n/2 and the binary logarithm.
The evolution of the entanglement entropies as a function of the circuit depth L is schematically drawn in Figure 2. It is convenient to divide the range of L into three regions A, B, and C. 1 A is where the entanglement entropy continues to grow, while C is where it has saturated to a constant value.As for their scaling behavior in n, the random circuit states in A/C obey the area/volume law scaling of the entanglement entropies, respectively.lead to efficient VQA optimization in contrast to those circuits in C. We also identify B as a transition region between A and C, where the entanglement entropy has already reached saturation yet the initial random parameter can determine the success/failure of the VQA optimization.
The technical reason why the circuit optimization fails in region C is the vanishing gradient problem.When the circuit distribution is approximately 2-design, such that the first and second moments are indistinguishable from those of the Haar distribution, the energy gradient at initial random values cannot deviate from zero, except for an exponentially decaying probability in n [5][6][7][8].It happens for the circuit ensemble in B/C, where the Renyi-2 entropy as a diagnostic of the quantum 2-design is closest to n A = n/2 that corresponds to the Haar ensemble.
Until now, we assumed that the entanglement entropy of the target state follows the area law scaling, as in gapped local one-dimensional systems [9].However, it does not always hold, and the variational circuit in A cannot minimize the circuit energy (2) to the ground level.For the Sachdev-Ye-Kitaev model [10][11][12], whose ground state exhibits a volume law entanglement [13], the optimization does fail no matter to which of A/B/C the variational circuit belongs.Incidentally, we argue that higher-dimensional parameter space can assist the circuit optimization even at high level of circuit state entanglement, so that over-parameterized circuits can offer a high precision approximation of volume law entangled target states including the SYK ground state [14].
The rest of this paper is organized as follows: Section II motivates the entanglement diagnostics as the initialization condition to arrange variational states close to the target.Section III studies the average entanglement growth of circuit states as a function of the circuit depth.Section IV examines the importance of the entanglement diagnostics in the local gradient search of optimal circuit parameters.Section V checks the validity of the entanglement diagnostics by testing them against different circuit architectures and also discuss the impact of shrinking the circuit parameter dimension.The paper concludes with discussion and outlook in Section VI.Additional details are given in the appendices.

II. ENTANGLEMENT DIAGNOSTICS
Using the density matrix of the quantum circuit ρ c (θ), the expectation value of the Hamiltonian (2) reads: Our optimization task is to get as close as possible to the ground state of the Hamiltonian by minimizing (5).It can be achieved by multiple iterations of evaluating the density matrix ρ c (θ) and updating the parameters via the gradient descent (26) that will finally stop at θ = θ f .We would like to reach the final parameter θ f such that where ρ g is the exact ground state of the Hamiltonian.
A simple upper bound of this approximation error ∆E follows from the Cauchy-Schwarz inequality, where the trace norm O 1 is the sum of singular values of an operator O, i.e., eigenvalues of (O † O) 1/2 .A natural condition for efficient reduction of ∆E is arranging an initial circuit state ρ c (θ in ) to be in the proximity of the ground state with a small enough trace distance ρ c (θ in ) − ρ g 1 .However, we will confront two issues.First, we generally do not know the ground state, thus being unable to estimate the trace distance ρ c (θ in ) − ρ g 1 .Second, the trace distance can be very sensitive to tiny changes of quantum states.So the above condition is often over-restrictive, discarding most reasonable initial states.
Instead, we want to relax the condition by using the entanglement entropy of an initial circuit state as a distance proxy between ρ c (θ in ) and ρ g , from which one can expect the success/failure of circuit optimization.It can be motivated as follows: The inequalities on the von Neumann and Renyi-k entropy differences [15][16][17][18]: show that, for given two quantum states ρ A and σ A , being close in their quantum entropies is necessary for being close in their trace distance ρ A − σ A 1 .Suppose now that ρ A and σ A are the density matrices of a subsystem A, which can be combined with a complementary part B to constitute the entire n qubit system, i.e., ρ A ≡ Tr B (ρ) and σ A ≡ Tr B (σ). Monotonicity of the trace distance under the partial trace, implies in turn that, for the trace distance between two quantum states ρ and σ to be small, the difference in their entanglement entropies should necessarily be small.Hence, the entanglement diagnostics of initial circuit states can be considered as a weaker version of the proximity measure.We usually cannot estimate the trace distance from the exact ground state ρ g due to our ignorance of ρ g .However, we expect that the ground states of gapped local Hamiltonians are far from typical quantum states σ, whose σ A are approximately maximally mixed [19].Thus, we require the trace distance between the equiprobable state and the reduced density matrix of the circuit state ρ A to be large.This requires non-maximal entanglement entropies of the circuit states, i.e. they should not scale with the subsystem size n A [20].This is encoded in the following: The trace distance between the reduced density matrix ρ A and the maximally mixed state 2 −n A I n A satisfies the following inequality: with where S EE (ρ A ) and R q A (ρ A ) are the von Neumann and Renyi-k entropies of the reduced state ρ A , respectively.
(ii) In the large size limit n A 1 of the subsystem A, the following lower bound holds asymptotically: Proof.(i) We start from the Pinsker's inequality: on the trace distance between two states ρ and σ and their relative entropy.Plugging ρ = ρ A and σ = 2 such that ( 15) becomes (12).The continuity bound of the Tsallis-k entropy implies [16]: which can turn into (13) by inserting [17]: where H(t) ≡ −t log t − (1 − t) log(1 − t).Substituting σ A = 2 −n A I A and taking the large system size limit n A 1, the LHS of ( 18) becomes (n A −S EE (ρ A )), which leads to the asymptotic inequality (14).
We stress that the entanglement diagnostic for circuit states is only a necessary condition to keep the initial and target states close.Remarkably, as we will see in Section IV, the gradient-based optimization indeed works efficiently for those variational circuits whose average entanglement entropy scales slower than the volume law.Concerning the circuit depth, this suggests to avoid intermediate-depth and high-depth circuits, respectively corresponding to B/C of Figure 2, and favor the circuits with fewer layers that belong to A. We will estimate the critical depth L s that divides A and B/C in the following Section III.

III. RANDOM QUANTUM CIRCUIT
In this section, we study the growth of entanglement entropy for the circuit states generated by a random circuit evolution of the initial product state |0 ⊗n .Figure 1 is the quantum circuit architecture used in this paper.It defines a (1 + 1)-dimensional discrete quantum system, where the n qubits along the vertical axis represent the space, and the L layers along the horizontal axis span the time.The qubits are arranged identically with period n, i.e., i i + n, imposing a periodic boundary condition along the spatial direction.At each time step, the wavefunction evolves by a chain of the 2-qubit unitary gates, acting alternatingly on all neighboring odd-even/evenodd qubit pairs.The 2-qubit gate is made of independent Pauli-y rotations acting on single qubits, followed by the controlled-Z operation that generically creates a pairwise entanglement.We will collectively denote all rotation angles by θ while using θ l,i to indicate a specific angle that rotates the i'th qubit at the l'th layer, where 1 ≤ i ≤ n and 1 ≤ ≤ L. These variables are randomly chosen from U(0, 2π), the uniform distribution between 0 to 2π.

A. Linearity of the Initial Entanglement Growth
Let us consider the evolution of the n-qubit state under the random circuit unitaries of Figure 1, as a function of the number of layers L. We measure the average growth of the bipartite entanglement of random circuit states by decomposing the n qubits into two equal-size subsystems, n A = n B = n/2, and calculating the sample statistics of various Renyi entropies for different n and L.
Figures 3a and 3b show the von Neumann and Renyi-2 entropies averaged over 50 random circuit states with different numbers of qubits n. Figure 4 compares the Renyi entropies of different orders averaged over 50 random circuit states with n = 12 and 20 qubits.They all exhibit the linear growth of the entanglement entropies at initial times.The curves then slow down in growth and eventually reach the plateaus.See Figures 19-21 in the appendix for the growth curves of several other entanglement quantities with different system sizes n.
The early linear growth of the entanglement entropy, is a characteristic feature of the global quench dynamics [21], which in our case is driven by the successive application of the layer unitaries U (θ L ) to the n-qubit product state |0 ⊗n .The coefficient v k is known as the entanglement velocity and generally depends on k.We determine v k by the linear regression of the early-time entropies on the range of 0 ≤ L ≤ n/2.The estimated values of v k , computed at different n's and k's, are summarized in the third columns of Tables I and II.We find that v k is independent of n except for minor fluctuations, identifying (21) with the area-law entanglement of the early-time circuit states.Furthermore, v k decreases when the order k of the Renyi entropy increases, i.e., v k1 > v k2 for k 1 < k 2 .
On the other end, at a late time, the Renyi-k entropy saturates to a constant r n,k for any n and k.We compute the saturated value of R k A by averaging it over the time frame 200 ≤ L ≤ 250 and record that in the fifth columns of Tables I and II.The resulting constants r n,k manifest  the following simple dependency on n A = n/2: implying the volume-law entanglement of the late-time circuit states [22].We also find that, as the entropy order k increases, the saturated value r n,k declines monotonically, so the shift constant c k > 0 can be only larger.
Combined with the discussion in Section II, the average entanglement curves suggest to refrain from using a variational circuit in the region of the plateau, i.e., L ≥ L s , in order to prepare an initial circuit state in proximity to the target ground state that follows the area-law entanglement.We now turn to examine the scaling behavior of the early-time and late-time scales, i.e., L l and L s .

B. Timescale for the Entanglement Growth
Let us study the early-time L l and late-time scales L s , respectively, as the depth scales where the linear growth (21) ends and where the saturation (22) begins.We measure L l and L s using the following operational definitions: L l is the maximum depth L where the gap |R k A (L)−v k L| between the Renyi entropy and its linear approximation maintains smaller than two times the RMS deviation, for 0 ≤ L ≤ n/2.Similarly, L s is the minimum depth L whose difference |R k A (L) − r n,k | between the Renyi entropy and its saturated value remains to be smaller than two times the RMS deviation (24) for 200 ≤ L ≤ 250.
The estimated values of L l and L s for different values of n and k are summarized in the fourth and sixth columns of Tables I and II.We make three observations: First, both timescales L l and L s increase as the entropy order k goes higher.Second, the saturation time L s scales linearly in the system size n, i.e., L s ∼ O(n), because This is consistent with [22] that a unitary design that maximizes all Renyi entropies can be reached within a linear complexity in the system size n.Third, there exists a transient gap between L l and L s , at least for finite-sized systems, in which the entanglement growth is slower.Details of the entanglement curves in this crossover region are largely model-dependent.See [23] for an example.

IV. CIRCUIT OPTIMIZATION
Our focus in this section is on the classical component of the hybrid quantum/classical algorithm.The objective is to find circuit parameters θ * that closely approximate the ground state energy, E(θ * ) E g , by taking iterative steps proportional to the negative gradient of the energy function (5) at each point θ τ , i.e., 2 The learning rate η scales the step size of each update.A too-large η can cause overshooting near the minimum θ * , while a too-small η can make the optimization trajectory stuck at local minima.We will use η = 0.005 for most experiments.When the parameters update is small, each step of the gradient descent can reduce the energy by Due to the constant decrease of the energy (27), we expect to reach E g eventually if there are no other obstacles.We will terminate the iteration after updating the circuit parameter 10 4 times in all our numerical experiments.
2 Estimating the gradient requires the readout of the circuit state ρc at shifted gate parameters [24] conducted by repeated measurements of Pauli strings.We will not consider the effect of the readout noise in this paper.

A. Results
Let us discuss the eigensolver optimization results that aim to solve the ground state of many-body systems.We specifically consider the 1d transverse-field Ising models with nearest-neighbor and long-range interactions and the Sachdev-Ye-Kitaev (SYK) models.See Appendix A for a brief review of their Hamiltonians and ground-state entanglement properties.

The Transverse-Field Ising Models
We search the ground states of interacting 1d spinchain systems.To break the degeneracy of ground states, we turn on the magnetic coupling g to all the spin variables, choosing it to be g = 1 or 2. As we are interested in finding a general correlation between the entanglement diagnostics and the success of optimization, not relying on specific characteristics of Hamiltonians, we study the optimization for the following three Ising models: (i) the nearest-neighbor spin coupling (A1) with g = 2 (ii) the nearest-neighbor spin coupling (A1) with g = 1 (iii) the long-range spin coupling (A3) with α = g = 1.
We repeatedly perform the circuit optimization 50 times, to remove fluctuation made by random parameter initialization, and record the circuit outputs in Figures 5-7 as a function of the circuit depth L.
Each figure consists of three panels.The left ones represent the energy difference E(θ)−E g between the circuit state (5) and the exact ground state (A2).The middle ones show the trace distance between the reduced circuit state ρ c,A and the reduced ground state ρ g,A .The right ones display the Renyi-2 entropy of the reduced circuit state.All the orange/blue curves therein represent a corresponding quantity before/after the optimization.
Figure 5 is for the nearest-neighbor Ising model (A1) with g = 2.It reveals the relation between the average entanglement entropy of initial circuit states and the success of gradient-based optimization: The optimization works well for the circuits with L 36.However, in the intermediate range of 36 L 52, the success rate gradually lowers as the circuit becomes deeper.Beyond that, i.e., L 52, it always fails to close the gap between the exact ground state and the circuit state as to their energy and entanglement entropies.Such relation shows an advantage of using the circuits with L < L s , whose entanglement curve has not reached the plateau.The above pattern also persists in Figures 6 and 7, which correspond to the nearest-neighbor and long-range Ising models with g = 1.
A notable difference of Figure 6 from Figures 5 or 7 appears in the trace distance curve, where the optimization fails to narrow the distance even when the circuit energy is close to the exact ground state energy.It is related to the fact that the ground state entanglement entropy in the g = 1 nearest-neighbor Ising model is higher than those in other Ising models, as shown in Figure 18.
When the entanglement entropy of target ground states is higher, the local search of approximating circuit parameters becomes increasingly difficult.Such difficulty leads to deviations between the post-optimization circuit state and the exact ground state, to which the trace distance reflects much more sensitively than the energy and entanglement entropy differences.

The Sachdev-Ye-Kitaev Model
We will now discuss the circuit optimization in a situation where the Hamiltonian ground state exhibits a volume law scaling of the entanglement entropy.
The SYK 4 Hamiltonian (A4) defined with an instance of random coupling constants has a ground state that follows the volume law scaling of entanglement [13], as reviewed in Appendix A and specifically in Figure 18.We optimize the variational circuit to approximate the SYK 4 ground state and summarize the output in Figure 8 as a function of the circuit depth L.
Since the approximation target state itself behaves in terms of entanglement like a generic quantum state, the optimization task is now much more challenging.Unlike the optimization towards the Ising ground state, even choosing a less entangled circuit within the range L 36 does not lead to success.Figure 8 manifests this failure, not only in the trace distance between the circuit and exact ground states but also their differences of energy and entanglement entropy.

Optimization Speed
As another indicator of how difficult the circuit optimization is, we draw in Figure 9 the evolution of Renyi-2 entanglement entropy R 2 A (ρ c,A ) as a function of the number of parameter updates τ .The three curves therein are for the circuits with L = 12, 40, 68, which represent the characteristics of low-, intermediate-, high-depth circuits.The entanglement entropy of the ground states is marked by the dashed lines.
Towards the Ising ground states, the gradient descent works efficiently for the L = 12 circuit, rapidly reducing the entanglement entropy within 10 3 steps of the update.However, the same gradient descent takes a much longer time for the L = 40 circuit and even fails to reach closely the target state for the L = 68 circuit.As the average entanglement entropy of initial circuit states increases, the optimization difficulty becomes more evident not only in the trace distance, as in Figure 6, but also through the entanglement diagnostics.
The optimization task towards the SYK 4 ground state is inherently more challenging such that all three curves leave a large entanglement gap from the target state.Interestingly, the gradient descent constantly reduces the entanglement entropy of the L = 12 circuit state, enlarging the gap over the optimization steps τ .In general, over-parameterization can assist the circuit optimization that starts from/ends at a highly-entangled typical quantum state.An exponentially high-dimensional parameter space was needed for the SYK 4 model to approximate its ground-level energy with very high precision [14].

B. Entanglement Diagnostics and Optimization
Our results shown in Section IV A exemplify the difficulty in finding a successful optimization trajectory that starts from or ends at a typical quantum state that takes up the vast majority of the Hilbert space.This has been best described through the evolution of the entanglement entropies, ( 3) and ( 4), over the optimization steps, rather than a more commonly-used sensitive metric such as the trace distance between the circuit and target states.
Suppose we can divide the Hilbert space into two subregions distinguished by their entanglement entropies, say A and B/C, in accordance with Figure 2. Generic random states belong to the region B/C whose entanglement entropies are approximately maximal.
For many interesting applications, the target state is a non-generic state that resides in the region A, i.e., following the area-law scaling of the entanglement [3].Along an optimization path inside the region A, the circuit state entanglement tends to decrease regularly.However, when an initial state ρ c (θ in ) belongs to the region B/C, the local parameter update ( 26) is unable to cross over to the   region A, thus failing to reach the ground state energy.We make these observations from the optimization result in Section IV A 1 that discusses the Ising Hamiltonians.
Even when the target state is maximally entangled and lies in the region B/C, the entanglement entropy of the circuit state ρ c (θ τ ) still tends to decrease on average.It means that the entanglement gap between the circuit and target state can become larger, if an initial circuit state has a smaller entanglement entropy than the target state, i.e., ), the optimization moves towards narrowing the gap, but often failing to match a desired level of the entanglement.These observations are based on the optimization results in Section IV A 2, obtained for the 1d SYK Hamiltonian.
The numerical results suggest that the Hilbert space can be partitioned into multiple layers, distinguished by the supported amount of the bipartite entanglement entropy.It is a very demanding task to move across distant layers via the gradient descent (26), which is doable only for the over-parameterized circuits that involve exponentially large parameter space [14].

V. OTHER CIRCUIT ARCHITECTURES
We discussed the importance of choosing the circuit to avoid the saturation of its average initial entanglement entropy, for a generic optimization task that finds a target state with the area law entanglement.This section examines if the entanglement diagnostic still serves as an indicator of efficient circuit optimization with different circuit architectures.We also consider the effect of reducing the number of circuit parameters while maintaining a similar degree of entanglement.

A. Random Graph Architecture
Let us study a simple stochastic variation of the circuit architecture that omits the CZ entangler (20) inside the 2-qubit gate of Figure 1b with a fixed probability p = 1/2.Renyi-2 entropy before/after optimization Figure 13.Measurements averaged over 50 independent restricted circuits (28), before/after the VQA optimization with the nearest-neighbor Ising Hamiltonian (A1) at g = 1, as a function of the number of circuit layers L.

Entanglement Growth
Since the average number of the entangler is cut in half, we expect that the entanglement growth rate would be halved.Accordingly, the circuit depth L to reach the saturation of the entanglement entropy would be doubled.
Figures 10 and 11 show the evolution of entanglement diagnostics as a function of the circuit depth L, estimated by the sample averages of 50 random states.The overall shape of the curves remains the same, but the growth rate has significantly decreased.Reaching a certain level of the entanglement diagnostics requires twice the circuit depth compared to the non-stochastic architecture, i.e., p = 1, as expected.See Figure 21b for the curve of the geometric measure whose growth rate has been halved.

Optimization
Given the optimization task that reaches the nearestneighbor Ising ground energy (A2) with the background field coupling g = 1, the outputs of the p = 1/2 stochastic circuit are all collected in Figure 12 as a function of L.
The depth range of the p = 1/2 circuits where the gradient descent remains successful has increased to L 96.Beyond that, the optimization success rate continues to drop until it reaches 0% at L ∼ 144 and above.This is consistent with the entanglement growth curves, which maintain the same overall shape as in Section III but only with a lower growth rate.We remark that the low and intermediate ranges, in which the optimization may succeed with a non-zero probability, has been extended to L 136, more than mere doubling.It is the impact of the expanded parameter space whose dimension has been doubled, as required for the p = 1/2 circuit to hold the same level of entanglement.
Over the entire range of L, unlike the trace distance, the entropy diagnostic holds a robust correlation with the successful minimization of the circuit energy (5), showing its usefulness regardless of circuit-specific details.

B. Restricted Circuit Parametrization
Recall that an additional circuit layer increases both the average entanglement entropy of initial circuit states and the number of classical parameters.To isolate the effect of the classical parameter space, we study the consequence of imposing the following restriction: which equates all the parameters in each layer, yet maintains the same growth rate of entanglement diagnostics.The basic 2-qubit gate O i,j in Figure 1b reads: where CZ = diag(1, 1, 1 − 1) and R(θ) is the Pauli rotation (19) around the y-axis.It is curious to note that the constraint ( 28) is equivalent to imposing [O i,j , Q i,j ] = 0 on the Hilbert space of (i, j) qubits, where: in the computational basis of (i, j) qubits.Still, there is no globally conserved charge written as a tensor product sum of Q i,i+1 , because it does not generically commute with O i−1,i and O i,i+1 on the next layer.

Entanglement Growth
The entanglement entropies averaged over 50 random circuit states under the parameter space restriction (28) are illustrated in Figures 14 and 15 as a function of the number of circuit layers L. Except small extra wiggles, the overall growth shape and speed of the entanglement diagnostics are similar to those of the unconstrained circuit.Such correspondence of the entanglement growth curves renders the restricted circuit an appropriate setup to study separately the effect of the parameter space dimension on the circuit optimization.
As a side remark, we have seen that the evolution curve of the geometric measure, illustrated in Figure 21c, for the restricted circuit is far more fluctuating than as for the unconstrained circuit, while their saturation depth scales remain largely the same.

Optimization
We optimize the restricted circuit to approximate the ground state of the nearest-neighbor Ising model at g = 1 using the gradient descent.The results are summarized in Figure 13 as a function of L. It is notable that even the circuit optimization with L 20 often stops at ∆E 1, not giving a reliable approximation of the ground energy.Furthermore, starting from L 24, an increasingly large proportion of the randomly initialized circuits fails to reach the ground level energy E g and leave a large gap, i.e., ∆E 8.Such transitional result emerges at a much lower depth than L = 44 of the unconstrained circuit.
It emphasizes the importance of having enough parameters in applying the gradient descent to optimization tasks, even for those circuits that remain within a suitable range of the entanglement diagnostics.

VI. DISCUSSION AND OUTLOOK
In this paper we considered the variational circuit model of quantum computation, arguing for the effectiveness of entanglement diagnostics in finding the circuit architecture for efficient parameter optimization that minimizes the Hamiltonian expectation value.Introduced as a distance measure between the circuit and target states, the entanglement diagnostic has shown its usefulness by illustrating that quantum circuits states within a suitable range of entanglement entropies can successfully reach the ground level energy of local Hamiltonians.It also says that, while entanglement is a valuable non-local resource for quantum computation, circuit states being highly entangled do not necessarily have an advantage but it can be rather the opposite. 3ne way to control the average entanglement entropy of circuit states is to adjust the number of circuit layers.The mean entropy grows linearly with the circuit depth, then gradually slows down, and finally converges to a constant near the theoretical maximum.Denoting by L s the saturation depth beyond which the average entanglement entropy has converged, we divided the depth range into two intervals, L < L s and L ≥ L s , and called them respectively A and B/C.A is typically the optimal region that leads to efficient VQA computation, e.g., when we search an area-law entangled target state, while B/C can suffer from the barren plateau problem, One can further differentiate C from B based on whether the optimization success rate has become 0% or not yet.
Although the assumption of an area-law entangled target state covers most of the interesting VQA applications [3], the ground states of some important Hamiltonians exhibit volume-law entanglement scaling.Matching the entanglement diagnostic alone is not sufficient to approximate such states due to the overwhelming population of highly-entangled quantum states.We need deep variational circuits whose depth L lies in B/C and that are equipped with a large parameter space that can assist high-resolution specification and approximation of the desired target state [14].Having more circuit parameters can generally help to approximate the ground state better, as exemplified by the decreased accuracy for a reduced number of independent variables4 in Section V B, as well as the increased success rate for circuits with extra single-qubit rotation parameters [14,27].
There are many follow-up directions for further investigation: First, for having additional substantial evidence to the validity of the entanglement diagnostics, it would be crucial to consider 2d gapped local Hamiltonians whose ground states follow the area law entanglement scaling but are difficult to approximate.Second, we would like to explore various circuit architectures, e.g., using other rotation and entangling gates [28], built on different graph structures [29,30], or conserving diffusive charges [31,32].Especially, symmetry-preserving circuits can work efficiently for the VQA optimization if the target state is known to respect the imposed built-in symmetries [33,34].Third, the layered circuit defines a discrete dynamical system.We would like to investigate the appearance of quantum chaos in the circuit wavefunction, such as the emergence of random matrix ensemble for the reduced density matrix [35] and the operator spreading [36], relating them to the efficient VQA optimization [37].Finally, it is important to study different types of noise and analyse how they affect the VQA optimization performance [38,39].We also draw in Figure 18 the curves of various Renyi-k entropies of the n = 12 Ising ground state for 0 ≤ g ≤ 2.5. Figure 17 contains the scatter plot of energy and Renyi-2 entanglement entropy over all the eigenstates of the g = 1 Ising Hamiltonian with n = 12 qubits.

Long-Range Ising Model
We add the long-range spin-spin interaction whose couplings decay with the distance.This Hamiltonian reads: where d(i, j) is the shortest distance between the i'th and j'th spins with the periodic boundary condition i ∼ i+n.
All the non-local interactions vanish in the limit α → ∞, thus the Hamiltonian (A3) reduces to (A1).
As in the nearest-neighbor model, for any α ≥ 0, the long-range Ising model exhibits a transition between the anti-ferromagnetic and paramagnetic phases.Note that the ground state in the anti-ferromagnetic phase can have the entanglement entropy that grows with n, thus violating the area law.Its scaling behavior in n is logarithmic for α > 1 and sub-logarithmic for α < 1 [43].Since the matrix product state ansatz can still closely approximate the ground state [43,44], we expect the mild violation of the area law entanglement would not be a big obstacle of the gradient-based optimization even for large n.Several Renyi-k entropies of the n = 12 long-range Ising ground state are shown in Figure 18 as a function of 0 ≤ α ≤ 20.
Figure 16 plots the g = 1 ground energy as a function of the exponent 0 ≤ α ≤ 20 for different system sizes n.Since the long-range interactions are almost negligible for α 10, the curves converge to the energy (A2) of the nearest-neighbor Ising model at g = 1.We also draw the scatter plot of energy and Renyi-2 entanglement entropy in Figure 17, denoting every eigenstate of the g = α = 1 long-range Ising Hamiltonian with n = 12 qubits.

The SYK Model
The Sachdev-Ye-Kitaev (SYK) model [10,11] consists of random, long-range, all-to-all interactions of n qubits, which correspond to the following random couplings of q Majorana fermions: H = (i) q/2 1≤i1<...<iq≤2n J i1...iq γ i1 ...γ iq , (A4) where the Majorana fermions {γ i } 1≤i≤2n satisfy the Clifford algebra {γ i , γ j } = δ ij and can be translated to the spin variables via the Jorgan-Wigner map.The coupling constants J i1...iq are randomly drawn from the Gaussian distribution with mean 0 and variance (q − 1)!/(2n) q−1 .Much attention has been paid to the SYK q model because it is exactly solvable and exhibits a chaotic dynamics for q ≥ 4 at the same time [11,12].We focus on the SYK 4 model and drop the subscript for brevity.Each random draw of the coupling constants J i1...i4 from the Gaussian distribution defines a different instance of the SYK Hamiltonian.The SYK ground energy for 100 individual instances with n qubits, or equivalently, 2n Majorana fermions are displayed in Figure 16.Similarly, the entanglement entropies of 100 instances of the n = 12 SYK ground state are visualized in Figure 18, illustrating the SYK ground state is much more highlyentangled than that of the Ising models.More generally, the energy-entropy scatter plot of Figure 17 denotes the full spectrum for an instance of the n = 12 SYK Hamiltonian, exhibiting the volume-law scaling of the entanglement entropies [13].Its energy gap is notably smaller than that of the Ising models, thus violating the assumption [9] for the area-law entanglement of the ground state.density matrix, i.e., λ i (ρ A ) ≥ 0 and i λ i (ρ A ) = 1, then implies the change of the eigenvalue statistics from having one λ being non-zero and having a value of 1 to all λ's being non-zero and having similar values around 2 −n A .Such decrease in the largest eigenvalue of ρ A is displayed in the min-entropy curve.We note this spectral change of ρ A can be a contributing factor for the emergence of the random matrix behavior of ρ A , studied elsewhere [37].

Geometric Measure
Another way to measure the quantum entanglement of circuit states is to study the geometric measure of entanglement, based on the overlap between the circuit state |ψ(θ) and its nearest product state [45].It reads: where P is the set of qubit product states.Figure 21 shows that the geometric measure of entanglement grows in the same pattern as of the entanglement entropy curve.However, note that the geometric measure is directly calculated from the full density matrix ρ, while the entanglement entropies are found from the reduced state ρ A .

|0i< l a t e x i t s h a 1 _
b a s e 6 4 = " 9 r t M C d F m l j j 1 H x D j 4 U S z f C K n D 2 E = " > A A A C C H i c b V C 7 S g N B F J 2 N r x g f W b W 0 G Q x C q r A b B S 0 D N p Y R z A O y S 5 i d 3 C R D Z m a X m V k h L P k B f 8 F W e z u x 9 S 9 s / R I n y R Y m 8 c C F w z n 3 c i 4 n S j j Tx v O + n c L W 9 s 7 u X n G / d H B 4 d F x 2 T 0 7 b O k 4 V h R a N e a y 6 E d H A m Y S W Y Y Z D N 1 F A R M S h E 0 3 u 5 n 7 n C Z R m s X w 0 0 w R C Q U a S D R k l x k p 9 t x x Y 1 2 A P B 4 r I E Y e + W / F q 3 g J 4 k / g 5 q a A c z b 7 7 E w x i m g q Q h n K i d c / 3 E h N m R B l G O c x K Q a o h I X R C R t C z V B I B O s w W j 8 / w p V U G e B g r O 9 L g h f r 3 I i N C 6 6 m I 7 K Y g Z q z X v b n 4 n 9 d L z f A 2 z J h M U g O S L o O G K c c m x v M W 8 I A p o I Z P L S F U M f s r p m O i C D W 2 q 9 J K T C R m t h R / v Y J N 0 q 7 X / K t a / e G 6 0 q j m 9 R T R O b p A V e S j G 9 R A 9 6 i J W o i i F L 2 g V / T m P D v v z o f z u V w t O P n N G V q B 8 / U L b x W Z M w = = < / l a t e x i t > |0i < l a t e x i t s h a 1 _ b a s e 6 4 = " 9 r t M C d F m l j j 1 H x D j 4 U S z f C K n D 2 E = " > A A A C C H i c b V C 7 S g N B F J 2 N r x g f W b W 0 G Q x C q r A b B S 0 D N p Y R z A O y S 5 i d 3 C R D Z m a X m V k h L P k B f 8 F W e z u x 9 S 9 s / R I n y R Y m 8 c C F w z n 3 c i 4 n S j j T x v O + n c L W 9 s 7 u X n G / d H B 4 d F x 2 T 0 7 b O k 4 V h R a N e a y 6 E d H A m Y S W Y Y Z D N 1 F A R M S h E 0 3 u 5 n 7 n C Z R m s X w 0 0 w R C Q U a S D R k l x k p 9 t x x Y 1 2 A P B 4 r I E Y e + W / F q 3 g J 4 k / g 5 q a A c z b 7 7 E w x i m g q Q h n K i d c / 3 E h N m R B l G O c x K Q a o h I X R C R t C z V B I B O s w W j 8 / w p V U G e B g r O 9 L g h f r 3 I i N C 6 6 m I 7 K Y g Z q z X v b n 4 n 9 d L z f A 2 z J h M U g O S L o O G K c c m x v M W 8 I A p o I Z P L S F U M f s r p m O i C D W 2 q 9 J K T C R m t h R / v Y J N 0 q 7 X / K t a / e G 6 0 q j m 9 R T R O b p A V e S j G 9 R A 9 6 i J W o i i F L 2 g V / T m P D v v z o f z u V w t O P n N G V q B 8 / U L b x W Z M w = = < / l a t e x i t > |0i < l a t e x i t s h a 1 _ b a s e 6 4 = " 9 r t M C d F m l j j 1 H x D j 4 U S z f C K n D 2 E = " > A A A C C H i c b V C 7 S g N B F J 2 N r x g f W b W 0 G Q x C q r A b B S 0 D N p Y R z A O y S 5 i d 3 C R D Z m a X m V k h L P k B f 8 F W e z u x 9 S 9 s / R I n y R Y m 8 c C F w z n 3 c i 4 n S j j T x v O + n c L W 9 s 7 u X n G / d H B 4 d F x 2 T 0 7 b O k 4 V h R a N e a y 6 E d H A m Y S W Y Y Z D N 1 F A R M S h E 0 3 u 5 n 7 n C Z R m s X w 0 0 w R C Q U a S D R k l x k p 9 t x x Y 1 2 A P B 4 r I E Y e + W / F q 3 g J 4 k / g 5 q a A c z b 7 7 E w x i m g q Q h n K i d c / 3 E h N m R B l G O c x K Q a o h I X R C Rt C z V B I B O s w W j 8 / w p V U G e B g r O 9 L g h f r 3 I i N C 6 6 m I 7 K Y g Z q z X v b n 4 n 9 d L z f A 2 z J h M U g O S L o O G K c c m x v M W 8 I A p o I Z P L S F U M f s r p m O i C D W 2 q 9

Figure 3 .
Figure 3. Von Neumann and Renyi-2 entropies for 8 ≤ n ≤ 20 averaged over 50 independent circuit states, as a function of the circuit depth L.

Figure 4 .
Figure 4. Various Renyi-k entropies for n = 12 and n = 20 averaged over 50 independent circuit states, as a function of the circuit depth L.

Figure 5 .Figure 6 .
Figure 5. Measurements averaged over 50 independent circuits, before/after the VQA optimization with the nearest-neighbor Ising Hamiltonian (A1) at g = 2, as a function of the number of circuit layers L.

Figure 7 .Figure 8 .
Figure 7. Measurements averaged over 50 independent circuits, before/after the VQA optimization with the non-local Ising Hamiltonian (A3) at α = g = 1, as a function of the number of circuit layers L.

Figure 9 .
Figure 9.The evolution of the Renyi-2 entropy by the gradient-based VQA optimization with the Ising and SYK4 Hamiltonians, i.e., (A1), (A3), (A4), for n = 12 qubits.The horizontal direction denotes the number of parameter updates τ .The dashed line shows the Renyi-2 entropy of the exact ground states for each Hamiltonian.

Figure 10 .
Figure 10.Von Neumann and Renyi-2 entropies for 8 ≤ n ≤ 20 averaged over 50 independent p = 1/2 circuit states, as a function of the circuit depth L.

Figure 11 .
Figure 11.Various Renyi-k entropies for n = 12 and n = 20 averaged over 50 independent p = 1/2 circuit states, as a function of the circuit depth L.

Figure 12 .
Figure 12.Measurements averaged over 50 independent p = 1/2 circuits, before/after the VQA optimization with the nearestneighbor Ising Hamiltonian (A1) at g = 1, as a function of the number of circuit layers L.

Figure 14 .
Figure 14.Von Neumann and Renyi-2 entropies for 8 ≤ n ≤ 20 averaged over 50 independent restricted circuit states (28), as a function of the circuit depth L.

20 Figure 15 .
Figure 15.Various Renyi-k entropies for n = 12 and n = 20 averaged over 50 independent restricted circuit states (28), as a function of the circuit depth L.

Figure 16 .Figure 17 .Figure 18 .
Figure 16.The ground energy of the following Hamiltonian systems of different sizes n: (Left) Nearest-neighbor Ising Hamiltonian with different g. (Middle) Long-range Ising Hamiltonian at g = 1 with different α. (Right) SYK4 Hamiltonian with 100 different instances of Gaussian random couplings.

Figure 19 .
Figure 19.The max-entropy of the circuit reduced density matrix ρA(θ) as a function of the number of layers L.

Figure 20 .
Figure 20.The min-entropy of the circuit reduced density matrix ρA(θ) as a function of the number of layers L.
The circuit of Figure1 The p = 1/2 circuit

Figure 21 .
Figure 21.The geometric measure of the circuit reduced density matrix ρA(θ) as a function of the number of layers L.