Sequential optimal selections of single-qubit gates in parameterized quantum circuits

In variational quantum algorithms, it is important to balance conflicting requirements of expressibility and trainability of a parameterized quantum circuit (PQC). However, appropriate PQC designs are not necessarily trivial. Here, we propose an algorithm for optimizing the PQC structure, where single-qubit gates are sequentially replaced by the optimal ones via diagonalization of a matrix whose elements are evaluated on slightly modified circuits. This replacement leads to a better approximation of target states with limited circuit depth. Furthermore, we clarify the existence of a barren plateau in the sequential optimization in terms of the spectrum concentration of the matrix, which defines the cost landscape with respect to changes in the target gate. Then, we rigorously show the concentration is no faster than polynomials in the number of qubits when an n-qubit PQC depth is O(log⁡n) using local observables. Finally, numerical experiments are provided to show the convergence of our method which is faster than classical optimizers on both simulators and a real device. Our results provide evidences for sequential optimizers as better alternatives to optimize PQCs on near-term quantum devices.

The objective/cost function of VQA is often formulated as the expectation value of Hamiltonian H of a target system whose solution can be obtained from the eigensystem of H computed with variational quantum eigensolver (VQE), e.g., for quantum chemical calculations using fermionic or spin Hamiltonian [2].Appropriate designs of PQCs are essential to express the quantum states of interest.PQCs are classified into physicsbased ansatz and heuristic-based ansatz.A physicsbased ansatz, including a unitary coupled cluster [19] and Hamiltonian variational ansatz [20,21], can achieve effi-cient optimization by limiting the Hilbert space spanned by the ansatz to the neighborhood of the target states.However, it is difficult to run on near-term devices because of its deep circuit depth.In contrast, a heuristicbased ansatz puts more weight on the feasibility on nearterm devices, which generally results in shallower circuits albeit with the uncertainty to express the target states.
A typical strategy to mitigate the uncertainty is by systematically increasing the circuit depth with adding layers of gates.However, increasing layers can cause the gradient of the cost function with regards to parameters of PQCs to exponentially vanish as the number of qubits grows, a phenomenon termed barren plateau [22,23].The barren plateau renders gradient-based approaches useless.Several remedies, such as layerwise-learning [24] and parameter correlation [25], have been proposed but they can only cope with noiseless conditions.The only effective strategy for the noise-induced barren plateau to date is reducing circuit depth.Hence, there are two conflicting requirements of PQCs with heuristic ansatz; they should be deep enough to express target states but should be as shallow as possible to avoid both types of barren plateau.
A circuit structure optimization implemented in Rotoselect [26] draws attention to deal this dilemma, which is shared in Variable Ansatz (VAns) algorithm [27].Rotoselect is also one of sequential quantum optimizers such as NFT [28] (also termed Rotosolve in [26]), where singlequbit gates are sequentially and analytically optimized The blue and red circles correspond to the expressibility of the single-qubit gate of FQS and Fraxis, respectively.Since FQS (Fraxis) has full (partial) degrees of freedom in a single-qubit gate, the FQS (blue) circles are larger than those of Fraxis (red).FQS and Fraxis can find the optimal state in its circle that drifts stepwise reflecting the circuit structure optimization.
without gradient.It allows to select the optimal (singlequbit) rotational gate among R x (θ), R y (θ), and R z (θ) to minimize the objective function with regards to a single-qubit gate in a PQC.In this optimal selection, the method utilizes the periodicity of the objective function in θ, as R x (θ) = cos (θ/2)I −i sin (θ/2)X and similarly for R y and R z .Since Rotoselect is more flexible via choosing the rotation axes of the single-qubit gates through cost minimization, it is regarded as structural optimization of fixed-depth PQCs.However, it has two drawbacks: the rotation axis is from a finite and discrete gate set, and each parameter of PQCs is updated locally.These make it prone to local optima.
To deal with the drawbacks, "Free-axis selection" (Fraxis) was proposed based on the representation of a single-qubit gate whose rotational axis is arbitrary 3dimensional vector, but the rotational angle is fixed to π [29].Fraxis is also a sequential quantum optimizer, but with more parameters (i.e., axes of rotations) to optimize simultaneously.The optimal axis is obtained from a matrix diagonalization whose elements are computed from expectation values of H for quantum states generated from PQCs, where the gate of interest is replaced by a set of unitaries.Because Fraxis continuously varies the axis of each Fraxis gate R n (π) in a PQC, it has higher degree of freedom and better expressibility with limited depth.However, unlike the gradient-based optimizers, nothing is known about the properties of sequential optimizers with regards to barren plateaus: [18] suggested they might be inferior to the gradient-based ones despite their faster convergence.
In this work, we first extend the matrix diagonalization framework of Fraxis to full optimization of a general single-qubit gate, termed as Free Quaternion Selection (FQS) after the quaternion representation of a singlequbit gate.More precisely, to optimize one of single-qubit gates in a PQC, the VQA cost function can be transformed into a solvable quadratic form by mapping the single-qubit gate R n (ψ) to a unit quaternion.Because the single-qubit gates of PQCs with FQS have the highest degree of freedom, FQS can optimize more effectively the circuit structure to express target quantum states with limited depth as illustrated in Fig. 1.In addition, we show that the FQS generalizes the existing sequential quantum optimizers; all of them can be formulated with the matrix diagonalization whose minimal eigenvalue corresponds to the (locally) optimal energy.
Next, we unveil several important properties of this optimizer family with respect to barren plateaus.Originally, barren plateau was shown using the gradient-based framework where it manifests as exponential decay of the variance of gradient.Since sequential quantum optimizers do not directly use gradients, it is not straightforward to study the properties of these optimizers with regards to barren plateaus.However, the essential nature of barren plateaus is the loss of trainability as the number of qubits increases, which reflects an extreme separation between the initial and the target states caused by exponential inflation of the dimension of Hilbert space.Hence, it is worth investigating the relation between loss of the trainability and the system size in sequential quantum optimizers.To clarify this point, we introduce the spectral radius of the matrix associated with each optimizer as an alternative measure for trainability.The spectral radius corresponds to the cost difference by a single application of sequential optimization to the gate of interest.Therefore, if sequential optimizers are subject to the barren plateaus, the second moment of the spectral radius should exponentially vanish with respect to the system size.
Here, we rigorously prove that under the same conditions as assumed in their gradient-based counterparts [22,30], the sequential optimizers will likely run into barren plateaus when a PQC has sufficient expressibility.More precisely, if the circuits become sufficiently deep to achieve unitary 2-design over the whole system, the second moment of the spectral radius shows exponential decay regardless of the global or local cost.On the other hand, we also rigorously prove that when the cost functions are local observables on an alternating layered ansatz [30,31], the exponential concentration of the spectrum cannot occur as long as the depth of the n-qubit PQCs is O(log n), and as a result the barren-plateau problems can be avoided.The latter result is obtained thanks to the analysis of the cost-function dependent barren plateaus in [30] combined with our bounds on the spectral radius of the matrices used to optimize singlequbit gates sequentially.We further demonstrate these properties using extensive numerical simulations up to 12 qubits.Because the sequential optimizers have been shown faster in convergence, our results shed new light on sequential optimizers as better alternatives to the gradient-based ones.
The remainder of this paper is organized as follows.In Sec.II, we describe the theoretical aspects of FQS.We first give the quaternion representation for a single-qubit gate and then derive the matrix for FQS that completely characterize the energy landscape of the PQCs.We then present the FQS as a generalized form of existing methods by showing their corresponding matrices.The optimal energy can be determined from the optimal eigenvalues of the matrices.Rigorous proofs on the relation between barren plateaus with sequential optimizers are then derived.In Sec.III, we provide extensive numerical experiments demonstrating the properties of FQS.Then, we confirm FQS outperforms other sequential quantum optimizers as well as gradient-based optimizers.Moreover, in application to a mixed field Ising model up to 12 qubits, we demonstrate that VQE with FQS on an alternating layered ansatz for the local cost keeps the optimization capacity avoiding barren plateaus when the circuit is sufficiently shallow.Finally, we conclude this study in Sec.IV.

A. Quaternion representation for single-qubit gate
A general single-qubit gate is conventionally represented as where I and σ = (σ 1 , σ 2 , σ 3 ) = (X, Y, Z) denote the 1qubit identity operator and the Pauli matrices.The parameters n and ψ correspond to a rotational axis and angle in the Bloch sphere, respectively.
Here, we show another way to parameterize the general single-qubit gate based on the well-known relationship between a single-qubit gate and a unit quaternion.Since the rotational axis n is a three-dimensional real unit vector, we can write it in the polar coordinate system with the zenith angle θ and the azimuth angle φ as n = n(θ, φ) = (cos θ, sin θ cos φ, sin θ sin φ). ( Substituting Eq. (2) into Eq.( 1), we obtain the quaternion representation of a single-qubit gate as where a unit quaternion q = (q 0 , q 1 , q 2 , q 3 ) (i.e., q ∈ R 4 , |q| = 1) is parameterized with (ψ, θ, φ) as Here, ς = (ς 0 , ς 1 , ς 2 , ς 3 ) is an extension of the Pauli matrices defined as Equation ( 3) allows us to identify a point q on the threedimensional spherical surface with a single-qubit gate.
Based on this identification, we write R n (ψ) as R(q) for simplicity in the rest of this paper.Note that if we focus on the conventional single-qubit rotation gate with one parameter θ for the rotational angle, such as R x (θ), this gate can be identified with a point on the one-dimensional spherical surface, i.e., the unit circle.We can decompose a general single-qubit gate R(q) into three R z gates and two √ X gates up to global phase [32], that is, where θ, φ, λ can be determined from q. Since R z gates on IBM Quantum devices are pulse-operation free, the general single-qubit gate can be implemented with only two times pulse-operation for √ X.

B. Our algorithm: Free Quaternion Selection for Variational Quantum Algorithm
Let us consider an n-qubit parameterized quantum circuit U consisting of D parameterized single-qubit gates {R(q d )} D d=1 and parameter-free gates such as CNOT gate.Tuning the parameters, we aim to solve an optimization task with the following objective function where ρ k is an n-qubit initial state from a training set, and H k is some observable.In the following, without loss of generality, we focus on a single expected value in the objective function (i.e., K = 1, which can simply be considered as the minimization of energy for the Hamiltonian H := H 1 and the input state ρ in := ρ 1 ).The extension of our algorithm to the whole objective function is trivial due to the linearity.For the energy A quantum circuit to optimize the single-qubit gate R(q d ).The variational parameters in U1 and U2 are fixed during the optimization of the gate of interest.Our algorithm sequentially replace R(q d ) in the circuit by the optimal one, which is computed by the diagonalization of the corresponding matrix S whose elements can be evaluated from the circuit by replacing R(q d ) with ten different gates as described in Sec.II C.
minimization, we focus on a sequential optimization regarding R(q d ) where all parameters are fixed except for the dth single-qubit gate as shown in Fig. 2.
The energy expectation of the variational state is written as where U 1 and U 2 are the quantum circuits before and after R(q d ), respectively.H and ρ in are defined as Here we omit the subscript d for simplicity.Substituting Eq. (3) into Eq.( 8), we can obtain the following quadratic form where superscript denotes a transpose operation, and S = (S µν ) is a 4 × 4 real-symmetric matrix whose elements are defined as and more explicitly where [•, •] denotes the commutation relation.See Appendix VII A for the derivation of the quadratic form.
The matrix S can be obtained by running and measuring ten quantum circuits, each corresponding to the element in the upper diagonal of S, as detailed in the next subsection.Note that the minimization of the quadratic form is exactly achieved by calculating the eigenvector corresponding to the lowest eigenvalue of S. In addition, this optimization over the whole SU (2) is a generalization of other sequential optimizers [26,28,29], which optimize only a part of SU (2).To clarify this point, we show they can be derived from our general framework in Sec.II C. Notice that a special FQS, which applies to the objective functions in a special form, was proposed for time-evolving simulation [11].Our formulation is also regarded as an extension of that special FQS.
Since FQS can select the optimal gate from the whole SU (2) for minimizing the energy expectation in Eq. ( 8), it can incorporate the multi-parameter correlation in SU (2) and thus achieves better performance for optimizing PQCs than other sequential optimizers.Rotoselect, Rotosolve, and their variants [26,28,[33][34][35][36] can be used to optimize a general single-qubit gate by first decomposing the multi-parameter gate using ZYZ-decomposition [37] to obtain three single-parameter gates.In Rotosolve/select and their variants, each parameter of the decomposed gate is optimized locally in contrast to FQS and Fraxis, where a general singlequbit gate is decomposed into two Fraxis gates defined as R n (π).Fraxis [29] updates these gates individually but simultaneously optimizes two parameters within the gate.

C. Evaluation of the S elements
To determine the optimal single-qubit gate for the cost minimization, we evaluate the matrix S constructing the quadratic form Eq. (10).All the elements of S in Eq. ( 12) are calculated from ten expected values classified into three types as below. Type-A: Type-C: for (k, m) = (1, 2), (1, 3), (2,3) tr Here, Type-A is the (0, 0)-the element of S and corresponds to the expected value of H on the PQC when the single-qubit gate of interest, as in Fig. 2, is replaced with identity.The other diagonal elements are produced by the type-A and type-B values with the following identity: Note that the Type-B corresponds to the expected value of H on the PQC when the single-qubit gate of interest, as in Fig. 2, is replaced with, respectively, R x (−π/2), R x (π/2), R y (−π/2), R y (π/2), R z (−π/2), and R z (π/2).
In contrast, subtracting the type-B values with different signs yields the other elements in the first row directly.The remaining off-diagonal elements are produced by the type-C expected values and the already obtained diagonal elements with the following identity: Note that the Type-C values correspond to the expected values of H on the PQC when the single-qubit gate to be optimized, as in Fig. 2, is replaced with, respectively, Fraxis gate R n (π) for n ∝ (1, 1, 0), (1, 0, 1), (0, 1, 1).All expected values of Type-A, B, and C can be evaluated with direct measurements without any control operation such as the Hadamard test.Since the degree of freedom for a 4 × 4 real-symmetric matrix is ten, the number of required direct measurements should be optimal.

D. Unification of sequential quantum optimizers
FQS generalizes all known sequential optimizers for single-qubit gates of PQCs, such as, Rotosolve (NFT), Rotoselect, Fraxis; those methods can be regarded as special cases of FQS.
For NFT, a single-qubit gate is restricted to a fixed axis m such as U NFT := R m (ψ).Then the corresponding objective function in quadratic form is where c := (cos ψ/2, sin ψ/2) and S 0 := (S 01 , S 02 , S 03 ).S denotes the lower right 3 × 3 part of the S matrix.The derivation of the quadratic form is detailed in Appendix A. The real symmetric matrix in Eq. ( 15) can be regarded as a contraction of the FQS matrix S with respect to the rotational axis m reducing its degree of freedom to three.In Rotoselect, three contracted matrices in Eq. ( 15) are constructed for m ∈ {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, and the lowest eigenvalue is selected after three separate diagonalization procedures.Since the three matrices share S 00 , the total number of circuit evaluations can be reduced to seven.
As for Fraxis, the target gate U Fraxis := R n (π) is simply expressed by the quaternion q = (0, n).Thus, substituting q = (0, n) into the objective function, we can reproduce the previous results derived in [29] as follows Note that the θ-Fraxis of [29], in which the rotation angle is fixed to arbitrary values θ instead of π, can be regarded as minimizing Eq. ( 10) for q := (q 1 , q 2 , q 3 ) ∈ R 3 under the constraint |q | 2 = 1 − q 2 0 , which results in solving simultaneous equations rather than diagonalization.
It is worth noting that the required number of circuit evaluations for each sequential quantum optimizer coincides with the degrees of freedom for the real-symmetric matrix of respective methods, i.e., 3=1+2 circuit evaluations for NFT/Rotosolve, 6=1+2+3 circuit evaluations for Fraxis, and 10=1+2+3+4 circuit evaluations for FQS.

E. Barren plateaus in sequential quantum optimizers
We have shown that the energy landscape of all sequential optimizers can be derived from the eigenvalues of the S matrices whose elements are computed from circuit evaluations of slightly modified PQCs.Sequential quantum optimizers can effectively obtain a better single-qubit gate minimizing the cost function as long as the eigenvalues of S are significantly far from degenerate.Hence, it is useful to quantify degeneracy of S for evaluation of performance of sequential quantum optimizers.
To this end, we introduce a centered matrix S (p) c defined as, where I p×p denotes a p×p identity matrix.Here, S (p) is a p × p real-symmetric matrix for each sequential quantum optimizer, and the degrees of freedom of target singlequbit gate is p − 1 e.g., p = 2 (Rotosolve/NFT), p = 3 (Fraxis), and p = 4 (FQS).Henceforth, we omit the superscript p for simplicity because it is clear from the context.Since the mean of the eigenvalues of S c is zero, the spectral radius of S c indicates the spread of the spectrum of S from the mean of eigenvalues of S.More precisely, defining λ(S) as the eigenvalue of S with the largest distance from tr[S]/p, which is the mean of eigenvalues of S, the spectral radius r of S c is equivalent to In the following, we provide two theorems on the be-havior of the spectrum of each real-symmetric matrix in Eqs. ( 10), (15), and ( 16) on the condition that the parameters in U 1 and U 2 , as in Fig. (2), are randomly initialized, which is standard for studying barren plateaus in gradient-based methods [22,30].

Sequential quantum optimizers catching barren plateaus
An upper bound of Eq. ( 18) that decays exponentially with the number of qubits hints at the existence of flat energy landscape (barren plateaus).Indeed, we reveal that the flat landscape can happen under similar conditions assumed in [22] i.e., if the circuits become sufficiently deep to achieve unitary 2-design, the second moment of the spectral radius of the centered matrix shows exponential shrink regardless of the global or local cost.
Theorem 1. Suppose that the quantum circuits U 1 and U 2 are randomly and independently generated.If either U 1 or U 2 forms a unitary t-design with t ≥ 2, the second moment of the spectral radius r of the centered matrix Eq. ( 17) is upper bounded as, where the first case corresponds to U 1 being a t-design, and the second case corresponds to U 2 being a t-design.
Here, E U1,U2 [•] is defined as the expectation over the random quantum circuits U 1 and U 2 , δ µν denotes the Kronecker delta, h.c.means the Hermite conjugate of the preceding term, and ∆ρ in,2 , ∆H 2 are defined as This theorem means that if either U 1 or U 2 has sufficient expressibility i.e., unitary 2-design over n-qubit, the spectrum of S matrix concentrates on a single value and its deviation is exponentially small with respect to the number of qubits.As a result, this spectrum concentration implies that the energy of the output quantum state becomes insensitive exponentially on the selection of single-qubit gates.

Sequential quantum optimizers avoiding barren plateaus
Although Theorem 1 assumes the relatively deep circuit forming unitary t-design with t ≥ 2, we are interested in shallower circuits due to the existence of noise-induced barren plateau.Here, we show that, in contrast to Theorem 1, there are shallow circuits avoiding the exponential shrink of the spectral radius when the target Hamiltonian is local.
To this end, let us consider an n-qubit alternating layered ansatz U [30,31] with m-qubit parametrized unitary blocks, as depicted in Fig. 3(a).Each block consists of parametrized single-qubit gates and parameterfree gates.Suppose the ansatz consists of L layers, and each layer contains ξ blocks (i.e., n = ξm).We define S k (k = 1, 2, • • • , ξ) as the m-qubit subsystem on which the kth block from the top in the final layer acts.In the following, we focus on a block W in the lth layer and a parameterized single-qubit gate R in the block as in Fig. 3(b).Moreover, we define the forward light-cone L of the block W as a series of gates that has at least one input qubit causally connected to the output qubits of W .As well, the backward light-cone L B of W is defined in the reverse direction of L.
Theorem 2. Suppose that the whole quantum circuit U is an n-qubit alternating layered ansatz with m-qubit blocks as described in Fig. 3. Here, we focus on a block W in lth layer and a parameterized single-qubit gate R in the block.We assume that the quantum circuits W A , W B ⊂ W , which are located after and before the target gate respectively, and the other blocks form a local 2-design independently.In addition, the Hamiltonian H is assumed to be m-local such as (21) where h i is a tensor product of Pauli matrices that acts non-trivially on at most m-qubit.(The same assumption was used in [30].)Then, the second moment of the spectral radius r of the centered matrix Eq. ( 17) satisfies where L denotes the total number of layers.Here, i L is the set of i indices whose associated operators This theorem means the lower bound of the second moment of the spectral radius does not vanish exponentially fast when we employ shallow L, which leads to similar properties as the gradient-based counterparts [30].More precisely, if at least one term c 2 i (ρ k,k ) (h i ) vanishes no faster than Ω(1/poly(n)), and if the number of layers holds.This guarantees trainability of sequential quantum optimizers in the beginning of optimization.On the other hand, if at least one term c 2 i (ρ k,k ) (h i ) vanishes no faster than Ω(1/2 poly(log(n)) ), and if the number of layer L is O(poly(log(n)) then, holds.This suggests a transition region of trainability where the spread of the spectrum decays faster than polynomial but slower than exponential.Note that Cerezo et al. [30] showed a detailed analysis of variance of gradient with respect to angles of fixed-axis rotation gates.In contrast, for a global cost function, the spectral radius is expected to shrink exponentially even on a constant-depth alternating layered ansatz in line with the gradient-based methods as [30].

III. NUMERICAL EXPERIMENTS
We have seen that FQS generalizes known sequential optimizers such as NFT, Rotosolve/select, and Fraxis.
Comparing with both other sequential quantum optimizers and a gradient-based optimizer, we show the advantages of FQS in VQE for the mixed field Ising model and fidelity maximization, where the former has a local Hamiltonian as the cost function while the latter is a global cost.In addition, we demonstrate that our method can avoid the barren plateau (more precisely, the exponential concentration of spectrum of S) by numerically evaluating the second moment of spectral radius.In the following, we provide numerical simulations based on the statevector simulator of Qiskit [38].

A. Mixed field Ising model
To benchmark the performance of FQS, we carried out VQE optimizations for the 1-dimensional mixed field Ising model with five qubits whose Hamiltonian is where the superscript i denotes the index of each site, and We employed the periodic boundary condition, that is, Z (n+1) = Z (1) .We employed two types of ansatz as in Fig. 4. In the FQS optimizations, we prepared 20 initial parameter sets in the staterandom manner, where the respective single-qubit gates were initialized based on the Haar random distribution.Similarly, in the Fraxis optimizations, the rotational axis n in R n (ψ) of each single-qubit gate was sampled from the uniform probability distribution on the Bloch sphere, while the rotational angles were fixed to π.As for Rotoselect, the initial rotational axes were randomly selected from X, Y and Z axes, while the rotational angle were randomly initialized in both Rotosolve (NFT) and Rotoselect.
Figure 5(a) shows averaged trajectories of independent 20 VQE simulations based on the 5-layer alternating layered ansatz.Here we compare the convergence efficiency of FQS with those of the other sequential quantum optimizers with respect to the number of gate updates.Although the number of circuit evaluations to update a single gate is not necessarily consistent among the optimizers, it may be a fair comparison because the practical wall times for a single gate update by these optimizers would be comparable if parallel circuit evaluations were allowed.
From Fig. 5(a), it is obvious that FQS converged the most efficiently to the lowest energy value within 6000 updates.Considering the gate expressibility, it may appear to be reasonable that FQS reached the better solution than Rotoselve and Fraxis.We note, however, the efficiency of FQS is not necessarily trivial.To verify it, we decomposed a general single-qubit gate, which we term an FQS gate, into equivalent three fixed-axis rotation gates R z (φ)R y (ϑ)R z (λ) (more precisely, we replaced R in Fig. 4 with RzRyRz gates) and sequentially optimized the three gates by Rotosolve.In this case, the optimization turned out to be far slower not only than FQS but also than Fraxis.The ansatz with the three-gate decomposition was also slower than the RyRz ansatz in optimization.Actually, this deceleration of the RzRyRz ansatz compared to RyRz ansatz is consistent with the known dilemma; higher expressibility leads to lower trainability [39].It is a contrast that FQS can maintain high optimization efficiency regardless of its high expressibility.We can clearly attribute this efficiency of FQS to the incorporated parameter correlation within a single qubit gate, which has three degrees of freedom.This insight is consistent with the fact that Fraxis also outperforms the sequential optimization for the RyRz ansatz by Rotosolve, because Fraxis incorporates correlation of two degrees of freedom.
The comparison of performance between FQS and the gradient-based optimizers is presumably in great demand.However, it is not straightforward to compare them in a fair manner, because their apparent performances varies depending on the assumed hardware.Figure 5(b) shows a comparison of optimization efficiency between FQS and the gradient-based optimizer (Adam) [40], where the horizontal axis represents the number of expectation evaluations, which is not consis-tent with that of Fig. 5(a).Although parallel computing is principally possible for the gradient-based optimizers, the required number of the cost evaluation for each optimization step is O(D), where D is the total number of gates in ansatz.This is contrast to FQS that requires a constant number of the cost evaluation i.e., at most 10 circuits for a single gate update.Since we did not find parallel computing practical for the gradient-based optimizations, we here employed the direct computational costs i.e., the number of expectation evaluations as alternative measure.Figure 5(b) clearly shows the FQS advantage over Adam optimizer.The slower convergence of the Adam with the RzRyRz gates can be attributed to the same reason for Rotosolve in Figure 5(a).Note that we employed the learning rate of 0.1 as a hyper-parameter for Adam, which appears to be rather larger than in its conventional usage in VQA, but provided modest results compared to 0.01 and 0.001 in benchmark simulations as shown Fig. 13 in Appendix.
Figure 6 shows the optimized energy after 100 sweeps based on the ansaetze in Figs.4(a) and (b).In a single sweep, all gates were updated once in ascending order of the gate set index as labeled in Figs.4(a) and (b).Although the 100 sweeps may not necessarily be sufficient for rigorous convergence, the energy update by single sweep is too small to expect the drastic improvement by further iterations.Figure 6 provides several insights as follows: 1.For all ansaetze employed, the optimizers did not show distinct difference except for Rotosolve with Ry ansatz, if an extremely shallow circuit i.e., L = 1.
2. FQS showed the best performance among all the sequential optimizers on all the ansaetze (alternating layered, cyclic, and ladder entanglers, see also Fig. 11 in Appendix).Especially, we confirmed a systematical improvement associated with the degrees of freedom of the target gate i.e., FQS optimizes 3 degrees, Fraxis optimizes 2 degrees and Rotosolve optimizes 1 degree.
3. The relative advantage of FQS over Rotosolve for the RzRyRz ansatz depends on the circuit structure, where it is more distinct on the alternating layered ansatz rather than on the cyclic entangler ansatz.
4. The Rotosolve applications to a series of the fixedaxis gates (RyRz and RzRyRz) were not better than FQS even though they have equivalent expressibility.In some cases, on the contrary, a series of RyRz gates showed better performance than Rotosolve with RzRyRz gates that possess full expressibility for a single-qubit as well as an FQS gate.
5. The Rotoselect, which selects an optimal singlequbit gate within a part of SU (2) unlike FQS, tended to be trapped in local minimum.
Taking the higher expressibility and correlation among parameters of an FQS gate into account, the second insight apparently seems to be promising.However, the third insight suggests that the higher circuit expressibility alone may not be sufficient for successful optimization, and on the contrary, in some cases it may disturb further optimization if correlation among parameters is not considered.Note that barren plateaus do not account this hindrance because the FQS application that has equivalent expressibility to RzRyRz showed better performance.Hence, we suppose that Rotosolve optimization for RzRyRz gates is likely to be trapped at local minima and/or saddle points, while FQS may be more resilient due to its incorporating parameter correlation.

B. Fidelity maximization
As another example to verify FQS performance, we conducted the fidelity maximization, where the infidelity with a reference state was regarded as the cost.The reference states were independently prepared with Haar random generator in Qiskit [38] in each optimization.Figure 7 shows the results after 100 sweeps of 40 independent runs where we employed the alternating layered ansatz as in Fig. 4(a).We confirmed that FQS also showed better performance than the others, where the advantage was more distinct as the number of layers L increased.This observation is consistent with the previous results on a local Hamiltonian.

C. Spectral radius and noise-free barren plateau
To verify our theorem on the spectrum of FQS matrix, we conducted numerical experiments by evaluating the second moment of the spectral radius of the centered FQS matrix Eq. ( 17), where an FQS matrix S is evaluated with a randomly initialized quantum circuit.Here, we employed the alternating layered ansatz in Fig. 4(a) and two types of the cost function.The first cost function is infidelity with a quantum state randomly generated with n-qubit Haar measure, which is a global cost function.The second cost function is the expectation of the 1-local Hamiltonian defined as H = Z ⊗ I ⊗n−1 .
Figure 8(a) shows that the second moment is independent of the number of layers, but exponentially scaled according to the number of qubits.Since the reference states generated with random unitary U Haar from the unitary group with respect to the Haar measure are written as ⊗n , the fidelity can be regarded as the projection measurement on |0 ⊗n with the input ansatz appended a randomly-generated unitary: As a result, the condition employed in Theorem 1 holds, which is consistent with the present experiments.In the case of the local cost function, the exponential decay of the second moment can be confirmed in the large limit of the number of layers in Fig. 8(b).However, the second moment exhibited the transition from a constant value to exponentially small values as the number of layers increases.Therefore, the sequential quantum optimizers are able to circumvent noise-free barren plateau using an alternating layered ansatz with limited number of layers for local cost functions.We remark that although each block in the alternating layered ansatz used in the present experiments does not form unitary 2-design, the results seem to be consistent with the consequence of Theorem 2. Therefore, the unitary 2-design in the block and even alternating layered ansatz may not be necessarily required to circumvent barren plateau.Furthermore, we also observed the transition region of the second moment of the spectral radius with a different type of alternating layered ansatz (See Fig. 12 in Appendix).We also note that, at present, barren plateau has been analytically proven assuming the randomly initialized conditions.Although it is the case at the beginning of the optimization, the randomness does not stand anymore during the optimization, and thus it is not trivial how the optimization proceeds after random initializations.For further understanding, we carried on the additional optimizations with the 5-layer alternating layered ansatz varying the number of qubits.Figure 9 shows the resulting relative errors after 100 sweeps of VQAs comparing the global and local cost functions.Here, for the global cost we employed the fidelity with randomly-generated states as described in the previous section, while the local cost is the Hamiltonian of the mixed-field Ising model in Eq. (25).Note that the number of applications of FQS are not consistent in one sweep, because the total number of single-qubit gates increases according to the number of qubits.The resulting errors for the global cost become exponentially larger as the number of qubits increases, which is in line with the spectral radius in the beginning of the optimizations as in Fig. 8(a).In contrast, the results for local cost did not exhibit the exponential deterioration of trainability, which is also consistent with Fig. 8(b), where the relative error balanced after 6-qubit.However, we expect that this error may increase again, as the number of qubits increases beyond a certain threshold, because a constant layer circuit does not have sufficient expressibility for the target states, even though the optimization proceeds to some extent in the beginning.Thus, it is required for scalability to use a high expressible circuit e.g., alternating layered ansatz with high expressible blocks [31] keeping the number of layers small.Since our method utilizes the full expressiblity of single-qubit gates to optimize the circuit structure holding the number of layers, we believe that FQS is effective to this requirement.

IV. CONCLUSION
We proposed a new quantum algorithm for VQAs, called FQS, based on analytical optimization of circuit structure with respect to a single-qubit gate.We have shown that the expectation of an observable on a quantum state prepared by a parameterized quantum circuit can be rewritten as the solvable quadratic form on parameters of a single-qubit gate in the cicuit, and our algorithm utilizes the matrix factorization based on this quadratic form.The matrix factorization framework has also revealed the hierarchical relation of sequential quantum optimizers in the degrees of freedom of simultaneous optimization for single-qubit gate i.e., Rotosolve (NFT) ≤ Rotoselect ≤ Fraxis ≤ FQS.Moreover, by introducing the spectral radius as a measure to evaluate the performance of sequential optimizers (including the existing methods), we rigorously proved the exponential scaling of spectrum concentration of the matrix associated with the quadratic form if the circuit is too deep, which is inherently equivalent to barren plateau in gradient optimizers.FIG. 9. Boxplots for the relative errors of the obtained cost measured from the exact solution after 100 sweeps as a function of the number of qubits.As for the global cost, we executed the fidelity maximization as described in Sec.III B, while the mixed field Ising Hamiltonian in Eq. ( 25) was employed as the local cost.For both VQEs, the 5-layer alternating layered ansatz in Fig. 4(a) was employed.The box plots denote quantiles consisting of independent 20 optimizations.
On the other hand, we also proved the possibility to avoid the exponential concentration by supposing a local cost function and an alternating layered ansatz.
In the numerical experiments, we confirmed FQS achieved a good balance between trainability and ex-pressibility by circuit structure optimization and incorporation of the intra-gate parameter correlation and outperformed a gradient-based optimizer as well as the other sequential optimizers.We showed the efficacy of our proposed framework by extensive numerical experiments and confirmed the relations of sequential optimizers with barren plateaus.Since FQS advances the circuit structure optimization from a heuristic to an analytical one, it may provide a solution to a critical problem of VQA, that is, circuit design.We hope the results are instrumental for promoting the use of sequential optimizers in VQAs.

VII. APPENDIX A. Derivation of quadratic forms
Substituting the quaternion representation of single-qubit gates R(q) = q • ς into the energy (8), we obtain A µν q µ q ν + 3 ν,µ=0 A µν q µ q ν + 3 µ,ν=0 where the matrix S := (S µν ) in the last line is defined by symmetrization of A := (A µν ) as follows Here, the S is obviously a real symmetric matrix since ς † µ H ς ν + ς † ν H ς µ is a Hermitian operator.We write S as the lower right 3 × 3 part in S matrix.Then, the quadratic form for Fraxis (more precisely, π-Fraxis algorithm) [29] is represented with S. Actually, Fraxis algorithm can deal with the single-qubit gate U Fraxis := R n (π) and the corresponding quaternion is q = (0, n) , which is simply located on the two-dimensional spherical surface.Thus, from Eq. (A.1) we directly obtain the quadratic form for Fraxis as As for the NFT [28] and Rotosolve/Rotoselect [26], they can optimize an axis-fixed single-qubit gate where m is a fixed rotational axis.Substituting the quaternion q = (cos ψ/2, m sin ψ/2) corresponding to this gate into the quadratic form of FQS, we obtain where S 0 := (S 01 , S 02 , S 03 ).Here, Rotoselect simply solve the minimization problems of the quadratic form for different axis m ∈ Λ and select the optimal axis for minimizing the energy, where Λ is a predefined subset of rotational axis such as Λ = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}.

B. Proof of main theorems
As derived in Appendix VII A, each sequential quantum optimizer has a p × p real-symmetric matrix.Here, p − 1 is the degree of freedom of the target single-qubit gate (−1 means the constraint of normalization).Note that FQS gate or general single-qubit gate R(q) (p = 4), Fraxis gate R n (π) (p = 3), and NFT gate with m axis R m (ψ) (p = 2) are written as the following unified form µ is the µth element of unit vector in R p .Here, ς is an extension of the Pauli matrices such as ( ς) (3) = (−iX, −iY, −iZ) for Fraxis gate and ( ς) (2) = (I, −im• σ) for NFT gate.Accordingly, each p×p real-symmetric matrix is also written as Note that S (3) and S (2) correspond to the real-symmetric matrix in Eqs.(A.3), (A.4), respectively.In the below, we omit the superscript p for simplicity.Moreover, it is convenient to write the elements of S as in In the following, d denotes the dimension of n-qubit system i.e., d = 2 n .Here, we provide a proof of Theorem 1.For convenience, we recall it.
Theorem 1. Suppose that the quantum circuits U 1 and U 2 are randomly and independently generated.If either U 1 or U 2 forms a unitary t-design with t ≥ 2, the second moment of the spectral radius, r, for the centered matrix S c = S − tr[S]I p×p /p (I p×p denotes the p × p identity) is upper bounded as follows.

, (B.4)
where the first case corresponds to U 1 being a t-design, and the second case corresponds to U 2 being a t-design.Here, E U1,U2 [•] is defined as the expectation over the random quantum circuits U 1 and U 2 , δ µν denotes the Kronecker delta, h.c.means the Hermite conjugate of the preceding term, and ∆ρ in,2 , ∆H 2 are defined as Proof.Since the centered matrix S c is a real-symmetric matrix regardless of quantum circuits U 1 and U 2 , the following inequality holds, which comes from the maximum is at most the square root of the sum of the squares.
where • F denotes the Frobenius norm.Thus, the second moment of the spectral radius is evaluated as First, we evaluate the r.h.s of Eq. (B.7) on the condition that the random quantum circuit U 1 forms a unitary t-design with t ≥ 2. Using lemma 1, which is proved in the next subsection, the following identity holds, where we defined Hence, we can calculate the expectations in the final line of Eq. (B.7) as follows.
where we defined Accordingly, we can evaluate the second moment of the spectral radius as in ς † ν ς µ + h.c., (B.17 where we used ς † ν ς µ = (−1) 1−δµν ς † µ ς ν in the last equality, and Here, we provide a proof of Theorem 2. For convenience, we recall it.
Theorem 2. Suppose that the whole quantum circuit U is an n-qubit alternating layered ansatz with m-qubit blocks as described in Sec.II E 2. Here, we focus on a block W in lth layer and a parameterized single-qubit gate R in the block.We assume that the quantum circuits W A , W B ⊂ W , which are located after and before the target gate respectively, and the other blocks form a local 2-design independently.In addition, the Hamiltonian H is assumed to be m-local such as where a tensor product of Pauli matrices h i acts non-trivially on at most m-qubit.(This is the same assumption in [30].)Then, the second moment of the spectral radius is lower bounded as Proof.To establish the lower bound of the second moment, we begin with the following inequality, which comes from the maximum is at least the square root of the average of the sum of the squares.
where || • || F denotes the Frobenius norm.Since the inequality holds for any quantum circuits U 1 and U 2 in Fig. 2, We first evaluate the expectation of S µν S µ ν over the block of interest.From the setting, the block W containing a single-qubit gate R is decomposed as where 1 m−1 is the identity on m − 1 qubits system, and W A , W B ⊂ W are the quantum circuits after and before R, respectively.As shown in Fig. 2, the quantum circuits U 1 and U 2 for an alternating layered ansatz can be written as where 1 w denotes the identity over the qubits on which the block W acts trivially.Here, V 2 contains the gates in the forward light-cone L of W , i.e., all gates with at least one input qubit causally connected to the qubits of W as in Fig. 3, and V 1 contains other gates.Accordingly, we can write the elements of S matrix as and we obtain Here, we defined where tr ω [•] means the partial trace over the qubits which are not in W , and {|i } denotes the computational basis on (n − m)-qubit system.
Since W A forms a local 2-design, we first compute the expectation of Eq. (B.26) over W A as Here, we employed the following integration formula from the Weingarten calculus [30] where W is Haar-distributed on the unitary group of degree 2 m , and A, B, C and D are arbitrary linear operators on an m-qubit system.Noting that in,ij W † B = 2δ µν tr ρ where we used the formula Eq. (B.29) in the first equality.In the second equality, we used the following relation, which can be derived from direct calculation, as ji H (2) Here, we defined ji H (2) Finally, we evaluate the expectations over V 1 , V 2 in Eq. (B.39), which can be calculated basically with a series of integration for the m-qubit blocks in V 1 , V 2 .In the same assumption of ours, the previous study [30] has showed the following inequality holds: where we recall that L is the total number of layers, and the block W of interest is in the lth layer.Combining this inequality with Eq. (B.39), we establish the lower bound for the second moment of spectral radius as In this subsection, we provide some lemmas for the proof of Theorem 1.

FIG. 1 .
FIG. 1. Schematic diagram of VQA with FQS in comparisonwith other optimizers.The blue and red circles correspond to the expressibility of the single-qubit gate of FQS and Fraxis, respectively.Since FQS (Fraxis) has full (partial) degrees of freedom in a single-qubit gate, the FQS (blue) circles are larger than those of Fraxis (red).FQS and Fraxis can find the optimal state in its circle that drifts stepwise reflecting the circuit structure optimization.

1 =FIG. 3 .
FIG. 3. Schematic diagram of the alternating layered ansatz with m-qubit untiary blocks.In Theorem 2, we focus on a single-qubit gate R in a block W of the lth layer as depicted in (b), where WA and WB are a set of unitary gates located after and before the target single-qubit gate.L (LB) denotes the forward (backward) light-cone with respect to W . Here, U1 in Fig. 2 includes the gates in L and WA, and U2 includes the other gates.

FIG. 5 .
FIG.4.PQCs employed for numerical experiments.Each layer consists of gates in the dashed line, and the total number of layers is written as L. In sequential quantum optimization, parameterized single-qubit gates R are updated in ascending order of the subscript.

FIG. 6 .
FIG.6.Boxplots of the resulting energy of VQE for 5-qubit 1-dimensional mixed field Ising model after 100 sweeps.The 20 independent VQEs were conducted from randomly-generated initial states.The vertical axis represents the difference between the obtained energy and the exact ground energy.

FIG. 7 .
FIG.7.Boxplots of the VQE results for fidelity maximization as a function of the number of circuit layers.The alternating layered ansatz in Fig.4(a) was employed.The box plot denotes quantiles consisting of independent 40 resulting energies with randomly-generated initial parameter set and target states taken from the Haar distribution.

FIG. 8 .
FIG.8.The second moment of spectral radius of the centered FQS matrix.The FQS matrix is evaluated for the single-qubit gate acting on the first qubit in the first layer of the alternating layered ansatz.The second moment is evaluated with 1000 samples based on the randomly initialized circuits with respect to the parameterized single-qubit gates.We employed the infidelity with the state generated using Haar measure on n-qubit unitary as the global cost.For the local cost function, we employed an expectation value of the Hamiltonian H = Z ⊗ I ⊗n−1 .

B. 20 )
where L denotes the total number of layers.Here, i L is the set of i indices whose associated operators h i act on qubits in the forward light-cone L of W , and k LB is the set of k indices whose associated subsystems S k are in the backward light-cone L B of W .The quantum state ρ k,k is the reduced density matrix of the input state ρ in on S k S k+1 • • • S k , and the function (M ) for a matrix M is defined as (M ) = D HS (M, tr(M )1/d M ) where D HS is the Hilbert-Schmidt distance and d M is the dimension of the matrix M .
(B.30) holds, then the first term of Eq. (B.28) is independent of W B .Since W B also forms a local 2-design, a part of the second term of Eq. (B.28) is evaluated as L+l i∈i L (k,k )∈k L B k ≥k c 2 i (ρ k,k ) (h i ), (B.40)

FIG. 12 .FIG. 13 .
FIG.12.The second moment of spectrum radius of the centered FQS matrix.The FQS matrix was evaluated for the singlequbit gate acting on the first qubit in the first layer of the alternating layered ansatz with ladder entanglers as shown in Fig.10.The second moment was evaluated with 1000 samples based on the randomly parameterized circuits.For the local cost function, we employed an expectation value of the Hamiltonian H = Z ⊗ I ⊗n−1 .