An exponentially-growing family of universal quantum circuits

Quantum machine learning has become an area of growing interest but has certain theoretical and hardware-specific limitations. Notably, the problem of vanishing gradients, or barren plateaus, renders the training impossible for circuits with high qubit counts, imposing a limit on the number of qubits that data scientists can use for solving problems. Independently, angle-embedded supervised quantum neural networks were shown to produce truncated Fourier series with a degree directly dependent on two factors: the depth of the encoding and the number of parallel qubits the encoding applied to. The degree of the Fourier series limits the model expressivity. This work introduces two new architectures whose Fourier degrees grow exponentially: the sequential and parallel exponential quantum machine learning architectures. This is done by efficiently using the available Hilbert space when encoding, increasing the expressivity of the quantum encoding. Therefore, the exponential growth allows staying at the low-qubit limit to create highly expressive circuits avoiding barren plateaus. Practically, parallel exponential architecture was shown to outperform the existing linear architectures by reducing their final mean square error value by up to 44.7% in a one-dimensional test problem. Furthermore, the feasibility of this technique was also shown on a trapped ion quantum processing unit.


I. INTRODUCTION
The successes of quantum computing in the past decade have laid the foundations for the interdisciplinary field of quantum machine learning (QML) [1,2], where parameterised quantum circuits (PQC) are used as machine learning models to extract features from the training data.It was argued [3] that such quantum neural networks (QNN) could have higher trainability, capacity, and generalization bound than their classical counterparts.Practically, hybrid quantum neural networks (HQNN) have shown promise in small-scale benchmarking tasks [4][5][6][7] and larger-scale industrial tasks [8][9][10][11].Nevertheless, the utility, practicality, and scalability of pure QNNs are still unclear.Furthermore, [12] provided a thorough overview of this field, showing that while classical machine learning is solving large real-world problems, QNNs are mostly tried on synthetic, clean datasets and show no immediate real-world advantages in its current state 1 .It also suggested that the QNN research focus should be shifted from seeking quantum advantage to new research questions, such as finding a new, advantageous quantum neuron.This work explores a new quantum neuron that argues for moving beyond using the Pauli-, single-qubit gates for encoding data and instead employing higher-dimensional unitaries through gate decomposition.Since quantum gates are represented by elements of compact groups, Fourier analysis is a natural tool for analysing QNNs.[16] showed that certain quantum encodings can create an asymptotically universal Fourier estimator.A universal Fourier estimator is a model that can fit a trigonometric series on any given function.As the number of terms in the series approaches infinity, the fit becomes an asymptotically perfect estimate.This estimator can initially infer the coarse correlations in the supplied data, and by increasing the number of Fourier terms, it can incrementally judge the more granular properties of the dataset.This provides an adjustable, highly-expressive machine learning model.In [16], the authors showed that a QNN could operate as such in two ways: a sequential single-qubit architecture with n repetitions of the same encoding could yield n Fourier bases, which could also arise from an n-qubit architecture with the encoding gates applied to all qubits in parallel.The sequential architecture is widely shown to be an efficient Fourier estimator, demonstrated both theoretically [16] and empirically [17,18].However, sequential circuits are often deep [18], and assuming that near-term quantum computers will have a non-negligible noise associated with each gate, these circuits can experience noise-induced barren plateaus [19].Barren plateaus are a phenomenon observed in optimization problems where the gradients of the model vanish, rendering them impossible to train.More importantly, a single qubit can be simulated efficiently in a classical setting, so this architecture brings no quantum advantage.In contrast, the parallel setting offers the exponential space advantage of quantum computing but poses two challenges for large numbers of qubits: • An exponentially growing parameter count with the number of qubits required to span the entire group space SU(2 n ) [20], which could also lead to noise-induced barren plateaus.Spanning the full space is especially important for a priori problems, where the best model lies in the Hilbert space.Still, the machine learning scientist has no prior knowledge of parameterising a circuit to reach this point.In addition, gradient calculations in QNNs -as of this publication -use the parameter-shift rule discovered and presented in [21].This method requires two evaluations of the QNN to find the derivative of the circuit with respect to each of the trainable parameters.
An exponentially growing number of trainable parameters translates to exponentially increasing resources required for gradient computation.In mitigation, [22,23] showed that a polynomiallygrowing number of parameters could generate a similar result based on the quantum t-design limits [24].
• Strongly-parameterised QNNs have a vanishing variance of gradient which decreases exponentially with the number of qubits [25].This means that for a large number of qubits if one initialises her QNN randomly, they will encounter a barren plateau.This happens because the expectation value of the derivative of the loss function with respect to each variable for any well-parameterised quantum circuit is zero, and its variance decreases exponentially with the number of qubits.Mitigation methods have been suggested in [26,27], and most notably in [28], it was shown that by relaxing the well-parameterised constrain to only include a logarithmically growing circuit depth with the number of qubits in the system and use local measurements, the circuit is guaranteed to evade barren plateaus.[29] developed a platform based on ZX-calculus2 to explore which QNN architectures are affected by the barren plateau phenomenon and found that strongly-entangling, hardware efficient circuits suffer from them.In contrast with the previously-mentioned cases of barren plateaus, the latter is not noise-induced.Thus, this problem must be addressed even in the fault-tolerant future of quantum computing.
Therefore, the practising QML scientist is limited in choosing her QNN architectures for general data science problems: they need to be shallow3 or employ only a few qubits.This contribution suggests modifying the encoding strategies in [16] to increase the growth of the Fourier bases in a QNN from linear in the number of qubits/number of repetitions to exponential.The proposed encoding is constructed by decomposing large unitary generators into local Pauli-Z rotations.This improves the expressivity of the QNNs without requiring additional qubits or encoding repetitions.The increased expressivity is a product of eliminating the encoding degeneracies of the quantum kernel, making efficient use of the available Hilbert space by assigning a unique wavevector to each of its dimensions.However, such encodings could introduce a greater risk of limiting the model's Fourier accessibility 4 .
Sec. II provides a review of how angle-embedded QNNs approximate their input distributions by fitting to them a truncated Fourier.Specifically, Sec.II reviews the linear encoding architectures and how their number of Fourier bases grow linearly with the number of repetitions -sequential linear in Sec.II A -as well as the number of qubits -parallel linear in Sec.II B. Then, Sec.III introduces the same two architectures but slightly modified to represent an exponentially-growing number of Fourier bases.To use these architectures in practice, Sec.III C compares the training performance of these architectures and shows that the parallel exponential has a superior training performance to the other architectures on a synthetic, one-dimensional dataset.Finally, Sec.III D critically evaluates the work and suggests areas for future investigation.

II. BACKGROUND REVIEW -LINEAR ARCHITECTURES
As discussed in [16], all quantum neural networks that use angle embedding 5 as their encoding strategy produces a truncated Fourier series approximation to the dataset.[16] also specifically explored two families of architectures of quantum neurons: a single-qubit architecture with a series of sequential SU(2) gates and a multi-qubit architecture with parallel SU(2) encoding gates.In this section, the results and the architectures introduced in [16] are explored in depth in Sec.III.Two alternative QNN architectures are presented with the capability of achieving an exponentially higher Fourier expressivity for the same number of gates.Consider a quantum neuron that maps a real feature x ∈ X onto the quantum circuit via a parametric gate S(x) = e −iGx .In most common architectures, the only parametric gates are single qubit rotations {R x , R y , R z }.For this work, the Pauli-Z generated rotations are used without any loss of generality , then the embedding gate takes a simple form S(x) = e −ix/2 0 0 e ix/2 .In general, the dependence of the expected value of any observable on the parameter x is then given by * 1 e −ix with some complex parameters c 0 and c 1 , which depend on the rest of the circuit and the measurements.This expected value is a function of the feature x with a very simple Fourier series.The data reuploading method [17] is a natural way to construct neurons that give rise to richer Fourier series.These are architectures where several parametric gates depend on the same x.It is the most straightforward to consider gates that have a hardwired dependence on the feature 7 .In particular, such that the expected value of any observable takes the form of a discrete Fourier series where θ the variational parameters and c k ∈ C with c * −k = c k for real observables.In Sec.II A and II B, two architectures exhibited in [16] are reviewed.The Fourier expressivities of these architectures are of particular interest, that is, the list of wavenumbers {k 1 , k 2 , . . .} appearing in the exponents in Eq. (1).

A. Sequential linear
The single-qubit sequential linear method uses repetitions of the same single-qubit encoding gate S(x) interlaced in-between trainable variational layers.Fig. 1a shows this implementation with generalized variational gates.Since the eigenvalues of each unitary are e ±i 1 2 x , it is straightforward to observe (see, e.g.App.A 1) that after n encoding layers, the expected value of any observable takes the form Thus, the repetitions have an additive effect such that for n repetitions, the final list becomes The four general circuits under analysis in this paper.In the exponential architectures, the first encoding is kept the same, and the subsequent encoding gates are multiplied by the coefficients in Eq. ( 6).
The parallel circuits have a CNOT layer at the end to ensure that all qubits are cooperating in the training by propagating the π-measurement through all quantum wires.
same frequency.Therefore, for n repetitions of the encoding S(x) = e −iGx , n distinct Fourier bases are generated.

B. Parallel linear
In the parallel setting, the single-qubit encoding gates are applied in parallel on separate qubits -see Fig. 1c.Similarly to the sequential encoding, for n parallel rotations n Fourier bases are produced.This is due to the commutativity between the parallel rotations as they act on separate qubits.The generator G becomes: where q is the qubit index, r indicates the total number of qubits, and σ (q) z is the Pauli-Z matrix applied to the q th qubit.In App.A 2, it is shown that G -being a square matrix of dimensions 2 r -has 2r + 1 unique eigenvalues.This suggests a high degree of degeneracy in its eigenspectrum.As before, subtracting these values from themselves yields a list of wavenumbers ranging from −r to r generating r Fourier bases.

C. Redundancy
Both in the sequential and parallel linear architectures, there is a lot of redundancy in how the feature is encoded into the circuit.This is the easiest to see for the parallel architecture, where most of the eigenvalues of exp(ixG) are largely degenerate as the encoding commutes with qubit permutations.

III. RESULTS -EXPONENTIAL ARCHITECTURES
In this section, two new families of architectures are suggested that can encode an exponential number of Fourier bases for a given number of repetitions/parallel encodings.The basis of this generalization is to modify each "subsequent" appearance of the encoding gate in the circuit by a re-scaling of the generator S(x) → S(mx) with an integer m.Keeping the factors m integer guarantees that this procedure results in a discrete Fourier series in the form of Eq. (1).

A. Sequential exponential
It was shown in Sec.II A that the wavenumbers created in the linear models are highly degenerate.By modifying the circuit encoding, this degeneracy can be reduced, resulting in adding new wavenumbers to the list.This is accomplished by altering the generators in the individual encoding layers.In the linear case, the diagonal elements of the generator λ i always belonged to the list {− 1 2 , 1 2 }, but could be altered by scaling the generator G in each layer.In practice, this is achieved by scaling the embedded data x and mathematically associating it with the generator.The resultant function becomes 2 {−a l , a l } for a l ∈ N 8 .In this work, a l scales as follows a l = {2 0 , 2 1 , 2 2 , • • • , 2 n−1 + 1}.The motivation behind this choice is the sum of powers of 2, n−1 i=0 2 i = 2 n − 1, where the largest wavenumber possible, 2 n , is obtained by taking all the positive contributions from the list of eigenvalues, i.e. k max = n−1 i=0 2 i + 2 n−1 + 1 = 2 n .Next, one can switch the signs of the positive values to negative starting from the smallest term to produce all integers from −2 n to 2 n .This generates 2 n Fourier frequencies.Fig. 1b shows a quantum circuit encoded using the sequential exponential strategy with 2 layers.App.C demonstrates that this network produces extreme constraints on the Fourier accessibility and thus is an undesirable choice for general data modelling.However, this scheme motivates extending this idea to parallel architectures.

B. Parallel exponential
To perform this extension, it is appropriate to proceed with a two-qubit example.The parallel linear method described in Sec.II B produces the generator: This matrix has three unique eigenvalues, λ ∈ {−1, 0, 1}, and when subtracted from itself -yielding wavenumbers L (lin) k = {−2, −1, 0, 1, 2} -it can produce 2 Fourier bases with frequencies {1, 2}.One could generate a matrix with more unique values.For example, is a generator with four unique eigenvalues that generate nine wavenumbers {−4, −3, −2, −1, 0, 1, 2, 3, 4}.This generator can be constructed using the quantum circuit shown in Fig. 1d.In this case, a SU(4) generator is employed.This is decomposed into two SU(2) generators, one using the group parameter, x, and the other 3x.This can be generalized to n qubits as one can extend the matrix for larger numbers of qubits, i.e. for n qubits G would be a diagonal matrix starting from −2 (n−1) up to 2 (n−1) , producing 2 n Fourier bases.The quantum circuit associated with this generator is an application of Pauli-Z rotations of x with frequencies increasing in the following way: where n is the number of qubits.Note the similarities between the sequential and parallel encodings and their symmetries in how the circuits are constructed.One also recognizes similarities between the parallel encoding and Kitaev's quantum phase estimation algorithm [34], albeit in this case, x is a classical feature.This can be significantly more expressive than the parallel linear method.Still, this advantage needs to be accompanied by Fourier accessibility.If the Fourier values of these newly-acquired bases cannot be altered, there would be no advantage in pursuing this setting.Sec.III C shows a significant advantage in using parallel exponential encoding in a simple toy example.

C. Training
In this section, the training performance of these four architectures on a simple dataset is compared.Each architecture is trained to reproduce a one-dimensional top-hat function.Fig. 4 shows the ground truth, as well as the fitting performances of these architectures, and Fig. 3, shows their training performance.It is clear that the parallel exponential architecture fits a closer function to the ground truth, and in contrast, the sequential exponential architecture has the worst performance of all models.Furthermore, the Fourier decompositions of the models in Fig. 6 show that exactly two Fourier terms are accessed by the linear architectures and four by the exponential ones.Additionally, Fig. 5 demonstrates the performance of the parallel exponential architecture on a trapped ion quantum processor.The

Sequential Parallel
Linear Exponential Ground Truth FIG.4: The QNNs fit the best possible truncated Fourier series on the top-hat function.The parallel exponential architecture provides the best fit.Even though the sequential exponential architecture has access to the same four Fourier frequencies, it fails to access all of them efficiently, and as a result, it performs sub-optimally.The linear architectures perform similarly to each other, potentially arising from their high Fourier accessibility to the two Fourier frequencies that they can represent.

Ground Truth
Parallel Exponential -QMware Simulator FIG.5: The parallel exponential fit to the top-hat function on a simulator vs on the IonQ Harmony quantum processor.The noisy solid line evaluates this network for 100 equally-spaced points using 100 shots of this device.
IonQ implements a high-fidelity gate-based quantum processing unit through a process known as laser pumping trapped-ions explained in [37].The hardware was shown to be one of the most accurate in recent benchmarking tests [38].We specifically used the hardware introduced in [39] with a singlequbit fidelity of 0.997 and a two-qubit fidelity of 0.9725.The code implementation was done through Amazon Web Services (AWS) Braket, and the process of the forward pass for 100 data points took four hours and 11 minutes due to the delays and queuing times.It can be observed that the low number of shots is the dominant source of noise here, and higher shot counts could yield a smoother curve that is closer to the simulator.It is noteworthy that even though the superconductor-based QPUs were not tested here, they are expected to produce a similar result if they match the single-and two-qubit gate fidelity rates.

D. Critical Evaluation
While the results for the parallel exponentials are encouraging, it is equally important to understand the limitations of this approach.Firstly, while the exponential growth in the number of Fourier frequencies is evident, this is not the higher limit of Fourier frequency growth.[16] showed that for L repetitions of an encoding gate with a Hilbert space of dimension d, there is an upper limit to this growth of the form where K is the number of Fourier frequencies.This suggests a potential for square-exponential growth, whereas the method discussed in this work only grows exponentially.In App.D, a mathematical problem is proposed whose solution could unlock the maximum possible Fourier accessibility.Secondly, it is important to emphasize that the two parallel architectures are the same, with a minor multiplicative factor added in the exponential case.Training them for a fixed number of epochs requires the same computational resources.However, adding more Fourier bases by eliminating the network's degeneracy could result in under-parameterised models.Therefore, it is often necessary to parameterise the exponential architectures more heavily than the linear ones, indirectly affecting the required resources.Every Fourier frequency requires two degrees of freedom (real-valued parameters), and an exponentially-growing Fourier space requires the resources to grow exponentially, too.These resources could include the classical memory required to store the parameters or the classical optimizer that needs to calculate the gradient for these parameters.And lastly, extending this to many qubits will still result in barren plateaus.

IV. CONCLUSION AND FUTURE WORK
This work suggested two new families of QNN architectures, dubbed sequential and parallel exponential circuits that provided an exponentially growing Fourier space.It was demonstrated that the former struggled with accessing these frequencies but also that the latter showed an advantage in approximating a top-hat function.
Future work could focus on a quantitative understanding of the Fourier accessibility of these networks, such that the optimal variational parameterisation could be chosen for a specific problem.Another possible direction for future work is to depart from hardwired encoding gates.A natural elementary step in this direction is to consider single-qubit gates of the form S i (x, w i ) = exp(−i xw i 1 2 σ z ), where the scaling factor w i is an independent scalar trainable parameter for each occurrence of the encoding gate in the circuit.In this case, the final wavevectors k are linear combinations of the parameters w i that can be potentially trained efficiently.As an added note, the parallel exponential encoding introduced in this work for up to two qubits coincides with the commendable work in [40].This paper came to our attention after we had released the preprint, and we recognise that the parallel exponential architecture bears resemblance to the Trenary encoding both for the two-qubit case and in the type of growth in Fourier terms, albeit with different scaling strategies.Furthermore, [41] follows a similar example and creates this architecture for an optical setup.FIG.7: Fourier phases of the two exponential architectures: (a) sequential and (b) parallel.Each architecture was realised 10, 000 times, and their arguments were calculated using the discrete Fourier transform of their outputs.We see that the sequential architecture has a restricted four-dimensional behaviour and that a linear dependence between the phases seems to exist, demonstrating a lack of Fourier accessibility.In contrast, the parallel architecture can fill the space, but still, some constraint is visible between the arg(c1) and arg(c4) bases.
where G 1 new = G 1 remains unchanged.As the rotation angles are now equal -both are now simply xone could add the generators to obtain: In Sec.III C, it was shown that both sequential and parallel exponential architectures represented four Fourier frequencies.However, the latter achieved a lower training loss and a better fit for the top-hat function.This is due to the reduced Fourier accessibility of the sequential architecture, meaning it lacks the freedom to achieve any desired point in the Fourier space.The models with four Fourier frequencies are realised in a nine-dimensional space that includes c 0 and the real and imaginary values of c i .Each realisation of the trainable parameters of the quantum circuit produces a 9-dimensional array creating a point in this 9-dimensional space.Realising these two architectures many times makes it possible to analyse the geometry of the Fourier space for each architecture.Still, for manual observation, finding an efficient way to reduce this dimensionality to three dimensions (or four with colour) is essential.Fig. 7 shows a choice for this dimensionality reduction by investigating the arguments of the complex Fourier coefficients, which, based on Eq.A7, represent the phases of the co-sinusoidal terms.These show that the sequential exponential architecture is dramatically constrained in the collection of phases it can represent and that the parallel exponential is unconstrained in this way.The problem statement produces a list of eigenvalues L from which one can make a list of wavenumbers k (max) L .After finding this list, it is crucial to check if one can create a diagonal Hamiltonian using R Z rotations and non-parameterised gates whose diagonal elements are the numbers in L. In Appendix C.3 of [42], this problem is equated to the perfect Golomb ruler where for 5 ≤ m, this becomes impossible, and the numbers either become nonsequential or degenerate.
FIG.1:The four general circuits under analysis in this paper.In the exponential architectures, the first encoding is kept the same, and the subsequent encoding gates are multiplied by the coefficients in Eq. (6).The parallel circuits have a CNOT layer at the end to ensure that all qubits are cooperating in the training by propagating the π-measurement through all quantum wires.

FIG. 3 :
FIG.3:Training losses indicate a training advantage for the parallel exponential, and the sequential exponential architecture performs only marginally better than the linear architectures.The training was done on QMware hardware[35] using the PennyLane Python package[36].The Adam optimizer minimises a mean squared loss function with a learning rate of ϵ = 0.1 and with uniformly-distributed parameters θ ∈ [0, 2π].

FIG. 6 :
FIG. 6: Fourier decomposition of the four architectures after training to fit the top-hat function.The linear architectures can only access two Fourier frequencies, whereas the exponential ones can access four.

)
This generator produces the eigenvalue list λ ∈ {−2, −1, 1, 2} and by subtracting this list from itself, one can obtain the list of wavenumbers k ∈ {−4, −3, • • • , 3, 4}, a list of exponential growth with the number of qubits.Appendix C: Constraints of the sequential exponential

Appendix D : Problem 1
Beyond exponential growthAs in App.A 1, to reach the final list of wavenumbers needed to subtract the eigenvalues of the Hamiltonian in pairs.This section proposes a mathematical problem leading to the highest possible Fourier series.For a given m ∈ N, find a list of integers L ∈ Z m such that when subtracted from itself, it produces a new list k (max) L = {x − y |x, y ∈ L} whose elements are sequential integers and, except for zero, all the elements have a degeneracy of precisely one.