Neural predictor based quantum architecture search

Variational quantum algorithms (VQAs) are widely speculated to deliver quantum advantages for practical problems under the quantum–classical hybrid computational paradigm in the near term. Both theoretical and practical developments of VQAs share many similarities with those of deep learning. For instance, a key component of VQAs is the design of task-dependent parameterized quantum circuits (PQCs) as in the case of designing a good neural architecture in deep learning. Partly inspired by the recent success of AutoML and neural architecture search (NAS), quantum architecture search (QAS) is a collection of methods devised to engineer an optimal task-specific PQC. It has been proven that QAS-designed VQAs can outperform expert-crafted VQAs in various scenarios. In this work, we propose to use a neural network based predictor as the evaluation policy for QAS. We demonstrate a neural predictor guided QAS can discover powerful quantum circuit ansatz, yielding state-of-the-art results for various examples from quantum simulation and quantum machine learning. Notably, neural predictor guided QAS provides a better solution than that by the random-search baseline while using an order of magnitude less of circuit evaluations. Moreover, the predictor for QAS as well as the optimal ansatz found by QAS can both be transferred and generalized to address similar problems.

The conventional VQA requires a fixed quantum structure ansatz where the trainable parameters are iteratively adjusted via a classical optimizer in order to minimize an objective function.Some famous PQCs include QAOA ansatz for combinatorial problems [10], hardware efficient ansatz [28] and unitary coupled clusters (UCC) ansatz [4,5,29] for VQE.There are also extensive studies on the expressive power and trainability on these ansatzes [30][31][32][33][34].However, it still remains open on how to discover specifically tailored quantum ansatzes for different tasks.Partly inspired by neural architecture search (NAS) from the AutoML community [35][36][37][38], we introduced the concept of quantum architecture search (QAS), which refers to a collection of effective methods that systematically search for an optimal quantum circuit ansatz for a given problem in Ref. [39].Due to the apparent analogy between variational quantum circuits and neural networks, approaches developed in NAS have been adapted to QAS in the quantum computing domain.Representative works have utilized ideas including evolutionary/genetic algorithms [40][41][42][43][44][45][46], reinforcement learning (RL) approaches [47,48], greedy algorithms [9,[49][50][51], and differentiable architecture search [39].
While QAS is an elegant idea, it faces an important challenge of exploring and evaluating many quantum ansatzes during the training.Yet this computational bottleneck is intrinsic to all design-by-search methodologies including NAS in deep learning.In order to enhance search efficiency for NAS, there are two mainstream evaluation policies widely adopted in NAS.The first one is weights sharing, where trainable parameters are reused instead of standalone training for each ansatz.Such weights sharing policy is utilized in one-shot search [52][53][54] and DARTS [55] in NAS, and the same idea has been exploited in the corresponding QAS frameworks, quantum circuit architecture search [56] and differentiable quantum architecture search [39], respectively.The second type of evaluation policy is to evaluate the fitness of an architecture by a meta machine learning model.Such predictor-based methods constitute another wellestablished and actively researched subfield in NAS [57][58][59][60][61][62][63][64][65][66][67][68][69]; yet, to the best of our knowledge, a predictor-based QAS framework has not been explored so far.In this work, we introduce the first predictor-based QAS.We train a neural predictor to directly gauge the performance of candidates of quantum circuits using only the structure of quantum ansatz.This predictor is then integrated into a QAS workflow (to be elucidated) and can substantially accelerate the search process.
The main contributions of the present work are summarized below.
NAS [35][36][37][38] is a recently emerging and rapidly developing field.It aims at, automatically, searching for optimal neural networks for some given tasks without relying on handcrafted design and expert guidance.An effective NAS workflow is composed of several ingredients with the sampling strategy and evaluation protocol being among the most important.In short, sampling strategy refers to a customized recommendation of candidate neural networks, whose fitness for a specific task should be evaluated (eg.training and checking accuracy on some validation/test sets).Since the search space is exponentially large, we need an efficient method to traverse and sample candidate architectures.The sampling strategies that have been previously attempted include random search [70], local/greedy search [71][72][73], evolutionary/genetic type search [74][75][76][77][78], RL based search [79][80][81][82], Bayesian optimization search [59,61,62] and so on.While sampling is essential for NAS, the computational bottleneck is really the evaluation part as it is time consuming to train individual networks (from scratch) on a large dataset.In fact, early NAS works [79] using such plain evaluation methods with RL or evolutionary engine often take thousands of GPU hours before identifying an optimal neural network architecture.
There are two approaches to lower the cost of individual evaluations.The first one is weights sharing.In this setup, all candidate networks are organized within a super network, which can be trained either by training the entire network or by training sampled subnetworks in each epochs.After training the super network, we can evaluate each candidate subnetwork by regarding it as the child of the super network and the weights of such candidates are inherited from the super network directly without further tuning.Therefore, the evaluation on the candidate network is as simple as a forward pass for inference.Such parameter sharing setup is inspired from one-shot NAS [52][53][54] and has also become popular for DARTS [55,[83][84][85].Parameter sharing is not perfect, though.As there is actually no theoretical guarantee on the accuracy correlation between subnetworks with inherited weights and optimal weights from individual training.
This work focuses on the second strategy to improve the evaluation methodology.Instead of training a candidate network and evaluating its performance on some validation dataset, we directly build machine learning (ML) models to predict the network performance based on the network structure alone, i.e. without specifying trainable weights.If the prediction accuracy manifests non-trivial correlation with the ground truth, then such predictor based evaluation method may greatly ameliorate NAS efficiency.We denoted works along this line as predictor based NAS, in which the so-called "predictor" is a regression model [57][58][59][60][61][62][63][64][65][66][67][68][69].Some works take a further step by constructing variational autoencoder (VAE) [86] for neural architectures and trains a predictor with input from the latent space of VAE [87][88][89][90][91][92][93].Such predictor can also be transferred to infer other metrics of the network such as latency or FLOPs [59,68].It is worth noting that predictor based NAS can also be combined with weight sharing tricks, where the predictor is actually trained with fitness label obtained from one-shot setup [62,65,87,88].
Despite wide application range and initial successes, pushing for quantum advantages with VQA faces many challenges such as noise-induced decoherence, barren plateaus [123,124] that derail the training of parameters, and reachability deficits with certain fixed ansatz [30][31][32]34].Although there are various proposals [98,[125][126][127][128] on alleviating these issues, it is extremely difficult to fully resolve them as long as the circuit ansatz is fixed at the beginning.

C. Quantum Architecture Search
As discussed in the last section, a parametrized quantum circuit is required as an ansatz for VQA.A badly designed ansatz could possess limited expressive power and/or entangling capacity, leaving the global minimum for an optimization problem out of reach.Furthermore, such ansatz may be more susceptible to noises [129], wastes quantum resource or leads to barren plateau that frustrates the optimization procedure [123,124].
Therefore, a systematic approach to search for optimal circuit ansatz is desired, and we denote such workflow as "quantum architecture search" [39].The aim of the QAS is to recommend tailored quantum circuits for a given problems such that it not only minimizes a loss function, but also satisfies a few other constraints imposed by the hardware connectivity among qubits, native quantum gate set, quantum noise model, training loss landscape and other practical issues.
Previous QAS works have heavily borrowed ideas from NAS.More specifically, greedy methods [9,[49][50][51], evolutionary or genetic methodologies [40][41][42][43][44][45][46], RL engine based approaches [47,48], Bayesian optimization [130], one-shot search [56] and gradient based methods [39] have all been adopted to discover better circuit ansatz for VQA.However, as far as we know, predictor based evaluation strategy has not been applied toward quantum circuit design.Since predictor based NAS has been empirically demonstrated to be highly efficient, one may anticipate predictor based QAS to hold similar performance boost for VQA in the NISQ era.

III. METHODS
In this section, we describe the essential technical ingredients of our proposed neural predictor based QAS workflow.To facilitate the discussion, we layout a few definitions.Below, we use N to denote the number of qubits in a circuit, t to denote the number of types of quantum gate primitive in the search space, and n t to denote the total number of quantum primitives in one circuit.

A. Search space for QAS
We adopt two distinct representations (list of gates and image of a circuit) to denote each candidate circuit.In particular, the list representation comes with a strict syntax delineating a quantum circuit in terms of a sequence of applied quantum gates.Each list is a set of tuples, and each tuple encodes a quantum gate (referenced to a given gate set) and positional information (i.e.qubits numbered in a certain way).For instance, tuple (3, 1, 2) indicates a 3rd type of two-qubit quantum gates acting on the first and second qubit in the circuit.To reconstruct a corresponding quantum circuit from a list, the gates should be sequentially placed onto an initially empty circuit, according to the given tuple sequence.Note a circuit ansatz may have multiple list representations.One possibility is that the order of gate placements can be freely exchanged as long as gates encoded by the two tuples commute.This syntax, inherently, gives a legitimate search space for circuit ansatz.Hence, in this work, sampled circuits are generated in this list representation.
As for the quantum gate sets, we choose different primitives for different problems.Some common examples include non-parameterized gate such as Hadamard gate H; single qubit gates with trainable parameters such as rotation gate R x = e −iθX/2 and, similarly, for R y , R z , where X, Y, Z are corresponding Pauli matrices; and parameterized two-qubit gate such as XX = e −iθX1X2/2 with counterparts for Y Y, ZZ gate as well as parameterized SWAP gate SWAP θ = e −iθSWAP/2 where SWAP 12 = I Now, let us elaborate on the sampling strategy considered in this work.First, we comment on the naive method of random sampling.After fixing the number of qubits and the total number of quantum gates in a circuit, one then can randomly sample list representation of quantum circuit ansatz.This simple strategy is problematic and inferior in several ways.For instance, gate layout for randomly sampled circuits are highly "chaotic" and usually incur severe issues of barren plateau since the circuits somehow behave like random unitaries drawn from the Haar measure.Besides, such randomly generated circuit ansatz is often deep as the arrangement of quantum gates is sparse.Lastly, random ansatz is not amenable to further utilization in the sense of generalization and transferability to problems of different size since there is no obvious pattern for extraction.Therefore, we devise two pipelines for circuit sampling that substantially alleviate aforementioned concerns.
The two sampling pipelines are gatewise generation and layerwise generation, respectively.In the first pipeline, we construct the circuit gate by gate by specifying their positions and types, while in the second pipeline, we construct the circuit by iteratively adding half-layers.Namely, whenever we pick a type of quantum gate we have to apply it on the set of even qubits or odd qubits.Both pipelines further incorporate additional techniques such as hierarchical generation and gate correlation enforcement.See Supplemental Materials for the details of these two sampling pipelines and the consideration on design of circuit search space.

B. Representation of the quantum circuit
Apart from the list representation of quantum circuits, we have alluded to the image representation which is designed for training the neural predictor.To encode the circuit structure as input for a ML predictor model, we need a systematic way to represent circuit structures in the form of tensors.The strategy we invented is to transform a circuit to an image of multiple channels.The shape of the input tensor is [#depth, #qubit, #gate types].The size of the figure is the number of qubits times the depth of the circuit, where the depth is the number of gate layers in a circuit.For the circuit generation pipelines considered in this work, the total number of quantum gates in a circuit n t is fixed, but different candidate circuits tend to have incompatible circuit depth.Therefore, we need to set a max depth cutoff as D. All circuits with depth less than D are zero padded up to D columns, and all circuits with depth more than D are simply retracted.In other words, we only process candidate circuits with n t quantum gates and limited to a depth of D. Fig. 1 gives an example of both the list representation and image representation for an (N = 3, D = 4) quantum circuit.
In our setup, this image representation maintains an one-to-one mapping with the actual quantum circuit architecture.This is only true as we implicitly impose two restrictions on the search space: 1. the two-qubit gates are restricted to act on adjacent qubits, and 2. all twoqubit gate primitives in our work are symmetric.Namely, we do not use asymmetric two-qubit gates such as CNOT, in which two qubits play different roles as the control and the target.If the first condition is relaxed, when more than one two-qubit gates of the same type are in the same layer of the circuit, the image representation will have ambiguity on how to divide these qubits into pairs.For example, if in one row of the image representation (the same layer of the circuit), we have the first four elements in the same channel of SWAP gate (two SWAP gates are defined on this layer on qubits 1,2,3,4), then it is impossible to resolve whether the original circuit of this image has SWAP 12 , SWAP 34 or SWAP 13 , SWAP 24 .On the other hand, if the second restriction is relaxed, we then have to add more channels than the number of quantum gate types t.For example, CNOT gate may require two different channels to distinguish the role of each qubit.In the proposed QAS workflow, one may further consider different architectures of ML predictors.If an RNN based model is used, then we treat the depth dimension as the time dimension while the dimension of qubits and gate types is flattened out as an input vector for each time slice of such RNN based predictor.Further data augmentation can be applied to the image representation of quantum circuits.In many VQA problems, the final measurement observable is independent on the order of qubits, where a permutation on the qubit order leaves the final result unchanged.Strictly speaking, in our search space there is no qubit permutation symmetry or redundancy since two-qubit gates are only defined on the neighboring qubits and a random permutation may break such restriction.But if we assume there is still such permutation symmetry in the representation, input permutation on the qubits is helpful to avoid overfitting as it essentially creates N !times more data than the original input.

C. Architecture of predictor model
We have tried MLP, CNN and LSTM as the neural predictors to evaluate circuits in the proposed QAS workflow.In general, we find LSTM based RNN performs better than others in terms of predicting the fitness of circuit ansatz.See Fig. 2 for a schematic of the RNN neural predictor used in this work.
In some VQA problems, good circuit ansatz are dense in the search space with a very long tail distribution of bad candidates.Such a distribution is hard to fit with one regression model, and we adopt the strategy of twostage classification for screening circuits.Firstly, a CNN based binary classification model is trained to differentiate between good and bad circuits for a task.Only good candidates are further fed into an RNN based regression model for a more fine-grained evaluation.Such regression model is only trained with good ansatz.

D. Training neural predictors
To train the neural predictor, we need to prepare a dataset composed of a circuit structure and its performance according to a task-specific evaluation metric, eg.estimated energy in VQE simulation or validation accuracy for QML.It is important that the training of such a predictor model is sample efficient, i.e. a small number of training data points should allow the predictor to deliver an acceptable accuracy.Otherwise, predictor based QAS is not desired as it already takes an enormous amount of time and resources to process the training dataset.In our experiments, O( 102 ) data pairs are in general enough to make QAS workflow a success.Of course, more training data can boost the predictor's accuracy and, in turn, elevate the overall efficiency for architecture search.There is certainly a tradeoff between the efforts to prepare the training dataset (for neural predictors) and the search efficiency in a later stage of a QAS workflow.
While preparing the training dataset, it is desired to evaluate each circuit multiple times with different initialization of parameters.As an infamous fact, the energy landscape of typical cost functions for quantum circuits is often decorated with a plethora of local minima.Therefore, one should use the minimum loss of multiple runs as the training label for the neural predictor.There are additional benefits to run the same circuit multiple times.For instance, one may train another regression model to predict the standard deviation of the losses for multiple optimization runs of a quantum circuit.This frustration indicator gives us a hint whether a candidate circuit can be consistently and easily trained for the task at hand.A prediction of large standard deviation implies the candidate circuit may suffer from a more ragged energy landscape as well as possible issue of barren plateau.In principle, we can train a multi-task neural predictor that not only guide us for the circuit with a better potential to perform well but also easier to train from scratch.
Both mean squared loss and mean absolute loss are tried in the training of regression models, and they tend to give similar results.Adam optimizer is utilized for all the trainings in this work.Batch size is 32 or 64 in most of the training.Dropout layers with high dropout rate are heavily applied to avoid severe overfitting.
It is worth noting that the trained neural predictors often do not perform well in terms of conventional ML evaluation metrics.The training tends to overfit even with networks of fewer parameters and large dropout ratios.However, as we will show in the Results section, such predictors are actually good enough to greatly improve the search efficiency of QAS.

E. QAS workflow
Once the neural predictor is trained, we randomly sample a large number of quantum circuits according to the generation pipelines we introduce.Only circuits that FIG. 3. The workflow of predictor based QAS.There are two phases of QAS.At phase I (the upper row), we generate the dataset of quantum circuit and train a neural predictor.At phase II (the lower row), we utilize the trained predictor to filter a large number of quantum circuit candidates where only a small fraction of them with predicted performance better than a threshold is further evaluated from the scratch.The ansatz with the best performance is picked as the final result of QAS.
pass the predictor screening (a tiny fraction of all sampled) will be actually tested to verify their fitness.In the end, one should pick a candidate circuit of best fitness for the current task.The entire workflow of neural predictor based QAS is succinctly summarized in Fig. 3.
Such neural predictor based evaluation policy can be combined with more sophisticated sampling strategies to further improve search efficiency.For example, the neural predictor can be further trained with additional data collected during the Phase II of QAS workflow.One can then screen more circuits with help from a refined predictor.This fine-tuning of neural predictor can go a few rounds iteratively and should be beneficial for finding a really strong candidate circuit.Moreover, genetic transformation or Bayesian optimization can be used as sampling strategy in the Phase II of QAS to accelerate the search.Contrary to the plain random sampling utilized in this work, these advanced sampling strategies should help as in high-throughput virtual screenings of molecules and materials.Furthermore, neural predictor combined with a VAE setup can be exploited to directly search for strong candidate circuit with gradient-based method in the continuous latent space.We leave these interesting possibilities to extend predictor based QAS workflow as promising directions to explore in a future work.

F. Transferability of optimal quantum ansatz
It is highly desirable that an optimal ansatz identified by QAS supports a transfer capacity since direct QAS on large systems is prohibitively difficult.For each optimal quantum circuit selected from QAS, we then test whether such circuit ansatz can be transferred to similar problems involving a larger number of qubits while maintaining state-of-the-art performance.Quantum circuits generated by the layerwise pipeline can be straightforwardly adapted to work on larger system by simply extending the range of each gate to cover either all odd or even qubits in a larger circuit.However, such a direct transfer of the quantum ansatz is not rooted in a rigorous theoretical analysis and may suffer from a huge performance drop.
In order to find good ansatz on large size system with the knowledge gained from QAS on small systems, we develop a beam search based method to accelerate the search for appropriate ansatz for large systems while keeping the required quantum resource at a low extra overhead.Since quantum circuits generated by the layerwise pipeline always group the same quantum gate acting on either even or odd qubits in one layer, we can first fill the circuit by extending each quantum gate on all qubits (half-layer → full layer).Such "fill-in" quantum circuits are, in general, very good candidates with great fitness for larger size problems.However, the new quantum circuit costs twice the quantum resource than the original circuit with half-layers.To reduce the number of gates while maintaining the performance above a threshold, we use a beam search scheme [51] to find simpler circuit structures with respect to a given "fill-in" circuit.There are three phases at each step of the beam search.In the reduction phase, we reduce quantum gates of one half-layer and every possible reduction (O(2D) in total) is generated.In the evaluation phase, these circuit candidates with reduction are evaluated for the fitness.Since these quantum structures can be viewed as subcircuits of the "fill-in" one, the weights (trainable parameters) can be directly inherited and only some light fine tuning are required to get a reasonably accurate evaluation.In the selection phase, only top-q circuits in terms of fitness is kept in the queue (q = 1 is reduced to greedy search algorithm).This iterative pruning of circuit structures ends when no further reduction on quantum gates is possible without compromising the expected performance.Finally, we note this technique of beam search can also incorporate mutations (or transformations) of quantum circuits in addition to eliminations of quantum gates.In this augmented version of beam search, our transfer protocol actually operates like an evolutionary algorithm for finding suitable circuits for large-size problems.See Fig 4 for the workflow schematic of such fill-in + beam search protocol.

IV. RESULTS
In this section, we present the main results of neural predictor based QAS on two specific VQA tasks: 1. supervised learning task for binary classification on the fashion-MNIST dataset and 2. quantum simulation to estimate the ground state energy of the transverse field Ising model (TFIM).4. Schematic workflow of transferring optimal quantum ansatz.We first fill in the layerwise generated circuit structure and then do reductions by beam search (q=2 in this example).The circuit structure is shown as image representation, where light grey square indicates there is no quantum gate.The circuit in this example has N = 4 qubits and depth D = 3.Two step of the beam search is shown and more steps are possible as long as the fitness in evaluation phase is above the predefined threshold.

A. Quantum machine learning on classfication of fashion-MNIST
Setup.We train a PQC as a ML model for a binary classification task on a well-established benchmark in the ML community, and QAS is utilized to identify a suitable circuit architecture (conforming to specified constraints) that can attain rather high accuracy on the validation set.We choose fashion-MNIST [131] as the benchmark because it is more challenging than the MNIST dataset commonly tested in QML literatures.We only use the data labeled with T-shirt/top(0) and Dress(3) in order to focus on a binary classification problem instead of a multiple categorical classification.As demonstrated in [27], QML could perform better than the classical counterparts when the training dataset is small.We, hence, select only 500 datapoints from fashion-MNIST for the training purpose, and select another 500 datapoints for validation.Each 28 × 28 image of clothes is padding and flatten to an 1024-dimension vector which is further encoded into a 10-qubits wavefunction using amplitude encoding.Our quantum ansatz are hence defined in the search space with qubit number N = 10, number of quantum primitive gates n t = 50 and circuit depth cutoff D = 10.We only focus on quantum ansatz sampled from the layerwise pipeline in this QML experiment.The primitives of quantum gates include t = 7 types in total, they are parameterized Rx, Ry, Rz, XX, YY, ZZ, and SWAP gate.
Inspired from the recent proposal of classical shadow with random measurements [132,133], we introduce a classical post-processing module to make the actual prediction based on features extracted from random measurements on the proposed quantum circuit.This "random measurement" is implemented with one extra layer of Rx gate, appended to the end of the ansatz-generation circuit.Classical bits of information are then extracted with measurements of Pauli Z operator for each qubit as Z i .The collected classical information is subsequently fed into a classical dense layer as where k i and b are classical trainable parameters and σ(x) = 10 1+e −x is a scaled sigmoid function.Finally, the mean squared loss between y pred and y true (the ground truth) is minimized with a simple gradient descent approach, where the weights of quantum circuit and the classical post processing module are jointly optimized.Once the training of the QML model converges, we take the classification accuracy on the validation set as the evaluation metric.
Next, we discuss details of preparing the training dataset for neural predictors.For this QML benchmark, we adopt a loose converge condition in the phase I of QAS since the validation accuracy is known to be strongly correlated with the accuracy on the early stop.By using the early stop strategy, we may evaluate a circuit's prediction accuracy (for the fashion-MNIST data) without too many training steps.Furthermore, we decide to train each circuit only once, because it is rather time consuming to independently re-train a QML model multiple times.While we run the risk of getting a suboptimal estimate of a circuit's performance, we remind that our ultimate goal is to run a high-throughput screening of many quantum circuits in the phase II of a QAS workflow.Candidate circuits that pass the predictor-based screening still have to be experimentally trained and verified for their actual performances.As long as some optimal circuit architectures are detected and passed through the screening, a QAS is deemed successful.According to this perspective, building a highly accurate neural predictor is certainly desirable but not strictly indispensable.
We neural predictor.Each datapoint comprises of a quantum circuit and its corresponding validation accuracy on the fashion-MINST as the label.The regression result is shown in Fig. 5.The performance can be evaluated by R 2 score of the linear regression which is around 0.71 on the validation dataset.QAS result.The ultimate test for the QAS is how well the predictor-based screening performs in phase II.At this stage, we randomly generate quantum circuits from the layerwise pipeline, and predict their accuracy with the trained predictors.We only retain circuits with a predicted accuracy larger than 0.89 for further training and verification of their true accuracy in experiments.For 30, 000 random circuits, only 195 (0.65%) are kept and proceeds to the actual testings.The comparisons between the true accuracy of quantum circuits from random search (as training dataset for the predictor) and the counterpart from QAS filtered by the neural predic- tor is summarized in Fig. 6.As shown, the optimal ansatz found by QAS has validation accuracy above 0.92, which is significantly larger than the best result seen in the training set (all lower than 0.9).The optimal quantum circuit recommended by QAS has a layerwise layout as YY, ZZ-odd, Rz-odd, YY-even, Rz-even, ZZ-even, SWAP (see Supplemental Materials for the layered ansatz notation).It is interesting to observe the generalizability of such neural predictor based approach.As the predictor is only trained with suboptimal quantum circuits with accuracy less than 0.9, it can still single out quantum circuits with accuracy better than anyone it has seen during the training.
We further compare QAS-screened QML model against a baseline established by a QML built with the conventional hardware efficient ansatz.The details are given in Fig. 7.As seen, both classical post processing and QAS contribute to the improved accuracy.

B. Variational quantum eigensolver for transverse field Ising model
Model.Next, we investigate how predictor based QAS may help to identify an efficient state-generation circuit for representing the ground state of a manybody Hamiltonian in a VQE simulation.We consider an N = 6 TFIM model with periodic boundary condition (PBC) for illustration.The Hamiltonian is given by H = i Z i Z i+1 + X i , and this system can be exactly modeled with 6 qubits, whose exact ground state energy is E 0 = −7.7274066.The evaluation metric in this case is simply the best energy estimation from VQE optimization.
QAOA baseline.We first give the baseline, a QAOA-inspired circuit, for this VQE simulation.This baseline circuit generalizes the QAOA structure by allowing parameters to be independently tunable intralayer.More precisely, the ansatz wavefunction reads p i (e iφpiXi e iθpiZiZi+1 ) i H i |0 , where p is the number of layers for the ansatz, all φ, θ are trainable parameters and H i is Hadamard gate on qubit i.In the following analysis, we compare results from QAS-designed circuit against such baselines derived from this QAOA-inspired circuit with p = 1, 2, 3, respectively.For the record, these three baselines estimate the TFIM ground-state energy to be −7.24264,−7.4641, −7.7274066, respectively.In comparison to the true energy, p = 3 ansatz with 42 quantum gates and 36 parameters can fully represent the ground state of the N = 6 TFIM system.
Setup.Since obtaining optimal weights for a VQE circuit usually takes less time than that for a QML task, we train each quantum circuit for 10 independent runs with different random initializations to avoid local minimum in the estimation of the ground-state energy.These independent training can be cast into batch dimension with the help of quantum machine learning framework [134] by clever design which enables fast optimization over independent runs simultaneously.The search space for this VQE task is confined to a qubit number N = 6, a total number of quantum primitive gates n t = 36 with depth cutoff D = 10.Candidate circuits are sampled from both layerwise and gatewise generation pipelines with equal contribution.The primitive set of quantum gates include H, Rx, Ry, Rz, XX, YY, and ZZ gate this time.Among these gates, the Hadamard gate H is the only type without a tunable parameter.For the layerwise pipeline, we set 30% of all generated circuits to begin with a layer of the Hadamard gate applied to all qubits as starting from the state |+ may help VQE to find better approximation to the ground state.
We adopt the two-stage screening for this VQE investigation.The rationale is that the energy distribution of the TFIM model is not smooth across a wide region in the search space; therefore, a single high-quality predictor is extremely difficult to train.To overcome this problem, we resort to using two predictors as described in Method section.First, we use a CNN based binary classification model to quickly rule out inappropriate circuits covering a wide variety of rather arbitrary circuit structures.The "good" circuits with limited variety are then screened again with a RNN-based regression model to evaluate their performance more precisely .In this case, we use = (E − E 0 )/14 ∈ [0, 1), the normalized deviation of the estimated ground-state energy, as the predicted label.
We again randomly pick and optimize 300 independent quantum circuits to build the training datasets for the two neural predictors.Fig. 8 shows the distribution of converged energy (error ratio ) for the dataset of 300 circuits.Such a distorted distribution manifests the source of difficulty to rely on a single regression model to directly pick out top-performing circuits from the entire candidate pool.Rather, in our two-stage screening setups, the regression model is only expected to provide accurate characterization of potentially good circuit candidates with limited variety.
For the CNN based classification model, we adopt data augmentation by applying random permutations on qubits since TFIM is translational invariant and the energy is agnostic to the order of qubits.In addition, multiple layer convolution with dilation, batch normalization on qubit dimension, dropout, ELU activation and L 2 regularization on weights are all utilized in the classification model.
The training dataset we used only contains 300 quantum circuits and their corresponding error ratio obtained from VQE optimizations.When one is limited to such a small training set of 300 data points, it could be crucial how the circuits are selected for representing the circuit pool for a particular problem.Prior knowledge (such as insights from physics) shall prove beneficial at this stage.For instance, only circuits conforming to certain symmetry are generated etc.However, for the present study, we want to emphasize the universality of this predictorbased QAS and simply do a random sampling of circuits according to the two sampling pipelines.With such a scarcity of training data, the neural predictors actually do not perform particularly well according to the traditional metrics for ML model evaluation.Nevertheless, we remind that a complete QAS workflow operates more like a high-throughput screening with the ultimate goal of discovering one or few top-performing circuits as opposed to making highly accurate predictions for all circuits.Despite this focus being different from the standard ML, we still have to deal with some non-trivial challenges shared by many ML tasks.For the binary classification, we face a highly imbalanced classification with a trade-off between precision and recall.In this case, we prefer model with higher precision instead of recall as the goal is to efficiently filter a large search space.Specifically, our trained CNN based classification model has a precision around 0.7 and a recall around 0.47.
QAS result.With the two-stage predictor based screening in place, we filter a large number of circuits generated via either gatewise or layerwise pipelines.The filter threshold for the two models are correspondingly set at 0.85 and 0.005.We only keep the most confident candidates for further VQE evaluations.At the first stage, only quantum circuits with predicted value larger than 0.85 instead of 0.5 are kept as promising candidates for further evaluation.At the second stage, only candidates with predicted error ratio less than 0.005 will be recommended for experimental verification (i.e.going through standard VQE optimizations).It is remarkable that our RNN-based regression model can give outputs with predicted error ratio less than 0.005 when the smallest value in its trainning dataset 0.00789 is larger than this threshold.
We randomly sample 50, 000 quantum circuits, and only 626 (1.25%) pass the two-stage screening and are actually evaluated with the VQE optimization.Among these final candidates, 5 circuits give an energy less than −7.7 and we denote them as optimal ansatz.This is an interesting result as the training dataset of 300 quantum circuits do not feature such good circuits.The best energy estimation in our training dataset is only −7.61694.This observation demonstrates the effectiveness of our few-data trained predictors to screen unseen structures in a QAS context.In this case, the search efficiency for the predictor-based QAS workflow is around 1%, i.e. 1 out of every 100 quantum circuits recommended for VQE experiments are optimal (in the sense of VQE energy less than −7.7).Without the neural predictor as the filter, random search efficiency is around 1/2000 following our search space pipelines according to a large number of numerical simulations.In other words, on average, predictor-based QAS only needs to conduct 100 independent VQEs before discovering an optimal candidate while naive random search requires 2000 circuits evaluation before encountering a good candidate in our pre-defined search space.This is a 20 times boost of search efficiency.Fig. 9 summarizes the gain brought by the predictor based QAS compared to the random search.
Transfer of the predictor.Next, recall that our predictors are trained with circuits limited to exactly n t = 36 quantum gates, but it can be transferred to predict fitness of quantum circuits consist of a distinct number of quantum gates with zero tuning.For example, we know that p = 3 QAOA-inspired ansatz with n t = 42 gates can give the exact ground state.Although the total gate number is substantially changed compared to the instances in the training set (n t = 36) for the predictors, the screening pipeline still works well.Classification predictor at the first stage gives the output 0.99979 and the regression predictor gives 0.004 as the predicted error ratio for p = 3 QAOA-inspired circuit.In short, this  FIG.10.The optimal quantum ansatz found by neural predictor on nt = 30 search space.v {i} are trainable parameters for such VQE ansatz.The layerwise ansatz can be summarized as H, YY-odd, ZZ-even, YY-odd, ZZ-even, YY-odd, Rxeven, ZZ-even, Rx-odd following our convention on layerwise ansatz.
optimal circuit successfully passes the two-stage screening without having to train these predictors with circuits comprising the same number of quantum gates.The predictors are not only able to make reasonably accurate predictions for circuits having more quantum gates but also circuits with fewer gates too.For instance, we conduct a large scale search for circuits with n t = 30 gates.360 out of 100, 000 quantum circuit structures are screened, and 3 instances of them give VQE energy smaller than −7.7 with the best one being −7.7274.This is a highly nontrivial result, as p = 2 QAOA ansatz with the same amount of quantum resources only gives energy −7.46.The optimal circuit structure we find by transferred predictor is displayed in Fig. 10.
Transferability of the optimal anstaz.Apart from the neural predictors, the QAS-designed quantum ansatz can also be transferred to guide the search for optimal circuits for similar but larger-sized problems involving more qubits in the laywerwise search space.Such transferability is desired as it is time consuming to directly conduct QAS in a larger search space.Therefore, we propose that QAS on large systems should proceed in two steps.First, we can run QAS on smaller systems as a proxy task then we can transfer the optimal ansatz by QAS to larger systems.We conduct such transfer experiments from the optimal ansatz in Fig. 10 to a larger TFIM model with N = 10 spins.The exact energy and p = 3 QAOA baseline for N = 10 TFIM are −12.7849and −12.56758, respectively.
We adopt the beam search approach developed in the Method section.By starting from a fully fill-in quantum circuit, we utilize the beam search to reduce potentially redundant quantum layers without compromising the final performance too much.By this approach we obtain the following circuit structure: H-even, YY, ZZ, YY, ZZ-even, YY-odd, Rx-even.Such ansatz gives an VQE energy estimation of −12.634, significantly better than p = 3 QAOA (45 trainable parameters in the transferred ansatz versus 60 trainable parameters in the QAOA-inspired circuit).If we further relax the optimal threshold, we obtain a circuit strcuture: H-even, YY, ZZ, YY Rx-even.This extremely compact anstaz containing 35 trainable parameters and 40 quantum primitive units outperforms the more complex p = 3 QAOA-inspired ansatz with an VQE energy of −12.587.These numerical results strongly support our strategy to transfer optimal circuit structures to similar many-body simulation problems involving different number of qubits.

V. DISCUSSIONS
There are various future directions to refine the proposed QAS workflow and to explore other novel strategies to discover optimal quantum circuits.One obvious possibility is to combine advanced sampling engine than the simple random search with our predictor based evaluation policy in QAS.For example, we may invoke evolutionary algorithm based sampling policy together with the learned predictor, which might further improve the search efficiency as witnessed in many high-throughput virtual screening studies.Next, we may extend the phase II of the current QAS workflow into a loop with multiple rounds of screening.Hence, the neural predictors can be iteratively updated and fine tuned as bathes of new data points become available within each round of QAS verification of proposed circuits.Moreover, weights sharing mechanism can be combined with the predictor approach to further speed up the preparation of the training dataset.Also, it is of great importance to further investigate transferability and propose more systematic transfer protocols for the optimal ansatz, since challenging problems are always large sized.Finally, other type of neural predictors might be helpful, such as fast indi-cators for quantum noise resilience [135] or frustration in training energy landscape.These additional considerations may be particularly crucial to establishing nontrivial quantum advantages with VQA-based approaches in the NISQ era.

Conclusion.
In this work, we introduced neural predictor as the evaluation policy for quantum architecture search.We demonstrate the effectiveness of predictor based QAS on various examples from VQE and QML.We find greatly improved search efficiency and new state-ofthe-art quantum architectures for these VQA tasks.Be-sides, we show how the trained predictor as well as the QAS-designed optimal ansatz are capable of being transferred to a different ansatz search space or problems of different size, respectively.

FIG. 2 .
FIG.2.The schematic RNN neural architecture for the predictor of QAS.Layer of the circuit image representation are fed into the network as different time steps.The information processing flow in such network is similar to the real quantum dynamics on quantum circuit.

FIG. 7 .
FIG. 7. Validation accuracy on fashion-MNIST classification task from different contribution factors.The baseline (blue) accuracy is achieved by hardware efficient ansatz in the layer form of Rx, Ry, ZZ, Rx, Ry, ZZ.By attaching the classical post processing part from random measurements, the accuracy of hardware efficient ansatz get improved to 0.888 (orange).With random search on 500 quantum circuit candidates, the best of them gives accuracy 0.898 (green).Further QAS via the neural predictor trained from random search data record the best accuracy of 0.924 for training on only O(100) quantum circuit candidates (red).

FIG. 8 .
FIG.8.Sorted error ratio for VQE energy from the training dataset of quantum ansatz.The red dash line (0.014) is our threshold for the CNN based classification model at the first stage, i.e. we train the classification model to determine good candidates if their error ratio is less than 0.014.The inset is the zoom-in for good quantum ansatz candidates part.

FIG. 9 .
FIG.9.VQE energy histogram between quantum ansatz from random search and neural predictor based QAS.300 samples from random search are also the training dataset of the predictor (blue).QAS can find quantum ansatz with lower energy on average and select several candidates with energy less than the best result from training datatset.The red dash line is the exact ground state energy given by exact diagonalization.The optimal ansatz found by QAS can indeed match the ground truth.
FIG. S2.(a) CNN based model for classification at the first stage of the predictor pipeline.(b) RNN(LSTM) based model for regression at the second stage of the predictor pipeline.[10, 6, 7] is the shape of N = 6, D = 10 and gate type t = 7, consistent with VQE task we investigated.RandomExchange is the data augmentation layer which permutes the qubit order.
collect 300 datapoints for training the RNN based