Discovering quantum circuit components with program synthesis

Despite rapid progress in the field, it is still challenging to discover new ways to leverage quantum computation: all quantum algorithms must be designed by hand, and quantum mechanics is notoriously counterintuitive. In this paper, we study how artificial intelligence, in the form of program synthesis, may help overcome some of these difficulties, by showing how a computer can incrementally learn concepts relevant to quantum circuit synthesis with experience, and reuse them in unseen tasks. In particular, we focus on the decomposition of unitary matrices into quantum circuits, and show how, starting from a set of elementary gates, we can automatically discover a library of useful new composite gates and use them to decompose increasingly complicated unitaries.

It has been theorized for decades that quantum computers can perform tasks more efficiently than classical ones [1].Consider for example Shor's algorithm for factorization [2] or Grover's algorithm for search [3], the variational quantum eigensolver [4] or other quantum machine learning algorithms [5], which can all have a large impact on important problems.Even though those algorithms are already very promising, and they justify the current effort in the development of large-scale quantum computers [6], it is hard to automatically exploit the advantages offered by quantum computing.Indeed, each of those algorithms has been invented specifically for the task they solve, and often their principles do not easily generalize to other tasks.Up to now, we can barely count more than two hundred algorithms in total [7].While there is strong evidence of a quantum advantage [8], i.e. quantum Check solutions check which tasks are solved -h(I,0) -h(I,1) -h(I,2) -t(I,0) -t(I,1) -t(I,2) -h(t(I,2),1) -f1(I,1) … Library: available elementary components: Target Unitaries Figure 1: Synthesis of quantum unitaries.Given a dataset of target unitary matrices, we enumerate quantum circuits for a given timeout interval, using as components the elementary gates in the library.After some matrix decompositions have been found, the solutions are analyzed and the most useful components are added to the library as new available gates.The procedure is repeated for a given number of iterations.
computers can be more powerful than classical ones, and this advantage on near-term devices has been even shown experimentally on specific purpose-designed tasks [9][10][11], we do not have a way to automatically make use of quantum principles like superposition or entanglement to speed up classical algorithms: each algorithm must be designed from scratch, and it is not clear a priori whether a corresponding faster quantum algorithm exists.Compared to the classical regime, quantum mechanics is generally counterintuitive to human common sense, and thus scientists require a large effort and imagination to conceive quantum algorithms.Hence, it would be useful if there was a technique that helped to understand the laws of the quantum world and aided in finding an efficient way to solve a given task on a quantum computer.It would be an invaluable tool to assist researchers in the development of new quantum algorithms, perhaps contributing to understanding general principles that can grant quantum speed up.This long-standing goal, if attainable at all, is clearly out of reach at the moment, but it is interesting to explore it and develop ingredients that could be useful for that purpose.This paper takes a step in that direction by considering a simpler yet core subproblem: instead of working on entire quantum algorithms, we focus on the purely quantum part, excluding measurements and classical processing.We develop a machine learning technique to automatically produce a quantum circuit that performs a requested unitary transformation of a quantum state.The main innovation of our approach to this general problem is the gradual discovery of new composite gates, which can be subsequently used to decompose more and more complicated unitary matrices.In this way, we can build a selflearning compiler, which can express a given unitary matrix into a given set of gates and respect a given qubit connectivity, without ever providing explicit transformation rules.Instead, we just propose a training set of unitary matrices to decompose, with varying degrees of difficulty.
At a high level, our method works by iteratively (1) searching for correct decompositions, assembling together gates from our current set of components, and (2) extracting new components by analyzing the solutions found in the previous step.In doing so, our system gradually learns increasingly useful quantum operations, which it uses to decompose more and more complex matrices (Fig. 1).At first, our method automatically rediscovers suitable representations of common gates like SWAP and CZ, even when these gates are not initially provided to the system.Importantly, it also proposes new unexpected combinations of gates that prove to be valuable ingredients in the construction of more sophisticated unitaries.Circuits that would be too difficult to synthesize by randomly assembling the initial elementary components can be tractably found using new gates that the system itself proposed to use when building easier circuits.In that way, the system bootstraps a set of new gates from solving easier problems, which unlocks solving harder problems, which leads to new gates, and so on.
The technique here presented introduces the ideas of concept extraction and program synthesis to the field of quantum computing and shows proof-of-concept applications to the domain of unitary matrix decomposition.We show the importance of bootstrapping more advanced concepts from simpler ones, and how these concepts can then be used to solve increasingly complex problems.The choice of expressing concepts as small "programs" also allows for better interpretability, in contrast to other possible black-box approaches like neural networks [12].Extensions of these ideas will allow significant improvements to existing machine learning methods in quantum computing.

Related Works
The synthesis of quantum unitary matrices, to which we will also just refer as "unitaries" in the following, is the process of building a quantum circuit by placing gates one after the other, acting on selected qubits, to reproduce the effect of the given unitary.The problem of building a circuit that reproduces the action of a given unitary, or of another quantum circuit, using only a given set of gates, or respecting some given constraints, is well-known in the literature, and is called compilation [13].On the other hand, if the unitary is given by a circuit expressed in a set of gates, and we want to find an equivalent circuit which uses another set of gates, the process is called transpilation.Many algorithms already exist to compile given unitaries into circuits [14] that only use a given set of gates, e.g. the ones that can be physically implemented, some even using machine learning and reinforcement learning techniques [15].Many other works explore different directions for optimizing the total number of gates in a circuit [16] or reducing the number of a given kind of gate, which may be more expensive or noisy in the considered implementation [17].For example, the Solovay-Kitaev algorithm [18,19] can approximate with arbitrary error any given unitary matrix with a given finite set of gates, as long as this set can approximate any one-qubit gate.In our case, we build a self-learning compiler, which changes its behavior according to the experience, until converging to the optimal solution for the given architecture.This kind of compiler does not need explicit rules about how to decompose the given unitary matrix, but just a series of matrices to decompose with increasing difficulty.We emphasize that the main interest in our case is not in reproducing the performance of a state-of-the-art compiler, or transpiler, but to investigate how to imitate the ability of scientists to learn new concepts and reason with them, for example by building more and more complicated circuits from elementary components whose behavior is known.In our examples, we mainly optimize for conceptual efficiency, i.e. number of used high-level operations to express a quantum operation, rather than for the effective experimental implementation cost (e.g. total gate number minimization).Different constraints can potentially be chosen nonetheless.
After expressing quantum circuits more conveniently, we can take advantage of machine learning techniques to assemble them until the requested unitary is produced.We build on machine learning techniques for program synthesis [20].Program synthesis methods automatically construct source code, and our work exploits the fact that a quantum circuit can be easily expressed as a simple program.Recent program synthesis techniques use neural networks to learn how to generate source code ( [21], inter alia).Our work uses a slightly different family of learning methods which casts program synthesis as Bayesian inference [22].This probabilistic Bayesian framing allows searching for the most likely programs that solve a given unitary, and also learning to generate good programs via hierarchical Bayesian methods.These Bayesian program learning methods were developed in a series of works [23][24][25].We directly build upon Dream-Coder [26], a recent work in this family.
Coming back to the much higher conceptual level, regarding the longer-term idea of an "artificial scientist", the first proof-of-principle of an agent capable of conducting research on its own has been shown in [27], to automate functional genomics experiments.The idea of automatic concept discovery from experience is a helpful ingredient for this long-term goal, as it is a first step towards reasoning.Indeed, there has already been a large interest also in other fields of physics ranging from the design of new quantum optics setups [28], the use of symbolic regression and graph neural networks to find new laws of astro-cnot(0, 1) z(0) swap(0,1) h(0) physics [29], to the general development of algorithms capable to formulate scientific laws [30].
In [31], the idea of extraction of building blocks is used in a reinforcement learning setting to produce quantum entangled states.Also, projective simulation [32] allows employing reinforcement learning agents to explore novel algorithms for quantum communication [33].While these techniques share the idea of concept extraction to solve a specific quantum task, in the usual reinforcement learning setting the discovery of new components is an incidental effect, obtained while achieving the task.In our case, the goal of circuit decomposition itself is the discovery of new useful gates, by minimizing the overall complexity of the solutions, quantified in terms of description length [34,35].

Methods
In this section, we explain how our algorithm for unitary synthesis and gate extraction works.As shown in Fig. 2, quantum circuits can be seen as programs that subsequently apply operations to a given state: starting from the identity matrix, representing a circuit without any gates, each operation corresponds to a multiplication by the unitary matrix associated to the applied gate.
The final unitary matrix is built up by sequentially multiplying all the unitary matrices.By working on programs that build up the sequence of operations that make a circuit, we can explore the space of possible unitary matrix decompositions with program synthesis.
To begin, we define a probability distribution over quantum circuits c.Our learning algorithm works by adjusting the distribution over c to make useful circuits more likely.The distribution depends on the set of allowed gates, G, together with the probability of each gate g ∈ G, which we write θ g .We decompose it as a product of the probability of choosing the specific gates and applying those gates to the selected qubits.We assume gates are generated independently at random, and that they attach to wires (i.e.qubits) drawn uniformly and independently at random: where 1 is the indicator function, yielding one iff the condition is fulfilled, and χ(c, g) the probability of connecting the gate to the specific qubits.
For example, we use uniform probability of associating a gate to any of the possible qubits: , with n g is the number of inputs to gate g, N c is the number of qubits in the circuit c.If the resulting circuit is not valid, for example if the inputs of a CNOT gate are repeated, it is automatically discarded, which leads to a small modification of the probability χ whose discussion we omit since it is not crucial for understanding the workings of the algorithm.
Notice that by optimizing the choice of the set of gates G, it is possible to make more complex circuits more likely.
Ultimately our aim is not to probabilistically score specific circuits but to learn a collection of gates that are valuable for solving a broad range of unitary synthesis problems.To that end, we assume that we have a training set of unitary matrices to decompose, collectively written U .Our algorithm tries to maximize the posterior probability of the gate set, given that it must solve every unitary in U : it finds the optimal gate set and optimal gate probabilities as where we employ Bayesian reasoning to write where P (G) is the prior over gate sets, P (θ|G) is the prior of gate weights of a given gate set, and U(c) is the operator that gives the unitary matrix associated to a given circuit.The above equation is computationally intractable because it includes summing over the infinite space of all circuits (inner sum over c).We introduce a tractable lower bound on Eq. 4 by only summing over a small set of possible circuits for each unitary.Writing B u for the small set of circuits we consider for unitary u, our objective function becomes lower-bounded by Eq. 5 serves as our core objective function for learning a library of gates.A more detailed derivation of this objective is given in the Appendix B, and in the original work [26].Maximizing it with respect to (G, θ) corresponds to updating our gate set to increase the probability of a circuit solving each unitary.Maximizing it with respect to B u corresponds to program synthesis: finding a handful of likely circuits that evaluate to a given unitary.More precisely, our system takes as inputs the example unitary matrices to learn to decompose, U , together with a set of initial elementary gates, G 0 .The unitaries provided as examples in U determine which assemblies of gates are the most useful, thus the optimal set of learned gates G * .The algorithm iterates many times through two phases: program synthesis, where circuits that decompose target matrices are proposed, and library learning, where concepts are extracted from the found circuits and the most useful sequences of gates are added to the set of elementary gates G as a composite gate.It can then use the new gate as a single block in the subsequent iterations of program synthesis.Mathematically, program synthesis corresponds to maximizing Eq. 5 w.r.t.B u , while library learning corresponds to maximizing w.r.t.(θ, G).A sketch of the algorithm is shown in Fig. 1.

Program Synthesis
During this phase of the algorithm, we seek the top k most likely programs solving each unitary: where arg k-max c is the function that returns the arguments with the largest k values.To find those top k circuits, we enumerate programs in order of decreasing probability under P (•|G, θ) until k solutions have been found or we reach a timeout.We construct the syntax trees of candidate programs bottom-up, with higher probability expressions being generated first, using recent algorithms for probabilistic program enumeration [36,37].As an optimization, we discard any programs containing subexpressions that are semantically equivalent to higher-probability subexpressions, meaning they evaluate to the same unitary (in the literature called pruning by observational equivalence [38]).Within our implementation, we search for a maximum budget of 200 seconds and collect the top k = 2 programs for each unitary.
In practice, to accelerate the convergence in the algorithm, we do not update the programs for every unitary at each iteration.Instead, we sample a small batch of unitaries and only synthesize programs for those tasks.This is analogous to the use of mini batching for training neural networks using gradient descent [12].Essentially, it allows taking fast, small learning steps (updating the set of available gates G) without examining and analyzing the entire training set.
Although enumeration may seem like a very basic program synthesis strategy, our goal is to learn a sophisticated set of gates G such that even a simple enumerative search can quickly uncover interesting unitaries.Thus, the ability of the program synthesis to succeed hinges critically on learning a good gate library G, which we describe next.

Library building
During the library building step, we augment the library of gates by adding new compositions of gates that the system itself proposes.We do this by analyzing the circuits found during the program synthesis phase and extracting commonly occurring patterns of gates.Adding these new patterns of gates to G increases the probability of generating circuits that use them.
On the other hand, the goal of library building is not to simply memorize every successful circuit, even though memorizing would most increase the probability of the programs found so far.Instead, we want to find new gates that generalize the patterns found in the synthesized programs.Striking the right balance is accomplished by prioritizing gate sets that are compressive, i.e. have small description length [34,35].Remembering that our goal is to maximize Eq. 4, we see that we need to not just make the circuits likely under G, but also have a G with a high prior probability P (G).
Our system uses a prior that assigns less probability to larger sets of gates and to gates with many subcomponents, which exerts pressure for proposing new gates that are small, yet broadly useful across many tasks.Algorithmically, our system proposes new gates by extracting fragments of program syntax trees discovered during the previous program synthesis phase.Given a set of candidate new gates, G , it then constructs a set of candidate new libraries that extend the old library by exactly one gate: {G ∪ {g } : g ∈ G }.For each such G ∪ {g }, the system estimates a new θ using Expectation-Maximization [35].The system finally computes the objective function in Eq. 5, and takes the gate which most increases it.This entire process repeats until Eq. 5 fails to improve, and then another round of program synthesis begins.See [26] for details.
Despite the apparent simplicity of the synthesis step, the overall algorithm is substantially more efficient than simple brute-force enumeration, as each component is used according to its assigned probability, and branches of the tree are pruned as they are discovered to be equivalent to already known branches.Also, the addition of the extracted gates to the set of elementary gates increases the breadth of the search tree (more components to choose from at each step) but reduces the required depth of the search (number of components to put together one after the other).Without these tricks, there would be a combinatorial explosion with the depth of the tree (e.g.O((gn) d ) with g elementary gates, n qubits and depth of the tree d, considering only 1-qubit gates in this rough estimate).It would be just unfeasible to decompose very long circuits, and this is why reducing the depth to explore is so helpful.Using the probabilistic guidance of the learned (G, θ), we can discover a circuit c in at most O(1/P (c|G, θ)), which may be much better than O((gn) d ) if the target circuit c employs similar computational motifs to the training data.In some sense, this algorithm allows us to learn a domain-specific language for quantum circuits, by discovering a good prior to guide the circuit synthesis.

Results and discussion
In this section, we show the application of our unitary matrix decomposition algorithm using an elementary set of gates, which can theoretically approximate any circuit.We show results when enforcing either full connectivity between qubits, or only allowing gates between nearestneighboring qubits (e.g. between qubit 0 and 1 but not 0 and 2).To make the search faster and focus on the proof of concept, we limit ourselves to discrete gates, i.e. gates that do not depend on a tunable real parameter: this would require an optimization over the parameter of the gates, in addition to the search among the possible programs.For simplicity, we also fix the number of qubits to be the same in the entire set.In particular, to avoid the combinatorial explosion due to input qubit combinations (a n-qubit gate should be tested on all permutations of qubit inputs), we limit our examples to circuits with only 3 qubits, but generalizations to larger circuits are of course possible.
We choose G 0 ={H, T, T † , CNOT} as the elementary gate set.This essentially corresponds to the Clifford gate set, plus T gates to make it a universal approximator [39,40].We also include the T † gate, which itself corresponds to seven T gates, to spifeed up the search.The choice of the target unitary set U is important, as the extracted gates will be selected to maximize the decomposition efficiency over those tasks.In general, some unitaries are much harder to approximate because they require many elementary gates.Sampling from the space of unitary matrices would generally produce matrices that are too hard to decompose when starting from our elementary set of gates and trying combinations of them.In our experiments, to be sure that it can be decomposed in a finite amount of search time, we build U by defining another set of gates, G tasks , which uses more high-level operations.We sample circuits from G tasks by randomly putting gates on the circuit.The unitaries associated with the sampled circuits will be the target for our algorithm.To keep the decomposition difficulty under control, also the gates in G tasks have no continuous parameters, in particular G tasks = {H, T, T † , S, X, Y, Z, SX, SX † , CNOT, CY, CZ, CS, CH, SWAP, iSWAP}.We first generate a set of matrices by enumerating all the possible circuits given by this set within 50 seconds.From this set, we select 1000 matrices to build U .To make sure that the train set always contains a significant fraction of both easy and difficult decomposition tasks, we choose them with uniform probability in the number of gates of the initial circuits, taking into account only circuits generated within the timeout.All other generated unitaries are included in our test set T , and they will be used to assess the performance of the algorithm.The target dataset thus contains tasks with different levels of difficulty, allowing the algorithm to gradually learn to solve more and more complicated tasks.It is important to include unitaries with (a) Evolution of the probability of using a certain gate (θ).Some extracted gates (in blue) become more important during the iterations, while some elementary gates (in black) become rarer.(b) Likelihood of decomposing the target unitaries with different elementary sets: the initial set of gates, the final one (which also includes the extracted gates), and the set of gates with which the target dataset has been generated.(c) Final probability of using a certain gate in G 100 ("weight") and in the found solutions ("frequency").In blue are shown the extracted gates, in black the initial ones (in decreasing order: CNOT, T, Hadamard, and T † ).(d) The first few most useful extracted gates.different decomposition lengths.Indeed, only after some tasks are solved it is possible to extract gates that can be used to solve other tasks since we need at least some elements in B k in Eq. 5.If all tasks are too complicated, the learning procedure will not start and each iteration will not provide any benefit, since it will always propose the same programs.In that case, the enumeration timeout should be increased until some solutions are found.
We perform 100 algorithm iterations, each time considering batches of 25 tasks and enumerating for 150 seconds (in parallel with 32 CPUs).We see that after some iterations we can solve most of the about 50000 of tasks in the test set.Every time we learn a new gate, the algorithm starts using it to explore new circuits thus solving more tasks.To evaluate the performance of the algorithm, we can consider the test set T (which the algorithm has never seen during the previous iterations) and check how many unitaries can be decomposed into a circuit.Results are shown in Fig. 3.After about 50 iterations, the algorithm can decompose almost all the proposed matrices.The algorithm rediscovers suitable decompositions of gates into the available elementary gates.For example, it finds useful elementary decompositions of high-level gates like the SWAP gate and the Pauli gates, adding them as building blocks to its library.Importantly, it goes beyond those simpler examples and discovers more complicated composite gates, which also seem to be useful to reuse.By discovering useful building blocks, matrices that are initially impossible to decompose in the given time budget because of the high number of required components are easily decomposed into short sequences of the newly extracted components.Indeed, as soon as a new gate is extracted, many new unitary matrices can immediately be decomposed.The "train" and "test" curves are obtained by checking all the found programs at a given iteration against each unitary matrix in the target set U and test set T .We recall that here we consider the whole target set for evaluation purposes, but the algorithm only has access to a small random batch of matrices at each iteration.As Fig. 3b shows, the difficulty to decompose a given matrix changes during the algorithm iterations: when new gates are added, some unitaries suddenly become easier to be decomposed, while other matrices, which mainly use the initial elementary components, become less likely to occur because those components become less frequent in the enumeration.Overall, all matrices become easier to decompose.
It is also interesting to inspect the library of extracted gates and try to interpret their behavior.As seen in Fig. 4, after discovering composite gates, initial gates like T become less useful and are used less often, while some new gates like "f18", which corresponds to a controlled Hadamard gate suitably decomposed into elementary gates, and "f4", which corresponds to a controlled Z gate, are used more often than CN OT .By looking at Fig. 4b, we see that, initially, it is generally very hard to find decompositions for the target matrices (in orange).After running the algorithm, we have a new set of gates (in blue) that allows decomposing most of the matrices.We go from problems with probability e −60 to be solved to e −20 , about 17 orders of magnitude larger probability, which make it feasible to find the decomposition in a finite amount of time.It is interesting to observe how a different choice of elementary components can make the decomposition easier.In particular, the extracted set of gates at the final iteration, G 100 , makes the decomposition of the target matrices even easier than when using exactly the same set we used to generate the target dataset itself (in green).In other words, our algorithm discovers a set of quantum gates to describe the target dataset that is even better than the set that we used to generate it, G tasks .For example, it turns out that it is much more useful to have two-qubit gates like CH and CZ than CNOT.
We also performed another experiment with the same parameters, but this time we constrained the 2-qubit gates to only act on neighboring qubits.This configuration resembles a linear array of qubits, where interactions are constrained to nearest neighbors.Also in this case, the algorithm learns to decompose more and more matrices with experience.We notice that this time a larger fraction cannot be decomposed yet even after 100 iterations.This is due to the larger difficulty of this problem, and with more iterations and larger enumeration timeout results would keep improving.Again, the algorithm learns more complicated gates and finds similar results as in the previous example.In addition, it also learns gates that allow it to efficiently handle the enforced connectivity constraint, like the SWAP gate between the first and the third qubit (by swapping with the middle one) and similar two and three-qubit gates.The automatic extraction of composite gates allows for expressing concisely very long sequences of gates.Results are shown in Fig. 5.
The final outcome of the algorithm depends of course on the chosen definition of being a better set of gates: in this case, we wanted to minimize the number of components to put in a circuit so that all target matrices could be decomposed with some high-level gates.However, different constraints can be considered, for example, to include the overall length of the circuit in terms of elementary components or a different cost for the use of each component.

Outlook
In this paper, we have shown how concept extraction and program synthesis techniques can potentially help quantum computing, by providing tools to work and reason with the quantum world.
In particular, we have shown a procedure to discover useful quantum gates (in terms of reusability) by just giving a set of unitary matrices to decompose.This can be seen as a first step toward the longer-term goal of enabling the discovery of new quantum algorithms.The extension to larger qubit numbers will require careful optimization of the performance of the different parts of the algorithm (search and library building), but we anticipate there is a lot of room for improvement here.Indeed, our experiments take about 20 minutes per iteration to run, and systems with more qubits would require much more time.It is also possible to test the generalization capabilities by running first on smaller systems and then trying to solve more complicated tasks on larger systems.To improve performance, it would be possible to restrict the decomposition to the Clifford gate set, so that calculation would be faster, e.g. by exploiting highly optimized Clifford simulators that can deal with large qubit numbers [41].To tackle the combinatorial explosion due to the larger number of possible qubits a gate can be applied to, more advanced approximations for the circuit distribution P (c) in Eq. ( 1) may be employed.For example, instead of factorizing the circuit distribution as the product of the probability of its gates, we could condition the probability of a gate to the previous ones, improving the precision of the enumerator.
One of the most important future extensions would concern the choice of the set of target unitaries.While in our case these were generated as random circuits from a high-level gate set, the application to more structured training sets would greatly increase the power of the approach.We are thinking, in particular, of circuits generated from a library of quantum algorithms.
Different connectivity constraints may also be enforced, or one could extend the programs that generate the circuits to also include programming constructs like conditions and loops.In the long run, the presented library bootstrapping procedure can be part of future algorithms to automatically extract components and reuse them in a curriculum-learning approach [42].Also, it would be possible to adopt this concept extraction algorithm as an additional step of a reinforcement learning agent [43] that tries to decompose a unitary matrix.In that case, the goal would be to train an agent (i.e. a probability distribution of putting a certain gate given the current circuit) to synthesize the unitary, where the state would be the current circuit, and the possible actions would be the allowed elementary gates.It would be possible to extend the action space by adding the extracted gates, thus facilitating the exploration of the circuit space, as in hierarchical reinforcement learning [44].
Finally, on a more general level, the ability to extract concepts and to use them in further exploration is also interesting for the development of future "artificial scientist" algorithms, here applied to the quantum domain and specifically quantum computation, aimed at reasoning and developing scientific models similarly to humans: the possibility to define concepts and reason about them is reasonably a necessary skill for this purpose.Similar techniques can be a useful addition to existing machine learning algorithms for quantum circuit design, and, in the long run, they may help to develop new quantum algorithms.
The code of the algorithm and the instructions to reproduce the presented examples are opensourced on GitHub 1 .q 0 q 1 X Figure 6: Example circuit

A The program synthesis algorithm
To apply the program synthesis framework presented in [26], we need to express quantum circuits as programs.The specific formalism that we adopt is that of functional programming, typed-λ-calculus [45] in particular.Each quantum gate becomes a function that takes as input a quantum circuit and the sequence of qubits on which it should act, and returns a quantum circuit with the requested gate applied on the right.In this way, a program is simply the sequential application of many gates, starting from the empty circuit I.For example, the circuit in Fig. 6 can be expressed as f = cnot(x(I,0),0,1) Lambda calculus allows representing and working with these kinds of expressions efficiently, so that each function and its arguments are associated to the leaves of a tree and new programs can be obtained by modifying existing trees.In this language, the previous function is expressed as where $0 is the initial input circuit and $i the qubit index (increased by one).For the technical description of the lambda calculus programs as trees and their advantages, we refer to the DreamCoder paper [26].

B Probabilistic framing
In this section, we give some more details about the probabilistic framing that yields the optimization objective of our algorithm, while still referring to [26] for the full treatment.We want to find the optimal set of gates G * and the optimal single gate probabilities θ * = {θ g , ∀g ∈ G}.For this purpose, we perform Bayesian inference using Bayes theorem [46].The posterior reads where P (G, θ) is the prior, P (U |G, θ) is the likelihood, and P (U ) the marginal probability of the evidence.We can consider U as constant because it does not depend on this set of gates, hence we drop the denominator from the above equation (Eq.2): By definition, we can factorize the joint distribution as P (G, θ) = P (G)P (θ|G), and write the likelihood as where P (u|G, θ) is the probability of a specific unitary u given G and θ, because of the independence between different unitary matrices in the dataset.The probability of a specific unitary can in turn be written as a function of the probability of generating it from sequences of gates in G:  : Estimated performance on the target set of matrices during algorithm iterations, calculated on the effectively seen matrices (in green), and on the complete set (in blue).Using batches of tasks adds some stochasticity to the algorithm (which helps to make it more robust) and speeds up the library building routine, but increases the number of required iterations.library building phase since we limit the number of unitary matrices each enumerated circuit should be tested against.Of course, an additional consequence is that more iterations are overall needed, since it takes N/N batch iterations on average just to check against all the tasks.The algorithm learns to decompose new matrices, but it accounts for that only when those are selected as target tasks.As shown in Fig. 7, the performance that would be inferred at train time is, therefore, lower than the effective one on the train set, just because we don't check all the tasks at each iteration.The "seen train" curve (calculated by considering only the decomposed matrices in the iteration batch) takes more time to take advantage of the discovered gates since at each iteration it is only evaluated on a subset of the total elements.The same set, but completely evaluated at each iteration of the algorithm, is shown as "train" set (this is the same curve as in Fig. 3).We see that, even if the algorithm can decompose all the matrices (for example after iteration 30), it can still invent new gates to make the decomposition easier in terms of the number of used gates.
To produce plots of the circuits we use the qiskit library [47].The complete code to run the algorithm and reproduce the experiments is available on GitHub2 .

D Experiments
In this Appendix, we show more details about the experiments we presented in the main text.In Table 3, we show the list of all the gates that the algorithm extracted to solve the proposed tasks, in the case where no connectivity constraints were enforced (i.e. Figure 3 in the main text).

# Gate representation Expanded circuit Program
Table 2 shows some examples of unitary matrices in our set (expressed in terms of the circuit that we used to generate the matrix) and the decompositions proposed by our algorithm.We see that the algorithm also finds unexpected ways to decompose the associated unitary matrix, which does not necessarily involve the use of exactly the same blocks we used to build it.

Figure 3 :
Figure 3: Unitary synthesis with full connectivity among qubits.(a) Fraction of solved tasks at each iteration.(b)Likelihood to decompose some target unitaries during the algorithm iterations (P (u|G, θ)).The star symbol shows the first iteration in which the target matrix has been successfully decomposed.As new gates are discovered, some matrices become easier to decompose.For three tasks, we show the circuit that generates the target matrix in the insets.In both figures, the blue symbols mark the iterations at which a new gate has been extracted.

Figure 4 :
Figure4: Analysis of extracted gates.(a) Evolution of the probability of using a certain gate (θ).Some extracted gates (in blue) become more important during the iterations, while some elementary gates (in black) become rarer.(b) Likelihood of decomposing the target unitaries with different elementary sets: the initial set of gates, the final one (which also includes the extracted gates), and the set of gates with which the target dataset has been generated.(c) Final probability of using a certain gate in G 100 ("weight") and in the found solutions ("frequency").In blue are shown the extracted gates, in black the initial ones (in decreasing order: CNOT, T, Hadamard, and T † ).(d) The first few most useful extracted gates.

Figure 5 :
Figure 5: Unitary synthesis with only nearest-neighbor connectivity among qubits.(a) Percentage of solved tasks at each iteration.(b) Some extracted gates.(c) Example decomposition of a high-level circuit into the requested gate set.

4 solutionsFigure 7
Figure7: Estimated performance on the target set of matrices during algorithm iterations, calculated on the effectively seen matrices (in green), and on the complete set (in blue).Using batches of tasks adds some stochasticity to the algorithm (which helps to make it more robust) and speeds up the library building routine, but increases the number of required iterations.

Table 1 :
List of extracted gates after 100 iterations, with no connectivity constraints between qubits.

Table 1 :
List of extracted gates after 100 iterations, with no connectivity constraints between qubits.

Table 2 :
Examples of decomposed matrices.