A strategy for quantum algorithm design assisted by machine learning

Jeongho Bang; Junghee Ryu; Seokwon Yoo; Marcin Pawłowski; Jinhyoung Lee

doi:10.1088/1367-2630/16/7/073017

1. Introduction

Quantum information science has seen explosive growth in recent years, as a more powerful generalization of classical information theory [1]. In particular, quantum computation has received momentum from quantum algorithms that outperform their classical counterparts [2–5]. Thus, the development of quantum algorithms is one of the most important areas of computer science. However, unfortunately, recent research on quantum algorithm design is rather stagnant, compared to other areas in quantum information, as scarcely any new quantum algorithms have been discovered in the past few years [6]. We believe that this is due to the fact that we—the designers—are used to classical logic. Thus we think that quantum algorithm design should turn towards new methodology, different from that of the current approach.

Machine learning is a well-developed branch of artificial intelligence and automatic control. Although 'learning' is often thought of as a uniquely human trait, a machine being given feedback (taught) can improve its performance (learn) in a given task [7, 8]. In the past few decades, there has been a growing interest not only in theoretical studies of but also in a variety of applications of machine learning. Recently, many quantum implementations of machine learning have been introduced to achieve better performance for quantum information processing [9–13]. These works motivate us to look at machine learning as an alternative approach for quantum algorithm design.

Keeping our primary goal in mind, we ask whether a quantum algorithm can be found by the machine that also implements it. On the basis of this idea, we consider a machine which is able to learn quantum algorithms in a real experiment. Such a machine may discover solutions which are difficult for humans to find because of our classical way of thinking. Since we can always simulate a quantum machine on a classical computer (though not always efficiently), we can use such simulations to design quantum algorithms without the need for a programmable quantum computer. This classical machine can thus be regarded as a simulator that learns a quantum algorithm—a so-called learning simulator. The novelty of such a learning simulator lies in its capabilities of 'learning' and 'teaching'. With regard to these abilities, we consider two internal systems: one is a learning system (the 'student' say), and the other is a main-feedback system (the 'teacher' say). While the standard approach is to assume that both student and teacher are quantum machines, here we use a quantum–classical hybrid simulator such that the student is a quantum machine and the teacher a classical machine. Such a hybridization is easier and more economical to realize if any algorithms can be learned.

In this paper, we employ a learning simulator for quantum algorithm design. The main question of this work is: 'Can our learning simulator help in designing a quantum algorithm?' The answer to this question is affirmative, as it is shown, in Monte Carlo simulations, that our learning simulator can faithfully learn appropriate elements of a quantum algorithm for solving an oracle decision problem, called the Deutsch–Jozsa problem. The algorithms found are equivalent, but not exactly equal, to the original Deutsch–Jozsa algorithm. We also investigate the learning time, as it becomes important in application not only due to the large-scale problems often arising in machine learning but also because, in its learning, our simulator will exhibit the quantum speedup (if any) of an algorithm to be found, as described later. We observe that the learning time is proportional to the square root of the total number of parameters, in contrast to showing the exponential tendency found in classical machine learning. We expect our learning simulator to reflect the quantum speedup of the algorithm found in its learning, possibly in synergy with the finding that the size of the parameter space can be significantly smaller for quantum algorithms than for their classical counterparts [14]. We note that the method presented is aimed at a real experiment, in contrast to the techniques of [15, 16].

2. The basic architecture of the learning simulator

Before discussing the details of the learning simulator, it is important to have an understanding of what machine learning is. A typical task of machine learning is to find a function $f\left( x \right)={{t}_{x}}$ for the input x and the target ${{t}_{x}}$ based on the observations in supervised learning, or to find some hidden structure in unsupervised learning [7, 8]. The main difference between supervised and unsupervised learning is that in the latter case the target ${{t}_{x}}$ is unknown. Throughout this paper, we consider supervised learning where the target ${{t}_{x}}$ is known.

We now briefly sketch our method (see also figure 1). To begin, a supervisor defines the problem to be solved, and arranges the necessary prerequisites (e.g., the input–target pairs (x, ${{t}_{x}}$ ), and a function Q produced by a non-trivial device, the so-called oracle) for learning. The preliminary information is passed to the learning simulator at once. The simulator encodes the communicated information on its own elements. We note here that one could consider two main tasks in designing a quantum algorithm. The first is to construct a useful form of quantum oracle, and the second is to find another incorporating quantum operation(s) to maximize the quantum advantages, such as superposition engaging parallelism [17] or entanglement [18]. We focus here on the latter⁵ . Note, however, that it is necessary to define a specific oracle operation (see appendix A). This task is also performed, by the supervisor, at this preliminary stage.

**Figure 1.** Schematic picture of our method. A supervisor defines the problem to be solved and arranges the necessary prerequisites for learning the quantum algorithm. All of this information is communicated to the learning simulator at once. The simulator encodes the information on its own elements. The simulator consists of quantum elements, i.e., preparation P, operation U, and measurement M, assisted by the classical main feedback F. The classical channels ${{\mathcal{C}}_{MF}}$ and ${{\mathcal{C}}_{FU}}$ enable one-way communication from M to F and from F to U.
Download figure:
Standard image High-resolution image

We then describe the basic elements of the learning simulator in figure 1. The simulator consists of two internal parts. One is the learning system which is supposed to eventually perform a quantum algorithm, and the other is the feedback system responsible for teaching the former. The learning system consists of the standard quantum information-processing devices: preparation P for preparing a pure quantum state, operation U for performing a unitary operation, and measurement M. Here, the chosen quantum oracle is involved in U. On the other hand, the feedback system is classical, as this is easier and less expensive to realize in practice. Furthermore, by employing classical feedback, we can use a well-known (classical) learning algorithm whose performance has already been proved to be reliable. Recently, a scheme for machine learning involving quantum feedback has been reported [21], but the usefulness of the quantumness has not been clearly elucidated, even though the reported results are meaningful in some applications. Moreover, it is unclear at present whether any classical feedback is applicable to quantum algorithm design. Consequently, it is preferred to use classical feedback in this work. In this sense, our simulator is a quantum–classical hybrid. The feedback system is equipped with a main-feedback device F which involves the classical memory $\mathcal{S}$ and the learning algorithm $\mathcal{A}$ . $\mathcal{S}$ records the control parameters of U and the measurement results of M. $\mathcal{A}$ corresponds to a series of rules for updating U.

We illustrate how our simulator performs the learning. Let us start with the set of K input–target pairs communicated from the supervisor:

$\begin{eqnarray}&&T=\left\{ \left( {{x}_{1}},f\left( {{x}_{1}} \right) \right),\left( {{x}_{2}},f\left( {{x}_{2}} \right) \right),\ldots ,\left( {{x}_{K}},f\left( {{x}_{K}} \right) \right) \right\},\end{eqnarray} \tag{ 1 }$

where f is a function that transforms the inputs ${{x}_{i}}$ into their targets⁶ . The main task of the simulator is to find f. Firstly, an initial state $\left| {{\Psi }_{{\rm in}}} \right\rangle$ is prepared in P and transformed to $\left| {{\Psi }_{{\rm out}}} \right\rangle$ by U. Then M performs measurement on $\left| {{\Psi }_{{\rm out}}} \right\rangle$ with a chosen measurement basis. The measurement result is delivered to F through ${{\mathcal{C}}_{MF}}$ . Note here that the information about the initial state $\left| {{\Psi }_{{\rm in}}} \right\rangle$ and the measurement basis encoded in P and M is also determined by the supervisor before the learning. Finally, F updates U on the basis of $\mathcal{A}$ . Basically, the learning is just the repetition of these three steps. When the learning is completed, we obtain a P–U–M device to implement f by simply removing F. The supervisor, then, investigates whether the P–U–M device found provides any speedup reducing the overall oracle references, or saves any computational resources for implementing the algorithm [22]. In particular, the supervisor would standardize the identified operations U as an algorithm. Here, we clarify that the input information in T and the measurement results are classical. Nevertheless, the simulator is supposed to exploit quantum effects in learning, because the operations before measurement are all quantum. This assumption is supported by recent theoretical studies that show the improvement of learning efficiency achieved by using quantum superposition [14, 23].

3. Construction of the learning simulator

The general design of the learning simulator depicted in figure 1 works fine for problems such as number factorization. However, in problems requiring a large number of oracle references, the input is the oracle itself and, by definition, it is a (unitary) transformation rather than a string of bits. To allow for the input in the form of a unitary matrix we need to refine our simulator a little (but let us stress that this does not mean that our method is not general). The refined version depicted in figure 2 allows the simulator to learn an algorithm of iterative type. The difference in learning simulator stems directly from the formulation of the problems.

**Figure 2.** Architecture of our simulator for learning a quantum algorithm, where the unitary operation U consists of three sub-operations (see the text).
Download figure:
Standard image High-resolution image

The most important aspect of the refined learning simulator is the decomposition of U. In order to deal with both classical and quantum information, we divide U into three sub-devices, such that

$\begin{eqnarray}&&{{\hat{U}}_{{\rm tot}}}={{\hat{U}}_{3}}\left( {{{\bf p}}_{3}} \right){{\hat{U}}_{2}}\left( {{{\bf p}}_{2}} \right){{\hat{U}}_{1}}\left( {{{\bf p}}_{1}} \right),\end{eqnarray} \tag{ 2 }$

where ${{\hat{U}}_{tot}}$ is the total unitary operator, and ${{\hat{U}}_{j}}$ $(j=1,2,3)$ denotes the unitary operator of the jth sub-device. Here, ${{\hat{U}}_{1}}$ and ${{\hat{U}}_{3}}$ are n-qubit controllable unitary operators, whereas ${{\hat{U}}_{2}}$ is the oracle for encoding the input ${{x}_{i}}$ . By 'controllable' we mean here, and throughout the paper, that they can be changed by the feedback.

The unitary operators are generally parameterized as

$\begin{eqnarray}&&\hat{U}\left( {\bf p} \right)={\rm exp} \left( -{\rm i}{{{\bf p}}^{{\rm T}}}{\bf G} \right),\end{eqnarray} \tag{ 3 }$

where ${\bf p}={{\left( {{p}_{1}},{{p}_{2}},\ldots ,{{p}_{{{d}^{2}}-1}} \right)}^{{\rm T}}}$ is a real vector in $\left( {{d}^{2}}-1 \right)$ -dimensional Bloch space for $d={{2}^{n}}$ , and ${\bf G}={{\left( {{{\hat{g}}}_{1}},{{{\hat{g}}}_{2}},\ldots {{{\hat{g}}}_{{{d}^{2}}-1}} \right)}^{{\rm T}}}$ is a vector whose components are SU(d) group generators [24, 25]. The components ${{p}_{j}}\in \left[ -\pi ,\pi \right]$ of ${\bf p}$ can be directly matched to control parameters in some experimental schemes, e.g., beam-splitter and phase-shifter alignments in linear optical systems [26] and radio-frequency (rf) pulse sequences in nuclear magnetic resonance (NMR) systems [27]. In that sense, we call ${\bf p}$ a control-parameter vector. Here, ${{{\bf p}}_{2}}$ is determined by $Q\left( {{x}_{i}} \right)\mapsto {{{\bf p}}_{2}}\left( {{x}_{i}} \right)$ , as described above. In such a setting, we expect our simulator to learn an optimal set of $\left\{ {{{\bf p}}_{1}},{{{\bf p}}_{3}} \right\}$ , such that ${{\hat{U}}_{1}}$ and ${{\hat{U}}_{3}}$ will solve a given problem.

Our simulator is actually well-suited to learning even iterative algorithms, such as Groverʼs [5]. We envisage using our simulator as follows: in the first stage, apply ${{\hat{U}}_{1}}$ to an input state, then ${{\hat{U}}_{2}}$ which is a non-trivial operation, say the oracle, and finally ${{\hat{U}}_{3}}$ , to generate an output state. The feedback system updates ${{\hat{U}}_{1}}$ and ${{\hat{U}}_{3}}$ . Then, after a certain number of iterations which do not lead to any improvements, our simulator goes to the second stage, where the output state is fed back to be the input state to apply ${{\hat{U}}_{1}}$ – ${{\hat{U}}_{2}}$ – ${{\hat{U}}_{3}}$ again. Therefore, in the second stage, the oracle is referenced twice. If it fails again, it will try to loop three times at the third stage. By some number of stages, there will be enough oracle references to solve the problem. In such a way, our simulator can learn even a quantum algorithm of iterative type⁷ , without adopting any additional sub-devices and altering the structure in a real experiment. Thus, the scalability for the size of the search space is only concerned with the number of control parameters in ${{\hat{U}}_{1}}$ and ${{\hat{U}}_{3}}$ , given by $D=2\left( {{d}^{2}}-1 \right)$ , where $d={{2}^{n}}$ .

Here, we highlight another subsidiary question: how long does it take for our simulator to learn an (almost) deterministic quantum algorithm? Investigating this issue will become increasingly important, especially in the application of our simulator to very large-scale (i.e., $D\gg 1$ ) problems. One may raise the objection that our simulator runs extremely slowly for large size problems. On the other hand, however, it is also likely that, in its learning, our simulator enjoys the quantum speedup, if any, of the algorithm to be found. To see this, consider two cases: a classical algorithm and a quantum algorithm which our simulator tries to find, assuming that they are of different complexities in terms of the number of oracle queries. For instance, the quantum one may query a polynomial number of oracles, whereas the number of queries for the classical one increases exponentially with respect to the problem size. Regardless of the methods of realization, a learning simulator cannot reduce the number of stages to less than the number of oracle queries in a given algorithm to be found. This is reflected in the learning time. In other words, our simulator may show the learning speedup, exploring far fewer stages in the learning of the quantum algorithm, as long as the algorithm to be found exhibits quantum speedup. These controversial arguments require us to investigate the learning time as well as the effectiveness of our simulator.

4. Application to the Deutsch–Jozsa problem

As a case study, consider an n-bit oracle decision problem, called the Deutsch–Jozsa (DJ) problem. The problem is deciding whether some binary function ${{x}_{i}}:{{\left\{ 0,1 \right\}}^{n}}\to \left\{ 0,1 \right\}$ is constant ( ${{x}_{i}}$ generates the same value 0 or 1 for every input) or balanced ( ${{x}_{i}}$ generates 0 for exactly half of the inputs, and 1 for the rest of the inputs) [2, 3]. On a classical Turing machine, ${{2}^{n-1}}+1$ queries are required to solve this problem. If we use a probabilistic random classical algorithm, we can determine the function ${{x}_{i}}$ with a small error, less than ${{2}^{-q}}$ , by making q queries [28, 29].

On the other hand, the DJ quantum algorithm solves the problem with only a single query [29, 30]. The DJ quantum algorithm runs as follows: first, apply ${{\hat{H}}^{\otimes n}}$ on the input state $\left| {{\Psi }_{{\rm in}}} \right\rangle =\left| 00\cdots 0 \right\rangle$ , then ${{\hat{U}}_{x}}$ to evaluate the input function, and finally ${{\hat{H}}^{\otimes n}}$ again to produce an output state $\left| {{\Psi }_{{\rm out}}} \right\rangle$ . Here, $\hat{H}$ is the Hadamard gate which transforms the qubit states $\left| 0 \right\rangle$ and $\left| 1 \right\rangle$ into equal superposition states $\hat{H}\left| 0 \right\rangle =(\left| 0 \right\rangle +\left| 1 \right\rangle )/\sqrt{2}$ and $\hat{H}\left| 1 \right\rangle =(\left| 0 \right\rangle -\left| 1 \right\rangle )/\sqrt{2}$ respectively. ${{\hat{U}}_{x}}$ is the function-evaluation gate that calculates a given function ${{x}_{i}}$ . It is defined by its action,

$\begin{eqnarray}&&{{\hat{U}}_{x}}\left| {{k}_{1}}{{k}_{2}}\cdots {{k}_{n}} \right\rangle ={{e}^{{\rm i}\pi {{x}_{i}}\left( {{k}_{1}}{{k}_{2}}\cdots {{k}_{n}} \right)}}\left| {{k}_{1}}{{k}_{2}}\cdots {{k}_{n}} \right\rangle ,\end{eqnarray} \tag{ 4 }$

where ${{k}_{1}}{{k}_{2}}\cdots {{k}_{n}}\in {{\left\{ 0,1 \right\}}^{n}}$ is the binary sequence of the computational basis. Then, the output state is given as

$\begin{eqnarray}\left| {{\Psi }_{{\rm out}}}\left( {{x}_{i}} \right) \right\rangle =\left\{ \begin{array}{ccccccccccccccc} \pm \left| 00\cdots 0 \right\rangle , & {\rm if}\quad {{x}_{i}}\in C \\ \pm \left| {{z}_{1}}{{z}_{2}}\cdots {{z}_{n}} \right\rangle , & {\rm if}\quad {{x}_{i}}\in B \\ \end{array} \right.\end{eqnarray} \tag{ 5 }$

where C and B are the sets of constant and balanced functions, respectively, and the binary components ${{z}_{j}}\in \left\{ 0,1 \right\}$ $\left( j=1,2,\ldots ,n \right)$ depend on the $\left( \frac{d}{d/2} \right)$ balanced functions (excepting that ${{z}_{j}}=0$ for all j). In the last step, von Neumann measurement is performed on the output state. The corresponding measurement operator is given by $\hat{M}=\left| 00\;\cdots \;0 \right\rangle \left\langle 00\;\cdots \;0 \right|$ . The other projectors constituting the observable are irrelevant because we are interested only in the probabilities associated with the first case,

$\begin{eqnarray}&&{{P}_{C}}=\left\langle {{\Psi }_{{\rm out}}}\left( {{x}_{i}} \right) \right|\hat{M}\left| {{\Psi }_{{\rm out}}}\left( {{x}_{i}} \right) \right\rangle =1,\quad {\rm if}\quad {{x}_{i}}\in C,\end{eqnarray} \tag{ 6 }$

and the second case,

$\begin{eqnarray}&&{{P}_{B}}=\left\langle {{\Psi }_{{\rm out}}}\left( {{x}_{i}} \right) \right|\hat{M}\left| {{\Psi }_{{\rm out}}}\left( {{x}_{i}} \right) \right\rangle =0,\quad {\rm if}\quad {{x}_{i}}\in B.\end{eqnarray} \tag{ 7 }$

Therefore it is promised that the function ${{x}_{i}}$ is either constant or balanced by only a single oracle query.

We are now ready to apply our method to the DJ problem. To begin, the supervisor prepares the set of input–target pairs,

$\begin{eqnarray}&&T=\left\{ \left( {{x}_{i}},f\left( {{x}_{i}} \right) \right)|f({{x}_{i}})={\rm c}\ {\rm if}\ {{x}_{i}}\in C\ {\rm and}\ f({{x}_{i}})={\rm b}\ {\rm if}\ {{x}_{i}}\in B \right\}\end{eqnarray} \tag{ 8 }$

The learning simulator is to find the 'functional' f now, by adjusting ${{\hat{U}}_{1}}$ and ${{\hat{U}}_{3}}$ . The input functions ${{x}_{i}}$ are encoded in ${{{\bf p}}_{2}}\left( {{x}_{i}} \right)$ of ${{\hat{U}}_{2}}$ . Here, we chose the same form of the oracle as in equation (4), i.e., type (ii). Then P prepares an arbitrary initial state $\left| {{\Psi }_{{\rm in}}} \right\rangle$ and M performs the measurement on each qubit. Here we introduce a function to apply a measurement result to one of the targets (in our case, 'c' or 'b'). We call this the interpretation function. Note that the interpretation function is also to be learned, because, in general, any a priori knowledge of the quantum algorithm to be found is completely unknown. For the sake of convenience, we consider a Boolean function that transforms the measurement result ${{z}_{1}}{{z}_{2}}\;\cdots \;{{z}_{n}}$ to 0 (equivalently, 'c') only if ${{z}_{j}}=0$ for all $j=1,2,\ldots ,n$ , and otherwise 1 (equivalently, 'b'). One may generalize the interpretation function to a function ${{\left\{ 0,1 \right\}}^{n}}\to {{\left\{ 0,1 \right\}}^{m}}$ if one is interested in any other problems that contain many targets less than ${{2}^{m}}$ [31].

5. The learning algorithm for differential evolution

One of the most important parts of our method is the choice of a learning algorithm $\mathcal{A}$ . The efficiency and accuracy of machine learning are heavily influenced in general by the algorithm chosen. We employ so-called 'differential evolution', as it is known as one of the most efficient optimization methods [32]. We implement the differential evolution as follows. To begin, we prepare ${{N}_{{\rm pop}}}$ sets of the control-parameter vectors: $\left\{ {{{\bf p}}_{1,i}},{{{\bf p}}_{3,i}} \right\}$ ( $i=1,2,\ldots ,{{N}_{{\rm pop}}}$ ). Thus we have $2{{N}_{{\rm pop}}}$ parameter vectors in total. They are chosen initially at random and recorded on $\mathcal{S}$ in F.

[ ${\bf L}.{\bf 1}$ ] Then, $2{{N}_{{\rm pop}}}$ mutant vectors ${\boldsymbol{ \nu }} n{{u}_{k,i}}$ are generated for ${{\hat{U}}_{k}}$ (k = 1,3), according to

$\begin{eqnarray*}&&{{{\boldsymbol{ \nu }} }_{k,i}}={{{\bf p}}_{k,a}}+W\left( {{{\bf p}}_{k,b}}-{{{\bf p}}_{k,c}} \right),\end{eqnarray*}$

where ${{{\bf p}}_{k,a}}$ , ${{{\bf p}}_{k,b}}$ , and ${{{\bf p}}_{k,c}}$ are randomly chosen for $a,b,c\in \left\{ 1,2,\ldots ,{{N}_{{\rm pop}}} \right\}$ . These three vectors are chosen to be different from each other; for that, ${{N}_{{\rm pop}}}\geqslant 3$ is necessary. The free parameter W, called a differential weight, is a real and constant number.

[ ${\bf L}.{\bf 2}$ ] After that, all $2{{N}_{{\rm pop}}}$ parameter vectors

$\begin{eqnarray*}&&{{{\bf p}}_{k,i}}=\left( {{p}_{k,1}},{{p}_{k,2}},\ldots ,{{p}_{k,{{d}^{2}}-1}} \right)_{i}^{{\rm T}}\end{eqnarray*}$

are reformulated as trial vectors

$\begin{eqnarray*}&&{{{\boldsymbol{ \tau }} }_{k,i}}=\left( {{\tau }_{k,1}},{{\tau }_{k,2}},\ldots ,{{\tau }_{k,{{d}^{2}}-1}} \right)_{i}^{{\rm T}}\end{eqnarray*}$

by means of the following rule: for each j,

$\begin{eqnarray}\left\{ \begin{array}{ccccccccccccccc} {{\tau }_{k,j}}\leftarrow {{p}_{k,j}} & {\rm if}\quad {{R}_{j}}>{{C}_{r}}, \\ {{\tau }_{k,j}}\leftarrow {{\nu }_{k,j}} & {\rm otherwise}, \\ \end{array} \right.\end{eqnarray} \tag{ 9 }$

where ${{R}_{j}}\in \left[ 0,1 \right]$ is a randomly generated number and the crossover rate ${{C}_{r}}$ is another free parameter between 0 and 1.

[ ${\bf L}.{\bf 3}$ ] Finally, $\left\{ {{{\boldsymbol{ \tau }} }_{1,i}},{{{\boldsymbol{ \tau }} }_{3,i}} \right\}$ are taken for the next iteration if ${{\hat{U}}_{1}}\left( {{{\boldsymbol{ \tau }} }_{1,i}} \right)$ and ${{\hat{U}}_{3}}\left( {{{\boldsymbol{ \tau }} }_{3,i}} \right)$ yield a fitness value larger than that from ${{\hat{U}}_{1}}\left( {{{\bf p}}_{1,i}} \right)$ and ${{\hat{U}}_{3}}\left( {{{\bf p}}_{3,i}} \right)$ ; if not, $\left\{ {{{\bf p}}_{1,i}},{{{\bf p}}_{3,i}} \right\}$ are retained. Here the fitness ${{\xi }_{i}}$ is defined by

$\begin{eqnarray}&&{{\xi }_{i}}=\frac{{{P}_{C,i}}+\left( 1-{{P}_{B,i}} \right)}{2},\end{eqnarray} \tag{ 10 }$

where ${{P}_{C,i}}$ and ${{P}_{B,i}}$ are measurement probabilities for the ith set, given by equations (6) and (7). While evaluating the ${{N}_{{\rm pop}}}$ fitness values, F records on $\mathcal{S}$ the best ${{\xi }_{{\rm best}}}$ and its corresponding parameter vector set $\left\{ {{{\bf p}}_{1,{\rm best}}},{{{\bf p}}_{3,{\rm best}}} \right\}$ .

The above steps [ ${\bf L}.{\bf 1}$ ]–[ ${\bf L}.{\bf 3}$ ] are repeated until ${{\xi }_{{\rm best}}}$ reaches close to 1. In an ideal case, the simulator finds $\left\{ {{{\bf p}}_{1,{\rm best}}},{{{\bf p}}_{3,{\rm best}}} \right\}$ yielding ${{\xi }_{{\rm best}}}=1$ with ${{P}_{C}}=1$ and ${{P}_{B}}=0$ . The parameters found lead to an algorithm equivalent to the original DJ one.

6. Numerical analysis

The simulations are done for the n-bit DJ problem, with n increasing from 1 to 5. In the simulations, we take ${{N}_{{\rm pop}}}=10$ for all n ⁸ . The results are given in figure 3(a), where we present the averaged best fitness ${{\bar{\xi }}_{{\rm best}}}$ , sampling 1000 trials. It is clearly observed that ${{\bar{\xi }}_{{\rm best}}}$ approaches 1 as the iteration proceeds. There is just one stage required for all n. This implies that our simulator can faithfully learn a single-query quantum algorithm for the DJ problem, showing $\xi \simeq 1$ . It is also notable that the algorithms found are equivalent to, but not exactly equal to, the original DJ algorithm: the ${{\hat{U}}_{1}}$ and ${{\hat{U}}_{3}}$ found are always different, but constitute an algorithm solving the DJ problem (see appendix B).

**Figure 3.** (a) Averaged best fitness ${{\bar{\xi }}_{{\rm best}}}$ versus iteration r. Each data value is averaged over 1000 simulations. It is observed that ${{\bar{\xi }}_{{\rm best}}}$ approaches unity upon iterating. (b) Learning probability P(r) for the halting condition ${{\xi }_{{\rm best}}}\geqslant 0.99$ , when sampling 1000 trials. P(r) is well-fitted by an integrated Gaussian (black solid line), $G\left( r \right)=\int _{-\infty }^{r}{\rm d}r^{\prime} \;\rho \left( r^{\prime} \right)$ . (c) Probability density $\rho \left( r \right)$ resulting from P(r) for each n. (d) Graph of ${{r}_{c}}$ versus $\sqrt{D}$ , where D is the total number of the control parameters, and ${{r}_{c}}$ is the average number of iterations needed to complete the learning. The data are well-fitted linearly by ${{r}_{c}}=A\sqrt{D}+B$ with $A\simeq 43$ and $B\simeq -57$ .
Download figure:
Standard image High-resolution image

**Figure 3.** (a) Averaged best fitness ${{\bar{\xi }}_{{\rm best}}}$ versus iteration r. Each data value is averaged over 1000 simulations. It is observed that ${{\bar{\xi }}_{{\rm best}}}$ approaches unity upon iterating. (b) Learning probability P(r) for the halting condition ${{\xi }_{{\rm best}}}\geqslant 0.99$ , when sampling 1000 trials. P(r) is well-fitted by an integrated Gaussian (black solid line), $G\left( r \right)=\int _{-\infty }^{r}{\rm d}r^{\prime} \;\rho \left( r^{\prime} \right)$ . (c) Probability density $\rho \left( r \right)$ resulting from P(r) for each n. (d) Graph of ${{r}_{c}}$ versus $\sqrt{D}$ , where D is the total number of the control parameters, and ${{r}_{c}}$ is the average number of iterations needed to complete the learning. The data are well-fitted linearly by ${{r}_{c}}=A\sqrt{D}+B$ with $A\simeq 43$ and $B\simeq -57$ .
Download figure:
Standard image High-resolution image

Then we present a learning probability P(r), defined by the probability that the learning is completed before or at the rth iteration [33]. Here we assume a halting condition ${{\xi }_{{\rm best}}}\geqslant 0.99$ for finding a nearly deterministic algorithm. In figure 3(b), we present P(r) for all n, each of which is averaged over 1000 simulations. We find that P(r) is well-fitted with an integrated Gaussian

$\begin{eqnarray}&&G\left( r \right)=\int _{-\infty }^{r}{\rm d}r^{\prime} \;\rho \left( r^{\prime} \right),\end{eqnarray} \tag{ 11 }$

where the probability density $\rho \left( r \right)$ is a Gaussian function $\frac{1}{\sqrt{2\pi }\Delta r}{{{\rm e}}^{-\frac{{{\left( r-{{r}_{c}} \right)}^{2}}}{2\Delta {{r}^{2}}}}}$ . Here, ${{r}_{c}}$ is the average iteration number and $\Delta r$ is the standard deviation over the simulations, which characterize how many iterations are sufficient for a statistical accuracy of ${{\bar{\xi }}_{{\rm best}}}\geqslant 0.99$ . Note that we have finite values of ${{r}_{c}}$ and $\Delta r$ for all n. The probability density $\rho \left( r \right)$ is drawn in figure 3(c), resulting from P(r).

We also investigate the learning time. As we already pointed out, the learning time becomes an intriguing issue which may be related not only to the applicability of our algorithm to large-scale problems but also to the learning speedup. Regarding ${{r}_{c}}$ as a learning time, we present a graph of ${{r}_{c}}$ versus $\sqrt{D}$ in figure 3(d). Remarkably, the data are well-fitted linearly with ${{r}_{c}}=A\sqrt{D}+B$ with $A\simeq 43$ and $B\simeq -57$ . This means that the learning time is proportional to the square root of the size of the parameter space⁹ . This contrasts with the typical exponential tendency for classical machine learning (see, for example, [34, 35] and the references therein).

7. Summary and remarks

We have presented a method for quantum algorithm design based on machine learning. The simulator that we have used is a quantum–classical hybrid, where the quantum student is being taught by a classical teacher. We discussed such hybridization being beneficial in terms of the usefulness and the implementation cost. Our simulator was applicable in designing an oracle-based quantum algorithm. We demonstrated, as a case study, that our simulator can faithfully learn a single-query quantum algorithm that solves the DJ problem, even though it does not have to. The algorithms found are equivalent, but not exactly equal, to the original DJ algorithm with fitness $\simeq 1$ .

We also investigated the learning time, as this would become increasingly important in application, not only due to the large-scale problems often arising in machine learning but also because, in its learning, our simulator potentially exhibits the quantum speedup, if any, of an algorithm to be found. In the investigation, we observed that the learning time is proportional to the square root of the size of the parameter space, instead of showing the exponential dependence of classical machine learning. This result is very suggestive. We expect our simulator to reflect the quantum speedup of the algorithm found in its learning, possibly in synergy with the finding from [14] that for quantum algorithms, the size of the parameter space can be significantly smaller than that for their classical counterparts: not only does their learning time scale more favorably with the size of the space, but also this size is smaller to begin with.

We hope that the proposed method will help in designing quantum algorithms, and provide an insight into learning speedup, establishing a link between the learning time and the quantum speedup of the algorithms found. However, it is an open question whether one would observe more improved behaviors in quantum algorithm design on employing a quantum feedback rather than the classical feedback.

Acknowledgments

JB would like to thank M Żukowski, H J Briegel, and B C Sanders for discussions and comments. We acknowledge the financial support of National Research Foundation of Korea (NRF) grants funded by the Korea government (MEST; No. 2010-0018295 and No. 2010-0015059). JR and MP were supported by the Foundation for Polish Science TEAM project cofinanced by the EU European Regional Development Fund. JR was also supported by NCBiR-CHIST-ERA Project QUASAR. MP was also supported by the UK EP-SRC and ERC grant QOLAPS.

Appendix A.: Quantum oracle operation

As described in the main text, one could consider two different issues in designing a certain type of quantum algorithm. The first is determining a specific form of quantum oracle operation, and the second is finding another incorporating operations to maximize the quantum advantages. Although we focused on the latter in the current work, it is also necessary to inquire into the question of what kind of quantum oracle is best fitted for our learning simulator in figure 2, practically.

Dealing with the quantum oracle is a twofold task: defining the appropriate query function Q and encoding its output q for the oracle operation. The query function Q maps available inputs ${{x}_{i}}$ of the problem to certain accessible values ${{q}_{{{x}_{i}}}}$ , $Q:{{x}_{i}}\mapsto {{q}_{{{x}_{i}}}}$ ( $i=1,2,\ldots ,K$ ). Here we clarify that Q is evaluated classically, and independently of the construction of the oracle operation. The finite input set $\left\{ {{x}_{i}} \right\}$ ( $i=1,2,\ldots ,K$ ) and the query function Q are determined prior to learning, as mentioned in section 2.

Let us now consider a general process for oracle operation, such that

$\begin{eqnarray}&&\left| j \right\rangle \left| {{x}_{i}} \right\rangle \to {{{\rm e}}^{{\rm i}\pi {{\varphi }_{{{x}_{i}}}}}}\left| j\oplus {{g}_{{{x}_{i}}}} \right\rangle \left| {{x}_{i}} \right\rangle ,\end{eqnarray} \tag{ A.1 }$

where $\left| j \right\rangle$ is a computational basis, and $\left| {{x}_{i}} \right\rangle$ is a quantum state of an input ${{x}_{i}}$ . Here, ${{\varphi }_{{{x}_{i}}}}$ and ${{g}_{{{x}_{i}}}}$ are controllable parameters depending on ${{x}_{i}}$ . We then determine a specific form of oracle operation by choosing either $\left( {{\varphi }_{{{x}_{i}}}}=0 \right)\wedge \left( {{g}_{{{x}_{i}}}}={{q}_{{{x}_{i}}}} \right)$ or $\left( {{\varphi }_{{{x}_{i}}}}={{q}_{{{x}_{i}}}} \right)\wedge \left( {{g}_{{{x}_{i}}}}=0 \right)$ . These two types of oracle are equivalent, in the sense that they are independent of the query function Q, and can be converted into each other without any alteration of the complexity of the algorithm found [36]. In this work, we considered the latter type of oracle operation, as it is more economical in the sense that the query function is encoded into the phase without any additional system.

Appendix B.: The variants of the original one-bit Deutsch–Jozsa algorithm

In this appendix, we discuss the original Deutsch–Jozsa algorithm and its variants for the simple case n = 1 [37]. In such a case, the learning part of our simulator consists of two single-qubit unitary operations ${{\hat{U}}_{k}}$ (k = 1,3) and one oracle operation ${{\hat{U}}_{x}}$ , as in equation (4). Here it is convenient to rewrite any single-qubit unitary operation ${{\hat{U}}_{k}}$ as

$\begin{eqnarray}&&{{\hat{U}}_{k}}\left( {\bf p} \right)={\rm exp} \left( -{\rm i}{\bf p}_{k}^{{\rm T}}{\boldsymbol{ \sigma }} \right)={\rm cos} {{\Theta }_{k}}\widehat{11}-i{\rm sin} {{\Theta }_{k}}\left( {\bf n}_{k}^{{\rm T}}{\boldsymbol{ \sigma }} \right),\end{eqnarray} \tag{ B.1 }$

where ${{{\bf p}}_{k}}={{\left( {{p}_{k,x}},{{p}_{k,y}},{{p}_{k,z}} \right)}^{{\rm T}}}$ is a three-dimensional real vector, and ${\boldsymbol{ \sigma }} ={{\left( {{{\hat{\sigma }}}_{x}},{{{\hat{\sigma }}}_{y}},{{{\hat{\sigma }}}_{z}} \right)}^{{\rm T}}}$ is nothing but the vector of Pauli operators. Here, ${{\Theta }_{k}}$ is given as the Euclidean vector norm of ${{{\bf p}}_{k}}$ , i.e., ${{\Theta }_{k}}=\parallel {{{\bf p}}_{k}}\parallel ={{\left( {\bf p}_{k}^{{\rm T}}{{{\bf p}}_{k}} \right)}^{\frac{1}{2}}}$ , and ${{{\bf n}}_{k}}=\frac{{{{\bf p}}_{k}}}{\parallel {{{\bf p}}_{k}}\parallel }$ is a normalized vector. All pure states are characterized as points on the surface of a unit sphere, called the 'Bloch sphere', and ${{\hat{U}}_{k}}$ rotates a pure state (i.e., a point on the Bloch sphere) by the angle $2{{\Theta }_{k}}$ around the axis ${{{\bf n}}_{k}}$ . Such a geometric description is convenient for describing the unitary processes.

We now turn to the one-bit DJ algorithm ${{\hat{U}}_{1}}$ – ${{\hat{U}}_{x}}$ – ${{\hat{U}}_{3}}$ which consists of three operation steps. Firstly, the unitary ${{\hat{U}}_{1}}$ rotates the initial state $\left| 0 \right\rangle$ to a state on the equator of the Bloch sphere, i.e., $\frac{1}{\sqrt{2}}\left( \left| 0 \right\rangle +{{{\rm e}}^{{\rm i}\phi }}\left| 1 \right\rangle \right)$ , where ϕ is an arbitrary phase factor. The oracle ${{\hat{U}}_{x}}$ then flips the state to the antipodal side if ${{x}_{i}}$ is balanced, and leaves it unchanged if ${{x}_{i}}$ is constant. The last unitary ${{\hat{U}}_{3}}$ transforms the incoming state to the corresponding output,

$\begin{eqnarray}\left| {{\Psi }_{{\rm out}}}\left( {{x}_{i}} \right) \right\rangle =\left\{ \begin{array}{ccccccccccccccc} \pm \left| 0 \right\rangle , & {\rm if}\;\;{{{\rm x}}_{i}}\;\;{\rm is}\;\;{\rm constant}, \\ \pm \left| 1 \right\rangle , & {\rm if}\;\;{{{\rm x}}_{i}}\;\;{\rm is}\;\;{\rm balanced}. \\ \end{array} \right.\end{eqnarray} \tag{ B.2 }$

Noting that the Hadamard operation $\hat{H}$ is π-rotation about the axis ${\bf n}={{(1/\sqrt{2},0,1/\sqrt{2})}^{{\rm T}}}$ , it is easily checked that the phase ϕ is given as zero in the original DJ algorithm. On the basis of such a description, we can infer that there are numerous sets $\left\{ \left( {{\Theta }_{k}},{{{\bf n}}_{k}} \right) \right\}$ (k = 1,3) leading the initial state $\left| 0 \right\rangle$ to the desired output $\left| {{\Psi }_{{\rm out}}}\left( {{x}_{i}} \right) \right\rangle$ as in equation (B.2). Thus, many variants of the original DJ algorithm exist. As an example, we give ${{\hat{U}}_{1}}$ and ${{\hat{U}}_{3}}$ found in our simulator as follows:

$\begin{eqnarray}\begin{array}{rcl} {{{\hat{U}}}_{1}} & \simeq & \left( \begin{array}{ccccccccccccccc} 0.348+0.612i & 0.631-0.325i \\ -0.631-0.325i & 0.348-0.612i \\ \end{array} \right), \\ {{{\hat{U}}}_{3}} & \simeq & \left( \begin{array}{ccccccccccccccc} -0.360-0.609i & -0.031+0.706i \\ 0.031+0.706i & -0.360+0.609i \\ \end{array} \right), \\ \end{array}\end{eqnarray} \tag{ B.3 }$

with

$\begin{eqnarray}\left\{ \begin{array}{ccccccccccccccc} {{\Theta }_{1}}\simeq 0.552\pi , & {{{\bf n}}_{1}}\simeq {{\left( -0.243,0.847,-0.472 \right)}^{{\rm T}}}, \\ {{\Theta }_{3}}\simeq 0.476\pi , & {{{\bf n}}_{3}}\simeq {{\left( 0.043,-0.531,0.846 \right)}^{{\rm T}}}. \\ \end{array} \right.\end{eqnarray} \tag{ B.4 }$

The algorithm constructed with the above ${{\hat{U}}_{1}}$ and ${{\hat{U}}_{2}}$ runs as

$\begin{eqnarray}&&\begin{array}{l} \left| 0 \right\rangle \mathop{\longrightarrow }\limits^{{{{\hat{U}}}_{1}}}\left( \begin{array}{ccccccccccccccc} 0.704 \\ -0.710{{{\rm e}}^{0.18\pi }} \\ \end{array} \right)\mathop{\longrightarrow }\limits^{{{{\hat{U}}}_{x}}}\left( \begin{array}{ccccccccccccccc} 0.704 \\ -0.710{{{\rm e}}^{0.18\pi }} \\ \end{array} \right)\mathop{\longrightarrow }\limits^{{{{\hat{U}}}_{3}}}\left| {{\psi }_{{\rm out}}} \right\rangle \simeq \left| 0 \right\rangle , & \quad {\rm if}\quad {{x}_{i}}\in C, \\ \left| 0 \right\rangle \mathop{\longrightarrow }\limits^{{{{\hat{U}}}_{1}}}\left( \begin{array}{ccccccccccccccc} 0.704 \\ -0.710{{{\rm e}}^{0.18\pi }} \\ \end{array} \right)\mathop{\longrightarrow }\limits^{{{{\hat{U}}}_{x}}}\left( \begin{array}{ccccccccccccccc} 0.704 \\ 0.710{{{\rm e}}^{0.18\pi }} \\ \end{array} \right)\mathop{\longrightarrow }\limits^{{{{\hat{U}}}_{3}}}\left| {{\psi }_{{\rm out}}} \right\rangle \simeq \left| 1 \right\rangle , & {\rm if}\quad {{x}_{i}}\in B. \\ \end{array}\end{eqnarray} \tag{ B.5 }$

This algorithm is not exactly equal to, but equivalent to, the original one-bit DJ algorithm.

A strategy for quantum algorithm design assisted by machine learning

Article metrics

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. The basic architecture of the learning simulator

3. Construction of the learning simulator

4. Application to the Deutsch–Jozsa problem

5. The learning algorithm for differential evolution

6. Numerical analysis

7. Summary and remarks

Acknowledgments

Appendix A.: Quantum oracle operation

Appendix B.: The variants of the original one-bit Deutsch–Jozsa algorithm

Footnotes

A strategy for quantum algorithm design assisted by machine learning

Article metrics

Share this article

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. The basic architecture of the learning simulator

3. Construction of the learning simulator

4. Application to the Deutsch–Jozsa problem

5. The learning algorithm for differential evolution

6. Numerical analysis

7. Summary and remarks

Acknowledgments

Appendix A.: Quantum oracle operation

Appendix B.: The variants of the original one-bit Deutsch–Jozsa algorithm

Footnotes