Quantum implementation of an artificial feed-forward neural network

Francesco Tacchino; Panagiotis Barkoutsos; Chiara Macchiavello; Ivano Tavernelli; Dario Gerace; Daniele Bajoni

doi:10.1088/2058-9565/abb8e4

1. Introduction

The field of artificial intelligence was revolutionized by moving from the simple, single layer perceptron design [1] to that of a complete feed-forward neural network (ffNN), constituted by several neurons organized in multiple successive layers [2, 3]. In such artificial neural network designs each constituent neuron receives, as inputs, the outputs (activations) from the neurons in the preceding layer. The advantage of ffNNs with respect to simpler designs such as single layer perceptrons or support vector machines is that they can be used to classify data with relations that cannot be reduced to a separating hyperplane [4]. The present ubiquitous use of artificial intelligence in a wide variety of tasks, ranging from pattern or spoken language recognition to the analysis of large data sets, is mostly due to the discovery that such feed-forward networks can be trained by using well established optimization algorithms [2–4].

Quantum computers hold promise to achieve some form of computing advantage over classical counterparts in the not-so-far future [5]. Indeed, quantum computing has been theoretically shown to offer potentially exponential speedups over traditional computing machines, especially in tasks such as large number factoring, solving linear systems of equations, and data classification [6–10]. More recently, quantum computers have been applied to the fields of artificial intelligence and biomimetics [11–15], and implementations of artificial neurons [16–19] and support vector machines [20, 21] on real quantum processors, even if limited to simple systems at present, have shown a promising route toward a practical realization of such advantage.

In order to harness the full potentialities that quantum computing may offer to the field of artificial intelligence it is necessary to undergo the passage from single layered to deep ffNNs [22–24], which has so greatly expanded the capabilities of artificially intelligent systems to date. Here we propose the architecture of a quantum ffNN and we test it on a state-of-the art 20-qubit IBM Quantum processor. We start from a hybrid approach combining quantum nodes with classical information feed-forward, obtained via classical control of unitary transformations on qubits. This design realizes a fully general implementation of a ffNN on a quantum processor assisted by classical registers. A minimal three-node example, specifically designed to carry out a pattern recognition task exceeding the capabilities of a single artificial neuron, is used for a proof-of-principle demonstration on real quantum hardware. We then describe and successfully implement on a seven-qubit register an equivalent fully quantum coherent configuration of the same set-up, which does not involve classical control of the feed-forward links and thus potentially opens the way to the exploration of more complex and classically inaccessible regimes.

The proposed quantum implementation of ffNN can be viewed as a first step toward a viable integration of near-term quantum technologies [25] into machine learning applications: in particular, the hybrid nature of the ffNN itself suggests a seamless combination with existing classical structures and algorithms for neural network computation [26].

2. Design of the hybrid feed-forward neural network

In this section, we outline the general structure of our proposed hybrid ffNN, including a short description of the working principles of single nodes and a more detailed discussion of layer-to-layer connections. While, for the sake of clarity, we will often refer to a specific minimal example with three nodes and two layers, the overall scheme can be generalized to arbitrary feed-forward networks.

2.1. Individual nodes

A ffNN is essentially composed of a set of individual nodes {n_i}, or artificial neurons, arranged in a sequence of successive layers {L_j}. Information flows through the network in a well defined direction from the input to the output layer, traveling through neuron–neuron connections (i.e. artificial synapses). Each node performs an elementary non-linear operation on the incoming data, whose result is then passed on to one or more nodes in the successive layer.

In their simplest form, individual nodes can be designed to analyze binary-valued inputs. The artificial neurons that we consider here are based on the well known perceptron model [1]: such computational units analyze information by combining input ( $\overrightarrow {i}$ ) and weight ( $\overrightarrow {w}$ ) vectors, providing an activation response that depends on their scalar product $\overrightarrow {i}\cdot \overrightarrow {w}$ . In our case, input and weight vectors are assumed to be binary-valued m-dimensional arrays [27], i.e.

$\begin{equation} \overrightarrow {i}=\left(\begin{matrix}\hfill {i}_{0}\hfill \\ \hfill {i}_{1}\hfill \\ \hfill {\vdots}\hfill \\ \hfill {i}_{m-1}\hfill \end{matrix}\right),\qquad \overrightarrow {w}=\left(\begin{matrix}\hfill {w}_{0}\hfill \\ \hfill {w}_{1}\hfill \\ \hfill {\vdots}\hfill \\ \hfill {w}_{m-1}\hfill \end{matrix}\right)\end{equation} \tag{ 1 }$

where i_k, w_k ∈ {−1, 1} ∀ k ∈ [0, m − 1]. The activity of a binary artificial neuron can be implemented on a quantum register of N = log ₂(m) qubits [19] by considering the quantum states

$\begin{equation}\vert {\psi }_{i}\rangle =\frac{1}{\sqrt{m}}\sum _{j=0}^{m-1}{i}_{j}\vert j\rangle ,\qquad \vert {\psi }_{w}\rangle =\frac{1}{\sqrt{m}}\sum _{j=0}^{m-1}{w}_{j}\vert j\rangle .\end{equation} \tag{ 2 }$

These encode the corresponding input and weight vectors by effectively exploiting the exponential size of the Hilbert space associated to the quantum register in use. The states of the form presented in equation (2) are real equally-weighted (REW) superpositions of all the computational basis states |j⟩ ∈ {|0...00⟩; |0...01⟩; ..., |1...11⟩}. The quantum procedure carrying out the perceptron-like computation for single artificial neurons can be summarized in three steps [19]. First, assuming that the N-qubits quantum register is initially in the idle configuration, |0⟩^⊗N, we prepare the quantum state encoding the input vector with a unitary operation U_i such that |ψ_i⟩ = U_i|0⟩^⊗N. We then apply the weight factors of vector $\overrightarrow {w}$ on the input state by implementing another unitary transformation, U_w, subject to the constraint |1⟩^⊗N = U_w|ψ_w⟩. An optimized yet exact implementation of U_i and U_w exploits the close relationship between REW quantum states and the class of hypergraph states [19, 28], achieving in the worst case an overall computational complexity which is linear in the size of the classical input, i.e. O(m). A detailed description of the construction of the U_i and U_w transformations and the corresponding quantum circuits for arbitrary $\overrightarrow {i}$ and $\overrightarrow {w}$ is provided in the supplementary information (https://stacks.iop.org/QST/05/044010/mmedia), for completeness. After the two unitaries have been performed, it is easily seen that the state of the quantum register is

$\begin{equation}\vert {\phi }_{i,w}\rangle ={\mathrm{U}}_{w}{\mathrm{U}}_{i}\vert {0\rangle }^{\otimes N}=\sum _{j=0}^{m-1}{c}_{j}\vert j\rangle \end{equation} \tag{ 3 }$

where ${c}_{m-1}=\langle {\psi }_{w}\vert {\psi }_{i}\rangle =\left(1/m\right) \overrightarrow {i}\cdot \overrightarrow {w}$ . Finally, the non-linear activation of the single artificial neuron can be implemented by performing a multi-controlled NOT gate [6] between the encoding register and an ancilla initialized in the initial state |0⟩

$\begin{equation}\vert {\phi }_{i,w}\rangle \vert {0\rangle }_{a}\to \sum _{j=0}^{m-2}{c}_{j}\vert j\rangle \vert {0\rangle }_{a}+{c}_{m-1}\vert m-1\rangle \vert {1\rangle }_{a}\end{equation} \tag{ 4 }$

followed by a final measure of the ancilla in the computational basis. Hence, the output of the quantum artificial neuron is found in the active state |1⟩_a with probability p(1) = |c_m−1|².

2.2. Information feed-forward

When several copies of the quantum register implementing the artificial neuron model outlined above work in parallel, the respective ancillae, and the result of the measurements performed on them, can be used to feed-forward the information about the input-weight processing to a successive layer. Indeed, let us suppose that a layer L_j contains ℓ_j independent nodes, ${\left\{{n}_{kj}\right\}}_{k=1}^{{\ell }_{j}}$ , each of them characterized by a weight vector ${ \overrightarrow {w}}_{kj}$ : in one cycle of operation, every node is provided with a classical input ${ \overrightarrow {i}}_{kj}$ (either coming from layer L_j−1 or directly from the original data set to be analyzed) and, upon measurement, it outputs an activation state a_kj ∈ {1, 0}, chosen according to a probability ${p}_{kj}\left({a}_{kj}=1\right)\propto \vert { \overrightarrow {i}}_{kj}\cdot { \overrightarrow {w}}_{kj}{\vert }^{2}$ . Assuming for simplicity that the hth neuron n_h(j+1) belonging to the L_j+1 layer collects the outputs of all {n_kj} nodes, the corresponding binary classical input can be constructed as

$\begin{equation}{ \overrightarrow {i}}_{h\left(j+1\right)}=\left(\begin{matrix}\hfill {\left(-1\right)}^{{a}_{1j}}\hfill \\ \hfill {\left(-1\right)}^{{a}_{2j}}\hfill \\ \hfill {\vdots}\hfill \\ \hfill {\left(-1\right)}^{{a}_{{\ell }_{j}j}}\hfill \end{matrix}\right).\end{equation} \tag{ 5 }$

Such new input vector can then be used to parametrize the appropriate U_i transformation for the n_h(j+1) node. The overall computation can then be constructed by iteratively alternating the unitary quantum computation carried out by single layers with non-linear measurement and feed-forward stages. Notice that the design is totally general in terms of the number of nodes in each layer, the number of connections and the size of the various inputs to individual nodes. Moreover, as the information is formally transferred in the form of classical bits, the same input can easily be manipulated, e.g., by making classical copies to be fed to independent nodes sharing similar connections to the previous layer. An abstract representation of the proposed architecture is shown in figure 1.

**Figure 1.** Abstract architecture of a hybrid ffNN. Each layer L_j contains an arbitrary number of nodes {n_kj}, which can individually be implemented on a quantum hardware. Upon measurement, information about the activation state of a layer is passed to the following one (L_j+1) in the form of classical bits controlling quantum operations. Full connectivity between nodes in successive layers is schematically shown, although sparser networks are also possible in principle. The dashed line represents classical inputs from a generic preceding stage, which can be, e.g., a collection of layers up to L_j−1 or the original input information.
Download figure:
Standard image High-resolution image

From the technical point of view, a very natural implementation of the hybrid ffNN architecture onto a quantum processor makes use of classically controlled quantum gates. Independent quantum nodes within the same layer can either be implemented in different quantum registers, and thus computed simultaneously, or run on the same set of qubits, after proper re-initialization and by storing all the observed activation states in different positions of a classical memory register. However, it must also be noted that the classical manipulation of information between layers might limit the role and quantitative advantages brought by the use of quantum mechanical operations, reducing the hybrid architecture to a sequence of shallow quantum convolutional filters. It is therefore interesting to consider, as we will do in section 3, a different scenario in which the network is operated in a fully coherent way.

2.3. Example: pattern recognition

The working principles of our proposed hybrid ffNN, including the above technical details, are actually best clarified by describing an explicit example tailored to solve a well defined elementary classification problem. This will also set the stage for the experimental proof-of-principle demonstration on an actual superconducting quantum hardware to be presented in the next section. First, let us recall that binary input and weight vectors can be visually interpreted as images containing black or white square pixels [19]: a natural encoding scheme associates, e.g., a white spot to a i_j(w_j) = −1 entry in the corresponding input (weight) vector, as shown explicitly in figure 2(a) for the hidden (m = 4, i.e. 2 × 2 pixel images) and output (m = 2, i.e. 2 × 1 pixel images) layers of a minimal ffNN. Moreover, we can identify any such binary pattern with a unique integer label by considering the equivalent decimal representation of the binary number ${\mathtt{b}}_{3}{\mathtt{b}}_{2}{\mathtt{b}}_{1}{\mathtt{b}}_{0}$ where ${b}_{k}={\left(-1\right)}^{{\mathtt{b}}_{k}},\enspace {\mathtt{b}}_{k}\in \left\{0,1\right\}$ . The task that we set out to solve with our example ffNN is the following: the network should be able to recognize (i.e., give a positive output activation with sufficiently large probability) whether there exist straight lines in 2 × 2 pixel images, regardless of the fact that the lines are horizontal or vertical. All the other possible input images should be classified as negative. Notice that, as the data vectors encoding horizontal and vertical lines are orthogonal to each other, there is no single hyperplane separating the four positive states from all other possible input images: therefore, the desired classification cannot be carried out by a single node accepting four-bit inputs. This behavior of quantum artificial neurons differs from their usual classical counterparts, which cannot correctly classify sets containing opposite vectors [4]. More explicitly, given an input vector ${ \overrightarrow {v}}_{1}$ and a weight vector $\overrightarrow {w}$ , a single quantum neuron would output a value proportional to $\vert { \overrightarrow {v}}_{1}\cdot \overrightarrow {w}{\vert }^{2}$ , i.e. cos² θ, where θ is the angle formed by the two vectors. If we take a second input vector ${ \overrightarrow {v}}_{2}\perp { \overrightarrow {v}}_{1}$ , the output would be upper bounded by sin² θ. As the set of patterns that should yield a positive result includes vectors that are orthogonal (those representing horizontal lines are orthogonal to those representing vertical lines) and vectors that are opposite (for instance, the vector corresponding to a vertical line on the left column of a 2 × 2 pixel image is opposite to the vector corresponding to a vertical line on the right column), it is therefore impossible to find a weight $\overrightarrow {w}$ capable of yielding an output activation larger than 0.5 for all targets in the configuration space. We hereby show that a simple three-node network can accomplish the desired computation. A scheme of such an elementary ffNN is shown in figure 2(a), where the circles indicate individual artificial neurons, and the vectors ${ \overrightarrow {w}}_{i}$ refer to their respective weights. The network features a single hidden layer and a single binary output neuron. On a conceptual level, the working principle of the network can be interpreted as follows: with the a priori choice of weights represented in figure 2(a), the top quantum neuron of the hidden layer outputs a high activation if the input vector has vertical lines, while the bottom neuron does the same for the case of horizontal lines. The output neuron in the last layer then recognizes whether one of the neurons in the hidden layer has given a positive outcome.

**Figure 2.** Three-node ffNN for patter recognition. (a) The minimal example of a ffNN that we analyze in this study accepts four classical binary inputs and features one hidden layer containing two artificial neurons plus one output layer made of a single neuron. Next to each neuron, the ideal shape of the weight vectors achieving the desired recognition of horizontal and vertical lines is shown. The corresponding encoding scheme in terms of black and white pixels is also reported for a generic input/weight binary vector $\overrightarrow {b}=\left({b}_{0},\dots ,{b}_{m-1}\right)$ . (b) Ideal results for the classification of 2 × 2 pixel images. Notice that the target patterns, corresponding to integer labels 12, 3 (horizontal), 10 and 5 (vertical) all have p^out = 1, while all others have p^out < 0.5 (threshold shown in red as a dashed line).
Download figure:
Standard image High-resolution image

$ \overrightarrow {b}=\left({b}_{0},\dots ,{b}_{m-1}\right)$ — **Figure 2.** Three-node ffNN for patter recognition. (a) The minimal example of a ffNN that we analyze in this study accepts four classical binary inputs and features one hidden layer containing two artificial neurons plus one output layer made of a single neuron. Next to each neuron, the ideal shape of the weight vectors achieving the desired recognition of horizontal and vertical lines is shown. The corresponding encoding scheme in terms of black and white pixels is also reported for a generic input/weight binary vector $\overrightarrow {b}=\left({b}_{0},\dots ,{b}_{m-1}\right)$ . (b) Ideal results for the classification of 2 × 2 pixel images. Notice that the target patterns, corresponding to integer labels 12, 3 (horizontal), 10 and 5 (vertical) all have p^out = 1, while all others have p^out < 0.5 (threshold shown in red as a dashed line).
Download figure:
Standard image High-resolution image

A possible quantum circuit description of the ffNN introduced above, including the classical feed-forward stage between the hidden and the output layer, is provided in figure 3(a). We assume that each neuron within the hidden layer can accept four-bit inputs, such that each quantum neuron can be represented on a two-qubit encoding register plus an ancilla qubit (i.e., m = 4 and N = 2 in this case). At the same time, the output neuron takes two-dimensional inputs coming from the previous layer and provides the global activation state of the network, thus requiring a single qubit (m = 2, N = 1) to be encoded. Classical bits are also included to store the intermediate and final results.

**Figure 3.** Circuit implementation of a ffNN. (a) Hybrid realization of the feed-forward architecture introduced in figure 2 via classical control. (b) Equivalent quantum coherent version using quantum controlled operations.
Download figure:
Standard image High-resolution image

Let us call n₁ and n₂ the two hidden nodes, which actually accept the same classical input but process it in two different ways. As described at the beginning of this section, each artificial neuron will independently provide, upon measurement, an activation pattern a_k ∈ {0, 1} (for k = 1, 2), which can be stored in a classical bit b_k. We denote p_k the probability of actually observing a value a_k = 1 from the kth neuron. When such measurement is performed, we set b_k = a_k: as a result, the state of the classical two-bit register after the quantum computation in the hidden layer has been completed is one of the following

$\begin{equation}\left[{b}_{1},{b}_{2}\right]=\begin{cases}\left[0,0\right]\quad \hfill \\ \left[0,1\right]\quad \hfill \\ \left[1,0\right]\quad \hfill \\ \left[1,1\right]\quad \hfill \end{cases}\end{equation} \tag{ 6 }$

with the probability

$\begin{equation}p\left(\left[{b}_{1},{b}_{2}\right]\right)=\begin{cases}\left(1-{p}_{1}\right)\left(1-{p}_{2}\right)\quad \hfill \\ \left(1-{p}_{1}\right){p}_{2}\quad \hfill \\ {p}_{1}\left(1-{p}_{2}\right)\quad \hfill \\ {p}_{1}{p}_{2},\hfill \end{cases}\end{equation} \tag{ 7 }$

respectively. It is easy to see that feed-forwarding the information contained in the classical register to the output neuron n₃ corresponds to providing it with one of the classical binary inputs ${ \overrightarrow {i}}_{{b}_{1}{b}_{2}}$ reading

$\begin{equation}\begin{aligned}\hfill { \overrightarrow {i}}_{00}=\left(\begin{matrix}\hfill 1\hfill \\ \hfill 1\hfill \end{matrix}\right),\qquad { \overrightarrow {i}}_{01}=\left(\begin{matrix}\hfill 1\hfill \\ \hfill -1\hfill \end{matrix}\right)\\ \hfill { \overrightarrow {i}}_{10}=\left(\begin{matrix}\hfill -1\hfill \\ \hfill 1\hfill \end{matrix}\right),\qquad { \overrightarrow {i}}_{11}=\left(\begin{matrix}\hfill -1\hfill \\ \hfill -1\hfill \end{matrix}\right).\end{aligned}\end{equation} \tag{ 8 }$

As shown in figure 3(a), a straightforward strategy for preparing the corresponding |ψ_i⟩ state on the single-qubit register representing n₃ is by first bringing it from the idle state |0⟩ to the superposition $\sqrt{2}\vert +\rangle =\vert 0\rangle +\vert 1\rangle$ via a Hadamard (H) gate, and then conditioning the application of two Z gates (each of them adds a −1 phase to the |1⟩ component, if applied) on the two classical bits [b₁, b₂]. The resulting quantum state will then be

$\begin{equation}\vert {{\psi }_{i}\rangle }_{{n}_{3}}=\frac{1}{\sqrt{2}}\left(\vert 0\rangle +{\left(-1\right)}^{{b}_{1}\oplus {b}_{2}}\vert 1\rangle \right)\end{equation} \tag{ 9 }$

where a ⊕ b here denotes the usual bit sum modulo 2. If we now choose, as shown in figure 2(a), a weight vector ${ \overrightarrow {w}}_{3}=\left(1,-1\right)$ we obtain (see supplementary information) ${\mathrm{U}}_{{w}_{3}}\equiv \mathrm{H}$ . Therefore, the final state of the third neuron reads

$\begin{equation}\vert {{\psi }_{\text{out}}\rangle }_{{n}_{3}}=\begin{cases}\vert 0\rangle \quad \hfill & \quad \text{if}\quad {b}_{1}\oplus {b}_{2}=0\hfill \\ \vert 1\rangle \quad \hfill & \quad \text{if}\quad {b}_{1}\oplus {b}_{2}=1\hfill \end{cases}.\end{equation} \tag{ 10 }$

The overall probability of observing an active state on the output neuron can thus be written, in general, as

$\begin{equation}{p}^{\text{out}}=\sum _{\left[{b}_{1},{b}_{2}\right]}p\left(\left[{b}_{1},{b}_{2}\right]\right)p\left({a}_{3}=1\vert \left[{b}_{1},{b}_{2}\right]\right)\end{equation} \tag{ 11 }$

where we employed the usual notation for conditional probabilities and

$\begin{equation}p\left({a}_{3}=1\vert \left[{b}_{1},{b}_{2}\right]\right)=\vert \langle 1\vert {{\psi }_{\text{out}}\rangle }_{{n}_{3}}{\vert }^{2}.\end{equation} \tag{ 12 }$

In our specific case, it is easy to see that, given equations (7) and (10), this reduces to

$\begin{equation}{p}^{\text{out}}={p}_{1}\left(1-{p}_{2}\right)+\left(1-{p}_{1}\right){p}_{2}.\end{equation} \tag{ 13 }$

Since in this elementary example n₃ is encoded in a single qubit, the final measurement can be performed directly without the need for an additional ancilla. In figure 2(b) we report the exact result for the convolution of equation (13): as it can be seen, the ffNN ideally outputs an active state with p^out = 1 for the target horizontal and vertical patterns, while p^out < 0.5 in all other cases.

Before moving forward, it is worth mentioning that the construction of a classically conditioned U_i can always be found also in more general cases, e.g. when the hidden layer contains more than two neurons. In particular, any node encoded on N qubits will be able to accept inputs from m = 2^N nodes in the previous layer: indeed, each output configuration from the latter will be one of the 2^m possible bit strings [b₁, ..., b_m] that can be used to uniquely identify one of the ${2}^{m}={2}^{{2}^{N}}$ possible input states, and thus to classically program its preparation.

3. Quantum coherent feed-forward

The hybrid feed-forward architecture described so far and realized in a minimal three-node two-layer example can also be reformulated in a fully quantum coherent way. As we will show below, and at difference with the hybrid quantum–classical solution, this version always requires all nodes to be implemented simultaneously on a dedicated quantum register, thus making the quantum computation more demanding. At the same time, however, it reduces the necessity to store and process classical bits during intermediate stages. Moreover, fully coherent quantum neural networks offer more opportunities to be actually used on current quantum processors, as it will be discussed in the final section.

In figure 3(b) we show a fully quantum coherent construction for the ffNN of figure 3(a). The fundamental reason for the actual equivalence lies in the well known principle of deferred measurement [6], stating that in a quantum circuit one can always move a measurement done at an intermediate stage to the end of the computation while replacing classically controlled operations (O) with quantum controlled ones:

Indeed, assuming that the nodes n₁ and n₂ are encoded in parallel and after the operations of the first layer (except for the measurements on the ancillae) have been performed, we can write the global state of the total (3 + 3+1)-qubit network as

$\begin{equation}\left({r}_{{n}_{1}}\vert {\varphi }_{{n}_{1}}\rangle \vert {0\rangle }_{{a}_{1}}+{c}_{m-1,{n}_{1}}\vert 1\dots {1\rangle }_{{n}_{1}}\vert {1\rangle }_{{a}_{1}}\right)\otimes \left({r}_{{n}_{2}}\vert {\varphi }_{{n}_{2}}\rangle \vert {0\rangle }_{{a}_{2}}+{c}_{m-1,{n}_{2}}\vert 1\dots {1\rangle }_{{n}_{2}}\vert {1\rangle }_{{a}_{2}}\right)\otimes \vert {0\rangle }_{{n}_{3}}\end{equation} \tag{ 14 }$

where ${r}_{{n}_{x}}={\left(1-{c}_{m-1,{n}_{x}}^{2}\right)}^{1/2}$ and $\vert {\varphi }_{{n}_{x}}\rangle$ contains, for each neuron, all the components other than the one leading to activation, see equation (4). Notice that, by construction, $\langle {\varphi }_{{n}_{x}}\vert 1\dots 1\rangle =0$ . In the meantime, the n₃ qubit is brought into the superposition $\sqrt{2}\vert +\rangle =\vert 0\rangle +\vert 1\rangle$ by applying a single-qubit Hadamard gate, H. Synapses can thereafter be implemented with two CZ gates, as represented in figure 3(b). The overall state of the quantum ffNN then becomes

$\begin{equation}\left({r}_{{n}_{1}}{r}_{{n}_{2}}\vert {R}_{{n}_{1}}\rangle \vert {R}_{{n}_{2}}\rangle +{c}_{{n}_{1}}{c}_{{n}_{2}}\vert {A}_{{n}_{1}}\rangle \vert {A}_{{n}_{2}}\rangle \right)\vert {+\rangle }_{{n}_{3}}+\left({r}_{{n}_{1}}{c}_{{n}_{2}}\vert {R}_{{n}_{1}}\rangle \vert {A}_{{n}_{2}}\rangle +{c}_{{n}_{1}}{r}_{{n}_{2}}\vert {A}_{{n}_{1}}\rangle \vert {R}_{{n}_{2}}\rangle \right)\vert {-\rangle }_{{n}_{3}}\end{equation} \tag{ 15 }$

where ${c}_{{n}_{x}}$ is a short-hand notation for ${c}_{m-1,{n}_{x}}$ , and the activated |A⟩ and rest |R⟩ states of n₁ and n₂ are explicitly given as

$\begin{equation}\begin{aligned}\hfill \vert {A}_{{n}_{x}}\rangle & =\vert 1\dots {1\rangle }_{{n}_{x}}\vert {1\rangle }_{{a}_{x}}\hfill \\ \hfill \vert {R}_{{n}_{x}}\rangle & =\vert {\varphi }_{{n}_{x}}\rangle \vert {0\rangle }_{{a}_{x}}.\hfill \end{aligned}\end{equation} \tag{ 16 }$

By applying ${\mathrm{U}}_{{w}_{3}}\equiv \mathrm{H}$ on n₃ we explicitly obtain an output state

$\begin{equation}\vert {\psi }_{\text{out}}\rangle =\left({r}_{{n}_{1}}{r}_{{n}_{2}}\vert {R}_{{n}_{1}}\rangle \vert {R}_{{n}_{2}}\rangle +{c}_{{n}_{1}}{c}_{{n}_{2}}\vert {A}_{{n}_{1}}\rangle \vert {A}_{{n}_{2}}\rangle \right)\vert {0\rangle }_{{n}_{3}}+\left({r}_{{n}_{1}}{c}_{{n}_{2}}\vert {R}_{{n}_{1}}\rangle \vert {A}_{{n}_{2}}\rangle +{c}_{{n}_{1}}{r}_{{n}_{2}}\vert {A}_{{n}_{1}}\rangle \vert {R}_{{n}_{2}}\rangle \right)\vert {1\rangle }_{{n}_{3}}.\end{equation} \tag{ 17 }$

It is straightforward to observe at this point that the neurons of the hidden layer can in principle be measured in an activation state [b₁, b₂] ∈ {[0, 0], [0, 1], [1, 0], [1, 1]} with probabilities

$\begin{equation}p\left(\left[{b}_{1},{b}_{2}\right]\right)=\begin{cases}\vert {r}_{{n}_{1}}{\vert }^{2}\vert {r}_{{n}_{2}}{\vert }^{2}=\left(1-{p}_{1}\right)\left(1-{p}_{2}\right)\quad \hfill \\ \vert {r}_{{n}_{1}}{\vert }^{2}\vert {c}_{{n}_{2}}{\vert }^{2}=\left(1-{p}_{1}\right){p}_{2}\quad \hfill \\ \vert {c}_{{n}_{1}}{\vert }^{2}\vert {r}_{{n}_{2}}{\vert }^{2}={p}_{1}\left(1-{p}_{2}\right)\quad \hfill \\ \vert {c}_{{n}_{1}}{\vert }^{2}\vert {c}_{{n}_{2}}{\vert }^{2}={p}_{1}{p}_{2},\hfill \end{cases}\end{equation} \tag{ 18 }$

which exactly correspond to the ones reported in equation (7). However, as long as we are interested only in the output state of the network, i.e. the activation state a₃ of n₃, there is no need to actually perform the final measurements on n₁ and n₂: similarly to equation (11), we can in fact simply discard the information contained in the variables pertaining to the hidden layer by performing a partial trace operation. This returns a density matrix for the output neuron

$\begin{equation}{\rho }_{{n}_{3}}={\mathrm{Tr}}_{\left\{{n}_{1},{n}_{2}\right\}}\left[\vert {\psi }_{\text{out}}\rangle \langle {\psi }_{\text{out}}\vert \right]=\left(\begin{matrix}\hfill 1-{p}^{\text{out}}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {p}^{\text{out}}\hfill \end{matrix}\right),\end{equation} \tag{ 19 }$

which automatically represents the convolution of the hidden nodes, see equation (13). It is worth noticing that the role of the partial trace operation has recently been recognized and extensively discussed in the literature as a possible ingredient for a more general theory of quantum neural networks [29, 30].

To conclude this section, we also point out that the conversion between the two modes of operation (hybrid vs coherent) of our proposed ffNN architecture goes beyond the specific example presented in this work. Indeed, as mentioned at the end of section 2.3, any feed-forward link between successive layers can in general be decomposed in terms of classically controlled operations. Whenever such construction is known, measurement deferral and partial traces can in principle always be employed to obtain the equivalent coherent network, namely by replacing all classical controls with their quantum counterparts and by measuring only the output layer.

4. Experimental realization on a superconducting quantum processor

We have implemented the ffNN introduced in figure 2 on a real superconducting quantum processor made available on cloud via the IBM Quantum Experience and programmed using the Qiskit python library [31]. Employing the same device, named Poughkeepsie, we realized both the hybrid (figure 3(a)) and the fully coherent (figure 3(b)) configurations, reporting in both cases a remarkable successful completion of all the desired classification tasks.

In figures 4(a) and (b) we show the results for the three-qubit simulation of nodes n₁ and n₂, respectively corresponding to the first and second set of three qubits in figure 3(a), from which the probabilities p₁ and p₂ can be estimated for all possible input vectors while assuming the weights ${ \overrightarrow {w}}_{1}$ and ${ \overrightarrow {w}}_{2}$ shown in figure 2(a). The comparison with ideal results simulated numerically shows an excellent qualitative agreement and a good quantitative match of the outcomes: in particular, notice that each individual node can successfully single out either vertical or horizontal lines, see patterns in figure 2(b) [19]. The agreement is naturally better for the simulation of all possible n₃ circuits, whose results are reported in figure 4(c): indeed, in this case the probability p(a₃ = 1|[b₁, b₂]) can be computed operating on a single qubit. The final outcomes (i.e. p^out) for the hybrid configuration of the ffNN, reported in figure 5(a), are then obtained by applying equation (11). The latter is used in place of, e.g., equation (13), in order to avoid introducing unnecessary assumptions or biases in the calculation and to take into account all possible sources of inaccuracy such as, for example, a non exactly zero outcome for p(a₃ = 1|[0, 0]).

**Figure 4.** Experimental realization of single nodes on quantum hardware. Single artificial neurons of the ffNN introduced in figure 3(a) implemented on the IBM Quantum Poughkeepsie superconducting processor and compared with ideal noiseless outcomes computed numerically with the Qiskit `qasm_simulator`. (a) Neuron n₁, recognizing horizontal inputs. (b) Neuron n₂, recognizing vertical inputs. (c) Neuron n₃, recognizing two-dimensional inputs with dissimilar entries. Error mitigation is applied to data for n₁ and n₂ (see supplementary information https://stacks.iop.org/QST/05/044010/mmedia).
Download figure:
Standard image High-resolution image

epsilon — **Figure 5.** Results for the quantum ffNN classifying horizontal and vertical lines. (a) Classification in the hybrid configuration, applying equation (11) to the (error mitigated) experimental outcomes of figure 4. (b) Classification in the coherent configuration obtained with a seven-qubit calculation on the IBM Quantum Poughkeepsie quantum processor (error mitigation is applied, see supplementary information). Despite some residual quantitative inaccuracy, all the target patterns are correctly recognized if a threshold = 0.5 (shown in red) is applied to the outcome probabilities both in the hybrid and the coherent versions.
Download figure:
Standard image High-resolution image

Finally, the experimental results for the fully coherent ffNN configuration are reported in figure 5(b). These were obtained by running the seven-qubit quantum circuit introduced in figure 3(b). As it can immediately be appreciated, the outcomes are in good agreement with the corresponding ones in the hybrid version of the ffNN. We stress that such comparison is made non-trivial from the experimental point of view by the fact that, in the fully coherent version, a register of 7 simultaneously active and typically entangled qubits is required. On the contrary, the hybrid solution only requires each individual node to be separately implemented on a three-qubit quantum register and, provided that the classical outcomes are conveniently stored, such quantum computations can be carried out in dedicated runs, thus avoiding, e.g., cross-talk effects. As in the hybrid case, and despite some residual quantitative inaccuracy in the estimation of the activation probabilities, all the possible inputs are classified correctly by the ffNN, with the target horizontal and vertical patterns singled out from all other patterns.

We also mention that raw data from the quantum processor, reported in the supplementary information https://stacks.iop.org/QST/05/044010/mmedia, already allow for an accurate classification in both hybrid and coherent configurations. However, the overall quality of the outcomes greatly benefits from the application of simple error mitigation techniques [32–35], as described in detail in the supplementary information.

5. Discussion

In this work we have presented an original architecture to build ffNNs on universal quantum computing hardware and demonstrated their use in near-term quantum devices. In particular, we have shown how successive layers constituted by artificial neurons and implemented on independent quantum registers can be either connected to each other via classical control operations, thus realizing a hybrid quantum–classical ffNN, or by fully coherent quantum synapses. The necessary degree of non-linearity is achieved in one case via explicit quantum measurements, in the other by a partial trace operation that effectively produces a convolution operation. We stress that our proposed procedure is hardware-independent, and it can thus be implemented on any quantum computing machine, in principle, e.g. based on either superconducting qubits [36], trapped-ions [37], or photonic integrated circuits [38, 39]. In particular, it is worth noticing that native multi-qubit operations, such as the typical Mølmer–Sørensen collective entangling gates [40] commonly implemented in trapped ions quantum hardware, could lead to more efficient exact or even approximate [41] realizations of multi-controlled operations that are ubiquitous in the proposed scheme.

We have successfully tested a three-node implementation of our algorithm applied to an elementary pattern classification task, both in the hybrid and fully coherent configurations. Such proof-of-principle demonstration was achieved on the IBM Quantum Poughkeepsie superconducting quantum processor by using up to 7 active qubits, and finding a substantial experimental agreement between the two proposed operating modes of the network. These results represent, to the best of our knowledge, one of the largest quantum neural network computation reported to date in terms of the total size of the actual quantum register. We also notice that the use of quantum artificial neurons as individual nodes gives the prospective advantage of an exponential gain in storage and processing ability: in turn, this confirms that hybrid quantum–classical neural networks could already be able to treat very large input vectors, beyond the capabilities of current systems. Such ability is becoming increasingly needed to handle, e.g., very large image files, sanitary data for public health, market data for financial applications, and the 'data deluge' expected from the Internet of Things. Moreover, the hybrid structure of our proposed ffNN could actually represent a relevant technical feature in the process of integrating quantum and classical processes for machine learning tasks: indeed, it could be envisioned that a few carefully distributed quantum nodes at the input of an otherwise classical network might act as a memory-efficient convolutional layer enabling the treatment of otherwise unmanageable sets of data.

A very natural extension of this work, and particularly of the fully coherent setup, would be an exploration of classically inaccessible regimes with no hybrid (i.e. classically controlled) counterpart. This could be achieved, e.g., by allowing more complex synapses, thus letting activation probabilities for all neurons feeding the same successive layer to interfere in a truly quantum coherent way, or by engineering non-trivial quantum correlations between quantum nodes already within the same layer. In addition to the large advantage in data treatment capacity, this could then also result in new functionalities, such as the ability to deploy complicated convolution filters impossible to be run on classical hardware.

Even further reaching consequences might be expected from the possibility to directly process quantum data instead of quantum-encoded classical information, for instance to search for patterns in the output of a quantum simulator or process quantum states coming from a quantum internet appliance. In these cases, the input would directly be given in the form of a wavefunction or a density matrix [30], without the resource cost associated to a classical input [41–43]. Indeed, it must be recalled that, in general, an exact implementation of the single-node operations U_i and U_w can require a computational cost scaling exponentially in the size of the quantum register. In this respect, it is worth mentioning that the class of quantum hypergraph states considered in this work can be seen as a subset of the more general family of locally maximally entanglable states (LMES) [44]. On one hand, such relationship directly suggests an extension to continuously-valued data [45]. On the other hand, the properties of the LMES could be leveraged in some cases to ease the problem of input preparation: in particular, whenever the required ±1 phase factors can be described as a degree-k polynomial function of the corresponding computational basis states, the corresponding |ψ_i⟩ can be constructed using only k-qubit operations. Moreover, the LMES can be physically obtained as unique fixed points of engineered dissipative processes or as ground states of frustration-free Hamiltonians [44].

Along parallel lines, the manipulation stage represented by U_w could most effectively be rephrased in terms of parametrized variational circuits [46]. Preliminary results [47] based on recently proposed quantum unsampling protocols [48] have shown that, leveraging the characteristic tensor product structure of the output state |1⟩^⊗N in an active quantum neuron register under the proposed encoding, an efficient variational implementation of U_w for a single node can be learned using only single-qubit local cost functions. Such adaptive quantum circuit implementation of U_w could also become an essential ingredient when tackling the problem of training the quantum neural network. While in the practical example shown in this work the weights were selected beforehand, and not discovered through an optimization process, it has been already shown that the nonlinearity intrinsically deriving from the measurement on the ancilla of each artificial neuron is sufficient to guarantee the required plasticity for training [19]. This means that the hybrid architecture for quantum artificial neural networks proposed in this work is fully compatible, in principle, with classical and most employed training algorithms such as the Newton–Raphson or the backpropagation method [4], respectively. The latter is certainly best applied in presence of continuously-valued input parameters: as an example, we provide in the supplementary information a numerical description of such procedure carried out on the proposed three-node ffNN, where non-binary weight values are allowed during the learning phase. However, such a procedure is not easily transferred into the fully coherent regime where, in principle, only the final output activation is observed and the hidden layers are discarded. At the same time, it is also true that severe limitations in the training of deep quantum neural networks have been recently discovered, mainly in the form of Barren plateaus [49–51], which may critically affect the scaling of many quantum ffNN architectures even under dissipative frameworks [52]. While in general the ffNN design proposed here, as well as its possible extensions in the variational and continuous variable regimes, may not be immune from such pathological phenomenology, some of the strategies that have been put forward to deal with this issue [50, 52] could be leveraged also in this case. In particular, layerwise approaches [53] could naturally be applied to the ffNN in the hybrid configuration. More in general, we point out that the modular structure of the proposed architecture, featuring subsets of clearly identifiable and potentially shallow nodes together with ancilla-based funnelling of information, may fit a recently discovered scheme [52] suggesting locality of the action of single perceptrons and of the associated cost function as a promising route for lifting the obstacles commonly associated with vanishing gradients.

In conclusion, we provide a clear-cut recipe to map classical ffNNs onto quantum processors, and our results suggest that the whole design may eventually benefit from paradigmatic quantum properties such as superposition and entanglement. This represents a necessary step toward the final goal of approaching quantum advantage in the operation and training of quantum neural network applications.

Acknowledgments

We thank M Fanizza and S Woerner for useful discussions. We acknowledge the University of Pavia Blue Sky Research project number BSR1732907. This research was also supported by the Italian Ministry of Education, University and Research (MIUR): 'Dipartimenti di Eccellenza Program (2018-2022)', Department of Physics, University of Pavia and PRIN Project INPhoPOL. IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. The current list of IBM trademarks is available at https://www.ibm.com/legal/copytrade.

Quantum implementation of an artificial feed-forward neural network

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction