Quantum Fourier networks for solving parametric PDEs

Nishant Jain; Jonas Landman; Natansh Mathur; Iordanis Kerenidis

doi:10.1088/2058-9565/ad42ce

1. Introduction

1.1. Fourier neural network

Solving partial differential equations (PDEs) has been a crucial step in understanding the dynamics of nature. They have been widely used to understand natural phenomena such as heat transfer, modelling the flow of fluids, electromagnetism, etc.

Each PDE is an equation, along with some initial conditions, for which the solution is a function f of space and time (x, t), for instance. A PDE family is determined by the equation itself, such as Burgers' equation or Navier–Stokes equation. An instance of a given PDE family is the aforementioned equation along with a specific initial condition, represented, for instance, as $f(x,t_0)$ . Modifying this initial condition leads to a new PDE instance and, therefore, to a new solution $f(x,t)$ . Note also that the solution is highly dependent on some physical parameters (i.e. viscosity in fluid dynamics).

In practical scenarios, a closed-form solution for most PDE families' instances is difficult to find. Therefore, classical solvers often rely on discretising the input space and performing many approximations to model the solution. A large number of computations for each PDE instance are required, depending on the chosen resolution of the input space.

Recently, considerable effort in research for approximating a PDE's solution is based on neural networks. The main idea is to let a neural network become the solution of the PDE by training it either for a fixed PDE instance or with various instances of a PDE family. The network is trained in a supervised way by trying to match the same solutions as the ones computed with classical solvers. The first attempts [1, 2] were aimed at finding the PDE's solution $f(x,t)$ for an input (x, t) given a specific initial condition (one PDE instance), and later [3–5] to a specific discretisation resolution for all instances of a PDE family. For the first case, once trained, the neural network can output solution function values at any resolution for the instance it was trained on. However, it has to be optimized for each instance (new initial condition) separately. In the latter case, the neural network can predict solution function values for any instance of the PDE family but for a fixed resolution on which it was trained.

A recent proposal named Fourier neural network [6] overcame these limitations and posed the problem as learning a function-to-function mapping for parametric PDEs. Parametric PDEs are families of PDEs for which the initial condition can be seen as parametric functions. Given any initial condition function of one such PDE family sampled at any resolution, the neural network can predict the solution function values at the sampled locations.

The input is usually the initial condition $f(x,t_0)$ itself. It is encoded as a vector of a certain length $N_\mathrm{s}$ by sampling it uniformly at $N_\mathrm{s}$ locations x of the input space, given some resolution. This input is also called an evaluation of the initial condition function. The number of samples $N_\mathrm{s}$ is key in analysing the computational complexity as it is the neural network input size. Note that sometimes the initial condition is also sampled for several times t as well. The output of the neural network is the corresponding PDE's solution $f(x,t)$ applied at all x sampled and for a fixed t. Experiments on widely popular PDEs showed that it was effective in learning the mapping from a parametric initial condition function to the solution operator for a family of PDEs.

The method proposes a Fourier Layer (FL) repeated several times. It consists of a Fourier Transform (FT), then a multiplication with a trainable matrix (also called a linear transform, and an Inverse Fourier Transform (IFT) operation, and ends with a standard non-linearity. This is similar to the convolution operation as it also translates to multiplication in the Fourier domain. However, a key feature of the FL is the fact that one can keep only some part of the data before the IFT, corresponding to the lowest frequency in the Fourier domain, reducing the amount of information and computational resources.

The major bottleneck which might hinder the scalability of this classical Fourier neural operator (FNO) is its time complexity, limited by the classical FT and IFT operations inside the FL. Indeed, their time complexity is $O(N_\mathrm{s} \log N_\mathrm{s})$ with a classical computer (single-threaded), where $N_\mathrm{s}$ is the input size (number of samples). For a multi-threaded classical scenario (say p threads), the best possible theoritical time-complexity of the FT will be approximately $O((N_\mathrm{s} \log N_\mathrm{s})/p)$ (for the case when work is almost equally distributed). In most real-world use cases, $N_\mathrm{s}$ is expected to be quite high for learning precisely the solution of a PDE family. To be more precise, we will see that each input, a vector of size $N_\mathrm{s}$ , is first modified and reshaped to become a matrix of size $N_\mathrm{s} \times N_\mathrm{c}$ (see figure 1). $N_\mathrm{c}$ is named channel dimension, and usually $N_\mathrm{s} \gg N_\mathrm{c}$ . This matrix will be the actual input of the FL. Thus, even for a practical multi-threaded scenario where limited cores are availble leading to $p\lt\lt N_\mathrm{s}$ , the complexity of FFT can be at-best sublinear in $N_\mathrm{s}$ (depending on $p\gt\log N_s$ ).

**Figure 1.** Overview of the Fourier neural network. Each initial condition $f(x,0)$ is sampled $N_\mathrm{s}$ times and modified via a trainable matrix P to become a matrix of size $N_\mathrm{s} \times N_\mathrm{c}$ . Then, T Fourier Layers (green block) are applied sequentially. In this paper, we are designing quantum circuits to implement the Fourier Layers. Each Fourier Layer consists of a row-wise Fourier Transform (FT), followed by a column-wise multiplication with trainable matrices labelled W, and the row-wise inverse Fourier transform (IFT). We only apply the inner matrix multiplications to the first K columns (also called *modes*) and leave the others unchanged or replaced by 0 s before the IFT. Finally, a reverse operation with a trainable matrix Q is performed to obtain the output $f(x,t)$ as a discretized vector. The trainable parts are updated with Gradient Descent until the outputs correspond to the actual solutions of the PDE.
Download figure:
Standard image High-resolution image

**Figure 1.** Overview of the Fourier neural network. Each initial condition $f(x,0)$ is sampled $N_\mathrm{s}$ times and modified via a trainable matrix P to become a matrix of size $N_\mathrm{s} \times N_\mathrm{c}$ . Then, T Fourier Layers (green block) are applied sequentially. In this paper, we are designing quantum circuits to implement the Fourier Layers. Each Fourier Layer consists of a row-wise Fourier Transform (FT), followed by a column-wise multiplication with trainable matrices labelled W, and the row-wise inverse Fourier transform (IFT). We only apply the inner matrix multiplications to the first K columns (also called *modes*) and leave the others unchanged or replaced by 0 s before the IFT. Finally, a reverse operation with a trainable matrix Q is performed to obtain the output $f(x,t)$ as a discretized vector. The trainable parts are updated with Gradient Descent until the outputs correspond to the actual solutions of the PDE.
Download figure:
Standard image High-resolution image

1.2. Quantum algorithmic proposals

Quantum computing has gained popularity due to its potential for faster performance than its classical counterpart. Among the most famous quantum algorithms is the exponentially faster quantum fourier transform (QFT), although probably only attainable with long-term quantum computers. Similarly, other long-term quantum algorithms for machine learning have been proposed [7–9].

More recently, several developments in learning techniques based on near-term quantum computing were proposed. The initial demonstrations of these algorithms involved experiments on small-scale quantum hardware [10–13], which established their effectiveness in extracting patterns. Following this, many works [14, 15] proposed small-scale implementations of fully connected quantum neural networks on near-term hardware. Other proposals [16] for deploying convolution-based learning methods on quantum devices showed effective training in practice. Furthermore, [17] proposed quantum-hardware implementation for generative adversarial networks. A different approach, where the inputs are encoded as unary states, using the two-qubit quantum gate reconfigurable beam splitter (RBS) was proposed in a recent work [18]. This encoding gave rise to the use of orthogonal properties of pure quantum unitaries, as proposed in [19, 20] for training, for instance, orthogonal feed-forward networks to damp the gradient-based issues while learning. It used a pyramid-shaped (or other architectures) circuit based on parameterised RBS gates to implement a learnable orthogonal matrix as compared to the existing classical approaches, which offer approximate orthogonality at the cost of increased training time. This orthogonality in neural networks results in much smoother convergence and also lesser parameters, as shown by [21] for feed-forward neural networks and [22] for convolutional nets. The effectiveness of these orthogonal quantum networks was further shown in another work on medical image classification [19] problem.

In this work, we develop quantum algorithms to implement the FNO. In particular, we propose three circuits equivalent to or close to the FL (See the green box in figure 1). The remaining parts of the FNO can be adapted using existing techniques [19].

At the core of our circuit, we develop a new QFT suited for near-term hardware and specific quantum data encoding, and an implementation for Controlled Parameterized Circuits. Termed as unary-QFT, which transforms only the unary states into the Fourier domain. Under the assumption that a quantum hardware with appropriate connectivity is available, allowing parallel inference of fixed quantum gates to certain disjoint qubit pairs, it provides with an exponential speedup compared to the classical operation (assuming a practical classical scenario where $p\lt\lt N_\mathrm{s}$ ). This is a widely plausible scenario in the near future since even current quantum hardwares like ones based on cold atoms do allow parallel operations on qubits (1,2), (3,4), $\ldots$ , (n-1,n). We have built it upon the recently developed idea [18, 19] of encoding inputs to unary quantum states and applying orthogonal transformations only on these unary-basis states via learnable quantum circuits. Using this, we adapt the classical Fourier neural network [6] and propose several quantum algorithms to learn the functional mapping from the initial condition function of a PDE instance to the corresponding solution function.

The circuit proposed in this work is inspired by the widely popular butterfly diagram used for the fast Fourier transform (FFT) [23]. Then, we propose an implementation of controlled butterfly-shaped learnable quantum circuits for applying the linear transform (trainable matrix multiplications) in the Fourier domain. This results in three quantum circuits inspired by the classical FL, which are faster than the classical operation or, said differently, require fewer parameters for the same architecture, thereby boosting their scalability. Given the matrix input of the FL dimension $N_\mathrm{s}\times N_\mathrm{c}$ , where $N_\mathrm{s}$ corresponds to the number of samples per PDE, and $N_\mathrm{c}$ correspond to the channel dimension, the order of depth and gate complexities corresponding to FL and proposed algorithms is shown in table 1. In an ideal quantum learning scenario, the depth complexity should also correspond to the runtime complexity for the quantum algorithm. However, the current parameterized quantum circuits have a classical control which means parameters need to be loaded to this classical control and if this operation is not done in parallel in the final QPU architectures, then it the runtime complexity would have sequential execution complexity of this operation. Since the bottleneck classical operation (FT) and the corresponding quantum operation (QFT) does not involve parameterized options, under the above discussed assumption of parallel execution of gates on QPUs, the depth complexity can be approximately taken as runtime complexity of quantum algorithms. It is possible that in many practical scenarios the above assumptions might not be perfectly applicable where gate complexity might play a significant role in deciding runtime complexity. This might be a key limitation of our proposed algorithms.

Table 1. Comparison of the order of time/depth complexities (O) of the proposed circuits with the existing classical Fourier Layer (FL). Here $N_\mathrm{s}$ denotes the sampling dimension, $N_\mathrm{c}$ denotes the channel dimension where $N_\mathrm{s}\gg N_\mathrm{c}$ and K (usually in the range 4–16) denotes the maximum number of modes allowed [6]. This implies that the proposed quantum algorithms would be faster than the classical method. Each quantum circuit requires $N_\mathrm{c}+N_\mathrm{s}$ qubits and K independent parallel circuits are required by the Parallelised QFNO.

Method	#Qubits	#Circuits	Gate complexity	Depth complexity
Classical FL	—	—	—	$N_\mathrm{c}$ + $N_\mathrm{s}$ log( $N_\mathrm{s}$ )
Parallel QFL	$N_\mathrm{c}$ + $N_\mathrm{s}$	K	$KN_\mathrm{c}$ log( $N_\mathrm{c}$ )+ $KN_\mathrm{c}N_\mathrm{s}$ log( $N_\mathrm{s}$ )	$N_\mathrm{c}$ + $N_\mathrm{c}$ log( $N_\mathrm{s}$ )
Sequential QFL	$N_\mathrm{c}$ + $N_\mathrm{s}$	1	$KN_\mathrm{c}$ log( $N_\mathrm{c}$ )+ $N_\mathrm{c}N_\mathrm{s}$ log( $N_\mathrm{s}$ )	$KN_\mathrm{c}$ + $N_\mathrm{c}$ log( $N_\mathrm{s}$ )
Composite QFL	$N_\mathrm{c}$ + $N_\mathrm{s}$	1	( $N_\mathrm{c}$ +K)log( $N_\mathrm{c}$ +K)+ $N_\mathrm{c}N_\mathrm{s}$ log( $N_\mathrm{s}$ )	log( $N_\mathrm{c}$ +K)+ $N_\mathrm{c}$ log( $N_\mathrm{s}$ )

The first algorithm corresponds to the quantum counterpart of the classical operation, where the middle trainable matrix is an orthogonal one. The other two algorithms are modifications of the first circuit with lower circuit depth to mitigate the fact the near-term quantum hardware might still be too noisy.

We have simulated all the three proposed quantum algorithms on all the three PDEs evaluated, as in the classical FNO paper [6], namely Burgers' equation, Darcy's flow equation and Navier–Stokes equation, on the synthetic datasets used in that paper. We have also simulated our quantum algorithms against the convolutional neural networks (CNNs) on several benchmark datasets for image classification. In all the experiments, the three quantum algorithms perform similarly and also show accuracies comparable to state-of-the-art classical methods.

1.3. Contributions

To summarize, the contributions of this work are the following:

We propose a new quantum circuit for performing a QFT on an input encoded as a superposition of unary states.
Using this unary-QFT, we propose three quantum circuits with a provable equivalence or approximation to the classical FL. These circuits can be sequentially combined to form a quantum version of a trainable Fourier neural network for solving Parametric PDEs.
We provide an in-depth analysis of the computational complexity of the circuits and prove their logarithmic time complexity with respect to $N_\mathrm{ s}$ (input dimension) under certain assumptions, compared to the linear time complexity of the classical counterpart.
We benchmark results, showing the effectiveness of the proposed quantum algorithms, against their classical equivalent, for solving several PDEs or even classifying images from well-known datasets.

2. Classical FNO

For solving a PDE, we are provided with a dataset where each instance is a set of the initial conditions of the family of the PDE. This initial condition is represented as a parameterised function and is sampled at various locations to generate the input. The neural network's output corresponding to this initial condition is trained to be the value of the solution $f(x,t)$ for that instance at the same locations and for a given t. The FNO [6] tries to learn this functional mapping from the initial condition function to the solution function for this PDE family. This implies that given a set of initial conditions sampled at various locations as input to the FNO, it has to predict the solution function values at all these locations for any PDE instance in the test set. An overview of the FNO is given as figure 1.

We recall that each input of the FNO is a parametric initial condition, seen for instance as a function of space $f(x,0)$ . This function is sampled at $N_\mathrm{ s}$ locations and forms a vector in $\mathbb{R}^{N_\mathrm{ s}}$ that is later modified by a trainable matrix P to become a matrix $A \in \mathbb{R}^{N_\mathrm{ c} \times N_\mathrm{ s}}$ . We denote $N_\mathrm{ c}$ the channel dimension and usually $N_\mathrm{s} \gg N_\mathrm{c}$ . The matrix A will be the input of the first FL and its size will be maintained for each following FL. As shown in figure 1, the FNO consists of a sequence of FLs. Without loss of generality, we will denote $A \in \mathbb{R}^{N_\mathrm{c} \times N_\mathrm{s}}$ to be the input of any FL.

Each FL starts by transforming its input matrix to the Fourier domain. It applies a FT on each column of the input matrix A. The resulting matrix has the same dimension, and we will refer to its $N_\mathrm{s}$ columns as modes to emphasize their presence in the Fourier basis. After the FT, we apply a learnable matrix multiplication to the first K modes, i.e. the first K columns among $N_\mathrm{s}$ (see figure 1). The original proposal indicates to crop the remaining modes by replacing them with 0. In our quantum proposal, we will let the remaining modes untouched rather than cropping them. The final operation is an Inverse Fourier Layer (IFT) which transforms the matrix back to the input space. Note that in the original proposal, the authors apply in parallel a direct convolution to the input, and both outputs are merged (Not shown in figure 1). In our quantum proposals, we discard this convolution part for simplicity, without any impact on the experimental accuracy.

In this work, we propose several quantum algorithms for mimicking the FL and name these circuits Quantum Fourier Layer (QFL). We name the resulting neural network as a Quantum Fourier neural operator (QFNO). As the other parts of the FNO (P and Q matrix multiplications from figure 1) are easier to adapt with existing quantum techniques [19], we have not focused our work on them. Next, we will formulate the analytic expression of the FL's output, in order to prove its correspondence to QFL.

Fourier layer We now discuss the mathematical details of the classical FL for a 1D PDE case (e.g. Burgers equation), showing the inputs and outputs of each transformation involved. For each PDE instance, the input is denoted as $A \in \mathbb{R}^{N_\mathrm{c}\times N_\mathrm{s}}$ , where $N_\mathrm{c}$ denotes the number of channels per sample in the input and N_s corresponds to the number of samples for this instance (initial condition function).

Regarding notation, the elements of A are a_ij, its rows are a_i while its columns are a^j . We denote the output corresponding to this classical operation as $Y \in \mathbb{R}^{N_\mathrm{c}\times N_\mathrm{s}}$ , and its elements, columns, and rows as y_ij, y_i , and y^j respectively. As the quantum matrices are orthogonal and the l₂-norm of any quantum state vector is 1, we consider the input A such that $||A||_2 = 1$ . Enforcing this condition is easy and does not have any significant impact on the optimization process.

Going further, a FT is applied to this input along each row of size $N_\mathrm{s}$ :

$\begin{equation} \hat{a}_{i} : = FT\left(a_i\right) \end{equation} \tag{ 1 }$

where $a_i = (a_{ij})_{j\in [1,N_\mathrm{s}]}$ . We can also define $\hat{a}^{j} = (\hat{a}_{ij})_{i\in[1,N_\mathrm{c}]}$ . Let $\hat{A}$ be the resulting matrix.

Denoting the maximum number of modes with K, the intermediate linear transform is in fact a multiplication with a 3-tensor $W \in \mathbb{R}^{N_\mathrm{c}\times N_\mathrm{c} \times K}$ . Each $W^{k}\in \mathbb{R}^{N_\mathrm{c}\times N_\mathrm{c}}$ corresponds to the $k\mathrm{th}$ matrix of W, indexed along the last dimension, and corresponding to $k\mathrm{th}$ mode (see figure 1). In the quantum implementation later, we will consider matrices W^k to be orthogonal, as this naturally occurs in quantum circuits. We multiply the tensor W to the first K modes (along the $N_\mathrm{s}$ dimension) of the $\hat{A}$ . Said differently, for each $j\unicode{x2A7D} K$ , the jth column of $\hat{A}$ is multiplied by W^j , resulting in the following output

$\begin{equation} \left[\left(W\,^{j}\hat{a}^{j}\right)_{j\in\left[1,K\right]} , \left(\hat{a}^j\right)_{j\in\left[K+1,N_\mathrm{s}\right]}\right]. \end{equation} \tag{ 2 }$

Let $\hat{b}^j = W^j \hat{a}^j$ , we can rewrite the previous vector as

$\begin{equation} \left[\left(\hat{b}\,^{j}\right)_{j\in\left[1,K\right]} , \left(\hat{a}^{j}\right)_{j\in\left[K+1,N_\mathrm{s}\right]}\right]. \end{equation} \tag{ 3 }$

In the original classical proposal [6], the rest of the modes are discarded (replaced by zeros). In the quantum case, it will be simpler to let the other modes unchanged. We found that this choice does not impact the performance.

Finally, we apply the IFT operation on this transformed input, row by row. It results in the following output for each row i :

$\begin{equation} y_i = \mathrm{IFT}\left(\left[\left(\hat{b}_{ij}\right)_{j\in\left[1,K\right]} , \left(\hat{a}_{ij}\right)_{j\in\left[K+1,N_\mathrm{s}\right]}\right]\right) \end{equation} \tag{ 4 }$

where $\hat{b}_{ij}$ is the ith component of $\hat{b}^j$ . In conclusion, the overall time complexity of the FL (FT + Matrix Multiplications + IFT) should be $O(KN^2_\mathrm{c}+2N_\mathrm{c}N_\mathrm{s}log(N_\mathrm{s}))$ . This runtime can be improved if we consider a distributed algorithm, considering the current availability of efficient GPUs. By parallelising classical operations, we can achieve the FL in $O(N_\mathrm{c}+2N_\mathrm{slog}(N_\mathrm{s}))$ . Note however that the dominant term remains the same as $N_\mathrm{s} \gg N_\mathrm{c}$ and K is usually a constant.

3. Quantum algorithmic tools

In this section, we introduce quantum tools necessary to build the QFL in section 4. These tools are meant to be implemented on near-term quantum computers, with modularity so that they can be useful for other applications.

We first introduce the matrix unary amplitude encoding, a fast way to load a matrix as a quantum state. Then, we develop a new quantum circuit to apply a QFT on the unary basis states. Finally, we present learnable quantum orthogonal layers, the equivalent of learnable weight matrices in classical neural networks.

We introduce here a quantum gate that will be common to the next tools: the 2-qubit RBS gates [24], parameterised by a single parameter θ. The RBS gate has the following unitary:

$\begin{equation} \mathrm{RBS} \left(\theta\right) = \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & \cos\left(\theta\right) & \sin\left(\theta\right) & 0\\ 0 & -\sin\left(\theta\right) & \cos\left(\theta\right) & 0\\ 0 & 0 & 0 & 1\\ \end{pmatrix}. \end{equation} \tag{ 5 }$

It can be observed that it modifies $\mathinner{|{01}\rangle}$ and $\mathinner{|{10}\rangle}$ , while it performs the identity operation on $\mathinner{|{00}\rangle}$ and $\mathinner{|{11}\rangle}$ . The RBS, therefore, preserves the hamming weight of the input state. In particular, any superposition of the unary basis (states with hamming weight 1), is kept in this basis through a circuit made of RBS gates. Its implementation depends on the quantum hardware considered.

Now, we will discuss an identity for these RBS gates and use it in the coming section to implement controllable parameterised circuits made of these gates.

Proposition 3.1. Given two qubits, applying an RBS gate on them with angle θ, followed by a Z gate on any one of them is equivalent to applying a Z gate on the same qubit followed by an RBS gate with angle $-\theta$ on the two qubits (figure 2).

Proof. Let us first look at the circuit shown in figure 2(a). Equation (6) shows the calculation for the final unitary corresponding to the left-hand side:

$\begin{equation} \begin{aligned} \begin{pmatrix} 1&0&0&0 \\ 0&-1&0&0 \\ 0&0&1&0 \\ 0&0&0&-1 \\ \end{pmatrix} \cdot \begin{pmatrix} 1&0&0&0 \\ 0&\mathrm{cos}\left(\theta\right)&\mathrm{sin}\left(\theta\right)&0 \\ 0&-\mathrm{sin}\left(\theta\right)&\mathrm{cos}\left(\theta\right) &0 \\ 0&0&0&1 \\ \end{pmatrix} = \begin{pmatrix} 1&0&0&0 \\ 0&-\mathrm{cos}\left(\theta\right)&-\mathrm{sin}\left(\theta\right)&0 \\ 0&-\mathrm{sin}\left(\theta\right)&\mathrm{cos}\left(\theta\right) &0 \\ 0&0&0&-1 \\ \end{pmatrix} \end{aligned} \end{equation} \tag{ 6 }$

and equation (7) shows the same for the right-hand side:

$\begin{equation} \begin{aligned} \begin{pmatrix} 1&0&0&0 \\ 0&\mathrm{cos}\left(\theta\right)&-\mathrm{sin}\left(\theta\right)&0 \\ 0&\mathrm{sin}\left(\theta\right)&\mathrm{cos}\left(\theta\right) &0 \\ 0&0&0&1 \\ \end{pmatrix} \cdot \begin{pmatrix} 1&0&0&0 \\ 0&-1&0&0 \\ 0&0&1&0 \\ 0&0&0&-1 \\ \end{pmatrix} = \begin{pmatrix} 1&0&0&0 \\ 0&-\mathrm{cos}\left(\theta\right)&-\mathrm{sin}\left(\theta\right)&0 \\ 0&-\mathrm{sin}\left(\theta\right)&\mathrm{cos}\left(\theta\right) &0 \\ 0&0&0&-1 \\ \end{pmatrix} \end{aligned} \end{equation} \tag{ 7 }$

where both the equations finally arrive at the same unitary matrix, thereby proving the identity. A similar calculation for the circuit shown in figure 2(b) verifies its correctness. □

**Figure 2.** RBS identities.
Download figure:
Standard image High-resolution image

3.1. Data encoding in the unary basis

As seen in section 2, the input of each classical FL is a matrix A. As we are about to propose quantum circuits to process this data, we need a method to encode these matrices as quantum states. We chose to encode data as amplitude-encoded states, a superposition of basis states with amplitudes that correspond to the data itself. We chose the unary basis, namely the computational basis vectors that have a hamming weight of 1, e.g. $\mathinner{|{e_i}\rangle} = \mathinner{|{0\cdots010\cdots0}\rangle}$ with the 1 on the ith qubit. This choice of basis is motivated by its ability to implement near term encoding for vectors and matrices. It also allows performing tractable linear algebra tasks with provable guarantees. Higher order basis states can be used [25] but are not the focus of this work.

Given an input matrix $A \in \mathbb{R}^{n\times d}$ , its quantum state once loaded should be:

$\begin{equation} \mathinner{|{A}\rangle} = \frac{1}{\left\lVert A\right\rVert}\sum_{i = 1}^n\sum_{j = 1}^d a_{ij}\mathinner{|{e_j}\rangle}\mathinner{|{e_i}\rangle}. \end{equation} \tag{ 8 }$

To load a matrix in such a way, we use a quantum circuit made of two registers (one for the rows, the other for the columns) as shown in figure 3 from [20]. It uses subcircuits from [18] that load vectors in the unary basis. For instance, a row $a_i \in \mathbb{R}^d$ is loaded as $\mathinner{|{a_i}\rangle} = \frac{1}{\left\lVert a_i\right\rVert}\sum_{i = 1}^n a_{ij} \mathinner{|{e_i}\rangle}$ . We can get rid of the normalization factors if we assume or preprocess the vectors and matrices to be normalised.

On the n-qubits top register, we first load the vector made of the norm of each row $(\left\lVert A_1\right\rVert,\cdots,\left\lVert A_n\right\rVert)$ . On the d-qubits lower register, we sequentially load and unload each row A_i in a controlled fashion. Details can be found in [20].

With the right connectivity, the circuit to load a vector of size n has depth $O(\mathrm{log}(n))$ , hence loading a matrix $A\in\mathbb{R}^{n\times d}$ requires a circuit of depth $O(\mathrm{log}(d)+2d\mathrm{log}(n))$ .

The main advantages of using unary encoded states are:

It is possible to efficiently construct amplitude encodings of classical data (vectors) in the unary basis in logarithmic depth.
Starting with unary encoded states and using only Hamming weight preserving gates, one can restrict the size of the Hilbert space we work in, which prevents Barren Plateaus and other concentration phenomena that prevent QML in practice [26].

On the other hand, the number of qubits required is linear in the vector size. In other words, unary and general binary encodings provide a trade off between the number of qubits (linear versus logarithmic) and circuit depth (logarithmic versus linear). Last, one should look at the unary encodings as a special case of fixed Hamming weight encodings. We can use these more general encodings together with the Compound Circuit to perform compound matrix operations that can model higher order interactions. These operations are much harder to simulate classically as the Hamming weight increases, while the quantum complexity remains the same.

3.2. Unary QFT

QFT, one of the most impactful algorithms found in quantum computing literature, provides an exponential speedup compared to classical computing. It performs the discrete Fourier transform over the entire 2ⁿ-dimensional Hilbert space. In this work, we propose a new quantum circuit that performs the discrete Fourier transform over the unary basis states. This allows for a shallow-depth quantum circuit adapted to our quantum data encoding presented in the previous section.

The classical algorithm for performing FFT uses a butterfly-shaped diagram [23], shown in figure 4. Our goal is to inspire from the classical FFT diagram and perform the same operation with quantum circuits. Namely, the unitary matrix, once restricted to the unary basis, must implement the FFT matrix. With an input $x \in \mathbb{R}^n$ , The FFT matrix F_n is given by:

**Figure 4.** Diagram of the Cooley–Tukey algorithm [23], performing the classical FFT on an input $x\in\mathbb{R}^n$ . Each white box performs an elementary radix-2 operation with the root of unity $\omega = e^{i2\pi/n}$ .
Download figure:
Standard image High-resolution image

$\begin{equation} F_n = \begin{pmatrix} 1&1&1&\cdots&1 \\ 1&\omega & \omega^2 & \cdots & \omega^{\left(n-1\right)} \\ 1&\omega^2 & \omega^4 & \cdots & \omega^{2\left(n-1\right)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1&\omega^n & \omega^{2n} &\cdots & \omega^{\left(n-1\right)^2} \\ \end{pmatrix} \end{equation} \tag{ 9 }$

where $\omega^k = e^{i\frac{2\pi}{n}k}$ is the $n\mathrm{th}$ -root of unity raised to some power k based on the position of the radix. Note that F_n is not unitary, but its scaled version $F_n/\sqrt{n}$ , is unitary. Therefore, we will implement the scaled version using our quantum circuit. As shown in figure 4, the input x is permuted, which we will have to take into account when loading data quantumly.

The classical FFT algorithm is decomposed into several radix-2 operations (figure 4). Each one itself is a matrix multiplication which transforms the input $[a, b]$ into $[a+\omega^k b, a-\omega^k b]$ . In matrix multiplication terms, each of these operations applies the matrix $\begin{pmatrix} 1&\omega^k\\ 1&-\omega^k\\ \end{pmatrix}$ .

Quantumly, we want to reproduce the idea of the classical FFT diagram. It results in the circuit shown in figure 5. To reproduce the action of the radix-2 transforms (scaled by a factor of $1/\sqrt{2}$ to make it unitary) on two qubits, we use one single qubit gate and one RBS gate (equation (5)). The single qubit gate is a phase gate $\begin{pmatrix} 1&0 \\ 0&-\omega^k \\ \end{pmatrix}$ applied on the top qubit only. Then, we apply an RBS gate with an angle of $-\pi/4$ . Overall, this applies the following unitary:

**Figure 5.** Quantum circuit for implementing a Fourier Transform on the unary basis. Single qubit gates are phase gates. Vertical lines are RBS gates with $-\pi/4$ angle. It reproduces the classical FFT butterfly circuit (figure 4) by replacing each radix-2 operation with the phase gate followed by the RBS gate. The input must be a vector $\mathinner{|{x}\rangle}$ encoded in the unary basis, with the right permutation. The output will be $\mathinner{|{\hat{x}}\rangle}$ .
Download figure:
Standard image High-resolution image

$\begin{equation} \begin{aligned} \begin{pmatrix} 1&0&0&0 \\ 0&\frac{1}{\sqrt{2}}&-\frac{1}{\sqrt{2}}&0 \\ 0&\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}&0 \\ 0&0&0&1 \\ \end{pmatrix} \cdot \begin{pmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&-\omega^k &0 \\ 0&0&0&-\omega^k \\ \end{pmatrix} = \begin{pmatrix} 1&0&0&0 \\ 0&\frac{1}{\sqrt{2}}&\frac{\omega^k}{\sqrt{2}}&0 \\ 0&\frac{1}{\sqrt{2}}&-\frac{\omega^k}{\sqrt{2}}&0 \\ 0&0&0&-\omega^k \\ \end{pmatrix} \end{aligned}. \end{equation} \tag{ 10 }$

The above unitary, once restricted to the unary basis, namely $\{\mathinner{|{01}\rangle},\mathinner{|{10}\rangle}\}$ (middle rows and columns) in the case of two qubits, is exactly the desired radix operation : $\frac{1}{\sqrt{2}}\begin{pmatrix} 1&\omega^k\\ 1&-\omega^k\\ \end{pmatrix}$ .

We apply these operations exactly in the manner of the classical FFT architecture. For an n-dimensional vector, we would thus require n qubits, $\log(n)$ depth and $n\log(n)$ gates.

The Unary-QFT is meant to be applied on a unary quantum state, after a vector data loader for instance. Note that the input to the circuit needs to be the permuted version of the vector. This is a fixed permutation, and we can construct our data loader accordingly.

Finally, it is simple to apply the IFT in a similar way, named Unary-IQFT. It is simply the inverse of the above circuit (figure 5), along with the right permutation of the input data.

In the rest of this work, we will use the following equations. Given a normalized real vector $x = (x_1,\cdots,x_M)$ , and its FT $\hat{x} = (\hat{x}_1,\cdots,\hat{x}_M)$ , the QFT operation in unary basis and its inverse, the IQFT operation, can be defined as follows:

$\begin{equation} \begin{aligned} &\mathrm{QFT} \left(\sum_i x_i \mathinner{|{e_i}\rangle}\right) = \sum_i \hat{x}_i \mathinner{|{e_i}\rangle} \quad \text{and} \quad \mathrm{IQFT} \left(\sum_i \hat{x}_i \mathinner{|{e_i}\rangle}\right) = \sum_i x_i \mathinner{|{e_i}\rangle} \end{aligned} \end{equation} \tag{ 11 }$

The classical FFT time complexity is $O(n\log(n))$ for a n-bit input on a single CPU. Indeed, it consists in applying $\log(n)$ times $n/2$ butterfly operations. Our quantum unary FT requires n qubits and has $O(n\log(n))$ gates, but can run in $O(\log(n))$ timesteps in a single QPU. Indeed, when the connectivity allows it, gates acting on different qubits are applied simultaneously, for example this is indeed the case for trapped ion or cold atom quantum hardware.

3.3. Quantum circuits as trainable linear transforms

To mimic the learnable part of the FL, we need to perform quantumly some sort of matrix multiplication in the unary basis. With this in mind, we propose to use Quantum Orthogonal Layers [19], namely parameterised quantum circuits using hamming-weight preserving gates only. In our case, we are consequently ensured to preserve the superposition in the unary basis, before applying the final IFT. The unitary considered have real values and once restricted to the unary basis, correspond to orthogonal matrices, hence their name. In the context of the FL, we are now able to apply a learnable linear transform (see section 2), by implementing orthogonal matrix multiplications with trainable parameters.

Several circuits are possible, but we will mostly use the Butterfly circuit shown in figure 6: it has the same layout as the previous Unary-QFT (section 3.2), and for this reason has a logarithmic depth if the hardware connectivity allows to simultaneously apply quantum gates acting on different pairs of qubits, as is already the case for trapped ion and cold atoms hardware. For an input $x\in\mathbb{R}^n$ , the Butterfly quantum circuit has $O(n\log(n))$ parameterised gates. The corresponding matrices, once restricted to the unary basis, lie in a subgroup of orthogonal matrices. Other circuits, with nearest-neighbours qubits connectivity, are also usable in our case. They differ in their expressivity, number of parameters, and depth.

In the next sections, we will detail how to apply such circuits in a controlled fashion. This will become useful as K such orthogonal matrix multiplications will be applied on the first K modes (see section 2), after the FT.

4. Quantum circuits for FL

Using the circuits from the previous section as building blocks, we propose three quantum circuits for classical FL, named Sequential (section 4.1), Parallel (section 4.2), and Composite (section 4.3) QFL respectively. These quantum circuits are differentiated based on how the intermediate matrix multiplications in the classical FL are implemented using the learnable quantum circuit. We compare their computational complexities (see table 1) and their efficiency in practice in the following sections.

To reproduce the classical FL (see section 2) using quantum circuits, we need the final quantum state to correspond to the result of the classical FL, as shown in equation (4). Therefore, we expect ideally a quantum output state of the following form:

$\begin{equation} \begin{aligned} \mathinner{|{y}\rangle} & = \sum_i \mathinner{|{y_i}\rangle}\mathinner{|{e_i}\rangle} = \sum_i^{N_c}\sum_{j}^{N_s}y_{ij}\mathinner{|{e_{j}}\rangle}\mathinner{|{e_{i}}\rangle} \quad \text{where} \\ y_{ij} & = IFT\left(\left[\left(\hat{b}_{ij}\right)_{j\in\left[1,K\right]} , \left(a_{ij}\right)_{j\in\left[K+1,N_s\right]}\right]\right)_{j} \end{aligned} \end{equation} \tag{ 12 }$

which simply means that the matrix y is encoded in the unary basis, as is equation (8), and y_ij is the jth component of the resulting vector y_i defined in equation (4). Note that there are no normalization factors in the above equation, as the rows of the matrix $(a_1,\ldots, a_{N_\mathrm{c}})$ are assumed normalized, and the normalisation is preserved through the circuit.

For our three quantum circuit proposals- the Sequential circuit (4.1), the Parallel circuit (4.2), and the Composite circuit (4.3)-we will compare their output quantum state to the desired one (equation (12)), showing how well we are able to replicate the classical operation.

Our conclusions are the following: the Sequential circuit is returning the desired output $\mathinner{|{y}\rangle}$ but might have a prohibitive depth for near-term application. On the other hand, both Parallel and Composite circuits are designed to have a shorter depth, at the cost of producing a slightly different quantum output. Interpreting these alternative outputs is complicated, and numerical simulations in section 5 will assess their quality.

The three proposals are closely related, as shown in figures 7, 9 and 10. All circuits start with $\mathinner{|{0}\rangle}^{N_\mathrm{c}}\otimes \mathinner{|{0}\rangle}^{N_\mathrm{s}}$ , where the top and lower registers are used respectively to encode the N_c rows and the N_s columns of the input matrix A. Indeed, all circuits start by loading the input matrix A as explained in section 3.1. We recall that $N_\mathrm{s}$ is the number of samples that are used to encode each initial condition function of the PDE, as a vector. $N_\mathrm{c}$ is another dimension we use to extend this vector as a matrix. We usually have $N_\mathrm{s} \gg N_\mathrm{c}$ . After this step, the Unary-QFT from section 3.2 is applied on the lower register. At the very end, the inverse QFT is applied similarly, followed by a measurement of both registers. Between the two FTs lies the core difference between the three proposals: the implementation of the K matrix multiplications (see figure 1 and section 2). This part is a trade-off between circuit depth, circuit repetitions, and correspondence with the classical FL.

**Figure 7.** Sequential-QFL: The proposed Sequential Quantum Circuit, which replicates the classical FNO operation from figure 1(if we measure at the end). Further details regarding it are given in section 4.1. The yellow box comprises a controlled parameterized circuit having jth qubit ( $1\unicode{x2A7D} j \unicode{x2A7D} K$ ) of the lower register as the controlling qubit.
Download figure:
Standard image High-resolution image

**Figure 7.** Sequential-QFL: The proposed Sequential Quantum Circuit, which replicates the classical FNO operation from figure 1(if we measure at the end). Further details regarding it are given in section 4.1. The yellow box comprises a controlled parameterized circuit having jth qubit ( $1\unicode{x2A7D} j \unicode{x2A7D} K$ ) of the lower register as the controlling qubit.
Download figure:
Standard image High-resolution image

4.1. Sequential circuit for FL

This first proposal is represented as figure 7. As explained, the Sequential circuit starts by loading the input matrix A, which in general is itself the output of the previous FL. Therefore, the first part of the circuit is nothing else than the circuit displayed in figure 3. Its depth is $O(\log(N_\mathrm{c})+2N_\mathrm{c}\log(N_\mathrm{s}))$ . The resulting state is:

$\begin{equation} \sum_{i}^{N_\mathrm{c}}\sum_{j}^{N_\mathrm{s}} a_{ij}\mathinner{|{e_{i}}\rangle}\mathinner{|{e_{j}}\rangle}. \end{equation} \tag{ 13 }$

To follow the classical operation, the FT should be applied to each of the rows. Thus, we apply the Unary-QFT operation on the lower register, currently encoding a superposition of these row vectors.

We first rewrite the previous state as:

$\begin{equation} \sum_{i}^{N_\mathrm{c}}\mathinner{|{e_{i}}\rangle}\left(\sum_{j}^{N_\mathrm{s}} a_{ij}\mathinner{|{e_{j}}\rangle}\right). \end{equation} \tag{ 14 }$

And then apply the QFT operation, as in equation (11), to the lower register:

$\begin{align} \sum_{i}^{N_\mathrm{c}}\mathinner{|{e_{i}}\rangle}\mathrm{QFT}\left(\sum_{j}^{N_\mathrm{s}} a_{ij}\mathinner{|{e_{j}}\rangle}\right) = \sum_{i}^{N_\mathrm{c}}\mathinner{|{e_{i}}\rangle}\sum_{j}^{N_\mathrm{s}} \hat{a}_{ij}\mathinner{|{e_{j}}\rangle} = \sum_{i}^{N_\mathrm{c}}\sum_{j}^{N_\mathrm{s}} \hat{a}_{ij}\mathinner{|{e_{i}}\rangle}\mathinner{|{e_{j}}\rangle} \end{align} \tag{ 15 }$

where $\hat{a}_{ij}$ corresponds to the classical, row-wise, FT of A. Note that the resulting state has kept its superposition in the unary basis.

Next, we realize the learnable linear transform, namely K matrix multiplications, as in the middle of the classical FL (figure 1). Each matrix W^k , to follow the classical notations from section 2, will now become an orthogonal matrix in the quantum case. This multiplication is done with the parameterised quantum circuits from section 3.3, which preserves the unary basis. We choose to use the learnable butterfly circuit from figure 6, for its shallow depth. As explained before, applying these learnable circuits effectively applies an orthogonal matrix multiplication in the unary basis. The main challenge, which will differentiate the three proposals, consists in applying these matrix multiplications to the first K columns $(\hat{a}^1,\cdots,\hat{a}^K)$ , independently. Contrary to the previous QFT, the linear transform should act on the columns, encoded in the upper register.

As shown in figure 7, we propose to apply sequentially K such parameterised circuits $P_1, \cdots, P_K$ (butterfly circuits) on the top register. To ensure independent transformations are applied on each column $\hat{a}^j$ , we propose a controlled implementation of the parameterised circuit, or, a Controlled parameterised Circuit. It applies the parameterised circuit to the upper register only when a particular qubit of the lower register is not in the activated state ( $\mathinner{|{1}\rangle}$ ). For the circuit P_j , controlled by jth qubit of the lower register, this controlled implementation begins with the application of circuit P_j on the upper register, followed by the application of Controlled Z-gates to certain qubits in the upper register if the jth qubit in the lower register is in state $\mathinner{|{0}\rangle}$ . This can be seen as a controlled implementation of some matrix U_Z (see figure 7) and thus, we denote this transformation by CU_Z .

The set of qubit indices for applying these Z-gates are selected using the following rule: each RBS-gate of the parameterized circuit P_j has a Z-gate applied to exactly one of its two qubits. For instance, if P_j has nearest neighbour connectivity, then the desired set can be the set of even or odd qubit indices. Subsequently, we apply another parametrized circuit, similar to P_j , on the upper register, but this time reversing the order in which the RBS gates are applied, denoting this circuit by $P_j^{^{\prime}}$ . Finally, we repeat the controlled Z-gate operations on the same qubits. This completes the implementation of the controlled parameterised circuit. Let us now see how this implementation achieves the desired task.

Figure 8 shows the controlled version of the parameterized circuit from figure 6. Here, the lowest qubit controls all the Z-gates and thus, is the controlling qubit for the parameterized circuit. The complete circuit includes the parameterized circuit (P_j ) followed by controlled Z-gates on some of the qubits (CU_Z ). These qubits are selected using the following criterion: Every RBS-gate in the circuit has exactly one of its two qubits lying in this set. Finally, the circuit P_j is applied again but this time the order of RBS-gate is reversed ( $P_j^{^{\prime}}$ ). Denoting the angles $[\theta_1, \theta_2,...., \theta_{M}]$ by $\mathbf{\theta}$ , where M is the total number of parameterized RBS-gates in P_j , and writing the circuit P_j as $P_j({\theta})$ , the following relation holds between P_j and $P_j^{^{\prime}}$ :

$\begin{equation} P_j^{^{\prime}}\left(\theta\right) = P_j^{\dagger}\left(-\theta\right) \end{equation} \tag{ 16 }$

where $P_j^{\dagger}$ denotes the conjugate transpose of P_j .

**Figure 8.** parameterised Butterfly Circuit followed by Z gates on certain qubits and its flipped version around the vertical axis. The dash-enclosed region shows the last layer RBS gates of the parameterised circuit being cancelled out by first layer RBS gates in its flipped version, using the RBS identities in figure 2. Similarly for the second last layer after the last layer is cancelled out and so on. Thus, eventually, only Z gates will remain.
Download figure:
Standard image High-resolution image

Before proceeding further on the implementation of the controlled parameterised circuit, we first discuss a claim below regarding the existence of U_Z for any possible parameterized butterfly circuit.

Claim 4.1. For an N-qubit butterfly-shaped parameterised circuit described in section 3, with N = 2^a for any whole number a, there exists a set of qubit indices $\mathcal{I}_a$ such that every RBS-gate in the circuit has exactly one of its qubits' index lying in this set.

Proof. We show this proof by recurrence. We first establish the base case for a = 1 (or N = 2). For this we know $\mathcal{I}_1$ will be either {1} or {2}. □

Now, it is given that $\mathcal{I}_a$ exists for some a > 1. Since $\mathcal{I}_a$ consists of exactly one of the qubit index for every RBS-gate, then indices not in $\mathcal{I}_a$ (denoted by $\mathcal{I}^{^{^{\prime}}}_a$ ) cannot have any RBS-gate between them. Also, since exactly one of every RBS-gate indices is in $\mathcal{I}_a$ , therefore the other index lies in $\mathcal{I}^{^{^{\prime}}}_a$ . Thus, $\mathcal{I}^{^{^{\prime}}}_a$ is also a solution set for this problem. Now, if we combine the two $N = 2^a$ qubit circuits (corresponding to $\mathcal{I}^1_a$ and $\mathcal{I}^2_a$ ), then we see that there is no new connection on the index set $\mathcal{I}^{^{^{\prime}}1}_aU\mathcal{I}^{2}_a$ . Thus, this can correspond to the set $\mathcal{I}_{a+1}$ . Hence, we proved that if $\mathcal{I}_a$ exists, then $\mathcal{I}_{a+1}$ also exists.

Going further, we now use Identity 3.1 for RBS gates to arrive at the following equation:

$\begin{equation} U_ZP_j\left(\theta\right) = P_j\left(-\theta\right)U_Z. \end{equation} \tag{ 17 }$

Using this, we arrive at the following relation:

$\begin{equation} \begin{aligned} P_j^{^{\prime}}\left(\theta\right)U_ZP_j\left(\theta\right) = P_j^{\dagger}\left(-\theta\right)U_ZP_j\left(\theta\right) = P_j^{\dagger}\left(-\theta\right)P_j\left(-\theta\right)U_Z = U_Z \end{aligned} \end{equation} \tag{ 18 }$

Thus, it can be observed that when the lowest qubit in figure 8 is in state $\mathinner{|{0}\rangle}$ , each RBS gate in the last layer of P_K is being cancelled out by the RBS gate in the first layer of $P_K^{^{\prime}}$ . After this, the second last layer operations get cancelled by the second layer of $P_K^{^{\prime}}$ and finally only Z-gate operations (U_Z ) will remain. Finally, the application of another CU_Z unitary results in U_Z being re-applied to the upper register and using $U_ZU_Z = I$ , we can conclude that the state of the upper register is preserved. Thus, applying X-gate on the lowermost qubit again retains the initial state before the application of this circuit.

On the other hand, when the lowest qubit is in state $\mathinner{|{1}\rangle}$ , no Z-gates are applied, and the initial state of the remaining qubits is transformed by P_K and $P_K^{^{\prime}}$ . This corresponds to a controlled version of a parameterised circuit, namely $P_j P^{^{\prime}}_j$ . Note that it is slightly different from applying a controlled P_j only, whose implementation remains an open question.

Let us now apply the Controlled version of P₁ on the state in equation (15). After re-arranging terms and applying P₁, it can be written as:

$\begin{equation} \sum_j^{N_\mathrm{s}}\mathinner{|{e_j}\rangle}P_1\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right). \end{equation} \tag{ 19 }$

Now, on applying the CU_Z unitary using the first qubit of the lower register as the control (applied when this qubit is in state $\mathinner{|{0}\rangle}$ ), the state becomes:

$\begin{equation} \mathinner{|{e_1}\rangle}P_1\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right) + \sum_{j\neq 1}^{N_\mathrm{s}}\mathinner{|{e_j}\rangle}U_Z P_1\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right). \end{equation} \tag{ 20 }$

Now, applying the flipped circuit $P^{^{^{\prime}}}_1$ transforms the state to:

$\begin{equation} \mathinner{|{e_1}\rangle}P^{^{^{\prime}}}_1 P_1\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right) + \sum_{j\neq 1}^{N_\mathrm{s}}\mathinner{|{e_j}\rangle}P^{^{^{\prime}}}_1U_Z P_1\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right). \end{equation} \tag{ 21 }$

Using $P^{^{^{\prime}}}_1U_Z P_1 = U_Z$ from equation (18), the state reduces to:

$\begin{equation} \mathinner{|{e_1}\rangle}P^{^{^{\prime}}}_1 P_1\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right) + \sum_{j\neq 1}^{N_\mathrm{s}}\mathinner{|{e_j}\rangle}U_Z\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right). \end{equation} \tag{ 22 }$

Finally, repeating the controlled application of U_Z leads to:

$\begin{equation} \mathinner{|{e_1}\rangle}P^{^{^{\prime}}}_1 P_1\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right) + \sum_{j\neq 1}^{N_\mathrm{s}}\mathinner{|{e_j}\rangle}\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right). \end{equation} \tag{ 23 }$

Applying K such controlled circuits, where the jth circuit on the upper register is controlled by jth qubit in the lower register, and re-arranging the terms in the last equation leads to the following state:

$\begin{equation} \sum_{j = 1}^KP^{^{^{\prime}}}_j P_j\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right)\mathinner{|{e_j}\rangle} + \sum_{j = K+1}^{N_\mathrm{s}}\left(\sum_i^{N_\mathrm{c}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\right)\mathinner{|{e_j}\rangle}. \end{equation} \tag{ 24 }$

We now want to compare the state obtained in equation (24) to the classical output of the FNO from equation (3), before the final IFT. We recall that in the classical case, each of the first K columns $\hat{a}^j$ was multiplied by an independent matrix W^j . In the quantum case, we need to understand if the same operation is applied, and with which matrix $W_Q^j$ .

For all $j \in [1,K]$ , we saw in equation (24) that $P^{^{\prime}}_jP_j$ was applied to the unary encoding of $\hat{a}^j$ . We denote respectively by $W_{P_j}$ and $W_{P^{^{\prime}}_j}$ the unary matrix of P_j and $P^{^{\prime}}_j$ . Each matrix is the $N_\mathrm{c}\times N_\mathrm{c}$ submatrix of the whole unitary of size $2^{N_\mathrm{c}}\times 2^{N_\mathrm{c}}$ , corresponding to the basis states of unary vectors (see section 3.3). Therefore, considering only the top register, the jth operation $P^{^{^{\prime}}}_jP_{j}$ corresponds to the sub-matrix $W_Q^{j}$ :

$\begin{equation} W^j_Q = W_{P_j^{\prime}}W_{P_j}. \end{equation} \tag{ 25 }$

This matrix $W^j_Q$ is the quantum implementation corresponding to the matrix W_j used in the implementation of classical FNO (equation (3)). The overall matrix $(\hat{a}_{ij})$ can be decomposed into $N_\mathrm{s}$ vectors $\hat{a}^j = (\hat{a}_{ij})_{i\in[1,N_\mathrm{c}]}$ . Then, for the first K vectors $\hat{a}^j \in \mathbb{R}^{N_\mathrm{c}}$ we will have $\hat{b}^j = W_Q^j \hat{a}^j$ , where this $\hat{b}^j$ is the quantum counterpart for one used in classical FNO. Thus, the state after these controlled parameterised circuits can be written as:

$\begin{equation} \sum_i^{N_\mathrm{c}}\left(\sum_j^K\hat{b}_{ij}\mathinner{|{e_i}\rangle}\mathinner{|{e_j}\rangle} + \sum_{j = K+1}^{N_\mathrm{s}}\hat{a}_{ij}\mathinner{|{e_i}\rangle}\mathinner{|{e_j}\rangle}\right). \end{equation} \tag{ 26 }$

Finally, the output state of this circuit after IQFT on the lower register becomes:

$\begin{equation} \sum_{i}^{N_\mathrm{c}}\mathinner{|{e_{i}}\rangle}\mathrm{IQFT}\left(\sum_{j}^{K} \hat{b}_{ij}\mathinner{|{e_{j}}\rangle} + \sum_{j = K+1}^{N_\mathrm{s}} \hat{a}_{ij}\mathinner{|{e_{j}}\rangle}\right). \end{equation} \tag{ 27 }$

Since $\mathrm{IQFT}(\sum_{i}\hat{x}_i\mathinner{|{e_{i}}\rangle})$ = $\sum_{i}x_i\mathinner{|{e_{i}}\rangle}$ , where $\mathrm{IFT}(\hat{x}) = x$ , this implies that jth component of IFT would be same as the coefficient of jth state in IQFT. From this, we can conclude that the state in equation (12) is equivalent to the state in equation (27) and thus, this circuit replicates the classical operation. Finally, let us now discuss the depth complexity of this circuit.

Depth complexity (d). Based on the discussion of the Sequential QFL circuit, it can be divided into four parts : (a) unary loading of the input matrix ( $d_\mathrm{load}$ ), (b) applying QFT on the lower register ( $d_\mathrm{qft}$ ), (c) applying K Controlled Parameterised Circuits on the upper register ( $d_\mathrm{cpc}$ ) and (d) applying inverse QFT on the lower register ( $d_\mathrm{iqft}$ ). Thus, the depth of the complete Sequential QFL circuit becomes:

$\begin{equation} \begin{aligned} = & d_\mathrm{load} + d_\mathrm{qft} + Kd_\mathrm{cpc} + d_\mathrm{iqft} \\ = & \left(N_\mathrm{c}\text{log}\left(N_\mathrm{s}\right)+\text{log}\left(N_\mathrm{c}\right) \right) + \text{log}\left(N_\mathrm{s}\right) + \\ &K\left(N_\mathrm{c}+2\text{log}\left(N_\mathrm{c}\right)\right) + \text{log}\left(N_\mathrm{s}\right) \\ = & \left(N_\mathrm{c}+2\right)\text{log}\left(N_\mathrm{s}\right) + \left(2K+1\right)\text{log}\left(N_\mathrm{c}\right) + KN_\mathrm{c} \\ \end{aligned}. \end{equation} \tag{ 28 }$

4.2. Parallelised circuit for FL

For the Sequential QFL discussed in the previous subsection, the depth complexity of the learnable part is linear in the number of modes (K). Given the multiplicative noise model for NISQ devices, this linear dependence might hinder learning. A helpful modification then can be parallelising the learnable butterfly circuits, which can make the learning in the presence of noise more efficient and reduce the circuit's depth complexity. Figure 9 shows this modified version of the Sequential QFL, consisting of K quantum circuits operating in parallel and each implementing only one learnable circuit controlled by one of the top K qubits in the lower register. As all the circuits up to the learnable part are similar to the sequential circuit, we can directly write the state after the QFT using equation (15), as:

$\begin{equation} \left[\sum_{j}^{N_\mathrm{s}}\sum_{i}^{N_\mathrm{c}}\left(\hat{a}_{ij}\right)_{k}\mathinner{|{e_{i}}\rangle}_{k}\mathinner{|{e_{j}}\rangle}_{k}\right]_{k = 1}^{K} \end{equation} \tag{ 29 }$

where the index k denotes the $k\mathrm{th}$ parallel circuit and $(\hat{a}_{ij})_k$ , $\mathinner{|{e_i}\rangle}_k$ denote the coefficient $\hat{a}_{ij}$ , state $\mathinner{|{e_i}\rangle}$ corresponding to this $k\mathrm{th}$ circuit respectively. Also, in the $k\mathrm{th}$ parallel circuit, the learnable butterfly part is controlled by the $k\mathrm{th}$ qubit of the lower register. We recall that the parameterised circuit applied on the top register is effectively mapping the vector $\hat{a}^j$ to $\hat{b}^j$ (see equation (27)) and thus, we can write the updated state of the circuits as:

$\begin{equation} \left[\sum_{j\neq k}\sum_{i}^{N_\mathrm{c}}\left(\hat{a}_{ij}\right)_{k}\mathinner{|{e_{i}}\rangle}_{k}\mathinner{|{e_{j}}\rangle}_{k}+\sum_{i}^{N_\mathrm{c}}\left(\hat{b}_{ij}\right)_k\mathinner{|{e_{i}}\rangle}_{k}\mathinner{|{e_{k}}\rangle}_{k}\right]_{k = 1}^{K}. \end{equation} \tag{ 30 }$

Now, applying IQFT on the lower register in each of the circuits independently:

$\begin{equation} \begin{aligned} &\left[ \sum_{i}^{N_\mathrm{c}}\mathinner{|{e_{i}}\rangle}_{k} \mathrm{IQFT} \left(\left(\hat{b}_{ik}\right)_{k}\mathinner{|{e_{k}}\rangle}_{k} + \sum_{j\neq k}\left(\hat{a}_{ij}\right)_{k}\mathinner{|{e_{j}}\rangle}_{k} \right) \right]_{k = 1}^{K}. \end{aligned} \end{equation} \tag{ 31 }$

We denote coefficients of this state corresponding to $k\mathrm{th}$ circuit by $(c_{ij})_k$ and thus re-writing it as:

$\begin{equation} \begin{aligned} \left[ \sum_{i}^{N_\mathrm{c}} \sum_{j}^{N_\mathrm{s}} \left(c_{ij}\right)_k\mathinner{|{e_{i}}\rangle}_{k}\mathinner{|{e_{j}}\rangle}_{k} \right]_{k = 1}^{K} \end{aligned} \end{equation} \tag{ 32 }$

where all of the $(c_{ij})_k$ are explicitly given by using the equation for the FT as follows:

$\begin{equation} \left(c_{ij}\right)_k = \frac{1}{N_\mathrm{s}}\left( \sum_{j\neq k} \left(\hat{a}_{ij}\right)_ke^{i \frac{2\pi j}{N_\mathrm{s}}} +\left(\hat{b}_{ik}\right)_ke^{i \frac{2\pi k}{N_\mathrm{s}}} \right). \end{equation} \tag{ 33 }$

Similarly, writing c_ij for the sequential circuit discussed in the paper (using equation (27)):

$\begin{equation} c_{ij} = \frac{1}{N_\mathrm{s}}\left( \sum_{j = K+1}^{N_\mathrm{s}}\hat{a}_{ij}e^{i \frac{2\pi j}{N_\mathrm{s}}} + \sum_{j}^K \hat{b}_{ij}e^{i \frac{2\pi j}{N_\mathrm{s}}}\right). \end{equation} \tag{ 34 }$

Comparing the above two equations leads to the observation that coefficients in equation (34) would not be a subset of coefficients in equation (33), and there is no closed-form classical processing/transformation to achieve this. Thus, this parallel circuit results in a somewhat different operation which might be intuitively similar to the sequential circuit, but the output is different. However, section 5 shows that this conceptually similar operation is also effective in dealing with PDEs/Images and is expected to be more efficient than the sequential circuit for a noisy scenario. Also, if we remove the IQFT operation from this circuit and instead apply the classical IFT, measuring after equation (30), we get the following K $N_\mathrm{c}\times N_\mathrm{s}$ matrices after applying the square root operation:

$\begin{equation} \left[\left(\hat{b}^{k}\right)_{k} , \left(\hat{a}^{j}\right)_{j\neq k}\right]_{k = 1}^K \end{equation} \tag{ 35 }$

where $\hat{b}^j$ and $\hat{a}^j$ have been defined previously. In case we combine $\hat{b}^{k}$ from all of the K matrices with $(\hat{a}^j)_{j\in[K+1, N_\mathrm{s}]}$ from any of the K matrices, suppose the first one, it leads to the following $N_\mathrm{c}\times N_\mathrm{s}$ matrix:

$\begin{equation} \left[\left(\hat{b}^{j}\right)_{j\in\left[1,K\right]} , \left(\hat{a}^{j}\right)_{j\in\left[K+1,N_\mathrm{s}\right]}\right] \end{equation} \tag{ 36 }$

which is exactly the same as equation (3). Thus, this modified circuit (without the IQFT), followed by some classical post-processing and IFT, can replicate the classical FL operation.

**Figure 9.** Parallel QFL: parallelised version of the Sequential Quantum Circuit to minimize the depth of the learning part, thus making it more efficient when deployed on noisy hardware. For each mode (out of the top K) in the transformed input, there is a different circuit to perform the parameterised matrix transform.
Download figure:
Standard image High-resolution image

Depth complexity (d). Given that the only difference compared to the Sequential QFL is the parallel implementation of the controlled parameterised circuits as against sequential, the depth complexity of this circuit can be derived by substituting K = 1 in equation (37):

$\begin{equation} \begin{aligned} & = d_\mathrm{load} + d_\mathrm{qft} + d_\mathrm{cpc} + d_\mathrm{iqft} \\ & = \left(N_\mathrm{c}+2\right)\text{log}\left(N_\mathrm{s}\right) + 3\text{log}\left(N_\mathrm{c}\right) + N_\mathrm{c} \\ \end{aligned} \end{equation} \tag{ 37 }$

and a total of K independent quantum circuits are required to execute this circuit.

4.3. Composite circuit for FL

As highlighted in the previous subsection, the depth of the parameterised part of the sequential circuit might make the learning process difficult on currently available noisy quantum hardware. Even though the Parallelised QFL can deal with this, its requirement of K independent $N_\mathrm{c}+N_\mathrm{s}$ qubit circuits might not be possible in many cases.

Therefore, we propose a new operation corresponding to the learnable part of the sequential circuit and term the resulting overall circuit as the Composite QFL. It significantly decreases the learnable part's depth complexity while requiring only one quantum circuit with ( $N_\mathrm{c}+N_\mathrm{s}$ ) qubits. Here, instead of applying the K-controlled parameterised circuits, we span a single, more extensive parameterised circuit over the upper register ( $N_\mathrm{s}$ qubits) and top K qubits in the lower register. Figure 10 shows the diagram for this circuit. Note that the upper and lower registers are unary independently, before the parametrized circuit.

**Figure 10.** Composite-QFL: Variant of the Sequential-QFNO where instead of controlled butterfly circuits, there is a Composite Butterfly Circuit spanning the upper register and top K qubits of the lower register.
Download figure:
Standard image High-resolution image

If we jointly consider the upper and lower registers, the states are in a superposition over the hamming weight two basis states. However, we will consider the upper register and just the top K qubits from the lower register, in that case, the states can be in a superposition over hamming weight one and two basis states. Given that the RBS gates are hamming weight preserving, the state after applying the parameterized circuit will also be a superposition of hamming weight one and two basis states.

Note that the input superposition cannot have all the possible basis states with hamming weights 1 and 2 for the top $N_\mathrm{c}+K$ qubits. For instance, it does not comprise unary states for which the 1 is in the lower K qubits. Similarly, for hamming weight 2, it does not comprise states with both the 1 s in top $N_\mathrm{c}$ or bottom K qubits. In contrast, the output superposition can have any of the hamming weight 1 or 2 states. Recall that the number of possible hamming weight 1 states for these $N_\mathrm{c}+K$ qubits is $N_\mathrm{c}+K$ and hamming weight 2 states is $N_\mathrm{c}+K \choose 2$ .

Let us now discuss the application of parameterised circuits on these $N_\mathrm{c}+K$ qubits. The complete unitary B will be a $2^{N_\mathrm{c}+K} \times 2^{N_\mathrm{c}+K}$ block diagonal matrix with each block corresponding to a subspace with fixed hamming weight [25], $B = B_1 \otimes B_2 \otimes{\ldots}\otimes B_n$ , where B_i correspond to the block diagonal unitary for subspace with hamming weight i. Since our input has hamming weight 1 or 2, we only care about unitaries B₁ and B₂. B₁ will be of size $(N_\mathrm{c}+K) \times (N_\mathrm{c}+K)$ and B₂ of ${N_\mathrm{c}+K \choose 2} \times {N_\mathrm{c}+K \choose 2}$ .

Given the circuit is similar to the sequential circuit till the QFT operation, the state of this circuit after QFT would be the same as the one in equation (15). We now separate this complete state into two sets of states corresponding to hamming weight 1 and 2:

$\begin{equation} \begin{aligned} \sum_{i}^{N_\mathrm{c}}\sum_{j}^{K} \hat{a}_{ij}\mathinner{|{e_{i}}\rangle}\mathinner{|{e_{j}}\rangle} + \sum_{i}^{N_\mathrm{c}}\sum_{j = K+1}^{N_\mathrm{s}} \hat{a}_{ij}\mathinner{|{e_{i}}\rangle}\mathinner{|{e_{j}}\rangle} \\ \end{aligned} \end{equation} \tag{ 38 }$

where the first term corresponds to the superposition of hamming weight two basis states $\mathinner{|{h_2}\rangle}$ and similarly the second term corresponds to the superposition of hamming weight one basis states $\mathinner{|{h_1}\rangle}$ . On application of the parameterised circuit, the unitary B₁ will act on $\mathinner{|{h_1}\rangle}$ and B₂ on $\mathinner{|{h_2}\rangle}$ .

Let us first focus on the term corresponding to $\mathinner{|{h_1}\rangle}$ . It does not contain the states where the qubits in the upper register are all 0 and the 1 lies in the top K qubits of the lower register. It implies that the coefficients of all these states should be taken as zero. Therefore, the state corresponding to this $\mathinner{|{h_1}\rangle}$ can also be written as:

$\begin{equation} \mathinner{|{h_1}\rangle} = \sum_{i}^{N_\mathrm{c}}\sum_{j = K+1}^{N_\mathrm{s}} \hat{a}_{ij}\mathinner{|{e_{i}}\rangle}\mathinner{|{e_{j}}\rangle}+\sum_{i}^{K}\sum_{j = K+1}^{N_\mathrm{s}} 0\mathinner{|{e_{0}}\rangle}\mathinner{|{e_{ij}}\rangle} \end{equation} \tag{ 39 }$

where $\mathinner{|{e_0}\rangle}$ denotes the state corresponding to no ones in the upper $N_\mathrm{c}$ register and $\mathinner{|{e_{ij}}\rangle}$ denotes the hamming weight 2 states for the lower register, where i and j denote the positions of 1. Similarly, if we consider the first term in equation (38), corresponding to $\mathinner{|{h_2}\rangle}$ , we further have to include states where both ones are in the upper register or both ones in the top K of the lower register. These new states again would have zero coefficients. As a result, we can write the term corresponding to $\mathinner{|{h_2}\rangle}$ in equation (38) as:

$\begin{equation} \begin{aligned} \mathinner{|{h_2}\rangle} = \sum_{i}^{N_\mathrm{c}}\sum_{j>i}^{N_\mathrm{c}} 0\mathinner{|{e_{ij}}\rangle}\mathinner{|{e_{0}}\rangle} + \sum_{i}^{N_\mathrm{c}}\sum_{j}^{K} \hat{a}_{ij}\mathinner{|{e_{i}}\rangle}\mathinner{|{e_{j}}\rangle} +\sum_{i}^{{K}}\sum_{j>i}^{K} 0\mathinner{|{e_{0}}\rangle}\mathinner{|{e_{ij}}\rangle} \end{aligned}. \end{equation} \tag{ 40 }$

This results in a total of ${N_\mathrm{c}+K \choose 2}$ states.

Let us now discuss the application of parameterised circuit (B₁, B₂) on the $\mathinner{|{h_1}\rangle}$ and $\mathinner{|{h_2}\rangle}$ states in equations (39) and (40) respectively. For the hamming weight 1 basis, the application of parameterised circuit (B₁) is already discussed in section 4.1. For notational consistency, we denote this operation as a multiplication with matrix $W^{1}\in\mathbb{R}^{(N_\mathrm{c}+K)\times(N_\mathrm{c}+K)}$ . It results in the transformed coefficients $\hat{b}_{ij}$ :

$\begin{equation} \begin{aligned} \hat{b}_{ij} = \sum_{t}^{N_\mathrm{c}}\left(W^1_{it}\hat{a}_{tj}\right) + \sum_{t = N_\mathrm{c}+1}^{N_\mathrm{c}+K}\left(W^1_{it}\times0\right) \; i \in \left[1,N_\mathrm{c}+K\right] \; j \in \left[K+1,N_\mathrm{s}\right]. \end{aligned} \end{equation} \tag{ 41 }$

Furthermore, we also apply a post-select operation to preserve the basis, selecting only the states with a non-zero coefficient before applying the B₁.

On similar lines as hamming weight 1 basis, for the hamming weight 2 case, the application of parameterised circuit (B₂) can be interpreted as multiplication by the matrix $W^2\in\mathbb{R}^{q\times q}$ where $q = {N_\mathrm{c}+K\choose2}$ . Based on a recent work on subspace states [25], if the parameterised circuit has a nearest neighbour connectivity, then the matrix B₂ is the compound order 2 matrix [27] of B₁. Therefore, for this case, each of its elements will correspond to the determinant of a $2\times2$ submatrix of W¹:

$\begin{equation} W^2_{i,j} = W^1_{a,b}W^1_{a+k,b+k} - W^1_{a+k,b}W^1_{a,b+k} \end{equation} \tag{ 42 }$

for some $a,b,k\lt N_\mathrm{c}+K$ . For the butterfly-shaped circuit, we are not limited to nearest neighbour connectivity and thus, W² has to be extracted from the complete unitary ( $2^{N_\mathrm{c}+K} \times 2^{N_\mathrm{c}+K}$ ) only. After applying this unitary B₂ on $\mathinner{|{h_2}\rangle}$ , their transformed coefficients c_ij are:

$\begin{equation} \begin{aligned} c_{ij} = \sum_{t}^{N_\mathrm{c}\choose2}\left(W^2_{it}\times0\right) + \sum_{t}^{N_\mathrm{c}K}\left(W^2_{i,t+{N_\mathrm{c}\choose2}}\times a\,^f_{tj}\right) + \sum_{t = N_\mathrm{c}K+{N_\mathrm{c}\choose2}}^{N_\mathrm{c}+K\choose2}\left(W^2_{it}\times0\right). \end{aligned} \end{equation} \tag{ 43 }$

Similar to the case of hamming weight 1, here also we use a post-select operation to discard the states which initially had coefficients zero thereby preserving the basis.

Combining the transformed $\mathinner{|{h_1}\rangle}$ , $\mathinner{|{h_2}\rangle}$ states and applying IQFT on the lower register, the final output state of this circuit is:

$\begin{equation} \sum_{i}^{N_\mathrm{c}}\mathinner{|{e_i}\rangle}\mathrm{IQFT}\left(\sum_{j}^{K}c_{ij}\mathinner{|{e_j}\rangle} + \sum_{j = K+1}^{N_\mathrm{s}}b_{ij}\mathinner{|{e_j}\rangle}\right). \end{equation} \tag{ 44 }$

We term the overall circuit, when parameterised circuit has nearest neighbour connectivity (pyramid-shaped), as the Composite Circuit (Compound) having depth complexity $(N_\mathrm{c}+K)+\text{log}$ $(N_\mathrm{c})+(N_\mathrm{c}+2)\text{log}(N_\mathrm{s})$ and for butterfly shaped as the Composite Circuit (Butterfly) having depth complexity $\text{log}(N_\mathrm{c}+K)+\text{log}(N_\mathrm{c})+(N_\mathrm{c}+2)\text{log}(N_\mathrm{s})$ .

4.4. Learning and expressivity

Classical vs. Quantum. The quantum learning models here have their learnable part as orthogonal matrices, as they are built up from parameterized quantum gates. This indeed changes the learning process and the expressivity, in theory. Since in the classical operations any matrix can be used, the quantum case will be less expressive. In practice, this constrain on degrees of freedom can have both advantages and disadvantages. It has been shown that imposing orthogonality can result in a more uniform spectrum of weights and a more stable training due to a smaller and better structured search space [21]. The quantum models inheriting natural orthogonality can provide these advantages. Also, since they can match the performance of classical methods with fewer parameters due to structural constraints/assumptions, they can also be a better alternative in overfitting cases. A contrasting effect of these constraints can be limited expressivity of the proposed quantum methods as compared to the classical counterparts. First, note that any matrix can be embedded into an orthogonal one of double the size, so we do not consider this constraint very important. Moreover, while the pyramid circuit can cover all possible classical orthogonal matrices [28], the butterfly circuit can further limit this expressivity since it has fewer parameters. It is an interesting future direction to explore the set of matrices that can be simulated by these butterfly circuits.

In classical orthogonal deep learning, training the weight matrices while keeping them orthogonal was a major difficulty [21]. However after the introduction of quantum-inspired orthogonal matrices such as in [20, 28] the classical orthognal training has also become efficient: $O(n^2)$ complexity, which corresponds to the number of tunable parameters. Quantumly, with a pyramid circuit the number of gates is also $O(n^2)$ , with a depth of O(n), depending on the hardware connectivity and on how the classical control of those parametrized is applied. For a butterfly circuit, the depth is $O(\log n)$ . Finally, for the Composite quantum FNO, computation is performed on a higher hamming weight sub-basis of the Hilbert space, therefore the gap between classical and quantum complexity increases, opening a path for larger separations [25].

Sequential vs. Parallel vs. Composite. For noiseless cases, like usual supervised learning classical simulation, we believe that the Parallel Circuit should be at par with the Sequential Circuit. However, on a quantum hardware having noise, in a complicated task where deeper networks are required, we believe the Parallel one can significantly outperform the Sequential one. For the Compound Circuit, we believe that if it is tuned properly, it can perform at par with the Sequential one due to its higher expressivity. However, for highly complex tasks, the Compound Circuit's training might be more difficult and computationally expensive where it might suffer in performance.

In terms of expressivity, we showed that Sequential and Parallel circuits are performing almost similar operations and require the same number of parameters as well. They would be exactly the same if the Parallel circuit measurement was done before the final IQFT, and a classical IFT was applied in the right way. That being said, it is hard to know which one is more expressive than the other at this point of the analysis. For the Composite circuit, we believe it can be more expressive since we do not limit the interaction between the qubits as in the first two circuits.

5. Experiments

This section analyses our proposed quantum algorithms for solving PDEs and image classification tasks. We compare them against the state-of-the-art in both the domains, i.e. classical Fourier networks (for PDEs) and CNNs (for image classification), for both tasks. All the details related to architecture and hyperparameters are provided at the end of the paper. All the experiments shown in this section are simulated, i.e. the quantum operations have been simulated using classical matrices corresponding to quantum unitaries, since the currently available quantum hardware is too noisy for circuits of such size. However, we expect the upcoming generations of quantum hardware to support experiments of such scale. Also, the Butterfly version of the Composite Circuit, i.e. Composite Circuit (Butterfly), has been used for all the experiments.

5.1. PDEs

We show results on all the three PDEs used in the classical FL paper [6]: Burgers' equation, Darcy's Flow equation and Navier–Stokes equation using the datasets proposed in that paper. All of these equations were designed for modelling the flow of fluids and have found their applications in other domains as well. We abstractly describe the three tasks. For equations and other details, please refer to [6]. We analyse the performance of the trained networks across different resolutions (N_s ) for the first two and for different viscosity values for the third.

Burgers' equation. It is a 1D-PDE for modelling fluid motion and is expressed as follows:

$\begin{equation} \begin{aligned} \partial_t u(x,t) + \partial_x (u^2(x,t)/2) & = \nu\partial_{xx}u(x,t) & t\in(0,1] \\ \text{where} \quad u(x,0) & = u_0(x) \quad\text{and} & x\in(0,1) \end{aligned} \end{equation} \tag{ 45 }$

with $\nu\in\mathbb{R}_+$ corresponding to the fluid viscosity and u₀ denoting the initial condition function for this PDE family. We need to learn the mapping from this u₀ to the function at time one $u(x,1)$ , for a given viscosity.

Figure 11(left) shows the comparison of relative error in estimating this mapping among the classical FL, classical CNNs and proposed quantum circuits for the FLs, across different resolutions. The quantum circuits perform comparably to the classical FL and are much better than classical CNNs.

Darcy's Flow equation. In this case, it is a 2D PDE with the following equation:

$\begin{equation} \begin{aligned} -\nabla\cdot\left(a\left(x\right)\nabla u\left(x\right)\right) & = f\left(x\right) \qquad x\in\left(0,1\right)^2\\ u\left(x\right) & = 0 \qquad \quad x\in \partial\left(0,1\right)^2 \end{aligned} \end{equation} \tag{ 46 }$

where a(x) is the diffusion coefficient, f(x) is the forcing function and u(x) is the solution function. The aim here is to learn the mapping $a\mapsto u$ given the forcing function f(x). All of them are functions of positional coordinates only. Figure 11(right) shows the relative error for the 2D-version of all the methods in solving this PDE, across different resolutions. Here also the three quantum circuits and the classical FL show similar performance, consistent across resolutions, and the CNNs show much worse results, with their error increasing with resolution.

Navier–Stokes equation. We now consider the 2D Navier–Stokes equation which is as follows:

$\begin{equation} \begin{aligned} \partial_t w(x&,t) + u(x,t)\cdot\nabla w(x,t) = \nu\delta w(x,t) + f(x), \\ &\nabla\cdot u(x,t) = 0 \qquad w(x,0) = w_0(x)\\ &\quad \quad x \in (0,1)^2 \qquad t\in(0,T] \end{aligned} \end{equation} \tag{ 47 }$

where w corresponds to vorticity, w₀ being the initial vorticity, ν is the viscosity, u is a velocity field and f(x) is some sort of a forcing function. The aim here is to model the fluid vorticity up to instant $T(\gt\!\!10)$ given the vorticity up to time 10. Figure 12(a) shows the performance comparison for this equation between our proposed circuits and classical methods. It shows the convergence comparison for this family with viscosity ν fixed to $\ 1\times 10^{-3}$ for all the methods. Here, again, it can be observed that all the proposed circuits and the classical Fourier method perform significantly better than CNNs. Also, from table 2, the sequential circuit performs similarly to classical method and the others converge at a slightly higher error.

**Figure 12.** Left: convergence comparison for the Navier–Stokes equation with $\ 1\times 10^{-3}$ , trained for 500 epochs. Right: performance comparison of the CNNS, classical Fourier layer and the proposed quantum circuits on the MNIST dataset. It can be observed that all of them perform quite similarly, classical CNNs being the best.
Download figure:
Standard image High-resolution image

**Figure 12.** Left: convergence comparison for the Navier–Stokes equation with $\ 1\times 10^{-3}$ , trained for 500 epochs. Right: performance comparison of the CNNS, classical Fourier layer and the proposed quantum circuits on the MNIST dataset. It can be observed that all of them perform quite similarly, classical CNNs being the best.
Download figure:
Standard image High-resolution image

Table 2. Comparison of parameters required by one layer of the proposed circuits and the existing classical Fourier Layer along with error analysis for different ν and T values for the 2D case of a Navier–Stokes equation.

Method	Classical FNO	Sequential QFNO	Parallelised QFNO	Composite QFNO
Parameters	294 912	23 040	23 040	6144
$\nu = 1\times 10^{-3}$ ; T = 50	0.0139	0.0148	0.0167	0.0186
$\nu = 1\times 10^{-4}$ ; T = 30	0.1603	0.1618	0.1633	0.1660
$\nu = 1\times 10^{-5}$ ; T = 20	0.1601	0.1615	0.1626	0.1638

5.2. Image classification

We further compare our proposed Quantum algorithms on the downstream image classification tasks on benchmark datasets including the MNIST, FashionMNIST [30] and PneumoniaMNIST [29] datasets. The MNIST dataset consists of grayscale images corresponding to digits from 0 to 9 (both included), having a resolution of $28\times 28$ . The task is to predict the digit for a given input image. Similarly, for FashionMNIST also there is a 10-way classification task into various categories of clothing, using $28\times 28$ grayscale images. On the other hand, PneumoniaMNIST involves a binary classification task into positive or negative for a given grayscale image.

For this comparison, we have used the 2D versions of all the architectures–our quantum FNO, the classical FNO and the CNNs. Figures 12(b) and 13 show the epochs (x-axis) v/s accuracy (y-axis) plot for this evaluation. It can be observed that our proposed algorithms outperform the classical FNO and converge in close proximity (w.r.t. accuracy) to the classical CNNs, with Sequential QFL being almost comparable to the classical CNNs. This shows that the proposed QFLs are comparable in performance to the state-of-art in the vision domain as well along with solving PDEs, thereby broadening the scope of their applicability.

**Figure 13.** Left: performance comparison of the CNNS, classical Fourier layer and the proposed quantum circuits on the Pneumonia-MNIST [29] dataset. The performance of CNNs is somewhat noisy here whereas it is smoother in the case of the sequential circuit, both converging to a similar value. The composite quantum circuit and the classical Fourier baseline are also quite close to the CNNs in convergence. Right: same comparison on the FashionMNIST [30] data. Here, a significant difference in the performance is observed with CNNs being the best followed by the Sequential circuit.
Download figure:
Standard image High-resolution image

6. Conclusion

We proposed a quantum algorithm to carry out the recently proposed classical FNO on quantum hardware. We further proposed two more quantum algorithms, which perform a different operation than the classical one and can be much more efficiently deployed on noisy quantum hardware. Experimental results further confirm that the proposed quantum neural networks perform efficiently in both solving PDEs and image classification. The sequential network quite matches the best-performing classical algorithm (which is the CNNs for images and the classical FL for PDEs) on both tasks.

An interesting future direction can be further developing the learning process of the composite network so that it can perform better than the sequential network while at the same time being more efficient to deploy. The composite network intuitively performs a kind of attention mechanism (as done in [20]) that learns how to mix the mappings performed on each of the top K modes and understanding how this mixing works can provide new ideas for improving it.

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: https://medmnist.com/.

Quantum Fourier networks for solving parametric PDEs

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. Fourier neural network

1.2. Quantum algorithmic proposals

1.3. Contributions

2. Classical FNO

3. Quantum algorithmic tools

3.1. Data encoding in the unary basis

3.2. Unary QFT

3.3. Quantum circuits as trainable linear transforms

4. Quantum circuits for FL

4.1. Sequential circuit for FL

4.2. Parallelised circuit for FL

4.3. Composite circuit for FL

4.4. Learning and expressivity

5. Experiments

5.1. PDEs

5.2. Image classification

6. Conclusion

Data availability statement

Quantum Fourier networks for solving parametric PDEs

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. Fourier neural network

1.2. Quantum algorithmic proposals

1.3. Contributions

2. Classical FNO

3. Quantum algorithmic tools

3.1. Data encoding in the unary basis

3.2. Unary QFT

3.3. Quantum circuits as trainable linear transforms

4. Quantum circuits for FL

4.1. Sequential circuit for FL

4.2. Parallelised circuit for FL

4.3. Composite circuit for FL

4.4. Learning and expressivity

5. Experiments

5.1. PDEs

5.2. Image classification

6. Conclusion

Data availability statement