QTN-VQC: An End-to-End Learning framework for Quantum Neural Networks

The advent of noisy intermediate-scale quantum (NISQ) computers raises a crucial challenge to design quantum neural networks for fully quantum learning tasks. To bridge the gap, this work proposes an end-to-end learning framework named QTN-VQC, by introducing a trainable quantum tensor network (QTN) for quantum embedding on a variational quantum circuit (VQC). The architecture of QTN is composed of a parametric tensor-train network for feature extraction and a tensor product encoding for quantum embedding. We highlight the QTN for quantum embedding in terms of two perspectives: (1) we theoretically characterize QTN by analyzing its representation power of input features; (2) QTN enables an end-to-end parametric model pipeline, namely QTN-VQC, from the generation of quantum embedding to the output measurement. Our experiments on the MNIST dataset demonstrate the advantages of QTN for quantum embedding over other quantum embedding approaches.


Introduction
The state-of-the-art machine learning (ML), particularly based on deep neural networks (DNN), has enabled a wide spectrum of successful applications ranging from the everyday deployment of speech recognition [1] and computer vision [2] through to the frontier of scientific research in synthetic biology [8].Despite rapid theoretical and empirical progress in DNN based regression and classification [9], DNN training algorithms are computationally expensive for many new scientific applications, such as new drug discovery [10], which requires computational resources that are beyond the computational limits of classical hardwares [11].Fortunately, the imminent advent of quantum computing devices opens up new possibilities of exploiting quantum machine learning (QML) [12,13,14,15,16,17] to improve the computational efficiency of ML algorithms in the new scientific domains.
Although the exploitation of quantum computing devices to carry out QML is still in its initial exploratory stages, the rapid development in quantum hardware has motivated advances in quantum neural networks (QNN) to run in noisy intermediate-scale quantum (NISQ) devices [18,19,20,21].A NISQ device means that not enough qubits could be spared for quantum error correction, and the imperfect qubits have to be directly used at the physical layer.Even though, a compromised QNN approach is proposed by employing hybrid quantum-classical models that rely on the optimization of variational quantum circuits (VQC) [22,23].The resilience of the VQC based models to certain types of quantum noise errors and high flexibility concerning coherence time and gate requirements [24] admit many practical implementations of QNN on NISQ devices [25,26,27,28,29,30,31,32]. One notable limitation in the current QNN training pipeline is that the quantum embedding is not fully realizable in a quantum computer, which may impede the learning of the QNN.Hence, this work proposes QTN-VQC to enable an end-to-end trainable QNN, including data embedding to quantum measurements, that are easily realizable in quantum devices, where QTN stands for the quantum tensor network [33,34,12,35] for generating quantum embedding.
As shown in Figure 1, our QNN builds a unitary linear operator that consists of three main components: (1) quantum embedding generation; (2) variational quantum circuit; (3) measurement.Quantum embedding generation, also known as quantum encoding, applies a fixed unitary linear operator H x transforming classical vectors x to quantum states  in a Hilbert space.This step is an important aspect of designing quantum algorithms that directly impact the entire computation cost of VQC and owns a characteristic of quantum superposition.Moreover, the VQC comprises two types of quantum gates: (1) Controlled-NOT (CNOT) gates; (2) learnable parametric quantum gates.The CNOT gates ensure the property of quantum entanglement through mutually connecting the qubits, and the parametric quantum gates can be adjustable to best fit the quantum input states.The model parameters of VQC should be optimized by employing variants of gradient descent algorithms during the training process.Those parametric quantum gates of VQC are similar to the weights assigned to DNN, and such quantum circuits have been justified to be resilient to quantum noises [36,21,37].Besides, the measurement M(|g θ (x) ) aims at projecting the quantum output states |g θ (x) to one classical output z i .
This work focuses on quantum embedding generation because it is quite related to the practical usage in machine learning applications in terms of computational cost and representation capability of classical input features.In particular, we design a novel quantum tensor network (QTN) for quantum embedding generation.More specifically, the QTN consists of a tensor-train network (TTN) for dimension reduction and a quantum tensor encoding framework for outputting quantum embeddings.The dimension reduction is a necessary procedure before the quantum encoding because only a small number of qubits could be supported on available NISQ computers at this moment.A typical approach for dimension reduction relies on a classical fully-connected layer, also known as a dense layer, to convert high-dimensional input vectors y into low-dimensional ones x.However, since a dense layer cannot be physically mapped on a quantum computer, much overhead has to be incurred by frequently communicating between classical and quantum devices during the end-to-end training pipeline.
As shown in Figure 2 (b), one of our contribution is to leverage a tensor train network (TTN) to replace the dense layer in Figure 2 (a).The benefits of applying TTN arise from two aspects: (1) TTN can maintain the representation power of the dense layer, which will be justified in our theorems; (2) TTN is a tensor network and can be flexibly placed in quantum computers, which enables an end-to-end training process fully conducted in a quantum computer.Moreover, in this work, a tensor product encoding (TPE) is delicately designed for generating quantum embedding, which builds the relationship between a classical vector x and the corresponding quantum state |x ; Besides, we further investigate the representation of QTN-VQC in terms of model size and non-linear activation function used in TTN.We denote a QTN as the combination of TTN and TPE and utilize QTN-VQC as a genuine end-to-end learning framework for QNN.

Related Work
The work [14,12,38] demonstrate that VQC shows great promise in surpassing the performance of classical ML.Prominent examples of VQC based models include quantum approximate optimization algorithm (QAOA) [36], and quantum circuit learning (QCL) [23].Various architectures and geometries of VQC have been shown in tasks ranging from image classification [39,40,41] to reinforcement learning [25].
As for quantum embedding, basis encoding is the process of associating classical input data in the form of binary strings with the computational basis state of a quantum system [42].Similarly, amplitude encoding is a technique involving encoding data into the amplitudes of a quantum state [43].Unfortunately, the computational cost of both quantum embedding and amplitude encoding becomes exponentially expensive with the increasing number of qubits [15].
A new technique of angle embedding makes use of the quantum gates to generate quantum states [44], but it cannot deal with the high-dimensional feature inputs.Therefore, this work exploits the use of TTN for dimension reduction followed by a TPE for generating quantum embedding.
In particular, this work employs the TTN for dimensionality reduction.The TTN model based on TT decomposition in neural networks was first proposed in [45], and it could be flexibly extended the convolutional neural network (CNN) [46] and recurrent neural network (RNN) [47].The empirical study of TTN on machine learning tasks shows that TTN is capable of maintaining the DNN baseline results [48,49,50,51].However, to our best knowledge, no existing works have applied TTN to QML.Besides, since the tensor network-based machine learning model like TTN is closely related to quantum machine learning in terms of their model structures [5,6], the QTN-VQC model can be directly regarded as the classical simulation of the corresponding quantum machine learning.In addition to a classical dense layer, more complicated architectures like AlexNet [3] could be used for dimension reduction, and we also compare the performance between TTN and AlexNet-based models.

Notations
We denote R I as a I-dimensional real coordinate space, and R I1×I2×•••I K refers to a space of K-order tensors.The symbol The quantum gate R Y (θ) means a Pauli-Y gate with a unitary operator as defined in Eq. ( 1), which implies a qubit rotates the Bloch sphere along the Y-axis by a given angle θ.
Moreover, the operator ⊗ is a tensor product.Given the vectors v i ∈ R I , the tensor product of I vectors is defined as Similarly, the symbol |0 ⊗S means a tensor product of S quantum states of |0 .Furthermore, for a scalar v, the quantum state |v can be written as: 4 QTN-VQC: Our Proposed End-to-End Learning Framework This section introduces our proposed end-to-end learning framework, namely QTN-VQC in this work.As shown in Figure 3, the QTN model includes two components (a) TTN and (b) TPE, which will be separately introduced in Section 4.1 and Section 4.2.Moreover, Figure 4 illustrates the framework of VQC and Section 4.3 is devoted to discussing the details of VQC.
Tensor-Train Network: given a set of TT-ranks , a circle represents a core tensor and each line is related to a dimension.
Figure 3: A demonstration of quantum tensor network for quantum embedding.

Tensor Train Network for Dimension Reduction
We leverage TTN [52] for the dimension reduction of input features.TTN relies on the TT decomposition [45] and has been commonly employed in machine learning tasks like speech processing [53] and computer vision [50].The TT decomposition assumes that given a set of TT-ranks {r 0 , r 1 , ..., r K }, a K-order tensor where ) is a scalar value.TTN employs the TT decomposition in a dense layer and is explicitly demonstrated in Figure 3 (a).In more detail, for an input tensor where A non-linear activation function, e.g., Sigmoid, Tanh, and ReLU, is imposed upon the tensor Y. Compared with a dense layer with K k=1 I k J k parameters, a TTN owns as few as When a TTN is utilized for the dimension reduction, the high-dimensional input vector x ∈ R I is first reshaped into a tensor X ∈ R I1×I2×•••×I K , and then we can represent X as a TT format that goes through TTN.The outputs of TTN can be converted back to a tensor Y ∈ R J1×J2×•••×J K , which is further reshaped to a lower dimensional vector y ∈ R J .Here, we define K k=1 I k = I and K k=1 J k = J.Moreover, the computational complexities of TTN and the related dense layer are in the same scale, which is discussed in [50].
Eq. ( 4) suggests that TTN is a multi-dimensional extension of a dense layer, where the trainable weight matrix of a dense layer is changed to the learnable core tensors.Additionally, many empirical studies demonstrate that a TTN is capable of maintaining the baseline results of the dense layer [53,50,52,48].More significantly, since TTN can be flexibly mapped into a quantum circuit, the quantumness inherent in TTN brings great advantages over other architectures like the dense layer.In other words, although TTN is treated classically, it is possible to substitute equivalent quantum circuits for TTN when more qubits become available [27], which implies that QTN-VQC stands for a genuine end-to-end QNN learning architecture on a quantum computer.Furthermore, since the gradient exploding and diminishing problems are serious issues in the TTN training.To avoid those training problems, we only consider 3-order core tensors and small TT-ranks to configure a simple TTN in our experimental simulations.Our theoretical analysis of QTN-VQC based on Theorem 3 in Section 5 suggests that the representation power is not related to TT-ranks and the tensor order K, thus small TT-ranks and the tensor order K are preferred.In particular, a lower K can significantly reduce the computational cost and speed up the convergence rate.

Tensor Product Encoding
In this subsection, we first introduce Theorem 1, and then we derive our TPE associated with the circuits in Figure 3  (b).
Theorem 1.Given the classical vector x = [x 1 , x 2 , ..., x I ] T ∈ R I , a TPE as shown in Figure 3 (b) can result in a quantum state |x with the following complete vector representation as: Proof.Since each element x i in the vector x can be written as |x i = cos x i |0 + sin x i |1 , the quantum state |x can be written as: When the vector x goes through the quantum tensor network, which implies the following as: The preceding equation, in turn, implies that Eq. ( 5).
Theorem 1 builds a connection between the vector x and the quantum state |x , and the resulting |x is taken as the quantum embedding as the inputs to VQC.Since ⊗ I i=1 R Y (2x i ) is a reversely unitary linear operator, there is no information loss incurred during the stage of quantum encoding.Furthermore, if the input is multiplied with a constant π 2 , we obtain the following term as: which corresponds to Figure 3 (b).

The Framework of Variational Quantum Circuit
The framework of VQC is shown in Figure 4 (a), where 4 qubit wires are taken into account, and the CNOT gates aim at mutually entangling the channels such that |x 1 , |x 2 , |x 3 and |x 4 lie in the same entanglement state.The Pauli-X, Y, Z gates R X (•), R Y (•) and R Z (•) with learnable parameters (α 1 , β 1 , γ 1 ), (α 2 , β 2 , γ 2 ), (α 3 , β 3 , γ 3 ), (α 4 , β 4 , γ 4 ) are built to set up the learnable part.Being similar to the unitary operators of R Y (α), R X (β) and R Z (γ), which are defined in Figure 4 (b), are separately associated with the rotations along X-axis and Z-axis by the given angles of β and γ.Besides, the quantum circuits in the dash square can be repeatedly copied to compose a deeper architecture.The outputs of VQC are connected to the measurement which projects the quantum states into a certain quantum basis that becomes a classical scalar z i .
As for the end-to-end training paradigm for QTN-VQC, the learnable parameters come from the VQC and TTN models, and they should be updated by applying the back-propagation algorithm based on the Adam optimizer.Given D qubits and H depths, there are totally 3DH trainable parameters for VQC.Consequently, there are K k=1 r k−1 r k I k J k + 3DH parameters for QTN-VQC.On the other hand, the Dense-VQC model possesses more model parameters than QTN-VQC (  Tensor Product Encoder

Characterizing Representation Power of QTN-VQC
This section focuses on analyzing the representation power of QTN-VQC.As shown in Figure 5, given D qubits and a target quantum state |z = ⊗ D d=1 |z d , since H θ is known as a linear operator and T x is defined as a definite mapping from input x to the unitary matrix U x , the representation power of QTN-VQC is determined by how TTN can approximate the classical vector T −1 x (H −1 θ |z ).To understand the expressiveness of TTN, we first start with the discussion on the expressive capability of Dense-VQC (a dense layer is taken for dimension reduction) and then generalize it to QTN-VQC.Based on the universal approximation theorem [54,55] for a feed-forward neural network, we derive the following theorem as: Theorem 2. Given a target vector T −1 x (H −1 θ |z ), there exists a feed-forward neural network f dense with a dense layer connecting to D qubits, then where the activation function tanh(•) is imposed upon the dense layer, and C is a constant associated with the target vector Since TTN is a compact TT representation of a dense layer, by modifying Theorem 2 for TTN, we can also derive the upper bound on the approximation error as follows: Theorem 3. Given a target vector T −1 x (H −1 θ |z ), there exists a TTN, denoted as f T T N , with a TT layer connecting to D qubits, then where K k=1 D k = D, the Sigmoid activation function is imposed upon the TTN model, K denotes the multidimensional order, C is a constant associated with the target vector T −1 x (H −1 θ |z ).
Comparing the two upper bounds, it is observed that TTN can attain an identical upper bound as the dense layer on the approximation error because K k=1 D k = D.That implies that TTN can at least maintain the representation power of a dense layer.Besides, the number of qubits D is a key factor determining the upper bound on the approximation error.However, D is a small fixed number on a NISQ device, and a larger number of qubits D is expected to further improve the representation power of QTN-VQC.However, the computational costs of classical simulation may grow exponentially with the increasing number of qubits, and a small number of qubits have to be considered in practice.
6 Experiments and Results

Experimental setups
We assess our QTN-VQC based end-to-end learning system on the standard MNIST.MNIST is a dataset for the task of 10 digit classification, where there are 50000 and 10000 28 × 28 image data assigned for training and testing, respectively.The full MNIST dataset is challenging for quantum machine learning algorithms, and many works only consider 2-digit classification on the MNIST task [7,40].Moreover, the image data are separately reshaped into 784 dimensional input vectors.Dense-VQC and PCA-VQC are taken as our experimental baselines to compare with our QTN-VQC model.Dense-VQC denotes that a dense layer is used for dimension reduction, and PCA-VQC refers to using principal component analysis (PCA) to extract low-dimensional features before training the VQC parameters.
As for the experiments of QTN-VQC, the image data are reshaped into 3-order 7 × 16 × 7 tensors.We set small TT-ranks as {1, 2, 2, 1} to reduce the computational cost of TTN. the image data are represented as the TT format according to Eq. ( 3) before going through the TTN model.Since 8 qubits are used for the quantum encoding, the output of TTN needs to configure the tensor format as 2 × 2 × 2, which results in 8 dimensional output vectors.Besides, the model parameters of QTN-VQC are randomly initialized based on the Gaussian distribution, and the back-propagation algorithm is applied to train the models.The Sigmoid function is utilized for the hidden layers of TTN.
To be consistent with QTN-VQC, the weight of the dense layer for Dense-VQC is configured as the shape of 784 × 8.Although Dense-VQC is a hybrid classical-quantum model, the training process of Dense-VQC can also be set as an end-to-end pipeline and the weights of the dense layer are updated during the training stage.The Sigmoid function is used for the dense layer.On the other hand, PCA is employed to reduce the feature dimension to 8, and the resulting low-dimensional features are further encoded into quantum states.Consequently, PCA-VQC admits the VQC parameters solely to be updated during the training stage.A standard AlexNet [4] is employed to constitute an AlexNet-VQC to compare the performance.Moreover, 6 VQC layers are constructed to form a deep model, and the outputs of the VQC model are connected to 10 classes with a non-trainable matrix.The back-propagation algorithm based on the Adam optimizer with a learning rate of 0.001 is employed for the model training.The loss of cross-entropy (CE) is utilized as the objective function during the training stage, and it is also taken as the metric to evaluate the model performance.We leverage the tools of Pennylane [57] and PyTorch [58] to simulate the model performance.In particular, we separately simulate the model performance with noiseless quantum circuits and noisy quantum circuits corrupted by quantum noises from IBM quantum machines.

Experimental Results of Noiseless Quantum Circuit
Table 1 shows the final results of the models on the test dataset.QTN-VQC owns much fewer model parameters than Dense-VQC (328 vs. 6416) and attains even higher classification accuracy than Dense-VQC (91.43% vs. 88.54%) and lower loss values than Dense-VQC (0.3090 vs. 0.4132).However, PCA-VQC with 144 trainable VQC parameters attains the worst performance by all metrics, which implies that a trainable quantum embedding is of significance to boost experimental performance.Although our empirical results cannot reach the state-of-the-art classification performance of classical ML algorithms, our empirical results demonstrate the advantages of QTN-VQC over the PCA-VQC and Dense-VQC counterparts.With the development of more powerful quantum devices supporting more qubits, the representation power of QTN-VQC can be improved and better experimental results could be attained.Moreover, AlexNet-VQC achieves better results than QTN-VQC (92.81%vs.91.43%), but it involves more model parameters than QTN-VQC.

Experimental Results of Noisy Quantum Circuit
To empirically validate the effectiveness of our proposed VQC algorithm, we proceed with the simulation of the practical experiments with noisy quantum circuits.More specifically, we follow an established noisy circuit experiment with the NISQ device suggested by [25].One major advantage of the setups is to observe the robustness and preserve the quantum advantages of a deployed VQC with physical settings being close to quantum processing unit (QPU) experiments without an executive queuing time.As for the detailed setup, we first use an IBM Q 20-qubit machine to collect channel noise in the real scenario for a deployed VQC and upload the machine noise into our Pennylane-Qiskit simulator (denoted as Acc q20 .We provide a depolarizing noisy circuit simulation (denoted as Acc depo ) based on a depolarizing channel attained from [59] with a noise level of 0.1.As shown in Table 2, the quantum noise brings about the performance degradation of all models, but our proposed QTN-VQC consistently outperforms PCA-VQC and Dense-VQC in the condition of noisy quantum circuits.In particular, QTN-VQC can even outperform the AlexNet-VQC counterpart in noisy circuit conditions.The above experimental results show the advantages of QTN-VQC over Dense-VQC and PCA-VQC in the scenarios with noiseless and noisy quantum circuits.Next, we will further discuss the representation power of QTN-VQC based on two factors: (1) the activation function used in TTN; (2) the number of qubits.

The activation function used in TTN
Table 3 compares the results of QTN-VQC based on different activation functions.Our simulation on noiseless quantum circuits shows that the non-linear activation functions can bring more performance gain than a linear one, but the Sigmoid function attains a better performance than the Tanh and ReLU counterparts in our experiments.Our experiments also correspond to the universal approximation theory for QTN-VQC in Theorem 3. Finally, we investigate the effects of the number of qubits on the performance of QTN-VQC by increasing the qubits from 8 to 12 and 16.Accordingly, the output of TTN is configured as a tensor format of 2×3×2, and the model size is increased from 328 to 464 and 600, respectively.Our experiments show that the baseline performance of QTN-VQC can be further improved by increasing the number of qubits, which implies that more qubits are likely to possess higher accuracy.Proof.Theorem 2 is derived from the modification of the universal approximation theory proposed by [55,54].The universal approximation theory is shown in Lemma 1, which suggests that a feed-forward neural network with d neurons can approximate any continuous function with arbitrarily small .Lemma 1.Given a continuous target function f : R I → R, we can employ a 2-layer neural network with a non-linear activation f : R I → R, such that where J denotes the number of neurons, and C f is a constant associated with f .In particular, for r ≥ 1, C f satisfies the following condition as: where To associate Lemma 1 with our Theorem 2, the target function is replaced with the target vector H −1 X H −1 θ |z , then there is a neural network with a dense layer connected to D qubits such that where C is related to the target vector T −1 x (H −1 θ |z ).
A.2 Proof for Theorem 3 Proof.Assume that X = f T T N (y), X = T −1 x (H −1 θ |z ) and the TT decomposition of target vector is { X1 , X2 , ..., XK }, then we obtain On the other hand, we denote vec(Y k ) and vec(X k ) as the vectorization of the tensors Y k and X k , respectively.We also define K k=1 I k = I, W k ∈ R D k ×I k ×r k−1 ×r k as the TTN parameters, and also define W k ∈ R I k ×r k−1 r k D k as the matricization of W k .Moreover, σ refers to a non-linear activation function.
Since vec(X k ) = σ(W T vec(Y k )) that corresponds to a dense layer, we can obtain that In sum, we can further obtain where

B Appendix
This section includes additional experimental simulations.First, we assess the settings of TT-ranks, and then we compare the convergence rates of QTN-VQC and Dense-VQC in the experiments.

B.1 Experiments on TT-ranks for QTN-VQC
Table 5 corresponds to the experiments of QTN-VQC with 8 qubits and the Sigmoid function.The empirical results suggest that the larger TT-ranks cannot result in better results than the smaller ones.The main reason is that the TTranks can correspond to a manifold, and there may potentially exist an optimal manifold with smaller TT-ranks that corresponds to the best performance.

Figure 2 :
Figure 2: Different paradigms for quantum embedding.(a) a dense layer is used to generate low-dimensional vector x from a high-dimensional one y; (b) a TTN is used for dimension reduction.
and the symbols v ∈ R I and W ∈ R I×J represent a vector and a matrix, respectively.For the notations of quantum computing, ∀ v ∈ R I , the symbol |v denotes a quantum state associated with a 2 Idimensional vector in a Hilbert space.Particularly, |0 = [1 0] T and |1 = [0 1] T .

Figure 4 :
Figure 4: A framework of variational quantum circuit.

Figure 5 :
Figure 5: An illustration of analyzing the representation power of QTN-VQC.

Table 1 :
Empirical results on the MNIST test dataset under the noiseless quantum circuit setting.

Table 2 :
Empirical results on the MNIST test dataset under the noisy quantum circuit setting.

Table 3 :
Comparing performance of QTN-VQC with different activation functions.

Table 4 :
Comparing performance of QTN-VQC with more qubits.This work proposes a genuine end-to-end learning framework for quantum neural networks based on QTN-VQC.QTN consists of a TTN for dimension reduction and a TPE framework for generating quantum embedding.The TTN model is a compact representation of a dense layer to classically simulate quantum machine learning algorithms.Our theorem on the representation of QTN-VQC shows that the number of qubits is inversely related to the approximation error of QTN-VQC and the non-linear plays an important role.Our experiments compare our proposed QTN-VQC with Res-VQC, Dense-VQC, and PCA-VQC.Our simulated results demonstrate that QTN-VQC obtains better experimental performance than Dense-VQC and PCA-VQC with both noiseless and noisy quantum circuits, and it achieves marginally worse performance than AlexNet-VQC.Besides, our results justify our theorem on the representation power of QTN-VQC.A.1 Proof for Theorem 2