Interpreting variational quantum models with active paths in parameterized quantum circuits

Variational quantum machine learning (VQML) models based on parameterized quantum circuits (PQC) have been expected to offer a potential quantum advantage for machine learning (ML) applications. However, comparison between VQML models and their classical counterparts is hard due to the lack of interpretability of VQML models. In this study, we introduce a graphical approach to analyze the PQC and the corresponding operation of VQML models to deal with this problem. In particular, we utilize the Stokes representation of quantum states to treat VQML models as network models based on the corresponding representations of basic gates. From this approach, we suggest the notion of active paths in the networks and relate the expressivity of VQML models with it. We investigate the growth of active paths in VQML models and observe that the expressivity of VQML models can be significantly limited for certain cases. Then we construct classical models inspired by our graphical interpretation of VQML models and show that they can emulate or outperform the outputs of VQML models for these cases. Our result provides a new way to interpret the operation of VQML models and facilitates the interconnection between quantum and classical ML areas.

The VQML models are motivated by the non-linear feature map of input data into a quantum state and focus on finding an advantage by considering the quantum Hilbert space as a feature space [26,27].It has been conjectured that potential quantum advantage can be achieved if the model uses a quantum embedding circuit that cannot be simulated efficiently in classical computers.However, as the benchmark of VQML models is not available for practical ML problems with current NISQ hardware, it remains an open question [48].One of the main obstacles to this problem is the lack of interpretability [49] about VQML models.Like a classical algorithm processing quantum data is not well-defined [50], VQML models are not easily interpretable.A lot of freedom in the structure of the PQC that VQML model utilizes makes it difficult to analyze how the classical data is embedded into a quantum state and how the model processes the data for its inference.It was suggested that the output of VQML models with data re-uploading scheme [16] can be considered as a partial Fourier series with limited expressivity [17].This approach inspired the development of techniques to improve the expressivity of VQML models [18] or to approximate their outputs classically [51].Also, there were attempts to understand the decohered version of VQML models based on tensor network structure as classical probabilistic graphical models [52,53].However, the actual operation of VQML models without consideration of limiting situation remains obscure and the tool to evaluate the expressivity of VQML models has not been yet investigated much.
In this paper, we introduce a graphical approach to model the operation of the PQC and interpret VQML models as large network models.With this graphical modeling, we analyze the output of a VQML model as a sum of active classical paths whose weights are determined by the input data or model parameters.Then we suggest some classical models inspired by our graphical interpretation of VQML models and test their performance on various classical datasets.We show that these classical models well emulate VQML models and therefore can reproduce or outperform the outputs of VQML models.This gives a new way of constructing quantum-inspired classical ML models.
We also relate the active paths and the expressivity of VQML models.Specifically, we evaluate the growth of the number of active paths for various types of the PQC and show that the tendency is well aligned with our expectations.Our tool can be used to analyze the expressive power of VQML models and evaluate their capacity when the size of the models is increased.
We start from the fact that any arbitrary quantum algorithms can be represented by a universal gate set [54].Throughout this paper, we consider a set of single qubit rotation gates (R σx , R σy , R σz ) and a two-qubit gate (CNOT) as a basic gate set: where R σ j ( θ j ) = e −iθ j σ j /2 with j ∈ {x, y, z} and σ x , σ y , and σ z are Pauli matrices.In section 2, we offer a brief overview of ML tasks, classical ML models, and VQML models.In section 3, we suggest the graphical interpretation of VQML models.Then we analyze how the output of VQML models can be described.We also show how the capability of VQML models can be evaluated, with an increasing number of qubits and circuit depths.In section 4, we illustrate some classical methods that can approximate VQML models.We compare the performance of these classical models with VQML models for various datasets.Finally, we summarize our results and discuss the potential area of study that our approach can be utilized in section 5.

ML models
The classical ML models can be categorized into three broad applications: supervised, unsupervised, and reinforcement learning [55].In this paper, we focus on the first one: supervised ML.It aims to find a mapping between the input feature space X and target space Y.In general, the learning model is trained and tested based on the given dataset which consists of multiple pairs of input and target vectors {( Throughout this paper, we assume the input space as d-dimensional real vector space R d .We also consider problems with one-dimensional target space for simplicity, but it can be easily generalized to multi-dimensional cases.

Quantum embedding
QML models for supervised ML first require a process called quantum embedding that converts the data into a quantum state.For most cases, it is done by PQC [19] which has a fixed structure and adjustable gate parameters.By utilizing the data as gate parameters, we can obtain the embedded quantum state which is defined by the circuit structure.There are popular structures of PQC such as instantaneous quantum polynomial (IQP) [56] and hardware-efficient ansatz [57].
Quantum embedding is a critical part of VQML models since it affects the capability of VQML models to approximate a non-linear function significantly.For example, it has been shown that if we use Hamiltonian encoding [17,51] with data re-uploading method [16], the output of VQML model can be considered as a partial Fourier series.The embedding circuit consists of multiple layers of the same structure, and the Fourier domain is determined by the depth of the circuit and the form of Hamiltonian.While the dimension of the Fourier domain can be exponentially large [18], VQML models can only access a constrained region [17].

Variational quantum models 2.3.1. QK model
The concept of kernel is often used to overcome the limitation of linear models.The kernel function is defined as an inner product in higher-dimensional feature space [30]: where x, x ′ ∈ X are input vectors and ϕ (x) maps the data x into a new vector in the feature space with non-linear relation, which is typically called as a feature map.Then a linear model in this high-dimensional feature space can be utilized to learn the non-linear relation in the original space.Simple models like the Nadaraya-Watson model [55] make predictions using a weighted sum of the kernel function, but in general it is used with some regularization.Well-known kernel-based methods are the kernel ridge regression and kernel support vector machine (SVM) [58].
The QK uses the quantum Hilbert space as its feature space [26,27].We will consider a kernel function that gives a real value as below [32]: where |ψ (x)⟩ is the quantum state generated by the embedding circuit The inner product can be obtained with the Hadamard test [31], the swap test [32], or the quantum kernel estimation (QKE) method [27,30].After estimating the QK, it is passed to the classical model like SVM.Then the classical models are trained with the QK to yield predictions for test data [25,28,[30][31][32].

QNN
Unlike the QK model which only uses the embedding circuit, the QNN model requires additional PQC layers with trainable parameters.Similar to the classical neural network (NN) model, these QNN model parameters are trained to minimize the loss.The whole circuit structure can be repeated to enhance the model capacity [17,41].Although the QNN model is analogous to the classical NN model, they have distinct characteristics.For example, the non-linearity of the NN model comes from the activation function whose type is important for model expression.In contrast, the non-linearity of the QNN model comes from the embedding circuit.After embedding the circuit, the QNN model can be considered as a linear model in quantum Hilbert space and therefore the QK and QNN model have a close relation [59].
Figure 1 shows the schematics of the overall VQML model setup.Once the embedding circuit is determined, the QK model utilizes the embedding circuit as a component among several estimation methods.On the other hand, the QNN model attaches additional PQC layer U C (θ) to process the embedded quantum state.After the overall model structure is fixed, it is compiled into a circuit composed of basic gates.Based on the compiled structure, the model parameters and data are pre-processed to be used as input for compiled circuits.Note that all these processes are done on a classical computer.The models are also trained on a classical computer and therefore they are often called hybrid quantum-classical models [24]

Graphical interpretation of quantum circuits and models
In this section, we interpret the evolution of a quantum state by the action of basic gates as a large network model.Specifically, we employ the Stokes representation of a quantum state and derive the corresponding matrix representation of basic gates like in [60][61][62].Using these representations, we treat quantum states before and after the gate operation as the nodes in a network.Then the weights that connect these nodes are determined by the action of basic gates.With this approach, we show that the expressivity of VQML models is related to the active paths in the network.We also suggest and demonstrate a method to evaluate the growth in expressive power of the PQC structure along with the number of qubits and the number of PQC layers.

Quantum circuit in Stokes representation
While quantum embedding is an essential process for VQML models, transformation of classical data x ∈ R d into a quantum state in a complex Hilbert space |ψ (x)⟩ ∈ C 2 n cannot be described straightforwardly due to the large degree of freedom.For example, a 1-dimensional real-valued data point can be embedded into a single qubit state through its amplitude, phase, or a combination of both.To deal with this problem, we use A schematic of a VQML model setup.First, the quantum embedding circuit is constructed and used for either QK or QNN.The QK model further needs to select the estimation method, while the QNN model requires an additional PQC layer.The overall circuit structure can be repeated for the data re-uploading scheme.The entire circuit is compiled into a primitive circuit comprising a basic gate set.The data and QNN parameters are processed to be used for basic gate operations appropriately.All these works are done on a classical computer.Quantum computer only runs the compiled circuits with given data and parameters.This approach is often called a hybrid quantum-classical approach.
the Stokes parameters [60] to represent a quantum state of n-qubits, which can be obtained by the density matrix formalism: where σ 0 is an identity matrix and σ 1,2,3 are Pauli matrices The Stokes parameters v i1,...,in are real, and therefore a quantum state can be considered as a real-valued vector v ∈ R 4 n .With this expression, we are interested in the way the state v is transformed under the basic gate operations.Now we need to obtain the representation of basic gates in equation ( 1) acting on a vector in Euclidean space R 4 n .Suppose we have a quantum circuit with n-qubits and L basic gates, U i from i = 1 to L in order.Let's denote the intermediate state after U l as v (l) whose elements are defined as equation (3): where σ is a vector whose elements σ i = ⊗ n k=1 σ i k are Pauli strings and First consider a single qubit rotation gate U l = R k a (θ) = e −i θ 2 σa acting on the kth qubit with parameter θ and a ∈ {1, 2, 3}: 1) .
One can find the general expression of each element v i as: where v i ≡ v i1i2•••in and ϵ ijk is the Levy-Civita symbol.Equation ( 4) indicates that a single qubit rotation gate R k a (θ) makes connections between the two elements of v (l−1) with the same i q for all q except k such that i k ̸ = 0, a.Therefore, R k a (θ) can be represented as a 4 n × 4 n matrix W whose components belong to the set {0, 1, cos θ, sin θ, − sin θ}.Now consider the CNOT gate acting on control qubit q 1 and target qubit q 2 , U l = U q1q2 CNOT .We use the equality: ) where σ 1 = X, σ 2 = Y, and σ 3 = Z.The second equality holds since Z q1 X q2 , Z q1 , and X q2 all commute with each other.From equation ( 4), we can find that R q1 3 exchanges all the pairs of v elements with different index only at q 1 as: In the same way, R q2 1 exchanges: Two-qubit gate e −i π 4 Zq 1 Xq 2 changes the value of 4 pairs of v i 's.We list those pairs as below: Note that the additional (−1) factor can be multiplied, and these operations occur for all v i 's satisfying the conditions.See A for detailed calculations of these representations.The graphical representation of equations ( 4) and ( 5) can be found in appendix B. We also include examples of two-and three-qubit circuits with corresponding graphical representation in appendix C.

Active paths in variational quantum models
Although it has been suggested that VQML models could be considered as a partial Fourier series so that they are asymptotically universal, the full Fourier domain is not available for the models in practical setup.That is, the expressivity of VQML models is limited [17].The expressive power of VQML models depends on the structure of PQC they use [63], however, there has been a lack of explanation about how they are related.First we consider a general quantum circuit where L is the number of gates and each U (i) is a single qubit gate or CNOT gate in equation (1).θ is an arbitrary p-dimensional input parameter vector θ ∈ R p .The single qubit gates utilize one of the elements of θ as its unique input parameter, and therefore L ⩾ p. Assume we measure the expectation value of an observable M for the output quantum state v (L) : where each element of v M is the coefficient of M decomposed by a set of Pauli strings like in equation (3).Therefore, the expectation value of M is a weighted sum of the final nodes in the network that represents the quantum circuit.
For simplicity, assume that v M i = 1 for single i ∈ {0, 1, 2, 3} n where n is the number of qubits, and all the other elements are zero.Then we have ⟨M⟩ = v (L) i .Additionally, the quantum state was initially prepared as: ) .Therefore, the expectation value can be described as a sum of active paths that connect v (L) i and v (0) j for j ∈ {0, 3} n .We show an example of a 2-qubit quantum circuit with i 1 = 0 and i 2 = 3 in appendix D. Following this approach, it is straightforward that the final quantum state can be expressed as: where c i k ∈ {−1, 0, 1}, g 0 (θ) = 1, g 1 (θ) = sin θ, and g 2 (θ) = cos θ.The expression of equation ( 6) has the form of a partial Fourier series and the coefficients c i k defines the accessible Fourier region for the quantum circuit.It is determined by the active paths of v (L) i .Assume that the QNN model utilizes the expectation value of an observable for its inference.Then it is possible to evaluate the expressive power of the QNN model by analyzing its active paths like in appendix D.
While the QK model works in a different way than the QNN model, it can also be interpreted in terms of active paths.Consider the QK function in equation (2).Again, assume there are L basic gates in embedding circuit with L ⩾ d where d is the dimension of the data.For a given pair of data x and x ′ , the quantum circuit first evolves with U E (x) and then with U † E (x ′ ) starting from the initial state |0⟩.Then the kernel function is defined as the probability of going back to the initial state |0⟩ after the whole circuit.Accordingly, the value of the kernel function is sum of paths that go to arbitrary intermediate nodes v (L) j using the data x from v (0) i and then go back to the original node v (2L) i with the same path but different data x ′ where i ∈ {0, 3} n and j ∈ {0, 1, 2, 3} n .
We illustrate the interpretation of the QK model with active paths in figure 2. The colored paths represent the active paths, where the paths connected to a single intermediate state v (L) i from the initial or final state v (0) , v (2L) can be evaluated like in appendix D. The embedding circuit defines the amount of connection between the initial state v (0) i for i ∈ {0, 3} n and the intermediate state v (L) j for j ∈ {0, 1, 2, 3} n .The weights of these connections are determined by the data x.According to the form of the QK function in equation ( 2), the active paths should be symmetric and therefore go back to the nodes at v (2L) where they are started from.As the different data x ′ is used for this process, the weight of the path can be modified and therefore it is indicated with a different color.One can predict that the QK model would possess more expressive power if more intermediate nodes are connected to the initial nodes.

Expressivity of VQML models
VQML models typically use multiple layers of the same circuit whose structure affects the expressive power of the models.Therefore, one important question for VQML models is how much the expressivity of the models is enhanced as the number of layers or the number of qubits increases.As we related the number of  active paths with the expressive power of VQML models, we suggest that this problem can be investigated with the graphical approach.
First we assume that the circuit uses the hardware-efficient ansatz [57] for simple illustrations.Figure 3(a) shows the structure of the circuit which has multiple layers consisting of single qubit rotation gates followed by nearest neighbor CNOT entangling gates.We suppose that we consider the values of the nodes v 00•••03 and v 33•••03 .We track the number of active paths in this circuit, first with a fixed number of qubits and an increasing number of layers.The growth of the number of active paths can be found in figure 3(b) for this case.The red dashed line shows the theoretical limit of the number of active paths.While the number of active paths increases exponentially for the number of layers, it cannot reach the limit.That is, this type of ansatz can only use a limited portion of the full Fourier domain if it is used for VQML models.In figure 3(c) we plot the result with a fixed number of layers and an increasing number of qubits.While the number of parameters grows at the same rate as in figure 3(b), the increment of the number of active paths is much slower.Therefore it can be expected that if we can use a limited number of parameters, the deeper circuit with a smaller number of qubits would be more expressive.This result can be expected from the fact that the circuit only utilizes nearest-neighbor interactions.
IQP-type circuit shown in figure 4(a) has similar structure to hardware-efficient ansatz but utilizes only one type of single qubit gate.This additional restriction gives it a distinct feature in the growth of the number of active paths, which can be found in figures 4(b) and (c).The number of active paths grows fast as we increase the number of layers, and therefore tracking of the whole active paths becomes intractable for L > 4. On the other hand, the rate is much slower if the number of qubits are increased as we can find in We used an even number of qubits n.If n is not the power of 2, there exists a layer that passes an odd number of qubits.We applied the QCNN block except the last qubit in this case and used the qubit later, and therefore the depth of the circuit is determined as ⌈log 2 n⌉.Finally, we measured the first qubit to estimate the expectation value ⟨X1⟩ (green) and ⟨Z1⟩ (purple).(b) The growth of the number of active paths in a circuit consists of n qubits.figure 4(c).Unlike in figure 3, the number of active paths is even saturated for n ⩾ 4.This might come from the fact that the IQP-type circuit utilizes layers mostly consisting of diagonal R Z , R ZZ gates, while we selected random single qubit gates for each layer of hardware-efficient ansatz.
In figure 5, we plot the result for QCNN-type circuit.Unlike the hardware-efficient ansatz and the IQP-type ansatz, the number of active paths does not grow monotonically in this case.In addition, we can observe that the expressive power of the circuit shows strong correlation to the observable we measure.Figure 5(b), however, shows that the pattern of the increase in the number of active paths is similar for both observables.There are some regions where adding additional qubits does not increase, and may even decrease, the number of active paths.Since the number of layers is defined as ⌈log 2 n⌉, increasing the number of qubits can change the pattern of reduction by convolutional layers shown in figure 5(a).This characteristic of the QCNN-type circuit is reflected in the growth pattern of active paths.
Note that the rate of growth is exponential for the depth of the circuit, and therefore predicting the measurement outcome of a deep circuit would be intractable for classical computers.However, if VQML model holds limited expressivity like above, we can expect that classical models would be possible to approximate VQML model well.Then the question is how can we construct such kind of classical models, and we will deal with this problem in the following section.
While we interpreted the operation of VQML models under some specific schemes, we believe that this graphical analysis can be widely adopted for various types of VQML models if it follows the common approach.If the model utilizes PQC with the data and model parameters as its input and makes predictions according to the measurement of the output quantum state, the operation of the model can still be investigated with our graphical tool.The utility of our method might depend on specific conditions the models are built and run, but we expect the graphical interpretation can be utilized in most cases.

Classical models from graphical interpretation
ML models require inductive bias [64] to possess a good generalization performance.It is also true for VQML models as they cannot learn the target function well without proper inductive bias for the data [38].According to our result in section 3, we can find that VQML models require the target function can be fitted well with the functional form in equation (6).The active paths in the models should cover the region necessary to approximate the target function.That is, VQML models should have expressive power that includes the form of the target function.However, as we can find in figure 3, VQML models may retain less expressivity than it is expected to have.If its the case, classical emulation of VQML models might be possible.In this section, we suggest examples of classical models that can emulate the outputs of VQML models under this condition, which are inspired by our interpretation.

Classical models 4.1.1. DE model Suppose we have data points x ∈ [−π, π]
d and each data feature is nearly independent of each other with proper pre-processing.Then, we assume that the target function can be approximated in the following form similar to equation (6): By using the orthogonal property of the Fourier series, the Fourier coefficients c k for k can be estimated from the dataset D as: where N , C are proper normalizing constant.This is a simple estimation method that can be used to approximate VQML models when the important sub-region of the Fourier domain is known and we call this model a direct estimation (DE) model.Although this is a clear and straightforward way of approximation, we need to take the following limitations into account.This model assumes that each feature of data is sampled from a uniform distribution of [−π, π] as well.If we have some knowledge about the true distribution where the data is sampled, the approximation used in equation ( 8) can be improved by techniques such as importance sampling [65].It is obvious that if one needs to find the coefficients c k for all k, we need to estimate exponentially many terms with respect to the dimension of data so that it would not be tractable.However, this limitation is also valid for VQML models.

Restricted Dirichlet kernel model
It is well known that the Dirichlet kernel utilizes the Fourier space as its feature space.Therefore, with the Dirichlet kernel, classical models such as SVM can be used to approximate functions that can be expressed through a Fourier series expansion: It is possible to add some regularization to the Dirichlet kernel to make the model focus on a specific Fourier domain.For example, the Fourier coefficients can be multiplied by a factor q m to put more weights on lower or higher frequency region [66].

Neural network-based model
Suppose we use a QNN model which uses the expectation value of an observable for its inference like the circuit in figure 3. The quantum convolutional neural network (QCNN) structure [22] is one example of such a model.It achieved high performance on various classical data with a small number of model parameters [21].To approximate the QCNN model, we construct a classical model based on a neural network that learns the active paths using the form in equation ( 6).The approximated paths are then combined after being multiplied by additional trainable parameters and input the final value into a sigmoid function to make a prediction.To reduce the number of parameters, we employed the parameter-sharing scheme of the QCNN model.We denote the classical model as QCNN-C and the original QCNN model as QCNN-Q.See appendix E for more details about QCNN-C model.

Numerical experiments 4.2.1. Synthetic dataset
We first consider synthetic 2-dimensional datasets called checkerboard and symmetric donuts.The distribution of each dataset is plotted in figure 6.We compare the results of the quantum embedding kernel (QEK) model suggested in [28] with the DE model and the restricted Dirichlet kernel model with SVM (RDK-SVM).See appendix F for more details on the data and the models.
The output distribution of each model after training can be found in figure 7. By estimating only 20 largest coefficients c k in equation ( 7), the outputs of DE model shown in figures 7(b) and (g) could have similar distributions as the QEK model whose output distributions can be found in figures 7(a) and (f).In figures 7(d) and (i), we show the outputs of the RDK-SVM model with some additional restrictions.For the checkerboard dataset, we make the model only consider the Fourier functions which have the same frequency for each data feature.For the symmetric donuts dataset, we added regularization similar to equation (6.32) in [66] to the Dirichlet kernel so that the model gives more weight to the lower frequency components.For both cases, the distributions of SVM model output in figures 7(d) and (i) become similar to that of the QEK model.However, if we remove the restrictions of the RDK-SVM model, it can approximate the true data distribution better as shown in figures 7(e) and (j).While the classical models approximate the true distribution in figure 6, the output of the QEK model can also be reproduced by adding some restrictions to these classical models.

Image dataset
Now we consider real image datasets.We only consider binary classification problems so that 2 classes are used for each dataset as in [21].For the QCNN-Q model, we first down-sampled the input image into a 6 by 6 image and then embedded it into a quantum circuit using IQP-type embedding.After that, the QCNN circuit is applied.Note that each QCNN layer shares 15 parameters for their SU (4) gates and passes only half of their qubits to the next layer.Finally, the predicted label for input data is determined by measuring the first qubit.It uses 45 parameters in total and the overall structure of the QCNN-Q model can be found in figure 8. QCNN-C model is designed to have the same number of parameters as the QCNN-Q model and utilize the same down-sampled image.The detailed description about the condition for QCNN-C model can be found in appendix F. We applied these models to the modified National Institute of Standards and Technology (MNIST) dataset [67], the fashion-MNIST dataset [68], and the Canadian Institute for Advanced Research (CIFAR-10) dataset [69].The CIFAR-10 dataset is converted to a grey-scale image before it is down-sampled.The train and test set accuracy of trained models are shown in table 1. QCNN-C model outperformed the QCNN-Q model for all datasets, while they have the same number of parameters.
Although we suggested and tested simple classical models, there are lots of ways to construct classical models motivated by VQML models.Our graphical interpretation can be used as a tool to analyze the operation of VQML models and allows us to have insights into making classical models that emulate the output of VQML models.For example, it is suggested that a more expressive quantum circuit can be built by employing the symmetry of the data [70].If we can make a powerful VQML model with such kind of methods, then it might be possible to interpret VQML model with our graphical tool and one can use the result to enhance the performance of other classical models.That is, our approach shows the way to interconnect classical and QML.

Discussion
In this work, we introduced a graphical approach to analyze the PQC and interpreted the operation of VQML models with it.By graphically expressing the PQC, we interpreted VQML models as a large graphical model.We introduced a notion of active paths, which indicate the paths that connect the initial state to the final nodes that correspond to the model output.We illustrated the rate of growth in the number of active paths along the size of the quantum circuit and suggested that the notion of active path can be utilized to evaluate the expressivity of VQML models.Observing the limited expressivity of VQML models in certain instances, we also introduced three classical models inspired by our interpretation.We then conducted a performance comparison between these classical models and other VQML models using synthetic and image data.
In [71], the expressivity of PQC was analyzed by utilizing Kullback-Leibler divergence (KLD) between the fidelity distribution obtained from the output states of given PQC and Harr random states.In this case, the expressivity indicates the ability of PQC to explore the quantum Hilbert space, and it saturates to distinct values according to PQC structures as the number of layers increases.While this observation is consistent with our observations in section 3 in the sense that the increment of PQC expressivity along the number of layers depends on the type of quantum circuit, the KLD of fidelity distribution cannot directly relate the expressivity of PQC with that of VQML models.The accessibility to full Hilbert space may be beneficial, but it does not hold explicit meaning to the performance of VQML modes as function approximator.That is, it is not obvious how the parameters are related to the measurement outcomes of give PQC.On the other hand, the active paths in PQC determines the class of functions which corresponds to the output of VQML models through equation (6).
To explore the parameter space of PQC, the quantum Fisher information (QFI) was utilized in [41].With random set of parameters, the rank of QFI was utilized as a metric to measure the number of effective parameters.The estimated effective dimension of PQC, however, may differ according to the choice of random parameters and their range.Also, QFI is related to the expressiviy of PQC through the variance of gradients for local observables, which show low correlations for global costs [72].Since the maximum number of active paths is limited by the number of parameters, QFI might have relation with the upper limit of active paths.The contribution of each active paths to the measurement outcome depends on the specific value of each parameters as QFI, while the number of active paths can be used for both local and global observables as in figure 3.This is because the active paths directly relate the parameters to the output of PQC.
The main contribution of our work is that we provided a tool to interconnect VQML models with classical ML models.While several studies have focused on improving VQML models [18,70], still there has been no evidence about the advantage of VQML models for practical ML tasks due to the current limitations of NISQ computers [48].Therefore it is important to have a way to interpret VQML models for this problem.Our result can be used as a tool for this as we presented with toy examples.Although it would be hard to approximate the output of VQML models in general cases, we could find that the expressivity of VQML models exhibits extremely slow growth under certain conditions.Based on our graphical interpretation, we constructed classical models that can emulate the operation of VQML models under the conditions.We showed that these classical models can reproduce or outperform the outputs of VQML models.While we focused on approximating VQML models, it would be also possible to improve classical models using the insights that are obtained from VQML models with our interpretation.
Further studies can be conducted on analyzing other types of ansatz or interpreting VQML models for other ML problems with the graphical approach we developed in this paper.It can be utilized to design a more powerful circuit structure for problems at hand.We also expect that it can be applied to other problems such as the complexity of quantum circuit [73] and entanglement growth [74].The relation between the active paths and the amount of entanglement in the output quantum state may give us a way to evaluate the advantage of VQML models for the given problems or circuit structures.However, it is an open question whether the graphical approach would give us new insights into these problems and it remains as future research.ij is determined by the action of parameterized gate U (l) .The input data x and model parameters θ are used for U (l) and therefore determine the weight nonlinearly.This is different from the neural network model, where the nonlinearity comes from the use of an activation function.

Appendix A. Matrix representation of basic gate set
In this section, we suggest detailed calculations to obtain equations ( 4) and ( 5) in the main text.Consider a n-qubit quantum state: Other notations are following the main text.First assume a single qubit gate U (l) = R k a (θ) = e −i θ 2 σa at layer l acts on k-th qubit of ρ (l−1) , where a ∈ {1, 2, 3}.We have a quantum state ρ (l) after U, Now the action of U can be represented by a real valued weight matrix W that connects v (l−1) and v (l) as below, Then each element of W can be evaluated with W Figure A1 shows the graphical representation of equation (A2).With this expression, the VQML models can be considered as 4 n × 4 n graphical models where each weight is determined by the input data x and model parameters θ nonlinearly.Now consider the first term in the last equality in equation (A2), Tr cos θδ i,j , otherwise .
Note that δ i,j = ∏ n m=1 δ im,jm .First equality comes from the fact that This means that the first term in the last equality of equation (A2) simply connects each v (l−1) i k to itself with weight 1 or cos θ.The second term gives Therefore, the second term mixes v with distinct a, i k , j k under the condition i k ̸ = 0 and j k ̸ = 0.
Combining these results, we can obtain the weight W By inserting this result into equation (A1), we can derive equation (4) in the main text.Now consider the two-qubit CNOT gate.In the main text, we decomposed the CNOT gate as below, Here, q 1 is the control qubit and q 2 is the target qubit, and we denote X = σ 1 , Y = σ 2 , and Z = σ 3 for 2-qubit gate e −i π 4 Zq 1 Xq 2 for convenience.Note that Z q1 X q2 = Z q1 ⊗ X q2 and we omitted additional tensor product of identity matrices for convenience.We need to evaluate the representation of 2-qubit gate by investigating the corresponding matrix element W ) . ) .

C.1. 2-qubit example circuit
Here, we illustrate an example analysis of the 2-qubit circuit in figure C1.Note that we first transform the CNOT gate as the right circuit in figure C1 using the equality suggested in equation (A3).Then we write each gate operation as U (l) from l = 1 to l = 5.We set U (1) = R σx (x 1 ), U (2) = R σx (x 2 ), U (3) , and U (5) = R σzσx ( π

2
) . Using the Stokes representation suggested in equation ( 3) in the main text, we use 16-dimensional real vector v to represent the quantum Each U (l) can be written as 16 × 16 real matrix but here we suggest a graphical representation which corresponds to it.For example, the single qubit gate U (1) = R σx (x 1 ) and U (2) = R σx (x 2 ) can be drawn like in figure C3.Note that edges without any weights indicate just 1.We use v1 (i 1 i 2 ) and v2 (i 1 i 2 ) for i 1 , i 2 ∈ {0, 1, 2, 3} to indicate the quantum state before and after the gate operation.The rest of the gate operations are shown in figure C4.

C.2. 3-qubit example circuit
Similarly, the graphical representations of U (l) from l = 1 to l = 9 for the circuit in figure C2 are provided as a supplementary image file.

Figure 1 .
Figure 1.A schematic of a VQML model setup.First, the quantum embedding circuit is constructed and used for either QK or QNN.The QK model further needs to select the estimation method, while the QNN model requires an additional PQC layer.The overall circuit structure can be repeated for the data re-uploading scheme.The entire circuit is compiled into a primitive circuit comprising a basic gate set.The data and QNN parameters are processed to be used for basic gate operations appropriately.All these works are done on a classical computer.Quantum computer only runs the compiled circuits with given data and parameters.This approach is often called a hybrid quantum-classical approach.

Figure 2 .
Figure 2. Active paths in quantum kernel model.The paths connected by colored edges are active.The paths are symmetric and the weights of the edges are determined by the data x and x ′ at each side.

Figure 3 .
Figure 3.The hardware-efficient ansatz used to evaluate the growth of the number of active paths and the results.(a) The structure of the circuit with L layers of single qubit rotation gates and CNOT gates.The indices i, j in R bi,j (θ i,j ) indicate the index of layer and qubit, respectively.For each i and j, the axis of rotation b i,j ∈ {σx, σy, σz} and the parameters θ i,j ∈ [0, 2π] are sampled uniformly and independently.(b) The growth of the number of active paths connected between (green) v (0) k and v (L) 00•••03 or (purple) v (0) k and v (L) 33•••03 where k ∈ {0, 3} n .The number of qubits n is fixed as 3 and the number of layers L increases from 1 to 10. (c) The growth of the number of active paths connected between the same nodes in (b).In this case, the number of layers L is fixed as 3, and the number of qubits n increases from 2 to 10.The red dashed line indicates the theoretical maximum number of active paths.The number of active paths grows much faster if one increases the number of layers instead of the number of qubits since the circuit only utilizes nearest-neighbor interactions.It also depends on the observable we measure.

Figure 4 .
Figure 4. Evaluation of the growth of the number of active paths for IQP-type circuit.(a) The structure of the circuit with 5 qubits and L layers.We measure the last qubit to estimate the expectation value ⟨Xn⟩ (green) and ⟨Zn⟩ (purple) where n is the number of qubits.(b) The growth of the number of active paths in a circuit consists of 3 qubits and L layers from L = 1 to L = 4. (c) The growth of the number of active paths in a circuit consists of 3 layers and n qubits from n = 2 to n = 10.The small plot inside shows the graph in the dashed box.(d) The enlarged plot of dots in the dashed box in (c).

Figure 5 .
Figure 5. Evaluation of the growth of the number of active paths for QCNN-type circuit.(a) The structure of the QCNN-type circuit.Each grey box represents the QCNN block which has the structure inside the dashed box in the lower right corner.We used an even number of qubits n.If n is not the power of 2, there exists a layer that passes an odd number of qubits.We applied the QCNN block except the last qubit in this case and used the qubit later, and therefore the depth of the circuit is determined as ⌈log 2 n⌉.Finally, we measured the first qubit to estimate the expectation value ⟨X1⟩ (green) and ⟨Z1⟩ (purple).(b) The growth of the number of active paths in a circuit consists of n qubits.

Figure 6 .
Figure 6.Distribution of the datasets.(a) checkerboard dataset and (b) symmetric donuts dataset.Red and blue regions are the input space X , where each color indicates different labels.The green region in the symmetric donuts dataset is not included in the input space.

Figure 7 .
Figure 7.The results of trained models.Red and blue points are train data samples with labels 1 and −1.The distribution plotted on the background is the model output for input space X .(a)-(e) checkerboard dataset and (f)-(j) symmetric donuts dataset.(a), (f) Outputs of QEK method in [27].The model output has a simpler form than the true ones in figure 6. Outputs of DE model (b), (g) with restrictions and (c), (h) without restrictions.Outputs of SVM with (d), (i) restricted Dirichlet kernel and (e), (j) original Dirichlet kernel.

Figures 7 (
Figures 7(c) and (h) show that the DE model can have a better approximation for the true distributions shown in figure 7 if all the coefficients are estimated.In figures 7(d) and (i), we show the outputs of the RDK-SVM model with some additional restrictions.For the checkerboard dataset, we make the model only consider the Fourier functions which have the same frequency for each data feature.For the symmetric donuts dataset, we added regularization similar to equation (6.32) in[66] to the Dirichlet kernel so that the model gives more weight to the lower frequency components.For both cases, the distributions of SVM model output in figures 7(d) and (i) become similar to that of the QEK model.However, if we remove the restrictions of the RDK-SVM model, it can approximate the true data distribution better as shown in figures 7(e) and (j).While the classical models approximate the true distribution in figure6, the output of the QEK model can also be reproduced by adding some restrictions to these classical models.

Figure 8 .
Figure 8.Quantum embedding and QCNN circuit of QCNN-Q model used for image dataset.The down sampled input image x has 36 features which are used as R1 (θ) = Rσ Z (θ), R2 (θ) = Rσ Z σZ (θ) gate parameters.After the embedding circuit, the QCNN structure is implemented.Each two-qubit gate SU (4) l indicates SU (4) gate at layer l.All the SU (4) gates in the same layer share the same 15 parameters.In total, the model uses 45 parameters.

Figure A1 .
Figure A1.Graphical representation of equation (A2).VQML models can be treated as simple graphical models like the above.The weight W (l)

Figure C1 .
Figure C1.A circuit used to conduct example analysis.(left) The circuit consists of a basic gate set and (right) a transformed circuit using the equality of CNOT gate suggested in the main text.We omit the global phase and utilize the right circuit for our example analysis.

Figure C2 .
Figure C2.An example 3-qubit circuit.Like in figure C1, we present the graphical representation of the transformed circuit on the right.

Figure F1 .
Figure F1.Training history of (a)-(c) QCNN-Q and (d)-(f) QCNN-C model.The data used for (a) and (d) are 0, 1 images in the MNIST dataset.T-shirt, and sneaker images in the fashion-MNIST dataset are used for (b) and (e).Cat and truck images in the CIFAR-10 dataset were used for (c) and (f).Both models are trained using the gradient descent method with the Adam optimizer.The same batch size was used for both models.The solid line shows the mean value and the background filling indicates the standard deviation obtained with 20 iterations.

Table 1 .
Training and test accuracy of QCNN-Q and QCNN-C model.