Information flow in parameterized quantum circuits

In this work, we introduce a new way to quantify information flow in quantum systems, especially for parameterized quantum circuits. We use a graph representation of the circuits and propose a new distance metric using the mutual information between gate nodes. We then present an optimization procedure for variational algorithms using paths based on the distance measure. We explore the features of the algorithm by means of the variational quantum eigensolver, in which we compute the ground state energies of the Heisenberg model. In addition, we employ the method to solve a binary classification problem using variational quantum classification. From numerical simulations, we show that our method can be successfully used for optimizing the parameterized quantum circuits primarily used in near-term algorithms. We further note that information-flow based paths can be used to improve convergence of existing stochastic gradient based methods.


I. INTRODUCTION
Parameterized quantum circuits (PQCs) are a central component of many variational quantum algorithms (VQAs) with applications in quantum chemistry and combinatorial optimization [1][2][3], such as the Variational Quantum Eigensolver [4] and the Quantum Approximate Optimization Algorithm [5].In addition, variational algorithms have more recently been applied to a number of machine learning tasks, including data classification [6][7][8] and generative modeling [9][10][11][12][13][14].The general concept behind VQAs is to employ a PQC to generate a trial wavefunction on a quantum device.The resulting state is then repeatedly measured to estimate expectation values of some Hermitian operator(s) using the current trial wavefunction.These expectation values are used to evaluate an objective function that a classical optimizer maximizes or minimizes by varying the parameter values of the PQC.
In recent years, significant progress has been made to better understand PQCs, their design, and their relationship to algorithm performance.For example, expressibility and entanglement capability have been proposed as meaningful metrics to compare different PQCs and exclude circuits with limited capabilities [15].Moreover, expressibility has been correlated with other performance metrics of certain variational quantum algorithms [16], and high expressibility has been associated with the presence of barren plateaus in cost function landscapes [17].However, there is still a general lack of understanding of how the parameterization of quantum circuits impacts algorithm performance.
In this work, we propose a novel method to characterize the information flow in PQCs to incorporate correlations between parameters in a quantum circuit.To quantify the information flow, we first consider a graph representation of quantum circuits to define paths between the two-qubit unitaries present in the quantum circuit.We then introduce a measure of distance between two points in a circuit by mapping local unitaries into a multi-qubit state and using mutual information between the resulting single-qubit states.Finally, using the distance metric, we propose a new method for stochastic optimization of variational algorithms using a subset of gate parameters from a selected (random or shortest) path.
We perform numerical simulations to analyze the performance of the proposed method for two different tasks; ground state energy estimation and binary classification.Applying our method for optimization of VQAs, we observe that using (shortest) paths consistently outperforms the stochastic gradient-based method.
The rest of the paper is organized as follows: we present some preliminary information required to understand the article in section II, and details of the method used in the article are presented in section III.Next, the results from numerical simulations is presented in section IV and we finally present some con-cluding remarks in section V.

A. Mutual Information
For a bipartite quantum system on two subsystems A and B, the state space can be represented as the tensor product of the individual sub-spaces as: The quantum mutual information I(A : B) is a measure of the amount of correlation between the two sub-systems and is defined as: where ρ AB represents the density matrix of the full system on H AB , ρ A = Tr B (ρ AB ) and ρ B = Tr A (ρ AB ) are the reduced state of ρ AB on systems A and B respectively, and S(ρ) = − Tr(ρ log ρ) represents the von Neumann entropy of a density matrix ρ.

B. Quantum Circuits as Graphs
A quantum circuit is a sequence of operations that can be used to prepare a state of interest on a quantum device.It typically comprises of simple unitary operations called quantum gates that depend on some parameters.The action of the circuit U (θ) on an initial state |ψ 0 prepares the state |ψ(θ) = U (θ) |ψ 0 .
We develop a simple strategy for converting a quantum circuit into a directed graph by representing every quantum gate as edges in the graph and the states at different moments in the circuits as nodes.Using this picture, a single qubit gate is represented as an edge that connects two nodes, while a two-qubit gate is represented as four edges between four nodes.The nodes are then arranged as per the time steps in the quantum circuit.The open edges represent the initial state and the resulting state of the qubits.A simple illustration of this conversion is shown in Fig. 1.

C. Information Flow
Information flow is defined as the transfer of information from one variable to another in a process.In the context of quantum circuits, this can be regarded as the spread of correlation between qubits as a result of multi-qubit gates.Analogously one can think of it as the spread of correlation between gates in their corresponding causal cones.The spread of correlation has been considered in previous studies [18,19] to design better ansatz for variational algorithms.In this work, we use the notion of information flow (spread of correlation) to define paths through the circuit, using which we can control the spread of correlation between qubits.We combine the concept of causal cones with paths and utilize the graph representation of a parameterized quantum circuit to identify and control the flow of information (correlation) between gates.An illustration of these concepts is depicted in Fig. 2.

D. Optimization of variational quantum algorithms
In variational quantum algorithms we usually optimize the parameters of states prepared by a PQC, U (θ), in order to minimize an objective function of the form: where H represent a problem Hamiltonian, which is a linear combination on M Pauli terms H j .The optimization of the objective function is carried out by iteratively updating the parameters of the quantum circuit as: where α is the learning rate and ∂f (θ) ∂θi denotes the partial derivative of the objective function with respect to the variable θ i .The analytical gradient can be calculated using a K-term parameter-shift rule [20][21][22][23] as: where, γ k,i is the K−term coefficient in the shift rule.The calculation of the analytical gradient requires a large number of measurements for a single parameter update, as every gradient calculation requires K objective evaluations, which in turn requires a large number of measurements (shots) for accurate estimation of each objective.To overcome this, n-shot stochastic gradient descent was proposed in Ref. [24,25] where one uses n-shot estimators of the gradient instead of the exact ones, where, ) is the n-sample mean estimator of H j θ k,i .The parameter are then updated similarly to equation 3.

E. Variational Quantum Classification
Variational quantum classifiers (VQC) are quantum circuits that are trained for supervised learning tasks [7,8].There exist several strategies to design a quantum classifier, including well-known classical machine learning techniques such as artificial neural networks [26,27] or kernel methods [28,29].In this study, the circuit-centric architecture in Fig. 3 is used to study a binary classification problem [8].The general objective is to train the VQC on a data set {x i , y i } train to find a mapping between input x i and label y i .The trained parameterized quantum circuit can then be used as a black box to predict labels ŷi for a given set of test inputs {x i , y i } test .
The circuit used for this classification task consists of three distinct parts: The state preparation circuit, the model circuit, and a measurement scheme.Preparing a quantum state that embeds some classical data is achieved by applying a static quantum routine on the initial ground state |Φ(x) = E(x) |0 .Using the socalled basis encoding method with x ∈ B n , encodes the input data x as computational basis states.The prepared state |Φ(x) is then further processed with a given parametrized quantum circuit U (θ) resulting in a state |ψ vqc (x, θ) = U (θ)E(x) |0 .The ansatz structure used in this study contains parameterized single and two-qubit gates with trainable parameters.The circuit is shown in Fig. 4. The output state |ψ vqc (x, θ) is finally measured using the Pauli-Z operator on a qubit (we chose the first qubit) and obtaining the expectation value E(σ z ).This yields the predicted label (ŷ i ).
The parameters of the variational block are trained by minimizing the square loss cost function with n as the training set size and the predicted label output of the VQC as The binary classification problem studied in this paper is the n-bit parity problem.The corresponding dataset contains 2 n distinct binary vectors, where each label indicates whether the sum of the n components of the binary vector is odd or even.The Boolean n-bit parity function to be modeled is with the property that f (x) = 1 if the number of ones in the vector x ∈ {0, 1} n is odd or zero otherwise.
In what follows, we describe the details of how to quantify information flow for designing algorithm to optimize variatonal algorithms.

III. METHOD A. Measure of distance
A given unitary Û on two qubits can be conveniently described using a 4 × 4 matrix U (a,b),(c,d) with entries: where the indices (a, b) are thought of as a single unified row-index, and (c, d) similarly plays the role of a unified column-index.Using this description, a four qubit state corresponding to the unitary transformation can be defined: where, N is a normalization-constant.
Using this state, we can now define reduced density matrices of different subsystems of qubits.For instance, we can define the single-and two-qubit reduced densitymatrices: where the subscripts on the trace indicates the degrees of freedom that are traced over.
Using this description of the unitary Ref. [30] proposed the following distance metric across the legs of the unitary (see Fig. 5): where, I(i : j) is the mutual information between the sites i and j as defined in Eq. 1.
Figure 5.An illustration of the definition of labels on the unitary operator, as well as the corresponding labeling of the metric distances.Image adapted from Ref. [30].
In this article we introduce a further modification to the distance metric as: else (13) where, I(ij : kl) is defined as: To develop some intuition about this metric, we present an illustration of the distance between the legs of a two qubit gate, C-Ry(θ), in Fig. 6.As expected, we observe that the distance (weight of the diagonal edges) approaches infinity when theta equals zero because the two qubit gate at this value acts as identity.

B. Paths
The objective functions f (θ) for commonly used quantum algorithms are constructed as a linear combination of expectation values as: where H j represent individual Pauli-strings in the Hamiltonian defining the problem.These individual Pauli-strings usually acts on the state of a subset of the full qubits, and their expectation value thus depends on the gate parameter in the causal cone of these qubits.[31] We define different paths in the causal cone of the qubits to be measured by converting the quantum circuits into graphs as described in section II B. An illustration of the choices of paths in the causal cone is shown in Fig. 7.
We further introduce the notion of shortest paths by adding weights to the edges in the graph representation of the circuit according to the distance measure proposed in section III A. This is inspired from Ref. [30] where the authors use the length of geodesics for disentangling a quantum state.We use the networkx package [32] to create the graph representation of the quantum circuit and for finding different paths between nodes in the graph representation.

C. Optimization with paths
In this section, we present a strategy for optimizing variational algorithms based on paths in the causal cone of the individual Pauli-strings used to define the objective function.We use two different strategies of choosing sets of parameters by randomly sampling paths from either the set of all possible paths or the set of shortest paths.The overall algorithm is presented in Algorithm 1.
One can also modify the presented algorithm to add more stochasticity by randomly sampling the Hamiltonian terms to optimize in each step [25,33] or by updating parameters before calculating paths for terms.We leave these variations for future studies.
Algorithm 1: An outline of the path optimization algorithm.
Input: choice of path, problem Hamiltonian and ansatz Output: Ground state energy of the Hamiltonian Initialize: generate the circuit graph while not converged do for term in Hamiltonian do select a path based on the distance metric and the sampling strategy end for path in all paths do 1.select the parameters in the path 2. calculate the gradients w.r.t.every parameter using Eq. 4 3. update the parameters using the gradients and learning rate using Eq. 3 end end

IV. SIMULATION AND RESULTS
In this section, we numerically demonstrate the applications of the proposed algorithm for training variational quantum algorithms.The training is implemented in Tequila [34] an open-source python package which uses Qulacs [35] as the backend for the execution of all the numerical simulations.We also used the Pennylane [36] package for running some of the numerical simulations.We first present the details of the experiments for finding ground state energies.

A. VQE -XXZ-Heisenberg model
We use the VQE framework to find the ground state energy of the XXZ-Heisenberg model.The Hamiltonian of such a system can be written as follows: where X, Ŷ , Ẑ are the Pauli matrices, i, j denotes all the pairs of adjacent lattice sites, and J X , J Y , J Z are the coupling constant and h on the right represents the external magnetic field.The coupling constants for the XXZ-model follow the relation: For all the experiments, we fix the values of the different constants to h = 0 (no external magnetic field), ∆ = −20.0 and J = 1.0.This corresponds to the model having a ferromagnetic ground state.We use this model due to the fact that all the terms in the Hamiltonian depend on only the state of two qubits, e.g.Xi ⊗ Xj , and thus our method can be very useful in reducing the cost of parameter updates in every iteration of the

optimization.
We carried out simulations for five different lattice sizes, starting with 3 × 2 qubits, 4 × 2 qubits, 5 × 2 qubits, 6 × 2 qubits and 7 × 2 qubits (6, 8, 10, 12, and 14 qubits respectively).For all the simulations we use the ansatz shown in Fig. 1(b) with varying number of layers.The optimization is carried out using the algorithm presented in Algorithm 1 and the stochastic gradient descent algorithm, with a fixed learning rate of 0.1.The results from all the simulations are plotted in Figure 8.We only plot the first 100 iterations of the trajectories for all the cases, as it is sufficient for comparing the different methods.All the simulations were repeated at least 5 times with random initialization of the gate parameters to collect the statistics for comparison.
We look at the optimization trajectories from the different simulations using a single layer of the ansatz plotted in Fig. 8. First, we observe that optimization using Algorithm 1 with either a random or shortest path always performs better when compared with stochastic gradient descent.Second, we observe that the optimization trajectories using the shortest path on average have steeper initial convergence, however, the final energies achieved by all optimization methods were very close.This indicates that choosing a set of parameters based on information transfer between qubits via gates can help accelerate the overall convergence of an algorithm.Also, it should be noted that forcing the flow of information along a particular path can be useful in cases where the spread of information can lead to convergence issues.[37] Finally, we point out that the spread in the trajectories corresponding to the runs with the shortest path tends to be smaller, however, all the methods have some runs (particularly in the case of the 12 qubit model) where they converge to a local minimum.This is a common occurrence in stochastic optimization methods and can be mitigated using different methods.[38] As we increase the number of layers of the ansatz used in the numerical simulations, we observe that the rate of convergence for all the methods increases.However, we point out that optimization trajectories with the shortest path are still the fastest converging among all the methods.This implies that using (shortest) paths for optimization of algorithms with objectives depending on only a subset of the qubits might be useful.

B. VQC -Binary classification
We use the VQC framework presented in Section II E for the n-bit parity classification problem.The corresponding dataset of such a system consists of 2 n distinct binary vectors, where each label indicates whether the sum of the n components of the binary vector is odd or even.The Boolean n-bit parity function to be modeled is with the property that f (x) = 1 if the number of ones in the vector x ∈ {0, 1} n is odd.Binary classification is an interesting choice for this study because there is only one readout qubit, making the problem well-suited for our optimization method.Simulations were performed on the 4-qubit parity problem using the ansatz shown in Fig. 4 with 2, 3, and 4 layers.Again, optimization is performed using the algorithm presented in Algorithm 1 and Nesterov momentum with a fixed learning rate of 0.1.We selected the Nesterov optimizer for comparison, as using Adam failed to improve the model accuracies.The simulation results are shown in Fig. 9.We focus on the first 50 training epochs of optimization for all cases and plot the average of 5 instances with random initialization of the gate parameters to collect statistics for comparison.
As can be seen from the optimization trajectories in Fig. 9, the model is able to learn to successfully perform the classification task.We note that the optimization with random paths consistently outperforms the one with the Nesterov momentum method.However, we observe that the optimization with the shortest path performs poorly.We attribute this to the fact that the model has limited knowledge of the full data to compute the parity at any given step.So to test this hypothesis and inform the model of the full input state, we perform numerical simulations with paths to all qubits instead of a single random path.The combined paths in the worst case can correspond to the causal cone of the observable, which has been considered for optimization in previous works.[31] Furthermore we carry out the simulation with the paths alongside the combined paths with a reduced learning rate of 0.05 and plot the results in Fig. 10.The reason for training with a reduced learning rate is that we observe oscillations in the cost function with the combined paths.We note that the trajectories from the optimization with the combined paths outperform the optimization with individual paths consistently.This is the expected behavior as the model has access to the full data for the classification as compared to the case with individual paths.We further point out that the optimization with the random path on average has a higher convergence rate as compared to the one with the shortest path.This can be due to the fact that it can have access to a larger number of parameters as compared to the shortest path.While this suggests that one needs larger number of parameters here, we point that we can still find the optimal solution using single paths which only depend on a subset of the parameters.
Overall, we have presented empirical evidence that our method based on the path (defined using a distance measure) can be used to successfully optimize variational algorithms.

V. CONCLUSION
In this work, we have proposed a notion of information flow by defining a path in parameterized quantum circuits.We have presented a novel measure of distance between two points in the circuit by using mutual information between the quantum states that a local unitary acts on.The distance can be calculated efficiently as it does not rely on global parameters but only on the local unitary operator.We also present a strategy for optimizing parameterized circuits using paths for variational quantum algorithms.
We performed numerical experiments to estimate the ground state energy of the XXZ-Heisenberg model as well as do n-bit binary classification, using parameterized circuits of varying size and depth.The results from the numerical simulations provide empirical evidence that our method can be successfully used for these tasks.
Our work is an initial step toward using path-based information flow for the optimization of quantum circuits.While we have demonstrated consistent improvement for smaller problem instances, a systematic investigation of the scaling of our method for sufficiently deep circuits could be worth exploring.Other questions such as if forcing information along paths can help mitigate the observed barren plateau phenomenon or remove redundant parameterization of quantum circuits are left for future research.We believe that results from this study can be useful to researchers studying the optimization and design of quantum circuits.
(a) An illustration of the graph representation of single and two qubit gates.(b) An illustration of the graph representation of the full circuit.

Figure 1 .
Figure 1.A figure showing the graph representation of a given circuit.

Figure 2 .
Figure 2. A figure showing the causal cone and the various paths in a graph representation of a circuit.The nodes present in the causal cone of the observable on qubit 4 are colored blue, and the color gradient represent the distance (darker implies smaller).The blue lines denote different paths within the causal cone.

Figure 3 .
Figure 3. Structure of an n-qubit variational quantum binary classifier: state preparation circuit E(x) encoding the input x into the amplitudes of a quantum system, a model circuit U (θ), and a single qubit measurement.The measurement retrieves the probability p(y) of the model predicting 0 or 1, from which the binary prediction can be inferred.The classification circuit parameters θ are trained by a variational scheme.

Figure 4 .
Figure 4. Gate composition of a single layer of the ansatz used in the model circuit U (θ) for the classification task.

Figure 6 .
Figure 6.A plot of the distance vs. the parameter value for a C-Ry(θ) gate.

Figure 7 .
Figure 7.A figure showing the different path options in the causal cone of an observable.The intensity of the colors (darker implies smaller) denote the qualitative distance from the last point on the qubit to measured.

Figure 8 .
Figure 8. Optimization trajectories from different VQE simulations of the different XXZ-Hamiltonians.The lines correspond to the mean of the trajectories from different runs, and the shadow represent the area between the best and worst values from the simulations.

Figure 9 .Figure 10 .
Figure 9. Optimization trajectories from different VQC simulations of the n-bit parity problem using a 4-qubit ansatz.The lines correspond to the mean of the trajectories from different runs, and the shadow represents one standard deviation.