Scalable quantum measurement error mitigation via conditional independence and transfer learning

Mitigating measurement errors in quantum systems without relying on quantum error correction is of critical importance for the practical development of quantum technology. Deep learning-based quantum measurement error mitigation (QMEM) has exhibited advantages over the linear inversion method due to its capability to correct non-linear noise. However, scalability remains a challenge for both methods. In this study, we propose a scalable QMEM method that leverages the conditional independence (CI) of distant qubits and incorporates transfer learning (TL) techniques. By leveraging the CI assumption, we achieve an exponential reduction in the size of neural networks used for error mitigation. This enhancement also offers the benefit of reducing the number of training data needed for the machine learning model to successfully converge. Additionally, incorporating TL provides a constant speedup. We validate the effectiveness of our approach through experiments conducted on IBM quantum devices with 7 and 13 qubits, demonstrating excellent error mitigation performance and highlighting the efficiency of our method.


I. INTRODUCTION
Quantum computing offers computational advantages over classical algorithms across various problems, such as factoring integers, simulating quantum systems, solving linear systems of equations, machine learning, and simulating stochastic processes [1][2][3][4][5][6][7][8][9][10].However, the susceptibility of quantum computing to noise and imperfections poses a significant challenge, limiting its ability to surpass classical capabilities in solving real-world problems.While the theory of quantum error correction (QEC) and fault-tolerance holds the promise of scalable quantum computation [11,12], building a fault-tolerant quantum computer remains a long-term endeavor.In the ongoing efforts to build full-fledged fault-tolerant quantum computers, there is a desire for techniques that improve the utility of quantum hardware in the presence of noise without relying solely on QEC.
Quantum error mitigation (QEM) refers to a set of techniques aimed at reducing the impact of errors on the outcomes of quantum computations [13][14][15][16].Unlike QEC, which completely removes errors, QEM focuses on minimizing their effects on the final result of an algorithm.By relaxing the requirement for full recovery of the desired state, QEM techniques can be implemented without the need for additional physical qubits.This makes QEM particularly well-suited for near-term quantum computing, where the size of quantum circuits that can be reliably executed is limited.In fact, QEM plays a crucial role in the Noisy Intermediate-Scale Quantum (NISQ) era [17], as it maximizes the utilization of limited quantum resources and expands the capacity of quantum systems for solving real-world problems [18,19].In this respect, developing the most efficient and scalable QEM techniques is an important task.
Measurement is an essential operation in quantum computing, but is prone to errors.In certain quantum devices, measurement errors can severely damage the overall computation.For instance, IBM quantum devices available on the cloud * dkd.park@yonsei.ac.kr typically exhibit measurement error rates on the order of 1%, with some cases reaching as high as 40%.Various methods have been proposed to mitigate measurement errors [20][21][22][23][24], all of which are based on fully characterizing the underlying noise model using techniques such as tomography and machine learning.However, the computational costs associated with these methods scale exponentially with the number of qubits, imposing limitations on both scalability and practicality.
In this paper, we present a scalable deep learning-based method for quantum measurement error mitigation (QMEM).Our method leverages the concepts of conditional independence and transfer learning [25] to significantly improve the efficiency compared to previous methods.Conditional independence assumes that the impact of measurement cross-talk between distant qubits is negligible.This assumption is especially relevant for quantum devices with limited connectivity among physical qubits, such as those constrained by nearestneighbor couplings [26] or employing distributed modular architectures [27][28][29][30][31].By incorporating this assumption, we are able to exponentially reduce the size of neural networks used for QMEM.Transfer learning assumes the existence of an error component that is shared across all qubits.This assumption facilitates a constant factor reduction in training time by effectively leveraging pre-trained models.To validate our approach, we conducted proof-of-principle experiments on IBM quantum devices with 7 and 13 qubits.The results demonstrate that the underlying assumptions hold and affirm the effectiveness of our QMEM method in reducing measurement errors.
The remainder of the paper is organized as follows.We begin by setting up the problems and reviewing the two common approaches of QMEM in Section II.Section III presents the theoretical framework of our work, describing how the concept of conditional independence and transfer learning techniques are incorporated into the proposed QMEM method.In Section IV, we provide detailed instructions on how to implement the proposed QMEM methods and describe experiments conducted through the IBM quantum cloud service.This section also includes a comprehensive performance comparison between the proposed QMEM methods and existing methods.Conclusions are drawn in Section V, along with discussions on directions for future works and open problems.

II. BACKGROUND
Many experimental setups for both the quantum circuit model and quantum annealing use projective measurement in the computational basis to perform readout of a quantum state.Moreover, positive operator-valued measurements can be realized through the projective measurement with ancillary qubits [32,33].Therefore, our primary focus is the development of error mitigation techniques to enhance the projective measurement in the computational basis.An ideal measurement on  qubits results in the probability distribution, which can be represented as a vector p = [  1 ,  2 , ...,  2  ].However, the observed probability distribution in experiments deviate from p due to measurement errors.We denote the observed probability for each bitstring as p and the error map as N such that p = N (p).The goal of QMEM is to minimize the loss function,  (p, p) where  is a distance measure that quantifies the discrepancy between the true and observed probability distributions.
The linear inversion method (LI-QMEM) assumes a noise model N (p) = p and aims to reconstruct the noise matrix  through tomography.It produces an error-mitigated probability vector p =  −1 p [20][21][22].In contrast, QMEM can be performed by training a deep neural network F to approximate the inverse noise function N −1 [23,24].The trained neural network produces an error-mitigated probability vector p = F ( p) ≈ N −1 ( p) = p.This approach, referred to as NN-QMEM, is capable of correcting non-linear errors, which is not possible with LI-QMEM [24].However, both LI-QMEM and NN-QMEM suffer from scalability limitations as the memory and computation time grow exponentially with the number of qubits.Recent estimations suggest that the current classical computational resources can only handle NN-QMEM for quantum systems of up to 16 qubits [24].This work focuses on overcoming the scalability limitation of NN-QMEM, since it can effectively correct non-linear errors.

A. Conditional Independence
Conditional independence (CI) is a fundamental concept in probability theory.It plays an important role in probabilistic models, simplifying the structure of the model and enabling an efficient analysis of the relationships between variables [34,35].It is valuable when modeling a large set of variables, where directly representing the joint distribution becomes challenging or impractical.By utilizing conditional independence relationships between variables, the joint distribution can be decomposed into smaller, more manageable components.This decomposition allows for a more tractable representation and analysis of complex probabilistic models.To illustrate this, consider the example of two random variables,  and  .We define  and  as independent if and ) for the experiment.The shaded yellow qubits indicate conditional qubits, and the qubits grouped within T-shaped boundaries represent subsystems with their respective conditional qubits (i.e., the leaf node and its parent in Fig. 2).
only if (,  ) = () ( ).Independence between these variables leads to a partitioning of the probability distribution into two parts.For instance, if  and  each take 2 10 values, the full joint distribution (,  ) would involve 2 20 probabilities.Nevertheless, assuming independence between  and  enables the decomposition of the joint distribution into the product of the individual distributions () and ( ).This decomposition significantly reduces the number of required probabilities to just 2 10 + 2 10 = 2 11 .Moreover, the definition of conditional independence is as follows.Let ,  , and  be random variables.We say that  and  are conditionally independent given  if the joint probability of  and  given  can be expressed as (,  |) = ( |) ( |).This indicates that the dependence between  and  can be accounted for solely through their relationship with .
To understand how the concept of CI can be applied to QMEM, let us consider a 7 qubit system depicted in Fig. 1a, where   denotes an  th qubit.The circles and the wires in the figure represent physical qubits and their connectivity, respectively.The set of qubits  = { 0 ,  1 ,  2 } and  = { 4 ,  5 ,  6 } are connected only through  = { 3 }.Then assuming conditional independence of subsystems  and  given , the joint probability distribution can be written as In the naive NN-QMEM, which aims to directly correct the full joint probability distribution ( , , ), the number of input nodes of the neural network must grow exponentially with the number of qubits.Typically, the total number of nodes grows linearly with the number of input nodes, and the number of parameters grows quadratically.On the other hand, under the conditional independence assumptions, one needs three machine learning models that correct for ( |), (|), and () independently.In this example, the number of input nodes for ( |) and (|) is 2 3 , and is 2 1 for ().Since  can take on two values (0 or 1), each conditional probability distribution requires two distinct neural networks.FIG. 2. The partitioning of qubits according to the conditional independence assumption. ,  represents the  th subsystem at partition level .   represents the conditional qubit that connects the subsystems  ,2  −1 and  ,2  .This partitioning scheme assumes that the subsystems are independent given the state of    .This process allows us to factorize a joint probability distribution into conditional probability distributions.
Consequently, the total number of parameters to be trained is proportional to 2((2 3 ) 2 + (2 3 ) 2 ) + (2 1 ) 2 = 260.In contrast, the full model requires it to be proportional to (2 7 ) 2 = 16384.The reduced parameter count in neural networks also implies that a smaller amount of training data is needed for the models to converge successfully.Therefore, by capitalizing on the conditional independence assumption, the overall training time can be significantly decreased.Additionally, smaller networks result in faster inference runtimes.Hereinafter, we refer to as the qubit corresponding to  as the conditional qubit.
The partitioning of the given quantum system based on the principle of conditional independence is illustrated in Fig. 2. In this figure,  ,  represents the  th subsystem at partition level , while    denotes the conditional qubit that connects the subsystems  ,2  −1 and  ,2  .This partitioning scheme assumes that the subsystems are independent given the state of    .Let us denote a set of conditional qubits that connect a leaf  ,  and the root by   ,  .For example,   3,3 = { 11 ,  21 ,  32 } and   3,6 = { 11 ,  22 ,  33 }.The partitioning process continues until the leaf nodes are reached, where each leaf node consists of a small number of qubits (e.g., less than 10).For instance, if the partitioning in Fig. 2 terminates at level 3, we would have eight leaf nodes.Each leaf node is associated with a conditional probability distribution ( 3, |  3, ), where  ∈ {1, . . ., 8}.The full joint probability distribution under CI is then computed by 8  =1 ( 3, |  3, ) 3 =1 (   ).In this example, each conditional probability distribution requires the training of eight separate neural networks, taking into account all computational basis states of the conditional qubits in   3, .However, the number of conditional qubits grows with the depth of the tree, which grows logarithmically with the number of total qubits.Moreover, it is evident that the number of leaf nodes cannot exceed the number of total qubits.Therefore, the number of neural networks to be trained inde-pendently grows linearly with the number of total qubits.The size of each neural network is constant because the leaf nodes are designed to contain only a small constant number of qubits.This constitutes an efficient QMEM method, which we refer to as CI-QMEM, that is exponentially faster than previous methods that aim to correct for the full joint probability distribution model without conditional independence, for which the size of the neural network or the size of the linear response matrix grows exponentially with the number of qubits.
The general formula for computing the joint probability distribution of a total system  whose partitioning under CI terminates at level  is (1)

B. Transfer Learning
Transfer learning (TL) can further reduce the training runtime by leveraging pre-trained neural networks.Instead of training a new neural network from scratch on a new dataset, transfer learning involves using parameters of a pre-trained network on a reference dataset that shares some similarities with the new dataset [36][37][38].Typically, the lower (earlier) hidden layers of a neural network, which capture low-level features, are kept frozen, while only the upper (later) layers are finetuned or trained.This strategy eliminates the need to relearn the common low-level features, enabling faster convergence and reducing the overall training time.
To understand how TL works in QMEM, let us consider a source subsystem denoted as  , with its associated conditional qubits represented by   , .CI-QMEM trains a neural network for each computational basis state of the conditional qubits   , by utilizing the noisy and ideal conditional probability distributions, p( , |  , ) and ( , |  , ), as input and output, respectively.Now, consider a target subsystem  , with the same number of qubits as  , , also having its associated conditional qubits denoted by   , .One might initially attempt repeat the entire CI-QMEM procedure described above to train a neural network for this new system.However, if the noise characteristics experienced by these subsystems exhibit similarities that can be captured by the neural network trained for the source system, it is unnecessary to train a separate neural network for the target system from the beginning.Instead, it is possible to transfer selected parameters learned from the source system to the target system, thereby reducing the number of parameters that need to be trained.Therefore, the application of TL is a justifiable approach, particularly when assuming the presence of systematic sources of noise that are common across different qubits within the quantum device.This concept is illustrated in Fig. 3. Importantly, a source model can be utilized for multiple target subsystems, as long as they share similar features with the source.Transfer learning typically reduces the number of parameters subject to training in the target model by a constant amount.Consequently, the reduction in trainable parameters also leads to a decrease in the amount of training data required for the model to converge.Henceforth, we refer to the QMEM technique that combines both CI and TL as CITL-QMEM.

A. Data collection
Deep learning-based QMEM methods require sample data for training neural networks.When constructing a family of quantum circuits to generate the training dataset, an essential requirement is the efficient computation of the associated probability distribution on a classical computer.This is necessary because training involves both noisy and ideal measurement results.Moreover, it is crucial that the error introduced by the gates used to prepare quantum states for the training data is negligible compared to the measurement error.In modern quantum devices, single-qubit gate errors are typically insignificant compared to measurement errors.Therefore, using quantum circuits composed solely of single-qubit gates satisfies these conditions.
Since the objective of QMEM is to mitigate errors in projective measurements in the computational basis, defined by the eigenstates of   , the relative phase between computational basis states is irrelevant.Consequently, quantum circuits that generate training data employ only single-qubit   () gates.Specifically, the unitary operation preparing the state is  =1  () (  ), where the superscript () denotes that the single-qubit rotation is applied to the  th qubit.The rotation angles are sampled randomly in a way that resulting quantum states are distributed uniformly on the boundary of the -plane of the Bloch sphere.This is achieved by randomly generating values for   ∈ [−1, 1] and computing   = arccos(  ).The noisy probability distribution p generated by the quantum circuit serves as the input for the neural network, while the ideal probability distribution p represents the desired output.The computation of p can be expressed as follows: where   is the  th bit of the binary string .
In the LI-QMEM method, the training data comprises all computational basis states of the target qubit system, requiring 2  circuit executions for an -qubit system.The data acquisition and error mitigation process can be facilitated using the Qiskit Ignis package [39].This package constructs a 2  × 2  calibration matrix for the -qubit system based on 2  pairs of noisy and ideal results.

B. Model construction
The neural network architecture comprises multiple layers, including the input, hidden, and output layers.For a joint probability distribution involving  qubits, there exist 2  possible computational basis states.Hence, both the input and output layers consist of 2  nodes.To achieve an optimal balance between convergence speed and accuracy, thorough experimentation was conducted to select appropriate hyperparameters.Specifically, we configured the network with 4 hidden layers, and each hidden layer contained 5 × 2  nodes.All hidden layers are fully connected layers, and each hidden node employs the Scaled Exponential Linear Unit (SELU) as the activation function [23,40].The output layer employs the softmax activation function, which normalizes the outputs into a probability distribution, ensuring that the sum of the output values is equal to one.The weights and biases of the neural network are optimised using the categorical cross-entropy loss function.The parameters are updated by the Adam optimizer [41].As part of the hyperparameter tuning, we set the learning rate, batch size, and number of epochs to 0.0001, 16, and 300, respectively.

C. Experimental Results
To validate the effectiveness of the proposed QMEM methods and to compare their performances against existing ones, we conducted experiments on 7-qubit and 13-qubit quantum systems.We assessed the performance of different QMEM methods using three metrics: Mean Squared Error (MSE), Kullback-Leibler Divergence (KLD), and Infidelity (IF).These metrics quantify the dissimilarity between the ideal and mitigated probability distributions and are computed as follows:

MSE Number of Training Data
Here,   and p represent the  th elements of the vectors representing the ideal and the mitigated probability distributions, respectively.A lower value for these measures indicates better performance, as it signifies a closer match between the ideal and mitigated distributions.Additionally, to quantify the rate of error reduction, we use the rate of improvement for each loss function   , where the subscript  corresponds to MSE, KLD, or IF.This rate of improvement, denoted as   , is calculated as follows [24]: × 100(%).
A higher value of   indicates better performance in reducing errors using QMEM.The 7-qubit experiments were conducted using two IBM quantum devices, namely ibmq jakarta and ibm lagos.To obtain the calibration matrix for LI-QMEM, we generated a set of 2 7 calibration circuits.For training the deep learningbased QMEM method, we generated 7500 data points for each device, which were then split into 6000 samples for training and 1500 samples for testing.To estimate the probability distribution, each quantum circuit was repeated 3.2×10 4 times.
The results of the 7-qubit QMEM experiments for evaluating the MSE between the ideal and error-mitigated probability distributions as a function of the number of training data are shonw in Fig. 4. The results indicate that CI-QMEM achieves substantial error mitigation using less data compared to NN-QMEM.This observation is consistent with the well-known fact that the amount of data required for training deep neural networks depends on the model complexity [42][43][44][45].As CI-QMEM significantly reduces the size of neural networks and the number of model parameters, it improves the error mitigation performance, even with fewer data.Specifically, the conventional NN-QMEM trains a neural neural network with about 1.8 × 10 6 parameters, while CI-QMEM uses only  about 2.9 × 10 4 parameters.The LI-QMEM, despite using only 128 data, cannot effectively mitigate non-linear errors as the deep learning-based approaches can.Consequently, CI-QMEM achieves smaller measurement errors, underscoring its superior performance in error mitigation compared to other methods.Based on the observation that the improvement from LI-QMEM to CI-QMEM is more pronounced in ibmq jakarta compared to ibm lagos, we speculate that the non-linear error is stronger in the former than in the latter.Moreover, it is important to note that the number of data required in LI-QMEM grows exponentially with the number of qubits.CITL-QMEM, which incorporates TL in addition to CI, achieves a further reduction in the number of parameters to about 1.5 × 10 4 .This is accomplished by selecting the neural network trained for learning the conditional probability distribution, ( 0 ,  1 ,  2 | 3 ), as the source model.Subsequently, the target model learns the new conditional probability distribution, ( 4 ,  5 ,  6 | 3 ).In this TL approach, only the last layer of the target neural network is fine-tuned (trained), while the rest of the hidden layers retain the parameters from the source model.
The overall results for the 7-qubit QMEM are reported in Table I, and Fig. 5 illustrates the effectiveness of transfer learning in reducing errors.Since the amount of error mitigated by CI-QMEM and CITL-QMEM is comparable as shown in the table, the figure only presents the results from the latter.The 13-qubit QMEM experiments were conducted using two IBM quantum devices, namely ibmq mumbai and ibmq kolkata.These devices consist of 27 qubits, and for our experiment, we selected 13 of them as shown in Fig. 1b.We generated 6000 data for each device and split them into 5950 for training and 50 for testing.To estimate the probability distribution, we repeated the quantum circuit 1.0 × 10 5 times, which is the maximum number available as the cloud service allows.
As shown in Fig. 1b.The joint probability distribution for the 13-qubit system decomposes as into account all computational basis states of the conditional qubits.Consequently, the total number of neural networks to be trained is 4 × 4 + 3 = 19.For transfer learning, we selected  2,1 and  2,2 as source subsystems.The results are presented in Fig. 6 and Table II.For the 13-qubit system, we did not perform conventional NN-QMEM experiments as it requires training a neural network with about 7.3 billion parameters, leading to unreasonably long data collection and training times.As evident from both the figures and the table, our methods significantly reduce errors in all performance measures.Interestingly, on the ibmq mumbai device, CITL-QMEM performs slightly better than CI-QMEM.This can be attributed to the use of a pre-trained model, enabling the creation of a simpler neural network with enhanced generalization capability [46,47].It is important to note that while LI-QMEM requires 8192 data points and NN-QMEM requires 7.3 billion parameters, our method is trained with only 5950 data points and 7.4 × 10 4 parameters for CI-QMEM, and 3.9 × 10 4 parameters for CITL-QMEM.

V. CONCLUSIONS AND DISCUSSION
We introduced a scalable quantum measurement error mitigation method that overcomes the limitations of existing approaches.By utilizing conditional independence and transfer learning techniques, we achieve exponential reductions in the size of neural networks while maintaining excellent errormitigation capabilities.Our method not only reduces the size of neural networks and the number of parameters to optimize but also significantly decreases the amount of data required for effective training.Experimental results on four IBM quantum devices, featuring 7 and 13 qubits, provide strong evidence of the efficiency and effectiveness of our method in mitigating measurement errors.Notably, CI-QMEM demonstrates a substantial reduction of neural network parameters by approximately 60 times for 7-qubit systems and 10 5 times for 13-qubit systems.Transfer learning allows for the additional reduction of parameters by approximately a factor of two in both cases, leading to even more efficient training.As a result, CI-QMEM and CITL-QMEM consistently outperformed the full NN-QMEM under similar training conditions in all experiments.Moreover, these deep learing-based methods outperformed the LI-QMEM due to their ability to correct non-linear errors.In particular, for the 13-qubit experiments, the enhancements were achieved while using a smaller amount of data.
In the following, we discuss interesting future research directions and open problems.While the primary focus of this study is to mitigate measurement errors, the techniques developed here can be combined with existing methods for mitigating gate errors [13][14][15][16]48], thereby enhancing overall quantum computing performance.Integrating gate error mitigation approaches with measurement error mitigation techniques holds great potential for significant improvements in the accuracy and reliability of quantum computations.Moreover, it would be intriguing to extend the concepts of CI and TL explored in this work to improve the efficiency and scalability of the existing machine learning-based gate error mitigation technique [15].The QMEM methods discussed in this work operate on the probability distribution obtained as the final outcome of a quantum computation at the software level.Consequently, these approaches are particularly suitable for cloud-based quantum computing environments.An interesting future endeavor is to optimize the performance by combining these approaches with hardware-level techniques that specifically addresses improving qubit-state-assignment fidelity by working directly with the readout signals [23].Furthermore, the success of CI-QMEM implies that the underlying assumption of negligible measurement cross-talk among distant qubits holds.Exploring the reverse scenario and developing techniques to characterize measurement crosstalk by leveraging the concept of conditional independence presents an interesting avenue for future research.An important open problem in the general NN-QMEM, including our approach, pertains to the potential impact of shot noise in large systems.For instance, when working with training data that is close to a uniformly-distributed state, the number of shots required to obtain its noisy distribution grows exponentially with the number of qubits.Future research can explore the effectiveness of CI and TL based QMEM on a biased training set, where the smallest probability is in  (1/poly(n)), or develop tailored methods for such conditions.

FIG. 1 .
FIG. 1. Layouts of the IBM quantum devices used in this work: (a) 7-qubit device and (b) 27-qubit device.Each circle labeled with   represents the  th physical qubit on the devices, while the connecting lines represent the qubit connectivity.In (b), we selected 13 qubits (from  4 to  16) for the experiment.The shaded yellow qubits indicate conditional qubits, and the qubits grouped within T-shaped boundaries represent subsystems with their respective conditional qubits (i.e., the leaf node and its parent in Fig.2).

FIG. 4 .
FIG. 4. Experimental results showing the mean squared error (MSE) between the ideal and error-mitigated probability distributions as a function of the number of training data.In the figure legend, we label the results from LI-QMEM, NN-QMEM, and CI-QMEM as LI, NN, and CI, respectively.The experiments were conducted on two 7-qubit devices: (a) ibmq jakarta and (b) ibm lagos.To evaluate the performance, 1500 test data points were used after training the model on varying numbers of training data.Each experiment was repeated five times, with the data set randomly split into training and test sets for each run.For NN and CI, the plotted values represent the mean MSE, with the error bars indicating one standard deviation.For the unmitigated and LI results, the smallest MSE obtained across the five experiments is shown.

FIG. 5 .
FIG. 5. To compare the 7-qubit QMEM obtained using different methods on the ibmq jakarta, we measured (a) MSE, (b) KLD, and (c) IF.The filled bars represent the unmitigated results, the unfilled bars represent the results obtained using the LI-QMEM method, the lighter hatched bars (third from the left) represent the results obtained using the NN-QMEM method, and the darker hatched bars (fourth from the left) represent the results obtained using the CITL-QMEM method.

FIG. 6 .
FIG. 6.To compare the 13-qubit QMEM results obtained using different methods on the ibmq kolkata, we measured (a) MSE, (b) KLD, and (c) IF.The filled bars represent the unmitigated results, the unfilled bars represent the results obtained using the LI-QMEM method, the lighter hatched bars (third from the left) represent the results obtained using the CI-QMEM method, and the darker hatched bars (fourth from the left) represent the results obtained using the CITL-QMEM method.
Transfer learning process for training neural networks in CI-QMEM.Initially, a neural network is trained to correct measurement errors of the source subsystem, denoted by  , .The training utilizes the noisy probability distribution, p( , |  , ) and ( , |  , ), as the input and the output, respectively.The parameters within the hidden layers, enclosed by a blue dashed box in the source model, are transferred to a new neural network designed to address measurement errors in a target subsystem,  , .This transfer is achieved by initializing the dashed edges in the target neural network with their corresponding parameters from the source neural network.Subsequently, only the last hidden layer of the target model, represented by a red dashed box, undergoes training.

TABLE II .
Performance evaluation of each QMEM method on the 13-qubit quantum devices, represented by the rate of improvement (  ).The results for LI-QMEM, CI-QMEM, and CITL-QMEM are displayed in the first, second, and third columns, respectively.