Noise resilience of variational quantum compiling

Variational hybrid quantum-classical algorithms (VHQCAs) are near-term algorithms that leverage classical optimization to minimize a cost function, which is efficiently evaluated on a quantum computer. Recently VHQCAs have been proposed for quantum compiling, where a target unitary U is compiled into a short-depth gate sequence V. In this work, we report on a surprising form of noise resilience for these algorithms. Namely, we find one often learns the correct gate sequence V (i.e. the correct variational parameters) despite various sources of incoherent noise acting during the cost-evaluation circuit. Our main results are rigorous theorems stating that the optimal variational parameters are unaffected by a broad class of noise models, such as measurement noise, gate noise, and Pauli channel noise. Furthermore, our numerical implementations on IBM’s noisy simulator demonstrate resilience when compiling the quantum Fourier transform, Toffoli gate, and W-state preparation. Hence, variational quantum compiling, due to its robustness, could be practically useful for noisy intermediate-scale quantum devices. Finally, we speculate that this noise resilience may be a general phenomenon that applies to other VHQCAs such as the variational quantum eigensolver.


Introduction
Obtaining accurate answers from near-term quantum computers is a challenge with major scientific and technological implications. In these so-called noisy intermediate-scale quantum (NISQ) computers [1], errors arise, for example, due to decoherence processes, gate noise, and measurement noise. Clearly, error mitigation techniques will be necessary to make use of NISQ devices. Several promising error mitigation strategies have recently emerged, including zero-noise extrapolation [2], quasi-probability decomposition [2], post-selection [3,4], noise-aware compiling [5], and machine learning for circuit-depth compression [6]. Let us consider two other strategies for error mitigation in what follows.
Hybridizing a quantum algorithm by pushing some of the complexity onto a classical computer allows one to only run a portion of the computation on the (error-prone) quantum computer. Excellent examples of this strategy are variational hybrid quantum-classical algorithms (VHQCAs) [7]. VHQCAs only employ a quantum computer to evaluate a cost function that depends on the parameters of a quantum gate sequence and then leverage a classical optimization routine to minimize the cost and hence train the parameters. The most famous VHQCA is the variational quantum eigensolver (VQE) [8], where the cost function is the energy for some Hamiltonian and hence the goal is to prepare the ground state. VHQCAs have been proposed for many other applications [9][10][11][12][13][14][15][16][17][18][19][20][21][22].
Another strategy for error mitigation is to find quantum circuits or quantum algorithms that are inherently noise resilient. Circuits for quantum error correction [23,24], of course, have this property of inherent noise resilience, and in fact, such circuits are resilient to all types of noise on a subset of the qubits. More generally, one could ask whether a circuit is resilient to a particular kind of noise process. Hence, for every circuit, which aims to compute some quantity, one could ask what noise models do not affect the output of the circuit.
The two strategies just mentioned have an interesting intersection: researchers have observed that some VHQCAs have some inherent noise resilience. McClean et al [7] noted that coherent errors (e.g., systematic gate biases) can lead to a situation where the formal unitary a V ( )specified by the parameters a is different from the actual unitary that is physically implemented ã V ( ). This error is correctable if there exists a vector b such that one can physically implement the unitary a b + V ( )within one's ansatz, with the condition that a b a + = V V ( ) ( ). If this condition is satisfied, then one could still physically achieve the minimum value of the cost function, where the minimum value would be associated with different parameters than one would have in the noiseless case. We refer to this kind of noise resilience as Cost Value Resilience, since the value of the cost function at the global minimum is unaffected by the noise. Cost Value Resilience is important, e.g. if one is interested in estimating the ground state energy of a Hamiltonian with VQE.
In this work, we report on a different kind of noise resilience for VHQCAs. Instead of considering Cost Value Resilience, we consider the case where the optimal parameters are noise resilient, which we call Optimal Parameter Resilience. While Cost Value Resilience is related to coherent noise, we find that Optimal Parameter Resilience holds for certain kinds of incoherent noise, such as decoherence processes and readout errors. For certain applications, obtaining the correct optimal parameters is more important than obtaining the correct value of the cost function.
Quantum compiling [25][26][27] is one of these applications. Compiling refers to transforming a high-level algorithm into a low-level machine code. For quantum compiling, it is crucial to do this transformation optimally, i.e. to keep the low-level code as short as possible, since errors accumulate with circuit depth. VHQCAs offer a promising framework for (optimal) quantum compiling. Three recent works introduced VHQCAs for quantum compiling, henceforth referred to as variational quantum compiling (VQC) [19][20][21]. In VQC one trains the parameters a of a short-depth gate sequence a V ( )such that it is close to a target unitary U. Here, some distance measure between a V ( )and U serves as the cost function and is efficiently evaluated on a quantum computer, while a classical optimizer adjusts the parameters a to minimize the cost. VQC could be an important tool for NISQ computing since it could optimally shrink the depth of quantum circuits. However, a potential issue is that one needs to put the target unitary U on the NISQ device, and hence the target itself is noisy or defective. Furthermore, there are noise sources in other parts of the cost-evaluation circuit. All of these may lead to a defective optimal a V ( ), with the noise effectively compiled into a V ( ). Addressing these concerns, our main results are rigorous theorems stating that many different types of noise during cost evaluation do not affect the optimal a V ( ). For example, we show that VQC is resilient to measurement noise (readout error). We also show resilience to incoherent gate noise and decoherence processes, such as Pauli channels and non-unital Pauli channels, acting at specific times during the costevaluation circuit. In addition to these analytical results, we implement VQC on IBM's noisy quantum simulator [28] (which simulates their quantum hardware) for several quantum gates: quantum Fourier transform, Toffoli, and W-state preparation. In each case, we observed significant noise resilience (even more resilience than what is explained by our theorems) such that we effectively learned the true optimal values of a despite the noise.
Finally, we speculate that the resilience phenomenon that we demonstrate for VQC may be more general, potentially applying to other VHQCAs. For example, we discuss the potential for seeing this resilience for VQE, and as a warm-up for the reader, we give a simple example in the next section where VQE exhibits Optimal Parameter Resilience. We also establish in the Discussion section that VQC is a special case of VQE, and hence our main results can be viewed as being relevant to VQE.

Warm-up: simple VQE example
Here we show that VQE [8] exhibits Optimal Parameter Resilience (OPR) to uncorrelated measurement noise for a special class of Hamiltonians. VQE may exhibit OPR more generally, although the proof would certainly be more involved. Hence we consider here this special case for illustration and leave the more general case for future work.
The ground state ofH is a tensor product of one-qubit states that are the eigenvectors of s w j is associated with ñ + w j | ( ) . Therefore, despite the measurement noise, one still finds that the ground state is This implies that one would still learn the correct optimal parameters of the statepreparation circuit if one implemented VQE for this Hamiltonian.

Background: variational quantum compiling
Let us now move on to variational quantum compiling (VQC). VQC was first introduced in [19], under the name of quantum-assisted quantum compiling (QAQC). Two later works further investigated VQC [20,21] with slightly different approaches. Since we are attempting to unite these works [19][20][21] under one umbrella, we are proposing the name VQC (instead of QAQC) as a unifying term.
There are two overarching approaches to VQC. One is to compile the full unitary matrix U by considering the action of U on all input states (or an informationally complete set of states) [19,21]. The other is to compile only a particular column of the matrix U by considering the action of U on a fixed input state [19,20]. The benefit of the first approach is that it is fully general, applying even when one does not know what the input state to U will be (for example, if U occurs in the middle of one's quantum algorithm). The benefit of the second approach is that, when the input state is known, it could lead to a shorter-depth compilation since it does not require compilation of the entire unitary matrix.

Full unitary matrix compiling
Full unitary matrix compiling (FUMC) was treated in detail in [19]. This work introduced cost functions based on the entanglement fidelity and proposed quantum circuits to quantify the cost based on the overlap between maximally entangled states. A slightly different but equivalent approach was employed in [21]. We focus on the approach of [19] in what follows.
Two cost functions were considered in [19]. One cost function C HST quantifies the Hilbert-Schmidt inner product between the target unitary U and the trainable gate sequence V, as follows: where d=2 n is the Hilbert-space dimension and n is the number of qubits that U acts on, and where we write V instead of a V ( )for simplicity. The circuit for computing C HST is called the Hilbert-Schmidt Test (HST) and is shown in figure 1(a). First, one prepares a maximally entangled state Fñ AB | by acting with a depth-two circuit E, then one applies U followed by V † on half of this maximally entangled state. Finally one measures the overlap with the original maximally entangled state Fñ AB | by applying E † and quantifying the probability of the all-zeros measurement outcome. One can verify that this probability is equal to . This cost function is operationally meaningful since it is equivalent to the average fidelity ò y between states acted upon by U versus those acted upon by V, as follows [29,30]: Note that C HST is faithful in that = C 0 HST iff V=U (up to a global phase). An alternative cost function [19] is given by ( ) is the probability of the 00 measurement outcome in the local Hilbert-Schmidt test (LHST), which is the circuit shown in figure 1(b). Note that F HST is the entanglement fidelity for the quantum channel defined by V U † . On the other hand, F j LHST ( ) is the entanglement fidelity for the quantum channel obtained from feeding into V U † the maximally mixed state on A j and then tracing over A j , where A j consists of all qubits in A other than A j . As shown in [19]  which implies that C LHST is also a faithful cost function, i.e. = C 0 LHST iff V=U (up to a global phase). The overall cost function proposed by [19] was a convex combination of C HST and C LHST : Here, q is a free parameter with   q 0 1. The definition of C(q) was motivated in [19] by the fact that C HST has a direct operational meaning (equation (4)) but it becomes difficult to train for large n due to a vanishing gradient [31], whereas C LHST is trainable but does not have a direct operational meaning. Hence one can take a weighted average of these two functions, where for small n one can choose » q 1, while for large n one can choose q≈0.

Compiling with a fixed input state
Fixed input state compiling (FISC) of a unitary matrix was introduced in [20,19] and treated in significant detail in [20]. In this case, the goal is to train a gate sequence V so that it has the same effect as a target unitary U when acting on a given input state y ñ 0 | . For simplicity and due to its technological relevance, we will consider the case where y ñ = ñ 0 0 | | is the all-zero state, so that we are interested in training V to satisfy (up to a global phase): between these two states: | the projector onto the all-zero state. We employed the LET subscript here since we refer to the circuit used to quantify (9) and (10) as the Loschmidt echo test (LET), shown in figure 2(a). The Loschmidt echo [32] refers to a forward and backward time evolution with the intent of recovering the initial state. This is analogous to the circuit in figure 2(a) where one first evolves forward with U and then attempts to undo that evolution with V † , to recover the initial state ñ 0 | . Hence the probability of the all-zero measurement outcome in figure 2(a) is precisely G LET .
One can see that compiling with a fixed input state leads to more freedom and hence more solutions than full unitary matrix compiling. Note that = C 0 HST iff = f W e i  where f is a global phase factor. On the other hand, for all bit strings z. Hence, for W that achieve ) remains completely arbitrary. This degeneracy of optima can simplify the optimization of V as any of these optima will lead to = C 0 LET . Figure 1. Circuits for cost evaluation in full unitary matrix compiling. (a) The Hilbert-Schmidt test (HST). An entangling gate E, consisting of Hadamards and CNOTs, prepares a maximally entangled state between systems A and B. Then a target unitary U is applied on A, which is followed by a trainable unitary V † . Finally, a measurement in the Bell basis is performed by applying the adjoint of E, followed by a standard basis measurement. This circuit computes the Hilbert-Schmidt inner product between U and V, as the probability to obtain the measurement outcome in which all n 2 qubits are in the ñ 0 The local Hilbert-Schmidt test (LHST), which is same as the HST circuit, except the disentangling gate E † is applied only on one A B j j pair of qubits (depicted here for the A B 1 1 pair) and subsequently, the same two qubits are measured in the standard basis. The probability for the outcome associated with the ñ 00 Analogous to the LHST cost for full unitary matrix compiling, one can define a cost function for fixed input state compiling that involves local observables: Here, P A 0 j is the projector onto the zero state on the A j qubit, and A j  denotes the identity on all qubits except A j and n is the number of qubits. We call the circuit used to compute C LLET the local Loschmidt echo test (LLET), and this circuit is shown in figure 2 ( ) corresponds to the probability of the zero outcome for the circuit in figure 2(b). With a proof similar to that of (6) one can show that . Furthermore, one can define an overall cost function analogous to C(q) in (7) which again is motivated by the fact that C LET has a direct operational meaning but is difficult to train for large n, whereas the opposite is true for C LLET . Hence one can take » q 1 for small n and q≈0 for large n.

Noise processes
In this work, we consider three different types of noise [33,34]: (1) decoherence noise, (2) gate noise, and (3) measurement noise. We now discuss how we mathematically model these three types of noise. Let us start with decoherence. Physical models of decoherence often refer to T 1 and T 2 processes, which respectively pertain to thermal relaxation (energy dissipation) and dephasing (loss of phase coherence). These processes are typically modeled as local quantum channels acting independently on individual qubits. However, mathematically it is easier to deal with classes of quantum channels that act globally on sets of qubits (which can contain the independent local channels as a special case). In what follows, we define three types of global quantum channels: depolarizing noise, Pauli noise, and non-unital Pauli noise. It is worth noting that Pauli noise includes T 2 processes as a special case (i.e. the dephasing channel is a Pauli channel), and non-unital Pauli noise includes T 1 processes as a special case (i.e. the amplitude damping channel is a non-unital Pauli channel). Consider the following precise definitions. Definition 1. We define depolarizing noise (DN) as a completely positive trace-preserving (CPTP) map that maps an n-qubit state ρ to the state r + - Definition 2. We define Pauli Noise (PN) as a CPTP map  whose superoperator is diagonal in the Pauli basis. In other words, its action on a Pauli operator = Ä Ä X Z X Z X Z : ... Definition 3. We define non-unital Pauli noise (NUPN) as a CPTP map  NU whose action on the identity is , and whose action on all other Pauli operators X Z l k with ¹ l k 0 0 , , . Furthermore, we assume that  c 0 lk for all l and k.
The local Loschmidt echo test (LLET), which is the same as the LET but only the A j qubit is measured. The probability that this qubit is in the ñ 0 | state is G j LLET ( ) in (12).
Next, we consider gate noise. While gate noise can involve coherent errors such as systematic gate bias, such errors are hardware-specific, and hence we focus on incoherent gate noise. We consider a simple model for gate noise in which every time a gate is implemented, a Pauli channel acts both before and after this gate. Furthermore, for generality, we allow these Pauli channels to act globally on all qubits, which serves as a model for cross-talk (where gates affect qubits on which they are intended to act trivially).
Definition 4. We define Pauli gate noise (PGN) as a simple noise model in which all gates are preceded and followed by global Pauli channels. In other words, for a gate G, instead of its action on a state ρ being r G G † , we model its action as † where  and ¢  are Pauli channels. Note that these Pauli channels act on all qubits, including qubits on which G acts trivially.
Finally, we consider measurement noise, also known as readout error. For a single qubit, we model measurement noise as a classical bit-flip channel, where feeding in the standard basis state ñ l | leads to the k outcome with probability p kl . We allow for asymmetry in that one can have ¹ p p 01 10 , which is an important generality, e.g. when T 1 noise occurs during the measurement process. For multiple qubits, our measurement noise model is a tensor product of the aforementioned bit-flip channels, corresponding to uncorrelated measurement noise.
Definition 5. We define measurement noise (MN) as a modification of the standard-basis POVM elements, which are = ñá = ñá P P 0 0 , , and p kl is the probability of getting the k outcome given the l input. Furthermore we assume that > p p kk kl for ¹ l k. Hence, for an n-qubit standard-basis measurement with measurement noise, we write the POVM element associated with the bit string , and we assume that

Main results
Before proceeding to the main results we first define two versions of optimal parameter resilience (OPR), i.e. of learning the correct gate sequence V despite various sources of noise, which we refer to as strong-OPR and weak-OPR.
Definition 6. Let  d be the set ofd dunitary matrices. Let C V QC ( ) be a cost function of V with Î  V d , and suppose that C V QC ( ) can be evaluated using a quantum circuit denoted QC. Let C V QC ( )  denote the noisy version of C V QC ( ), i.e. the corresponding function whenever the circuit QC is run in the presence of some noise process  . Let  d opt and d opt respectively denote the sets of unitaries that optimize C V

Noise resilience of full unitary matrix compiling
Let us begin with full unitary matrix compiling (FUMC). Figure 3 shows the two noise models that we will consider for FUMC. As shown in this figure, τ 1 and τ 2 are respectively defined as the times just before and just after the application of V U † . We note that the noise models considered in figure 3 capture fairly well the physical noise that is present in, e.g. superconducting-qubit quantum computers, with the exception that only depolarizing noise is allowed during the action of V U † . We make this simplification for ease of analysis, although our numerics in section 6 relax this assumption.
Consider the following definition for the noise model depicted in figure 3(a).
Definition 7. We define noise Model 1 to be the following noise process during the HST circuit: (1) global depolarizing noise acting continuously throughout the circuit, (2) global Pauli noise at times t 1 and t 2 , (3) global depolarizing noise on system A acting continuously in between t 1 and t 2 , (4) global non-unital Pauli noise on system B acting continuously in between t 1 and t 2 , (5) Pauli gate noise during E and E † , and (6) measurement noise. We also use the term Noise Model 1 when the same noise model acts during the LHST circuit, provided We now state our first main result. The proof of this result is given in appendix D, with some useful preliminaries and lemmas given in appendices A-C. Note that this theorem also implies that (16) and (17) is the same for C HST and C LHST functions. Hence this same set is optimal for C(q).
Consider the implications of theorem 1. First, this theorem implies that FUMC is resilient to the measurement noise model in definition 5. Second, FUMC is completely resilient to Pauli gate noise during the entangling and disentangling gates, E and E † . Note that this Pauli gate noise is global and hence accounts for cross talk. Third, FUMC is resilient to global depolarizing noise acting continuously throughout the circuit, as well as global Pauli noise acting at the specific times τ 1 and τ 2 . Fourth, FUMC is resilient to depolarizing noise acting on system A and non-unital Pauli noise acting on system B, provided that each of these process act (possibly continuously) during the time interval between τ 1 and τ 2 . We emphasize that Pauli noise includes dephasing channels (T 2 noise) as a special case, while non-unital Pauli noise includes the depashing channel (T 1 noise) as a special case. Importantly, theorem 1 states that FUMC is resilient to the general case where all of these noise processes occur together.
We now state our second main result (proven in appendix E), which deals with the noise model in figure 3 Definition 8. We define Noise Model 2 to be the following noise process during the HST circuit: (1) global depolarizing noise acting continuously throughout the circuit, (2) global Pauli noise at times t 1 and t 2 , (3) global non-unital Pauli noise on system A at time t 1 , (4) global depolarizing noise on system A acting continuously in between t 1 and t 2 , (5) global Pauli noise on system B acting continuously in between t 1 and t 2 , (6) Pauli gate noise during E and E † , and (7) measurement noise. We also use the term Noise Model 2 when the same noise model acts during the LHST circuit, provided one replaces E † with E j ( ) ( ) † .
Theorem 2. The cost functions C HST and C LHST exhibit strong-OPR to Noise Model 2 in definition 8.
The implications of theorem 2 are similar to those of theorem 1. The main difference is that theorem 2 allows for non-unital Pauli noise on system A at time τ 1 , at the expense of only allowing Pauli noise to act continuously on system B between τ 1 and τ 2 . The other aspects of the noise models treated by these two theorems are identical.
The above two theorems immediately imply several corollaries below. These corollaries establish resilience to noise models that are different and in some cases more general than the noise models previously considered, at the expense of possibly specializing the form of the unitary = W V U † . See appendix G for the proofs of all corollaries.
Corollary 1. The cost functions C HST and C LHST exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 1, as well as (2) a noise process during the implementation of in the time interval between t 1 and t 2 ) in which global Pauli channels , provided that the following condition is satisfied: Here  A  is also a Pauli channel, and the channels  ,  † , and  correspond to conjugating the state by the unitaries U , V † , and W , respectively.
The condition in (18) implies that the overall channel consisting of global Pauli channels acting on system A during the implementation of  is mathematically equivalent (although physically inequivalent) to a Pauli channel followed by  . Therefore, corollary 1 follows from theorem 1.
Consider the following implications of corollary 1. Unitaries corresponding to the Clifford group necessarily satisfy the condition in (18), as shown in appendix A. Therefore, corollary 2 below holds for any Clifford unitary W. Moreover, tensor-product unitaries satisfy this same condition provided that the noise is local depolarizing noise, and hence corollary 3 below also follows from corollary 1.
1 composed only of Clifford gates. Then the cost functions C HST and C LHST exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model1, as well as (2) a noise process during the implementation of 1 being a tensor product, i.e. W is a tensor product up to a particular time. Then the cost functions C HST and C LHST exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 1, as well as (2) a noise process during the implementations of = The following corollary follows from theorem 2 and is analogous to corollary 1.

Corollary 4.
The cost functions C HST and C LHST exhibit strong-OPR to the following noise model: (1) all noise processes in Noise Model 2, as well as (2) a noise process during the implementation of in the time interval between t 1 and t 2 ) in which global non-unital Pauli channels  A , provided that the following condition is satisfied:  is also a non-unital Pauli channel.
Finally, we present a simple corollary of theorem 1 based on the ricochet property of the standard Bell state. Note that the noise model in the following corollary is fairly simple but nonetheless physically distinct from those considered in figure 3, since it allows for global non-unital Pauli noise to occur during the implementation of W.
Corollary 5. The cost functions C HST exhibits strong-OPR to the following noise model: (1) global depolarizing noise acting continuously throughout the circuit, (2) global non-unital Pauli noise on system A at a fixed time in between t 1 and t 2 .

Noise resilience of fixed input state compiling
Let us now consider fixed input state compiling (FISC). Recall that the cost-evaluation circuits, shown in figure 2, have less structure than the circuits in figure 1. As a result, the noise model that we consider in the FISC case is simpler than the previously considered noise models. In particular, we define the following noise model, which is depicted in figure 4. Note that, in this context, τ 1 is defined as the time just before the application of V U † , and there is no need to consider a noisy quantum channel occurring after V U † since the measurement occurs immediately after V U † .
Definition 9. We define Noise Model 3 to be the following noise process during the LET or the LLET: (1) global depolarizing noise acting continuously throughout the circuit, (2) global Pauli noise acting at time t 1 , and (3) measurement noise.
We now state our main result for FISC, which is proven in appendix F. This theorem implies that FISC is resilient to the measurement noise model in definition 5. Furthermore, it is resilient to Pauli noise acting at τ 1 and global depolarizing noise acting continuously throughout the circuit.
We remark that while FUMC exhibits strong-OPR for the noise models considered (see the previous section), here FISC exhibits weak-OPR instead. The latter arises from the fact that the optimal set of unitaries  d opt for FISC can be highly degenerate (i.e. can contain many unitaries) and the presence of noise could in general break such degeneracy. The 'weak' term in weak-OPR is simply the fact that the number of global optima is possibly reduced by noise, not that the noise resilience itself is weak. Hence, weak-OPR should still be viewed as noise resilience, since the global optima in the presence of noise correspond to global optima in the noiseless case. This implies that training in the presence of noise will lead one to find the correct optimal parameters for a V ( ). Under certain conditions, theorem 3 implies that ¢ C q ( ) defined in (14)  , then for any value of q, ¢ = will also exhibit weak-OPR to Noise Model 3, where the unitaries that optimize ¢ C q ( ) in the noisy case belong to opt LET LLET . Theorem 3 implies the following corollaries, which establish resilience to noise models that go beyond Noise Model3 at the expense of specializing the form of W. Note that these corollaries are analogous to Corollaries 1-3, and corollary 6 implies Corollaries 7 and 8. See appendix G for the proofs. Corollary 6. The cost functions C LET and C LLET exhibit weak-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 3, as well as (2) a noise process during the implementation of , provided that the following condition is satisfied: where   is also a Pauli channel.
1 composed only of Clifford gates. Then the cost functions C LET and C LLET exhibit weak-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 3, as well as (2) a noise process during the implementation of 1 being a tensor product, i.e. W is a tensor product up to a particular time. Then the cost functions C LET and C LLET exhibit weak-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 3, as well as (2) a noise process during the implementations of =

Implementations
In this section, we present the results of implementing VQC on the following three-qubit unitaries: the Toffoli gate, the three-qubit quantum Fourier transform (QFT), and a W-state preparation circuit. Each of these unitaries is of interest, e.g. the Toffoli gate when combined with the Hadamard gate provides a universal gate set for quantum computing [35], the QFT is a subroutine in Shor's algorithm [36], and W-state preparation is useful for the quantum approximate optimization algorithm [37,9]. Figure 5 shows gate sequences corresponding to these unitaries obtained from the literature. The Toffoli gate in figure 5(a) is decomposed into a gate sequence that contains nine one-qubit gates and six CNOTs [38]. For the QFT we employ its textbook circuit [33] in figure 5(b), while the circuit for W-state preparation in figure 5(c) was derived from [39,40]. Our VQC implementations were performed using IBM's noisy quantum simulator [28] with a noise model built from the reported noise parameters and connectivity of IBM's 14-qubit Melbourne quantum computer [41]. We remark that for VQC, we must have a target unitary U that is written as a gate sequence in the native gate language and the native connectivity of the hardware. IBM's simulator for the Melbourne device has a square lattice connectivity and native gate alphabet of CNOTs, arbitrary rotation around Z and p 2 rotation around X. Hence, transforming the gate sequences in figure 5 for the native device will typically add an overhead of additional gates. Therefore, the target gate sequences in our implementations actually correspond to IBM's compilation (with this overhead included) of the circuits in figure 5.
In IBM's noise model [28,42], one-qubit gate errors are modeled as a single-qubit depolarizing error followed by a thermal relaxation error, where thermal relaxation refers to both T 1 and T 2 channels. Similarly, two-qubit gate errors consist of a two-qubit depolarizing error followed by single-qubit thermal relaxation errors on each qubit. Finally, the noise model includes single-qubit readout errors.
We employ two different ansatzes, shown in figure 6, and (as described below) we employ gradient-based optimization algorithms to train the gate sequence a V ( ). In figures 7-8, we plot the results of implementing VQC with IBM's noisy simulator for the three-qubit gates in figure 5. In each plot, we show the value of the noisy cost functions versus the number of iterations of the optimization algorithm. Additionally, we plot the corresponding value of the noiseless cost functions evaluated for the variational parameters a obtained from the Here, R m stands for the controlled phase gate with a phase shift of f = p e 2 i 2 m , and b V k k ( ) is given by (21). For the three-qubit W-state preparation circuit we have b = 2arccos 1 3 , 0, 0 noisy optimization. These results allow us to verify if the parameters obtained from the noisy optimization are indeed minimizing the noiseless cost functions. Before discussing the results, we first give details for our ansatzes and optimization methods.

Ansatzes and optimization methods
As previously mentioned, to implement VQC we consider two ansatzes for the trainable unitary a V ( ). The building block of our ansatzes is a dressed CNOT gate, which is a two-qubit gate composed of a CNOT preceded  Increasing the minimum number of shots iCANS employs to compute each partial derivative leads to smaller cost function values in both cases. and followed by single-qubit gates a V k k ( )acting on each qubit, as shown in figure 6(a

( ) ( )
Let us now introduce our ansatzes. We note that our two ansatzes are fairly similar to the ones introduced in [19]. In our first ansatz, each layer is composed of n dressed CNOTs, where n is the number of qubits (in the special case of n = 2 each layer consists of one dressed CNOT), with the precise structure defined as follows.
Definition 10. We define the alternating-pair ansatz as a layered ansatz in which each layer consists of (parameterized) dressed CNOT gates acting on alternating pairs of neighboring qubits as illustrated in figure 6(b).
We remark that it is useful to distinguish between a complete ansatz, in which an exact compilation for U is contained inside the ansatz, versus an incomplete ansatz, where exact compilation is not possible. In general, a small number of layers can lead to an incomplete ansatz, where one can only reach approximate compilation. Hence, increasing the number of layers l could allow one to obtain better compilations of U. Note however that while a large number of layers can achieve a complete ansatz, it can also be harder to train and can lead to a longer-depth circuit.
The alternating-pair ansatz may not lead to the optimal depth compilation for U, particularly in the complete ansatz case. Our second ansatz attempts to fix the issue of introducing unnecessary depth by having a structure that depends on U.
Definition 11. We construct the target-inspired ansatz by taking the gate sequence for the target unitary U , expanding this gate sequence into single-qubit gates and CNOTs, removing all single-qubit gates that precede or follow a CNOT, and replacing each remaining CNOT in the gate sequence with a (parameterized) dressed CNOT. Finally, each remaining single-qubit gate is replaced by a parametrized single-qubit gate.
As schematically depicted in figure 6(c), each layer is now composed of one dressed CNOT. This ansatz will always be complete since its structure is inspired by U. While this ansatz is not useful to compress the number of CNOTs in a V ( ), it is useful as a proof-of-concept to demonstrate OPR for complete ansatzes. We remark that a simple modification of this ansatz, where the placements of the dressed CNOTs are optimized over instead of fixed, would actually be useful for circuit-depth compression. Furthermore, we have implemented this dressed CNOT placement optimization, and we find that we obtain similar noise resilience results as those for the targetinspired ansatz.
Let us now discuss the optimization methods. As previously mentioned, the trainable gate sequence a V ( )is a function of a set of parameters a corresponding to the collection of the internal gate angles in each dressed CNOT. To optimize these parameters, we employ a gradient-descent approach. This approach exploits the fact that the gradient with respect to a of C HST , C LHST , C LET , and C LLET can be computed by using the circuits for HST, LHST, LET, and LLET, respectively [43,19]. We remark that we used different gradient-based approaches for the shallow and deep ansatz cases, since the latter requires a more sophisticated and efficient optimizer.
Specifically, for the shallow ansatz cases where there are few parameters, we employ the simple gradientbased approach outlined in [19, appendix 4] . In this approach, the number of shots N per iteration is fixed. (We choose N=50 000.) On the other hand, for deep ansatzes with larger numbers of parameters, we employ a more sophisticated gradient-based approach that improves efficiency by reducing the number of shots required [44]. This approach is the individual coupled adaptive number of shots (iCANS) algorithm of [44], which is a measurement-frugal method that often outperforms other optimizers in the presence of noise. The iCANS optimizer frugally adjusts the number of shots both for a given iteration and for a given partial derivative in a stochastic gradient descent. When employing iCANS, one sets as input: (1) the total number of shots employed during the optimization, and (2) the minimum number of shots (denoted N min ) employed to estimate the gradient for a given iteration. We set the latter to initially be N min =2 and then later increase this to N min =250, which empirically leads to good convergence.

Toffoli gate
The top panels in figure 7 show results of implementing VQC for the Toffoli gate. Figure 7 (top, a) corresponds to a V ( )being given by a single layer of the alternating-pair ansatz of definition 10. Here, the noisy cost functions C HST  and C LHST  (blue and red curve, respectively) tend to decrease as the number of iterations increases and converge to non-zero values. We remark that the number of iterations can be different for C HST  and C LHST  since the termination condition of the optimization algorithm can be reached for a different number of iterations. Figure 7 (top, a) also depicts the cost functions C HST and C LHST evaluated for the variational parameters a obtained from the noisy optimization (green and pink curve, respectively). These curves show that as the number of iterations increases, both C HST and C LHST tend to decrease too, indicating that the noisy training is indirectly training the noiseless cost functions, i.e. the adjustments to the parameters a made by noisy training are reducing the noiseless cost functions. Note that C HST and C LHST do not converge to zero since a single layer of three dressed CNOTs forms an incomplete ansatz for the Toffoli gate.
In order to determine if the algorithm is reaching the minimum value achievable with just one layer, we have also implemented VQC to compile the Toffoli gate in a noise-free simulation. The minimum values achieved for C HST and C LHST are shown as a blue and red dashed curve, respectively. Surprisingly, the cost functions evaluated with the parameters from the noisy training (green and pink curves) converge to the dashed lines. This suggests that the optimal parameters are noise resilient since noisy training reaches the minimum value obtained by noise-free training. As a caveat, however, we note that it is not clear whether the minima reached are global or local optima. Figure 7 (top, b) plots the VQC results for Toffoli with a V ( )given by two layers of the alternating-pair ansatz. In this case, C HST and C LHST converge to values which are smaller than the ones obtained in the one-layer case. The latter indicates that two layers allow for a more complete compilation of the Toffoli gate, albeit it appears that the ansatz is not yet complete. Note that both the decomposition of the Toffoli gate in figure 5, as well as two layers of the alternating-pair ansatz, consist of six CNOTs. However, the placement of the dressed CNOTs does not seem to be optimal. Finally, let us remark that the green and pink curves converge to the dashed blue and red lines, respectively. Hence, this once again shows that the optimal parameters are noise resilient. Similar to the previous case, it is not clear whether the minima reached are global or local minima. Figure 7 (top, c) shows results for the target-inspired ansatz of definition 11. As the number of iterations increases, all curves tend to decrease, with the green and pink curves converging to values of the order of 10 −4 . We remark that we have verified that = » W V U  † for the parameters obtained. In this case, we do not plot dashed blue and red curves since the ansatz is complete and the minimum of the noiseless cost functions is zero.
These results indicate that optimizing a V ( )in the presence of noise yields the correct variational parameters a, which minimize the noiseless cost function. Hence, both C HST and C LHST appear to exhibit OPR for the realistic noise model considered.

Quantum Fourier transform
We now discuss the VQC results for the three-qubit QFT. Figure 7 shows the results for a V ( )consisting of: a single layer of the alternating-pair ansatz of definition 10 (bottom, a), two layers of the alternating-pair ansatz (bottom, b), and the target-inspired ansatz of definition 11 (bottom, c). As shown in these plots, most of the results for QFT are similar to the results for the Toffoli gate. In all cases the noiseless cost functions tended to decrease with iterations, indicating that noisy training indirectly trains the noiseless costs.
For the one-layer case of figure 7 (bottom, a) the green and pink curves (noiseless cost functions evaluated at the parameters obtained from noisy training) converge to the value obtained by training in a noise-free environment (dashed curve). Here, the non-zero value of the dashed curve indicates that a one-layer ansatz is incomplete. This is in contrast to figure 7 (bottom, b), where the dashed red line of C LHST is of the order of 10 −4 , implying that the ansatz is complete. Once again, in figure 7 (bottom, b), the green and pink curves approximately converge to the dashed lines (noiseless training), indicating noise resilience. Finally, figure 7 (bottom, c), shows that that both C HST and C LHST appear to exhibit OPR, as we can indirectly train the parameters in a V ( )in the presence of noise.

W-state preparation
Finally, we discuss the results of implementing of VQC for both FUMC and FISC of a W-state preparation circuit. We remark here that we did not perform FISC for the Toffoli gate and the QFT since those unitaries act trivially on the ñ 0 | state. Moreover, we are only interested in comparing the FUMC and the FISC approach with a complete ansatz, meaning that we only considered the target-inspired ansatz of definition 11.
As shown in figure 8, all cost functions C HST , C LHST , C LET , and C LLET can be optimized indirectly via noisy training of a V ( ). Both for FUMC and FISC the cost functions go down to~-10 4 , while for FUMC one can even reach values of~-10 5 when employing the LHST. Hence, our numerics indicate that C HST , C LHST , C LET , and C LLET appear to exhibit OPR to IBM's realistic noise model.

VQC in the NISQ era
Our analytical and numerical results suggest that variational quantum compiling (VQC) could be a useful tool for near-term noisy quantum computing. While there are several intended uses for VQC [19], the main purpose is for circuit-depth compression of quantum algorithms. This depth compression arises because VQC could achieve optimal compiling, whereas classical methods for quantum compiling either scale exponentially (if they are aiming at optimal compiling) or are sub-optimal when they are restricted to local (instead of global) compiling of the circuit.
Suppose one is able to achieve depth compression with VQC. This implies that the target unitary U has a longer depth than the trained gate sequence a V ( ). Prior to our work, one may have been concerned that this depth compression might not reduce noise, because perhaps the noise occurring during U is somehow compiled into the gate sequence a V ( ). However, our work shows that this is not the case. Despite various sources of incoherent noise (e.g. see the noise model in figure 3), we find that one learns the correct optimal parameters a for a V ( ). This means that, after performing VQC, if one was to implement the gate sequence a V ( )instead of U, then one should see that a V ( )really does achieve less noise than U, since the depth of a V ( )is shorter.

Summary of results
In this work, we treated two different forms of VQC: Full Unitary Matrix Compiling (FUMC) and Fixed Input State Compiling (FISC). Our main analytical results were stated in theorems 1-3. We found that both FUMC and FISC are resilient to measurement noise. In addition, they are both resilient to global depolarizing noise acting continuously throughout the circuit and global Pauli noise occurring just prior to the implementation of = W V U † . For FUMC, we were able to prove resilience to additional sources of noise, such as Pauli gate noise during the entangling and disentangling gates as well as non-unital Pauli noise occurring at particular times in the circuit. The fact that our noise resilience results are more extensive for FUMC than for FISC may simply be due to the fact that the cost-evaluation circuit for FUMC is more complicated than that for FISC. Hence it is possible that this additional resilience is needed to make the two approaches have similar levels of noise resilience. Alternatively, it could be possible that either FUMC or FISC is more noise resilient than the other, although this remains to be established. (Note that our numerics did not see a significant difference in the noise resilience of FUMC versus FISC.) In addition, Corollaries 1-8 stated resilience results for noise models that go beyond the noise models considered in theorems 1-3, at the expense of possibly specializing the form of the unitary = W V U † (for example, to Clifford unitaries or tensor-product unitaries). In particular, these corollaries considered noise that occurs during the implementation of W, which is certainly practically relevant.
Our numerical results were presented in figures 7-8. Generally speaking, these numerics agreed with our theoretical expectations and hinted at resilience beyond what is stated in our theorems, which we discuss in the next subsection. We emphasize that our implementations employed the noise model of IBM's 14-qubit Melbourne device, and hence this shows that VQC exhibits resilience for currently available hardware.

Noise resilience beyond our theorems
There are two senses in which VQC might exhibit resilience beyond the results stated in our theorems. The first sense is that VQC may be resilient to more general noise models than the ones we considered. The second sense is that VQC may be resilient even for the incomplete ansatz case, on which we elaborate below. Both of these possibilities appear to be supported by our numerical implementations.
For evidence supporting the idea that VQC may be resilient to more general noise models, consider the following. The noise model associated with IBM's 14-qubit Melbourne device is more general than the noise models depicted in figures 3 and 4, and the unitaries we considered in figure 5 do not fall into the special cases (e.g. Clifford or tensor product) treated by Corollaries 1-8. For example, IBM's noise model has non-unital Pauli noise associated with each gate and hence occurring throughout the implementation of = W V U † . Thus, our theorems and corollaries do not cover all of noise processes occurring in IBM's noise model. Despite this, we were able to reduce the noiseless cost (via noisy training) to~-10 4 for the Toffoli gate ( figure 7 (top, c)) and QFT ( figure 7 (bottom, c)), and to~-10 5 for W state preparation ( figure 8).
Naturally, our theorems and corollaries have a bias towards noise models that are mathematically easy to work with, such as Pauli noise or depolarizing noise, since this makes it easier to formulate proofs. It is therefore important for future work to attempt to show resilience beyond these noise models.
As noted above, VQC may also have resilience beyond the complete ansatz case. Recall that we say an ansatz for a V ( )is complete (incomplete) if it contains (does not contain) an exact compilation of U. Our theorems and corollaries are restricted to the complete ansatz case, whereas our numerics in figure 7 also consider the incomplete ansatz case. Interestingly, figure 7 showed that typically one can obtain the same value for the noiseless cost with either noisy or noiseless training. This surprising result suggests that perhaps the optimal values for a may be resilient to noise even for the incomplete ansatz case, and future work should investigate this possibility.
In addition, it will be important to investigate the effect of noise on the parameter landscape and parameter trainability(e.g. [45]). Our work indicates that the global optimum of VQC may not change with noise, but does not address the difficulty of finding this optimum.

Coherent versus incoherent noise
In the Introduction, we emphasized the distinction between OPR and cost value resilience [7]. The latter is relevant to coherent noise, whereas OPR is relevant to incoherent noise. Intuitively, we anticipate that coherent noise (e.g. systematic gate biases) in VQC will often shift the location of the global minimum in parameter space, and hence we expect coherent noise to have a non-trivial effect on the optimal parameters in VQC. Because of this intuition, we have focused our paper and our definition of OPR solely on incoherent noise. We remark that our definition of OPR, which is stated in terms of unitaries (rather than parameters), would need to be modified if one is interested in studying parameter resilience for coherent noise. However, as noted, we do not anticipate resilience to coherent noise to hold. We also remark that other strategies exist to correct coherent noise [46]. Nevertheless, an interesting question for future work will be see whether OPR holds partially whenever both coherent and incoherent noise are present. In addition, it will be interesting to combine the ideas of OPR and cost value resilience into a single framework.

Noise resilience of VQE
Finally, let us consider VHQCAs more generally. In particular, let us revisit the variational quantum eigensolver (VQE) that we discussed in section 2. As we now show, VQC is a special case of VQE. This idea was noted for FISC in [20]. However, the argument is more subtle for the FUMC case.
The key observation is that the various cost functions can be rewritten as the expectation values for some effective Hamiltonians: a a a a a a a a and a c ñ Î  AB | ( ) are n-qubit and n 2 -qubit states, respectively, given by a a a a y c where  X denotes the Hilbert space of system X, and Fñ = ñ E 0 | | is the standard maximally entangled state on AB. We remark that a c ñ | ( ) is simply the Choi state associated with a V ( ). For the cost functions associated with FISC, the effective Hamiltonians are given by where P A 0 j is the projector onto the zero state of A j . For the cost functions associated with FUMC, the effective Hamiltonians are given by where F ñ j | ( ) is the standard maximally entangled state on A B j j . With these Hamiltonians, one can verify that the expressions in (22) are equal to the original cost function definitions in section 3. Hence, we have just shown that VQC is a special case of VQE, where the goal is to prepare the ground state of one of the Hamiltonians in (24) or (25).
The fact that VQC is a special case of VQE implies that, for specific Hamiltonians, VQE is noise resilient. Namely, we have shown that VQE exhibits OPR when the Hamiltonian has the form in either (24) or (25). This naturally points to the question of whether VQE is resilient more generally. It is therefore a very interesting direction for future research to extend our noise resilience to Hamiltonians other than the ones we considered.

Conclusions
In this work, we discovered a novel kind of noise resilience for variational hybrid quantum-classical algorithms (VHQCAs). We introduced the idea of optimal parameter resilience (OPR), where the variational parameters corresponding to the global optimum are unaffected by various types of incoherent noise. We showed that variational quantum compiling (VQC) exhibits OPR. This paves the way for VQC to be used in the era of noisy intermediate-scale quantum computing as a tool for circuit-depth compression. Important future research directions include: (1) extending our theorems to show resilience to more general noise models than the ones we considered (which our numerics suggest may be possible), (2) exploring noise resilience for the incomplete ansatz case (which our numerics indicate may also be resilient), (3) analyzing approximate noise resilience, (4) studying the effect of noise on the parameter training process, and (5) generalizing our resilience results to other Hamiltonians for the variational quantum eigensolver and exploring resilience for other VHQCAs (for example, some evidence of noise resilience was recently reported in [47]).
The main goal of the appendix is to provide the proofs of theorems 1-3 and Corollaries 1-8. For these proofs, we will need to first review some definitions and properties. We point readers to [33,34] for additional background.
Pauli Basis. In our proofs, we will work in the Pauli product basis, involving a tensor product of one-qubit Pauli operators. This is a natural basis to choose, given the qubit structure of quantum computers. Let The following properties are satisfied by the Pauli operators: In what follows, we consider the following maximally entangled states f f F ñáF = ñá . The aforementioned tensor product of maximally entangled states can be written in the Pauli basis as follows: All-zero state. Noting that s ñá = + 0 0 2 z  | | ( ) , then in the Pauli basis the all-zero state ñá = ñá Ä 0 0 Pauli channels. A Pauli noise channel corresponds to the action of random Pauli operators on a quantum state ρ according to a probability distribution. Let  A denote an n-qubit Pauli channel acting on system A=A 1 , ...A n . Then the action of  A on the state ρ is given by , and å = p 1 . Using the properties in (A2), we find that Non-unital Pauli noise channels. The action of a non-unital Pauli channel  NU on an n-qubit Pauli operators is We now prove the following lemma based on Clifford unitaries and Pauli channels.
Lemma 1. Let W be a Clifford unitary and let  be a Pauli channel. Then for any state ρ, the following holds: where  is another Pauli channel.
Proof. From (A6) it follows that å å The third equality follows from the definition of a Clifford unitary (A3), while the last equality follows from (A6). ,

Appendix B. Noisy entangling and disentangling gates in FUMC
For the proofs given in appendices D-G, we will make use of some properties of the noisy versions of entangling E and disentangling E † gates that appear in FUMC. Hence, it is helpful to first state these properties in this appendix. Recall that, for Pauli gate noise acting during E or E † , we assume that global Pauli channels act before and after each Hadamard, as well as before and after each CNOT. This noise model incorporates the case when there could be correlated Pauli noise acting on different qubits during E and E † . We note that the noisy entangling gate is the same for both the HST and the LHST. Let E=E AB denote the ideal entangling gate, which can be split into a tensor product of two qubit entangling gates E A B j j as

Moreover, each E A B
j j consists of a Hadamard gate acting on A j followed by a CNOT gate acting on both A j and B j . In the quantum channel notation we write this as • ( ) where  A j are the quantum channels that implement the Hadamard gates and  X A B j j are the quantum channels that implement the CNOTs. The noisy version of  AB , which we denote by  AB  , is where  j AB ,  j AB , and  j AB are n 2 -qubit global Pauli channels for all Î i n 1 ,..., { }, as defined in (A8). Since both Hadamard and CNOT gates are Clifford unitaries, by using lemma 1 we find that where  AB is another Pauli channel.
We now apply  AB  on the all-zeros state ñá 0 0 0 0 , , AB | | . Consider the following chain of equalities: where we used (A5), (A8), and the following identities for all jä{1, Kn}: The noisy disentangling channel for the HST is given by the adjoint of the noisy entangling channel, as defined in (B2). On the other hand, since in the LHST only two qubits A B j j are measured for a given run of the experiment, the disentangling channel is applied only on the A B j j pair. However, we assume that global Pauli channels act on n 2 qubits before and after the Hadamard and CNOT gate. For each jä{1, K, n}, the disentangling channel is given by the adjoint of the following channel: where  j AB ,  j AB ,  j AB , and  j AB are n 2 -qubit global Pauli channels, as defined in (A8), and we used lemma 1. We remark that the Pauli channels are defined with a j subscript in (B7) to emphasize that for different runs of the experiment the Pauli channels that act could be different.
From arguments similar to those used to derive (B4), we find that

Appendix C. Measurement noise in FUMC
For the proofs given in appendices D-G, we will make use of some properties of measurement noise in FUMC. Hence, it is helpful to first state these properties in this appendix.
Let P 0 denote the POVM element associated with getting the all-zeros outcome in the noiseless HST, which can be expressed as = ñá = ñá = P 0 0 : 0 0 . . Moreover, we assume that for all j the following strict inequality holds: > p p .
01 00 1 00 1 n n n n 1 1 1 1 C.1. Effective noisy measurement operator for the HST In the noiseless HST, the measurement is preceded by the disentangling unitary E AB ( ) † , where E AB is defined in (B1). In the Heisenberg picture, this corresponds to the evolution of the measurement operator with respect to the unitary E AB . We now derive the effective noisy POVM element as the evolution of P 0  under the noisy entangling channel  AB  (defined in section B). ñá  a b a b , , AB | | can be expressed as follows: where we used the properties of the Pauli operators as defined in (A2). Then, from (B4) and the linearity of quantum channels, it follows that , and l p A ( ) and k p B ( ) are probability distributions as in (C2).

C.2. Effective noisy measurement operator for the LHST
In the LHST, a noisy measurement on two qubits A B j j is preceded by the disentangling unitary E A B j j ( ) † acting on the same two qubits. Similar to section C.1, we now derive the effective POVM element as the evolution of the operator Q j 00 ( ) (defined below) under the adjoint of the noisy disentangling channel, as defined in (B7). The noisy POVM for the qubits A B j j is given by which follows from (C2). Moreover, the overall noisy POVM for the LHST is defined as By using arguments similar to those used in (C3), (C4), and (C5), we find that . Therefore, the overall effective noisy POVM for the LHST is defined as Before providing a proof of theorem 1, we prove the following lemma.

Lemma 2. Let C V
QC ( ) be a cost function of V with Î  V d , and  d the set ofd dunitary matrices. Additionally suppose that C V QC ( ) can be evaluated using a quantum circuit denoted QC as follows: where r is a quantum state, L denotes a POVM element and  V denotes the noisy unital quantum channel describing the evolution of the state throughout the computation, which depends on the unitary V . Then C V QC ( )  exhibits strong-OPR to a noise model composed of  V and a global depolarizing channels acting continuously throughout the computation.
Proof. Without loss of generality let us decompose  V as k noisy unital quantum channels: In the presence of global depolarizing noise acting throughout the computation, the cost function can now be expressed as where we have interleaved the channels  V i with global depolarizing channels  i . From definition 1 and from the fact that , it follows that Then, from (D4) we have that any unitary in  d opt will also optimize C V QC ( )  . Hence C V QC ( )  exhibits strong-OPR to a noise model composed of  V and a global depolarizing channels acting throughout the computation. , We first find the action of the channel where we used the definition of a non-unital Pauli channel from (A9) and (A10). We note that the terms that are independent of W i do not affect the global optima. Therefore, the only relevant term in (D9) is ) †  is a unital channel. Therefore, the term that decides the global optima in the HST is given by where we have omitted the scaling factors. Let The second equality follows from (C5), where we set k b =  = m p q c : 1 2   a a b b  a a b b a b a a b b a . The last equality follows from (A2). Let  d opt denote the sets of unitaries that optimize F V HST ( )(and hence We remark that this set of unitaries also optimizes F V LHST ( )(and hence C V LHST ( )). Then, for Consider the following inequality: where we used the Cauchy-Schwarz inequality. Moreover, note that the inequality in (D20) is saturated for any matrix ¢ Î  V d if we assume that the coefficients k a a b b AB , , , characterizing the noise satisfy k  0 . Therefore, the set of unitaries that optimize F V HST ( )(and hence C V opt . According to definition 6, the latter means that C HST exhibits strong-OPR to Noise Model 1 in definition 7.
We now show that the cost function C LHST exhibits strong-OPR to Noise Model 1. The LHST corresponds to the optimization of the following function: where we replaced the disentangling and measurement channels in (D15) with (C9). Consider the following: where in (D24) we have split Tr B into a contribution from qubit B j and a contribution on all qubits except B j , and where . The first equality is derived from (C9), while the inequality follows from the arguments similar to (D20).
Here we remark that the inequality (D26) is saturated for any unitary matrix in the set of unitaries that optimize F V HST ( )(and hence C V LHST ( )) given by (D18). Hence, C LHST exhibits strong-OPR to Noise Model 1 in definition 7 if we assume that the coefficients x a a b b j , , , Proof. We break up the HST circuit into three time intervals similar to section D. We again assume that the global depolarizing noise occurs on system AB during all three time intervals and the global depolarizing noise occurs on system A during the implementation of   • † . Moreover, suppose that a global Pauli channel  AB followed by a global non-unital Pauli channel  A NU acts at time τ 1 . Furthermore, a global pauli channel  AB  acts at time τ 2 , while a global Pauli channel acts continuously on the system B in between τ 1 and τ 2 .
The state at τ 1 is given by  a a b b a a b b a The term that depends on W in (E3) is given by where we used the definition of Pauli channels from (A6) and (A8). By omitting the scaling factors, the relevant term after t 3 is given by  a a b b   b  a  a b  b  a a b   a b   a a b b   b a  a b   a b   a a b 1 2   a a b b  a a b b a b a a b b a . The inequality follows from the arguments similar to (D20). Here, the last inequality in (E9) is saturated for any matrix V in the set  d opt of unitaries that optimize F V HST ( )(and hence C V LHST ( )) given by (D18). On the other hand . We note that (E16) is similar to (D23). Therefore, from the proof in section D it follows that å å t , provided that the following condition is satisfied: Here  A  is also a Pauli channel, and the channels  ,  † , and  correspond to conjugating the state by the unitaries U , V † , and W , respectively.
Proof. This follows from the fact that the overall noisy channel acting during the implementation of  is mathematically equivalent to a Pauli channel followed by the unitary  , as described in the condition (G1) and by invoking theorem 1, which allows for Pauli channel noise at time τ 1 ., 1 being a tensor product, i.e., W is a tensor product up to a particular time. Then the cost functions C HST and C LHST exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 1, as well as (2) a , provided that the following condition is satisfied:  is also a Pauli channel.
Proof. This follows from the fact that the overall noisy channel acting during the implementation of  is mathematically equivalent to a non-unital Pauli channel followed by the unitary  , as described in the condition (G7) and by invoking theorem 2, which allows for non-unital Pauli noise at time τ 1 ., Corollary 5. The cost functions C HST exhibits strong-OPR to the following noise model: (1) global depolarizing noise acting continuously throughout the circuit, (2) global non-unital Pauli noise on system A at a fixed time in between t 1 and t 2 .
where ¢ Î  V d opt , and where  d opt denote the sets of unitaries that optimize F V HST ( )(and hence C V HST ( )) as defined in (D18). The first and third equalities follow from the ricochet property. The last equality corresponds to the case when there is non-unital Pauli noise at time τ 1 and no other noise in the HST circuit, which is a special case of theorem 2. Therefore, the inequality follows from theorem 2. Moreover, by using the arguments similar to (E10)-(E12), we find that f V 2 ( ) is independent of W. This completes the proof. ,