Noise resilience of variational quantum compiling

Kunal Sharma; Sumeet Khatri; M Cerezo; Patrick J Coles

doi:10.1088/1367-2630/ab784c

1. Introduction

Obtaining accurate answers from near-term quantum computers is a challenge with major scientific and technological implications. In these so-called noisy intermediate-scale quantum (NISQ) computers [1], errors arise, for example, due to decoherence processes, gate noise, and measurement noise. Clearly, error mitigation techniques will be necessary to make use of NISQ devices. Several promising error mitigation strategies have recently emerged, including zero-noise extrapolation [2], quasi-probability decomposition [2], post-selection [3, 4], noise-aware compiling [5], and machine learning for circuit-depth compression [6]. Let us consider two other strategies for error mitigation in what follows.

Hybridizing a quantum algorithm by pushing some of the complexity onto a classical computer allows one to only run a portion of the computation on the (error-prone) quantum computer. Excellent examples of this strategy are variational hybrid quantum-classical algorithms (VHQCAs) [7]. VHQCAs only employ a quantum computer to evaluate a cost function that depends on the parameters of a quantum gate sequence and then leverage a classical optimization routine to minimize the cost and hence train the parameters. The most famous VHQCA is the variational quantum eigensolver (VQE) [8], where the cost function is the energy for some Hamiltonian and hence the goal is to prepare the ground state. VHQCAs have been proposed for many other applications [9–22].

Another strategy for error mitigation is to find quantum circuits or quantum algorithms that are inherently noise resilient. Circuits for quantum error correction [23, 24], of course, have this property of inherent noise resilience, and in fact, such circuits are resilient to all types of noise on a subset of the qubits. More generally, one could ask whether a circuit is resilient to a particular kind of noise process. Hence, for every circuit, which aims to compute some quantity, one could ask what noise models do not affect the output of the circuit.

The two strategies just mentioned have an interesting intersection: researchers have observed that some VHQCAs have some inherent noise resilience. McClean et al [7] noted that coherent errors (e.g., systematic gate biases) can lead to a situation where the formal unitary $V({\boldsymbol{\alpha }})$ specified by the parameters ${\boldsymbol{\alpha }}$ is different from the actual unitary that is physically implemented $\widetilde{V}({\boldsymbol{\alpha }})$ . This error is correctable if there exists a vector ${\boldsymbol{\beta }}$ such that one can physically implement the unitary $\widetilde{V}({\boldsymbol{\alpha }}+{\boldsymbol{\beta }})$ within one's ansatz, with the condition that $\widetilde{V}({\boldsymbol{\alpha }}+{\boldsymbol{\beta }})=V({\boldsymbol{\alpha }})$ . If this condition is satisfied, then one could still physically achieve the minimum value of the cost function, where the minimum value would be associated with different parameters than one would have in the noiseless case. We refer to this kind of noise resilience as Cost Value Resilience, since the value of the cost function at the global minimum is unaffected by the noise. Cost Value Resilience is important, e.g. if one is interested in estimating the ground state energy of a Hamiltonian with VQE.

In this work, we report on a different kind of noise resilience for VHQCAs. Instead of considering Cost Value Resilience, we consider the case where the optimal parameters are noise resilient, which we call Optimal Parameter Resilience. While Cost Value Resilience is related to coherent noise, we find that Optimal Parameter Resilience holds for certain kinds of incoherent noise, such as decoherence processes and readout errors. For certain applications, obtaining the correct optimal parameters is more important than obtaining the correct value of the cost function.

Quantum compiling [25–27] is one of these applications. Compiling refers to transforming a high-level algorithm into a low-level machine code. For quantum compiling, it is crucial to do this transformation optimally, i.e. to keep the low-level code as short as possible, since errors accumulate with circuit depth. VHQCAs offer a promising framework for (optimal) quantum compiling. Three recent works introduced VHQCAs for quantum compiling, henceforth referred to as variational quantum compiling (VQC) [19–21]. In VQC one trains the parameters ${\boldsymbol{\alpha }}$ of a short-depth gate sequence $V({\boldsymbol{\alpha }})$ such that it is close to a target unitary U. Here, some distance measure between $V({\boldsymbol{\alpha }})$ and U serves as the cost function and is efficiently evaluated on a quantum computer, while a classical optimizer adjusts the parameters ${\boldsymbol{\alpha }}$ to minimize the cost. VQC could be an important tool for NISQ computing since it could optimally shrink the depth of quantum circuits. However, a potential issue is that one needs to put the target unitary U on the NISQ device, and hence the target itself is noisy or defective. Furthermore, there are noise sources in other parts of the cost-evaluation circuit. All of these may lead to a defective optimal $V({\boldsymbol{\alpha }})$ , with the noise effectively compiled into $V({\boldsymbol{\alpha }})$ .

Addressing these concerns, our main results are rigorous theorems stating that many different types of noise during cost evaluation do not affect the optimal $V({\boldsymbol{\alpha }})$ . For example, we show that VQC is resilient to measurement noise (readout error). We also show resilience to incoherent gate noise and decoherence processes, such as Pauli channels and non-unital Pauli channels, acting at specific times during the cost-evaluation circuit. In addition to these analytical results, we implement VQC on IBM's noisy quantum simulator [28] (which simulates their quantum hardware) for several quantum gates: quantum Fourier transform, Toffoli, and W-state preparation. In each case, we observed significant noise resilience (even more resilience than what is explained by our theorems) such that we effectively learned the true optimal values of ${\boldsymbol{\alpha }}$ despite the noise.

Finally, we speculate that the resilience phenomenon that we demonstrate for VQC may be more general, potentially applying to other VHQCAs. For example, we discuss the potential for seeing this resilience for VQE, and as a warm-up for the reader, we give a simple example in the next section where VQE exhibits Optimal Parameter Resilience. We also establish in the Discussion section that VQC is a special case of VQE, and hence our main results can be viewed as being relevant to VQE.

2. Warm-up: simple VQE example

Here we show that VQE [8] exhibits Optimal Parameter Resilience (OPR) to uncorrelated measurement noise for a special class of Hamiltonians. VQE may exhibit OPR more generally, although the proof would certainly be more involved. Hence we consider here this special case for illustration and leave the more general case for future work.

Consider a Hamiltonian that is a sum of local Pauli operators

$\begin{eqnarray}&&H=-\sum _{j=1}^{n}{c}^{\left(j\right)}{\sigma }_{w\left(j\right)}^{\left(j\right)},\end{eqnarray} \tag{ 1 }$

where ${\sigma }_{w\left(j\right)}^{\left(j\right)}={U}_{w\left(j\right)}^{\left(j\right)}{\sigma }_{z}^{\left(j\right)}{({U}_{w\left(j\right)}^{\left(j\right)})}^{\dagger }$ is a local operator on qubit j that is unitarily equivalent to the Pauli z operator ${\sigma }_{z}^{\left(j\right)}$ . Physically, this Hamiltonian arises for a system of n non-interacting spin-1/2 particles in a non-uniform (i.e. j-dependent) magnetic field. Without loss of generality, one can take the ${c}^{\left(j\right)}$ coefficients to be non-negative (i.e. absorb any negativity into the definition of the Pauli operator). The ground state $| {\psi }_{0}\rangle$ of H has a tensor product form: $| {\psi }_{0}\rangle ={\displaystyle \bigotimes }_{j=1}^{n}| w{(j)}_{+}\rangle$ , where $| w{(j)}_{+}\rangle$ is the eigenvector of ${\sigma }_{w\left(j\right)}^{\left(j\right)}$ with the +1 eigenvalue.

Now suppose there is measurement noise in the cost-evaluation circuit. In the ideal case, one measures $\langle H\rangle \,={\sum }_{j}{c}^{\left(j\right)}\langle {\sigma }_{w\left(j\right)}^{\left(j\right)}\rangle ={\sum }_{j}{c}^{\left(j\right)}\langle {U}_{w\left(j\right)}^{\left(j\right)}{\sigma }_{z}^{\left(j\right)}{({U}_{w\left(j\right)}^{\left(j\right)})}^{\dagger }\rangle$ by applying ${({U}_{w\left(j\right)}^{\left(j\right)})}^{\dagger }$ on the jth qubit and measuring it on the standard basis to estimate $\langle {\sigma }_{w\left(j\right)}^{\left(j\right)}\rangle$ . Then, by performing classical post-processing we compute the weighted sum in $\langle H\rangle$ . However, with measurement noise, the ${\sigma }_{z}^{\left(j\right)}$ operator gets replaced by ${\widetilde{\sigma }}_{z}^{\left(j\right)}=({p}_{00}^{\left(j\right)}-{p}_{10}^{\left(j\right)})| 0\rangle \langle 0| -({p}_{11}^{\left(j\right)}-{p}_{01}^{\left(j\right)})| 1\rangle \langle 1|$ . Here, ${p}_{{kl}}^{\left(j\right)}$ is the probability to obtain the k outcome when feeding in the $| l\rangle$ state on the jth qubit. Hence, instead of measuring $\langle {\sigma }_{w\left(j\right)}^{\left(j\right)}\rangle$ , one measures $\langle {\widetilde{\sigma }}_{w\left(j\right)}^{\left(j\right)}\rangle$ with ${\widetilde{\sigma }}_{w\left(j\right)}^{\left(j\right)}={U}_{w\left(j\right)}^{\left(j\right)}{\widetilde{\sigma }}_{z}^{\left(j\right)}{({U}_{w\left(j\right)}^{\left(j\right)})}^{\dagger }$ . In other words, the Hamiltonian H gets replaced by an effective Hamiltonian:

$\begin{eqnarray}&&\widetilde{H}=-\sum _{j=1}^{n}{c}^{\left(j\right)}{\widetilde{\sigma }}_{w\left(j\right)}^{\left(j\right)}.\end{eqnarray} \tag{ 2 }$

The ground state of $\widetilde{H}$ is a tensor product of one-qubit states that are the eigenvectors of ${\widetilde{\sigma }}_{w\left(j\right)}^{\left(j\right)}$ with the largest eigenvalue. Suppose we assume that ${p}_{00}^{\left(j\right)}+{p}_{11}^{\left(j\right)}\gt {p}_{01}^{\left(j\right)}+{p}_{10}^{\left(j\right)}$ for all j, which means that the probability of getting the correct outcome is greater than the probability for getting the wrong outcome. With this assumption, the largest eigenvalue of ${\widetilde{\sigma }}_{z}^{\left(j\right)}$ is associated with the $| 0\rangle$ state, and hence the largest eigenvalue of ${\widetilde{\sigma }}_{w\left(j\right)}^{\left(j\right)}$ is associated with $| w{(j)}_{+}\rangle$ . Therefore, despite the measurement noise, one still finds that the ground state is $| {\psi }_{0}\rangle ={\displaystyle \bigotimes }_{j=1}^{n}| w{(j)}_{+}\rangle$ . This implies that one would still learn the correct optimal parameters of the state-preparation circuit if one implemented VQE for this Hamiltonian.

3. Background: variational quantum compiling

Let us now move on to variational quantum compiling (VQC). VQC was first introduced in [19], under the name of quantum-assisted quantum compiling (QAQC). Two later works further investigated VQC [20, 21] with slightly different approaches. Since we are attempting to unite these works [19–21] under one umbrella, we are proposing the name VQC (instead of QAQC) as a unifying term.

There are two overarching approaches to VQC. One is to compile the full unitary matrix U by considering the action of U on all input states (or an informationally complete set of states) [19, 21]. The other is to compile only a particular column of the matrix U by considering the action of U on a fixed input state [19, 20]. The benefit of the first approach is that it is fully general, applying even when one does not know what the input state to U will be (for example, if U occurs in the middle of one's quantum algorithm). The benefit of the second approach is that, when the input state is known, it could lead to a shorter-depth compilation since it does not require compilation of the entire unitary matrix.

3.1. Full unitary matrix compiling

Full unitary matrix compiling (FUMC) was treated in detail in [19]. This work introduced cost functions based on the entanglement fidelity and proposed quantum circuits to quantify the cost based on the overlap between maximally entangled states. A slightly different but equivalent approach was employed in [21]. We focus on the approach of [19] in what follows.

Two cost functions were considered in [19]. One cost function ${C}_{{\mathsf{HST}}}$ quantifies the Hilbert–Schmidt inner product between the target unitary U and the trainable gate sequence V, as follows:

$\begin{eqnarray}&&{C}_{{\mathsf{HST}}}=1-{F}_{{\mathsf{HST}}},\quad \mathrm{with}\quad {F}_{{\mathsf{HST}}}=| \mathrm{Tr}({V}^{\dagger }U){| }^{2}/{d}^{2},\end{eqnarray} \tag{ 3 }$

where d = 2ⁿ is the Hilbert-space dimension and n is the number of qubits that U acts on, and where we write V instead of $V({\boldsymbol{\alpha }})$ for simplicity. The circuit for computing ${C}_{{\mathsf{HST}}}$ is called the Hilbert–Schmidt Test (HST) and is shown in figure 1(a). First, one prepares a maximally entangled state $| {\rm{\Phi }}{\rangle }^{{AB}}$ by acting with a depth-two circuit E, then one applies U followed by ${V}^{\dagger }$ on half of this maximally entangled state. Finally one measures the overlap with the original maximally entangled state $| {\rm{\Phi }}{\rangle }^{{AB}}$ by applying ${E}^{\dagger }$ and quantifying the probability of the all-zeros measurement outcome. One can verify that this probability is equal to ${F}_{{\mathsf{HST}}}=| \mathrm{Tr}({V}^{\dagger }U){| }^{2}/{d}^{2}$ . This cost function is operationally meaningful since it is equivalent to the average fidelity $\overline{F}(U,V)=\int | \langle \psi | {V}^{\dagger }U| \psi \rangle {| }^{2}{\rm{d}}\psi$ between states acted upon by U versus those acted upon by V, as follows [29, 30]:

$\begin{eqnarray}&&{C}_{{\mathsf{HST}}}=\displaystyle \frac{{\rm{d}}+1}{{\rm{d}}}(1-\overline{F}(U,V)).\end{eqnarray} \tag{ 4 }$

Note that ${C}_{{\mathsf{HST}}}$ is faithful in that ${C}_{{\mathsf{HST}}}=0$ iff V = U (up to a global phase).

**Figure 1.** Circuits for cost evaluation in full unitary matrix compiling. (a) The Hilbert–Schmidt test (HST). An entangling gate E, consisting of Hadamards and CNOTs, prepares a maximally entangled state between systems A and B. Then a target unitary U is applied on A, which is followed by a trainable unitary ${V}^{\dagger }$ . Finally, a measurement in the Bell basis is performed by applying the adjoint of E, followed by a standard basis measurement. This circuit computes the Hilbert–Schmidt inner product between U and V, as the probability to obtain the measurement outcome in which all $2n$ qubits are in the $| 0\rangle$ state is ${F}_{{\mathsf{HST}}}=(1/{2}^{2n})| \mathrm{Tr}({V}^{\dagger }U){| }^{2}$ . (b) The local Hilbert–Schmidt test (LHST), which is same as the HST circuit, except the disentangling gate ${E}^{\dagger }$ is applied only on one ${A}_{j}{B}_{j}$ pair of qubits (depicted here for the ${A}_{1}{B}_{1}$ pair) and subsequently, the same two qubits are measured in the standard basis. The probability for the outcome associated with the $| 00\rangle$ state is ${F}_{{\mathsf{LHST}}}^{\left(j\right)}$ in (5).
Download figure:
Standard image High-resolution image

**Figure 1.** Circuits for cost evaluation in full unitary matrix compiling. (a) The Hilbert–Schmidt test (HST). An entangling gate E, consisting of Hadamards and CNOTs, prepares a maximally entangled state between systems A and B. Then a target unitary U is applied on A, which is followed by a trainable unitary ${V}^{\dagger }$ . Finally, a measurement in the Bell basis is performed by applying the adjoint of E, followed by a standard basis measurement. This circuit computes the Hilbert–Schmidt inner product between U and V, as the probability to obtain the measurement outcome in which all $2n$ qubits are in the $| 0\rangle$ state is ${F}_{{\mathsf{HST}}}=(1/{2}^{2n})| \mathrm{Tr}({V}^{\dagger }U){| }^{2}$ . (b) The local Hilbert–Schmidt test (LHST), which is same as the HST circuit, except the disentangling gate ${E}^{\dagger }$ is applied only on one ${A}_{j}{B}_{j}$ pair of qubits (depicted here for the ${A}_{1}{B}_{1}$ pair) and subsequently, the same two qubits are measured in the standard basis. The probability for the outcome associated with the $| 00\rangle$ state is ${F}_{{\mathsf{LHST}}}^{\left(j\right)}$ in (5).
Download figure:
Standard image High-resolution image

An alternative cost function [19] is given by

$\begin{eqnarray}&&{C}_{{\mathsf{LHST}}}=1-{F}_{{\mathsf{LHST}}},\quad \mathrm{with}\quad {F}_{{\mathsf{LHST}}}=\displaystyle \frac{1}{n}\sum _{j=1}^{n}{F}_{{\mathsf{LHST}}}^{\left(j\right)},\end{eqnarray} \tag{ 5 }$

where ${F}_{{\mathsf{LHST}}}^{\left(j\right)}$ is the probability of the 00 measurement outcome in the local Hilbert–Schmidt test (LHST), which is the circuit shown in figure 1(b). Note that ${F}_{{\mathsf{HST}}}$ is the entanglement fidelity for the quantum channel defined by ${V}^{\dagger }U$ . On the other hand, ${F}_{{\mathsf{LHST}}}^{\left(j\right)}$ is the entanglement fidelity for the quantum channel obtained from feeding into ${V}^{\dagger }U$ the maximally mixed state on ${\overline{A}}_{j}$ and then tracing over ${\overline{A}}_{j}$ , where ${\overline{A}}_{j}$ consists of all qubits in A other than A_j. As shown in [19]

$\begin{eqnarray}&&{C}_{{\mathsf{LHST}}}\leqslant {C}_{{\mathsf{HST}}}\leqslant {{nC}}_{{\mathsf{LHST}}},\end{eqnarray} \tag{ 6 }$

which implies that ${C}_{{\mathsf{LHST}}}$ is also a faithful cost function, i.e. ${C}_{{\mathsf{LHST}}}=0$ iff V = U (up to a global phase).

The overall cost function proposed by [19] was a convex combination of ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ :

$\begin{eqnarray}&&C(q)={{qC}}_{{\mathsf{HST}}}+(1-q){C}_{{\mathsf{LHST}}}.\end{eqnarray} \tag{ 7 }$

Here, q is a free parameter with $0\leqslant q\leqslant 1$ . The definition of C(q) was motivated in [19] by the fact that ${C}_{{\mathsf{HST}}}$ has a direct operational meaning (equation (4)) but it becomes difficult to train for large n due to a vanishing gradient [31], whereas ${C}_{{\mathsf{LHST}}}$ is trainable but does not have a direct operational meaning. Hence one can take a weighted average of these two functions, where for small n one can choose $q\approx 1$ , while for large n one can choose q ≈ 0.

3.2. Compiling with a fixed input state

Fixed input state compiling (FISC) of a unitary matrix was introduced in [20, 19] and treated in significant detail in [20]. In this case, the goal is to train a gate sequence V so that it has the same effect as a target unitary U when acting on a given input state $| {\psi }_{0}\rangle$ . For simplicity and due to its technological relevance, we will consider the case where $| {\psi }_{0}\rangle =| {\bf{0}}\rangle$ is the all-zero state, so that we are interested in training V to satisfy (up to a global phase):

$\begin{eqnarray}&&U| {\bf{0}}\rangle =V| {\bf{0}}\rangle ,\quad \mathrm{or}\ \mathrm{equivalently}\quad W| {\bf{0}}\rangle =| {\bf{0}}\rangle ,\end{eqnarray} \tag{ 8 }$

with $W={V}^{\dagger }U$ . To quantify how far $W| {\bf{0}}\rangle$ is from the state $| {\bf{0}}\rangle$ , one can define the cost function

$\begin{eqnarray}&&{C}_{{\mathsf{LET}}}=1-{G}_{{\mathsf{LET}}},\end{eqnarray} \tag{ 9 }$

where ${G}_{{\mathsf{LET}}}$ is the fidelity $F(\rho ,\sigma )={\left(\mathrm{Tr}[\sqrt{\sqrt{\rho }\sigma \sqrt{\rho }}]\right)}^{2}$ between these two states:

$\begin{eqnarray}&&{G}_{{\mathsf{LET}}}=F(| {\bf{0}}\rangle \langle {\bf{0}}| ,W| {\bf{0}}\rangle \langle {\bf{0}}| {W}^{\dagger })=| \langle {\bf{0}}| W| {\bf{0}}\rangle {| }^{2}=\mathrm{Tr}\left[{P}_{{\bf{0}}}W| {\bf{0}}\rangle \langle {\bf{0}}| {W}^{\dagger }\right],\end{eqnarray} \tag{ 10 }$

with ${P}_{{\bf{0}}}=| {\bf{0}}\rangle \langle {\bf{0}}|$ the projector onto the all-zero state. We employed the LET subscript here since we refer to the circuit used to quantify (9) and (10) as the Loschmidt echo test (LET), shown in figure 2(a). The Loschmidt echo [32] refers to a forward and backward time evolution with the intent of recovering the initial state. This is analogous to the circuit in figure 2(a) where one first evolves forward with U and then attempts to undo that evolution with ${V}^{\dagger }$ , to recover the initial state $| {\bf{0}}\rangle$ . Hence the probability of the all-zero measurement outcome in figure 2(a) is precisely ${G}_{{\mathsf{LET}}}$ .

**Figure 2.** Circuits for cost evaluation in compiling with a fixed input state. (a) The Loschmidt echo test (LET). In this circuit, the probability of obtaining the measurement outcome in which all n qubits are in the $| 0\rangle$ state is ${G}_{{\mathsf{LET}}}=| \langle {\bf{0}}| {V}^{\dagger }U| {\bf{0}}\rangle {| }^{2}$ . (b) The local Loschmidt echo test (LLET), which is the same as the LET but only the A_j qubit is measured. The probability that this qubit is in the $| 0\rangle$ state is ${G}_{{\mathsf{LLET}}}^{\left(j\right)}$ in (12).
Download figure:
Standard image High-resolution image

One can see that compiling with a fixed input state leads to more freedom and hence more solutions than full unitary matrix compiling. Note that ${C}_{{\mathsf{HST}}}=0$ iff $W={{\rm{e}}}^{{\rm{i}}\phi }{\mathbb{1}}$ where ϕ is a global phase factor. On the other hand, ${C}_{{\mathsf{LET}}}=0$ iff $| \langle {\bf{0}}| W| {\boldsymbol{z}}\rangle | =| \langle {\boldsymbol{z}}| W| {\bf{0}}\rangle | ={\delta }_{{\boldsymbol{z}},{\bf{0}}}$ for all bit strings ${\boldsymbol{z}}$ . Hence, for W that achieve ${C}_{{\mathsf{LET}}}=0$ , the (n − 1) × (n − 1) unitary principal submatrix of W with matrix elements $\langle {\boldsymbol{z}}| W| {\boldsymbol{z}}^{\prime} \rangle$ (such that ${\boldsymbol{z}},{\boldsymbol{z}}^{\prime} \ne {\bf{0}}$ ) remains completely arbitrary. This degeneracy of optima can simplify the optimization of V as any of these optima will lead to ${C}_{{\mathsf{LET}}}=0$ .

Analogous to the LHST cost for full unitary matrix compiling, one can define a cost function for fixed input state compiling that involves local observables:

$\begin{eqnarray}&&{C}_{{\mathsf{LLET}}}=1-{G}_{{\mathsf{LLET}}}=1-\displaystyle \frac{1}{n}\sum _{j=1}^{n}{G}_{{\mathsf{LLET}}}^{\left(j\right)},\qquad \mathrm{with}\qquad {G}_{{\mathsf{LLET}}}^{\left(j\right)}=\mathrm{Tr}\left[\left({P}_{0}^{{A}_{j}}\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\right)W| {\bf{0}}\rangle \langle {\bf{0}}| {W}^{\dagger }\right].\end{eqnarray} \tag{ 11 }$

Here, ${P}_{0}^{{A}_{j}}$ is the projector onto the zero state on the A_j qubit, and ${{\mathbb{1}}}^{{\overline{A}}_{j}}$ denotes the identity on all qubits except A_j and n is the number of qubits. We call the circuit used to compute ${C}_{{\mathsf{LLET}}}$ the local Loschmidt echo test (LLET), and this circuit is shown in figure 2(b). Note that

$\begin{eqnarray}&&{G}_{{\mathsf{LLET}}}^{\left(j\right)}={\mathrm{Tr}}_{{A}_{j}}\left[{P}_{0}^{{A}_{j}}{\rho }^{\left(j\right)}\right]=\langle 0| {\rho }^{\left(j\right)}| 0\rangle =F(| 0\rangle \langle 0| ,{\rho }^{\left(j\right)}),\end{eqnarray} \tag{ 12 }$

where ${\rho }^{\left(j\right)}={\mathrm{Tr}}_{{\overline{A}}_{j}}\left[W| {\bf{0}}\rangle \langle {\bf{0}}| {W}^{\dagger }\right]$ . Hence ${G}_{{\mathsf{LLET}}}^{\left(j\right)}$ corresponds to the probability of the zero outcome for the circuit in figure 2(b). With a proof similar to that of (6) one can show that

$\begin{eqnarray}&&{C}_{{\mathsf{LLET}}}\leqslant {C}_{{\mathsf{LET}}}\leqslant {{nC}}_{{\mathsf{LLET}}},\end{eqnarray} \tag{ 13 }$

and hence ${C}_{{\mathsf{LLET}}}=0$ iff ${C}_{{\mathsf{LET}}}=0$ . Furthermore, one can define an overall cost function analogous to C(q) in (7)

$\begin{eqnarray}&&C^{\prime} (q)={{qC}}_{{\mathsf{LET}}}+(1-q){C}_{{\mathsf{LLET}}},\end{eqnarray} \tag{ 14 }$

which again is motivated by the fact that ${C}_{{\mathsf{LET}}}$ has a direct operational meaning but is difficult to train for large n, whereas the opposite is true for ${C}_{{\mathsf{LLET}}}$ . Hence one can take $q\approx 1$ for small n and q ≈ 0 for large n.

4. Noise processes

In this work, we consider three different types of noise [33, 34]: (1) decoherence noise, (2) gate noise, and (3) measurement noise. We now discuss how we mathematically model these three types of noise.

Let us start with decoherence. Physical models of decoherence often refer to T₁ and T₂ processes, which respectively pertain to thermal relaxation (energy dissipation) and dephasing (loss of phase coherence). These processes are typically modeled as local quantum channels acting independently on individual qubits. However, mathematically it is easier to deal with classes of quantum channels that act globally on sets of qubits (which can contain the independent local channels as a special case). In what follows, we define three types of global quantum channels: depolarizing noise, Pauli noise, and non-unital Pauli noise. It is worth noting that Pauli noise includes T₂ processes as a special case (i.e. the dephasing channel is a Pauli channel), and non-unital Pauli noise includes T₁ processes as a special case (i.e. the amplitude damping channel is a non-unital Pauli channel). Consider the following precise definitions.

Definition 1. We define depolarizing noise (DN) as a completely positive trace-preserving (CPTP) map that maps an $n$ -qubit state ρ to the state $p\rho +(1-p){\mathbb{1}}/({2}^{n})$ .

Definition 2. We define Pauli Noise (PN) as a CPTP map ${ \mathcal P }$ whose superoperator is diagonal in the Pauli basis. In other words, its action on a Pauli operator ${X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}:={X}^{{l}_{1}}{Z}^{{k}_{1}}\ \otimes ...\otimes \ {X}^{{l}_{n}}{Z}^{{k}_{n}}$ is given by ${ \mathcal P }({X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}})={c}_{{\boldsymbol{lk}}}{X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}$ , where ${c}_{{\bf{00}}}=1$ . Furthermore, we assume that ${c}_{{\boldsymbol{lk}}}\geqslant 0$ for all ${\boldsymbol{l}}$ and ${\boldsymbol{k}}$ , where ${l}_{1}$ , ..., ${l}_{n}$ , ${k}_{1}$ , ..., ${k}_{n}\in \{0,1\}$ .

Definition 3. We define non-unital Pauli noise (NUPN) as a CPTP map ${{ \mathcal P }}_{{\rm{NU}}}$ whose action on the identity is ${{ \mathcal P }}_{{\rm{NU}}}({\mathbb{1}})={\mathbb{1}}+{\sum }_{({\boldsymbol{l}},{\boldsymbol{k}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{lk}}}{X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}$ , and whose action on all other Pauli operators ${X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}$ with $({\boldsymbol{l}},{\boldsymbol{k}})\ne ({\bf{0}},{\bf{0}})$ is given by ${{ \mathcal P }}_{{\rm{NU}}}({X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}})={c}_{{\boldsymbol{lk}}}{X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}$ . Furthermore, we assume that ${c}_{{\boldsymbol{lk}}}\geqslant 0$ for all ${\boldsymbol{l}}$ and ${\boldsymbol{k}}$ .

Next, we consider gate noise. While gate noise can involve coherent errors such as systematic gate bias, such errors are hardware-specific, and hence we focus on incoherent gate noise. We consider a simple model for gate noise in which every time a gate is implemented, a Pauli channel acts both before and after this gate. Furthermore, for generality, we allow these Pauli channels to act globally on all qubits, which serves as a model for cross-talk (where gates affect qubits on which they are intended to act trivially).

Definition 4. We define Pauli gate noise (PGN) as a simple noise model in which all gates are preceded and followed by global Pauli channels. In other words, for a gate $G$ , instead of its action on a state ρ being $G\rho {G}^{\dagger }$ , we model its action as ${ \mathcal P }^{\prime} (G{ \mathcal P }(\rho ){G}^{\dagger })$ where ${ \mathcal P }$ and ${ \mathcal P }^{\prime}$ are Pauli channels. Note that these Pauli channels act on all qubits, including qubits on which $G$ acts trivially.

Finally, we consider measurement noise, also known as readout error. For a single qubit, we model measurement noise as a classical bit-flip channel, where feeding in the standard basis state $| l\rangle$ leads to the k outcome with probability p_kl. We allow for asymmetry in that one can have ${p}_{01}\ne {p}_{10}$ , which is an important generality, e.g. when T₁ noise occurs during the measurement process. For multiple qubits, our measurement noise model is a tensor product of the aforementioned bit-flip channels, corresponding to uncorrelated measurement noise.

Definition 5. We define measurement noise (MN) as a modification of the standard-basis POVM elements, which are $\{{P}_{0}=| 0\rangle \langle 0| ,{P}_{1}=| 1\rangle \langle 1| \}$ for a noiseless single qubit. With measurement noise, this POVM gets replaced by $\{{\widetilde{P}}_{0},{\widetilde{P}}_{1}\}$ , with ${\widetilde{P}}_{0}={p}_{00}| 0\rangle \langle 0| +{p}_{01}| 1\rangle \langle 1|$ and ${\widetilde{P}}_{1}={p}_{10}| 0\rangle \langle 0| +{p}_{11}| 1\rangle \langle 1|$ , where ${p}_{00}+{p}_{10}=1$ , ${p}_{01}+{p}_{11}=1$ , and ${p}_{{kl}}$ is the probability of getting the $k$ outcome given the $l$ input. Furthermore we assume that ${p}_{{kk}}\gt {p}_{{kl}}$ for $l\ne k$ . Hence, for an $n$ -qubit standard-basis measurement with measurement noise, we write the POVM element associated with the bit string ${\boldsymbol{z}}=({z}_{1},\,\ldots ,\,{z}_{n})$ as

$\begin{eqnarray}&&{\widetilde{P}}_{{\boldsymbol{z}}}=\underset{j=1}{\overset{n}{\displaystyle \bigotimes }}\left({p}_{{z}_{j}0}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{{z}_{j}1}^{\left(j\right)}| 1\rangle \langle 1| \right),\end{eqnarray} \tag{ 15 }$

with ${\sum }_{{z}_{j}}{p}_{{z}_{j}0}^{\left(j\right)}=1$ and ${\sum }_{{z}_{j}}{p}_{{z}_{j}1}^{\left(j\right)}=1$ , and we assume that ${p}_{{z}_{j}{z}_{j}}^{\left(j\right)}\gt {p}_{{z}_{j}l}^{\left(j\right)}$ for $l\ne {z}_{j}$ .

5. Main results

Before proceeding to the main results we first define two versions of optimal parameter resilience (OPR), i.e. of learning the correct gate sequence V despite various sources of noise, which we refer to as strong-OPR and weak-OPR.

Definition 6. Let ${{\mathbb{V}}}_{d}$ be the set of $d\times d$ unitary matrices. Let ${C}_{{\mathsf{QC}}}(V)$ be a cost function of $V$ with $V\in {{\mathbb{V}}}_{d}$ , and suppose that ${C}_{{\mathsf{QC}}}(V)$ can be evaluated using a quantum circuit denoted ${\mathsf{QC}}$ . Let ${\widetilde{C}}_{{\mathsf{QC}}}(V)$ denote the noisy version of ${C}_{{\mathsf{QC}}}(V)$ , i.e. the corresponding function whenever the circuit ${\mathsf{QC}}$ is run in the presence of some noise process ${ \mathcal N }$ . Let ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ and ${\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}$ respectively denote the sets of unitaries that optimize ${C}_{{\mathsf{QC}}}(V)$ and ${\widetilde{C}}_{{\mathsf{QC}}}(V)$ , i.e.

$\begin{eqnarray}&&{{\mathbb{V}}}_{d}^{{\rm{opt}}}=\{V^{\prime} \in {{\mathbb{V}}}_{d}:{C}_{{\mathsf{QC}}}(V^{\prime} )=\mathop{\min }\limits_{V\in {{\mathbb{V}}}_{d}}{C}_{{\mathsf{QC}}}(V)\},\end{eqnarray} \tag{ 16 }$

$\begin{eqnarray}&&{\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}=\{V^{\prime} \in {{\mathbb{V}}}_{d}:{\widetilde{C}}_{{\mathsf{QC}}}(V^{\prime} )=\mathop{\min }\limits_{V\in {{\mathbb{V}}}_{d}}{\widetilde{C}}_{{\mathsf{QC}}}(V)\}.\end{eqnarray} \tag{ 17 }$

We say that ${C}_{{\mathsf{QC}}}(V)$ exhibits strong-OPR to ${ \mathcal N }$ if ${\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}={{\mathbb{V}}}_{d}^{{\rm{opt}}}$ . We say that ${C}_{{\mathsf{QC}}}(V)$ exhibits weak-OPR to ${ \mathcal N }$ if ${\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}\subseteq {{\mathbb{V}}}_{d}^{{\rm{opt}}}$ .

5.1. Noise resilience of full unitary matrix compiling

Let us begin with full unitary matrix compiling (FUMC). Figure 3 shows the two noise models that we will consider for FUMC. As shown in this figure, τ₁ and τ₂ are respectively defined as the times just before and just after the application of ${V}^{\dagger }U$ . We note that the noise models considered in figure 3 capture fairly well the physical noise that is present in, e.g. superconducting-qubit quantum computers, with the exception that only depolarizing noise is allowed during the action of ${V}^{\dagger }U$ . We make this simplification for ease of analysis, although our numerics in section 6 relax this assumption.

**Figure 3.** Schematic diagram of: (a) Noise Model 1 of definition 7, and (b) Noise Model 2 of definition 8. The following acronyms are employed: depolarizing noise (DN), Pauli gate noise (PGN), Pauli noise (PN), non-unital Pauli noise (NUPN), and measurement noise (MN). Red dashed boxes indicate the time period and the qubits on which the noise process acts. Time τ₁ (τ₂) corresponds to the time immediately before (after) the action of the unitary ${V}^{\dagger }U$ . While both panels show the HST, these noise models are also applicable to the LHST, provided one replaces ${E}^{\dagger }$ with ${({E}^{\left(j\right)})}^{\dagger }$ .
Download figure:
Standard image High-resolution image

Consider the following definition for the noise model depicted in figure 3(a).

Definition 7. We define noise Model 1 to be the following noise process during the HST circuit: (1) global depolarizing noise acting continuously throughout the circuit, (2) global Pauli noise at times ${\tau }_{1}$ and ${\tau }_{2}$ , (3) global depolarizing noise on system A acting continuously in between ${\tau }_{1}$ and ${\tau }_{2}$ , (4) global non-unital Pauli noise on system $B$ acting continuously in between ${\tau }_{1}$ and ${\tau }_{2}$ , (5) Pauli gate noise during $E$ and ${E}^{\dagger }$ , and (6) measurement noise. We also use the term Noise Model 1 when the same noise model acts during the LHST circuit, provided one replaces ${E}^{\dagger }$ with ${({E}^{\left(j\right)})}^{\dagger }$ .

We now state our first main result. The proof of this result is given in appendix D, with some useful preliminaries and lemmas given in appendices A–C.

Theorem 1. The cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to Noise Model 1 in definition 7.

Note that this theorem also implies that $C(q)={{qC}}_{{\mathsf{HST}}}+(1-q){C}_{{\mathsf{LHST}}}$ exhibits strong-OPR to Noise Model 1, for all values of q. This is because the set ${{\mathbb{V}}}_{d}^{{\rm{opt}}}={\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}$ defined in (16) and (17) is the same for ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ functions. Hence this same set is optimal for C(q).

Consider the implications of theorem 1. First, this theorem implies that FUMC is resilient to the measurement noise model in definition 5. Second, FUMC is completely resilient to Pauli gate noise during the entangling and disentangling gates, E and ${E}^{\dagger }$ . Note that this Pauli gate noise is global and hence accounts for cross talk. Third, FUMC is resilient to global depolarizing noise acting continuously throughout the circuit, as well as global Pauli noise acting at the specific times τ₁ and τ₂. Fourth, FUMC is resilient to depolarizing noise acting on system A and non-unital Pauli noise acting on system B, provided that each of these process act (possibly continuously) during the time interval between τ₁ and τ₂. We emphasize that Pauli noise includes dephasing channels (T₂ noise) as a special case, while non-unital Pauli noise includes the depashing channel ( ${T}_{1}$ noise) as a special case. Importantly, theorem 1 states that FUMC is resilient to the general case where all of these noise processes occur together.

We now state our second main result (proven in appendix E), which deals with the noise model in figure 3(b).

Definition 8. We define Noise Model 2 to be the following noise process during the HST circuit: (1) global depolarizing noise acting continuously throughout the circuit, (2) global Pauli noise at times ${\tau }_{1}$ and ${\tau }_{2}$ , (3) global non-unital Pauli noise on system $A$ at time ${\tau }_{1}$ , (4) global depolarizing noise on system $A$ acting continuously in between ${\tau }_{1}$ and ${\tau }_{2}$ , (5) global Pauli noise on system $B$ acting continuously in between ${\tau }_{1}$ and ${\tau }_{2}$ , (6) Pauli gate noise during $E$ and ${E}^{\dagger }$ , and (7) measurement noise. We also use the term Noise Model 2 when the same noise model acts during the LHST circuit, provided one replaces ${E}^{\dagger }$ with ${({E}^{\left(j\right)})}^{\dagger }$ .

Theorem 2. The cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to Noise Model 2 in definition 8.

The implications of theorem 2 are similar to those of theorem 1. The main difference is that theorem 2 allows for non-unital Pauli noise on system A at time τ₁, at the expense of only allowing Pauli noise to act continuously on system B between τ₁ and τ₂. The other aspects of the noise models treated by these two theorems are identical.

The above two theorems immediately imply several corollaries below. These corollaries establish resilience to noise models that are different and in some cases more general than the noise models previously considered, at the expense of possibly specializing the form of the unitary $W={V}^{\dagger }U$ . See appendix G for the proofs of all corollaries.

Corollary 1. The cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 1, as well as (2) a noise process during the implementation of ${ \mathcal W }={{ \mathcal W }}_{k}\ \circ \cdots \circ \ {{ \mathcal W }}_{1}={{ \mathcal V }}^{\dagger }\,\circ \,{ \mathcal U }$ ( ${\rm{i}}.{\rm{e}}.$ in the time interval between ${\tau }_{1}$ and ${\tau }_{2}$ ) in which global Pauli channels $\{{{ \mathcal P }}_{1}^{A}$ , ..., ${{ \mathcal P }}_{k}^{A}\}$ act on system $A$ , such that the overall channel on $A$ is ${{ \mathcal P }}_{k}^{A}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \ {{ \mathcal P }}_{1}^{A}\,\circ \,{{ \mathcal W }}_{1}$ , provided that the following condition is satisfied:

$\begin{eqnarray}&&({{ \mathcal P }}_{k}^{A}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{1}^{A}\,\circ \,{{ \mathcal W }}_{1})(\cdot )=({{ \mathcal W }}_{k}\,\circ \,{{ \mathcal W }}_{k-1}\cdots \circ \,{{ \mathcal W }}_{1}\,\circ \,{\widehat{{ \mathcal P }}}^{A})(\cdot ).\end{eqnarray} \tag{ 18 }$

Here ${\widehat{{ \mathcal P }}}^{A}$ is also a Pauli channel, and the channels ${ \mathcal U }$ , ${{ \mathcal V }}^{\dagger }$ , and ${ \mathcal W }$ correspond to conjugating the state by the unitaries $U$ , ${V}^{\dagger }$ , and $W$ , respectively.

The condition in (18) implies that the overall channel consisting of global Pauli channels acting on system A during the implementation of ${ \mathcal W }$ is mathematically equivalent (although physically inequivalent) to a Pauli channel followed by ${ \mathcal W }$ . Therefore, corollary 1 follows from theorem 1.

Consider the following implications of corollary 1. Unitaries corresponding to the Clifford group necessarily satisfy the condition in (18), as shown in appendix A. Therefore, corollary 2 below holds for any Clifford unitary W. Moreover, tensor-product unitaries satisfy this same condition provided that the noise is local depolarizing noise, and hence corollary 3 below also follows from corollary 1.

Corollary 2. Let the $W={V}^{\dagger }U$ gate sequence have the form $W={W}_{2}^{A}{W}_{1}^{A}$ with ${W}_{1}^{A}$ composed only of Clifford gates. Then the cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 1, as well as (2) a noise process during the implementation of ${{ \mathcal W }}_{1}^{A}={{ \mathcal W }}_{1,k}\,\circ \,\cdots \circ \ {{ \mathcal W }}_{\mathrm{1,1}}$ , in which global Pauli channels $\{{{ \mathcal P }}_{1}^{A}$ , ..., ${{ \mathcal P }}_{k}^{A}\}$ act on system $A$ , such that the overall channel on $A$ is ${{ \mathcal P }}_{k}^{A}\,\circ \,{{ \mathcal W }}_{1,k}\cdots \circ \ {{ \mathcal P }}_{1}^{A}\ \circ \ {{ \mathcal W }}_{\mathrm{1,1}}$ .

Corollary 3. Let the $W={V}^{\dagger }U$ gate sequence have the form $W={W}_{2}^{A}{W}_{1}^{A}$ with ${W}_{1}^{A}={W}_{1}^{A^{\prime} }\otimes {W}_{1}^{A^{\prime\prime} }$ being a tensor product, ${\rm{i}}.{\rm{e}}.$ $W$ is a tensor product up to a particular time. Then the cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 1, as well as (2) a noise process during the implementations of ${{ \mathcal W }}_{1}^{A^{\prime} }={{ \mathcal W }}_{1,k}^{A^{\prime} }\ \circ \cdots \circ \ {{ \mathcal W }}_{1,1}^{A^{\prime} }$ and ${{ \mathcal W }}_{1}^{A^{\prime\prime} }={{ \mathcal W }}_{1,l}^{A^{\prime\prime} }\ \circ \cdots \circ \ {{ \mathcal W }}_{1,1}^{A^{\prime\prime} }$ in which local depolarizing channels $\{{{ \mathcal D }}_{1,1}^{A^{\prime} }$ , ..., ${{ \mathcal D }}_{1,k}^{A^{\prime} }\}$ and $\{{{ \mathcal D }}_{1,1}^{A^{\prime\prime} }$ , ..., ${{ \mathcal D }}_{1,l}^{A^{\prime\prime} }\}$ act on subsystems $A^{\prime}$ and $A^{\prime\prime}$ , respectively, such that the overall channel on $A=A^{\prime} A^{\prime\prime}$ is $({{ \mathcal D }}_{1,k}^{A^{\prime} }\,\circ \,{{ \mathcal W }}_{1,k}^{A^{\prime} }...{{ \mathcal D }}_{1,1}^{A^{\prime} }\,\circ \,{{ \mathcal W }}_{1,1}^{A^{\prime} })\otimes ({{ \mathcal D }}_{1,l}^{A^{\prime\prime} }\,\circ \,{{ \mathcal W }}_{1,l}^{A^{\prime\prime} }...{{ \mathcal D }}_{1,1}^{A^{\prime\prime} }\,\circ \,{{ \mathcal W }}_{1,1}^{A^{\prime\prime} })$ .

The following corollary follows from theorem 2 and is analogous to corollary 1.

Corollary 4. The cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to the following noise model: (1) all noise processes in Noise Model 2, as well as (2) a noise process during the implementation of ${ \mathcal W }={{ \mathcal W }}_{k}\ \circ \cdots \circ \ {{ \mathcal W }}_{1}={{ \mathcal V }}^{\dagger }\,\circ \,{ \mathcal U }$ (i.e. in the time interval between ${\tau }_{1}$ and ${\tau }_{2}$ ) in which global non-unital Pauli channels $\{{{ \mathcal P }}_{{\rm{NU}},1}^{A}$ , ..., ${{ \mathcal P }}_{{\rm{NU}},k}^{A}\}$ act on system $A$ such that the overall channel on $A$ is ${{ \mathcal P }}_{{\rm{NU}},k}^{A}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{{\rm{NU}},1}^{A}\,\circ \,{{ \mathcal W }}_{1}$ , provided that the following condition is satisfied:

$\begin{eqnarray}&&({{ \mathcal P }}_{{\rm{NU}},k}^{A}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{{\rm{NU}},1}^{A}\circ {{ \mathcal W }}_{1})(\cdot )=({{ \mathcal W }}_{k}\circ {{ \mathcal W }}_{k-1}\cdots {{ \mathcal W }}_{1}\circ {\widehat{{ \mathcal P }}}_{{\rm{NU}}}^{A})(\cdot ),\end{eqnarray} \tag{ 19 }$

where ${\widehat{{ \mathcal P }}}_{{\rm{NU}}}^{A}$ is also a non-unital Pauli channel.

Finally, we present a simple corollary of theorem 1 based on the ricochet property of the standard Bell state. Note that the noise model in the following corollary is fairly simple but nonetheless physically distinct from those considered in figure 3, since it allows for global non-unital Pauli noise to occur during the implementation of W.

Corollary 5. The cost functions ${C}_{{\mathsf{HST}}}$ exhibits strong-OPR to the following noise model: (1) global depolarizing noise acting continuously throughout the circuit, (2) global non-unital Pauli noise on system $A$ at a fixed time in between ${\tau }_{1}$ and ${\tau }_{2}$ .

5.2. Noise resilience of fixed input state compiling

Let us now consider fixed input state compiling (FISC). Recall that the cost-evaluation circuits, shown in figure 2, have less structure than the circuits in figure 1. As a result, the noise model that we consider in the FISC case is simpler than the previously considered noise models. In particular, we define the following noise model, which is depicted in figure 4. Note that, in this context, τ₁ is defined as the time just before the application of ${V}^{\dagger }U$ , and there is no need to consider a noisy quantum channel occurring after ${V}^{\dagger }U$ since the measurement occurs immediately after ${V}^{\dagger }U$ .

**Figure 4.** Schematic diagram of Noise Model 3 of definition 9 for: (a) the LET circuit, and (b) the LLET circuit. Global depolarizing noise (DN) acts continuously throughout the circuit, global Pauli noise (PN) acts at time τ₁, and measurement noise (MN) occurs during readout.
Download figure:
Standard image High-resolution image

Definition 9. We define Noise Model 3 to be the following noise process during the LET or the LLET: (1) global depolarizing noise acting continuously throughout the circuit, (2) global Pauli noise acting at time ${\tau }_{1}$ , and (3) measurement noise.

We now state our main result for FISC, which is proven in appendix F.

Theorem 3. The cost functions ${C}_{{\mathsf{LET}}}$ and ${C}_{{\mathsf{LLET}}}$ exhibit weak-OPR, as defined in definition 6, to Noise Model 3 in definition 9.

This theorem implies that FISC is resilient to the measurement noise model in definition 5. Furthermore, it is resilient to Pauli noise acting at τ₁ and global depolarizing noise acting continuously throughout the circuit.

We remark that while FUMC exhibits strong-OPR for the noise models considered (see the previous section), here FISC exhibits weak-OPR instead. The latter arises from the fact that the optimal set of unitaries ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ for FISC can be highly degenerate (i.e. can contain many unitaries) and the presence of noise could in general break such degeneracy. The 'weak' term in weak-OPR is simply the fact that the number of global optima is possibly reduced by noise, not that the noise resilience itself is weak. Hence, weak-OPR should still be viewed as noise resilience, since the global optima in the presence of noise correspond to global optima in the noiseless case. This implies that training in the presence of noise will lead one to find the correct optimal parameters for $V({\boldsymbol{\alpha }})$ .

Under certain conditions, theorem 3 implies that $C^{\prime} (q)$ defined in (14) will also exhibit weak-OPR to Noise Model 3. Let ${{\mathbb{V}}}_{d,{\mathsf{LET}}}^{{\rm{opt}}}$ and ${{\mathbb{V}}}_{d,{\mathsf{LLET}}}^{{\rm{opt}}}$ denote the sets of unitaries that optimize ${C}_{{\mathsf{LET}}}$ and ${C}_{{\mathsf{LLET}}}$ , respectively. In the absence of noise we have ${{\mathbb{V}}}_{d,{\mathsf{LET}}}^{{\rm{opt}}}={{\mathbb{V}}}_{d,{\mathsf{LLET}}}^{{\rm{opt}}}$ , while in the presence of noise, theorem 3 implies ${\widetilde{{\mathbb{V}}}}_{d,{\mathsf{LET}}}^{{\rm{opt}}}\subseteq {{\mathbb{V}}}_{d,{\mathsf{LET}}}^{{\rm{opt}}}$ and ${\widetilde{{\mathbb{V}}}}_{d,{\mathsf{LLET}}}^{{\rm{opt}}}\subseteq {{\mathbb{V}}}_{d,{\mathsf{LLET}}}^{{\rm{opt}}}$ . Hence, if ${\widetilde{{\mathbb{V}}}}_{d,{\mathsf{LET}}}^{{\rm{opt}}}\cap {\widetilde{{\mathbb{V}}}}_{d,{\mathsf{LLET}}}^{{\rm{opt}}}\ne \varnothing$ , then for any value of q, $C^{\prime} (q)={{qC}}_{{\mathsf{LET}}}+(1-q){C}_{{\mathsf{LLET}}}$ will also exhibit weak-OPR to Noise Model 3, where the unitaries that optimize $C^{\prime} (q)$ in the noisy case belong to ${\widetilde{{\mathbb{V}}}}_{d,{\mathsf{LET}}}^{{\rm{opt}}}\cap {\widetilde{{\mathbb{V}}}}_{d,{\mathsf{LLET}}}^{{\rm{opt}}}$ .

Theorem 3 implies the following corollaries, which establish resilience to noise models that go beyond Noise Model 3 at the expense of specializing the form of W. Note that these corollaries are analogous to Corollaries 1–3, and corollary 6 implies Corollaries 7 and 8. See appendix G for the proofs.

Corollary 6. The cost functions ${C}_{{\mathsf{LET}}}$ and ${C}_{{\mathsf{LLET}}}$ exhibit weak-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 3, as well as (2) a noise process during the implementation of ${ \mathcal W }={{ \mathcal W }}_{k}\ \circ \cdots \circ \ {{ \mathcal W }}_{1}\,={{ \mathcal V }}^{\dagger }\,\circ \,{ \mathcal U }$ in which global Pauli channels $\{{{ \mathcal P }}_{1}$ , ..., ${{ \mathcal P }}_{k}\}$ act, such that the overall channel is ${{ \mathcal P }}_{k}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{1}\,\circ \,{{ \mathcal W }}_{1}$ , provided that the following condition is satisfied:

$\begin{eqnarray}&&({{ \mathcal P }}_{k}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{1}\,\circ \,{{ \mathcal W }}_{1})(\cdot )=({{ \mathcal W }}_{k}\,\circ \,{{ \mathcal W }}_{k-1}\cdots \circ \ {{ \mathcal W }}_{1}\,\circ \,\widehat{{ \mathcal P }})(\cdot ),\end{eqnarray} \tag{ 20 }$

where $\widehat{{ \mathcal P }}$ is also a Pauli channel.

Corollary 7. Let the $W={V}^{\dagger }U$ gate sequence have the form $W={W}_{2}^{A}{W}_{1}^{A}$ with ${W}_{1}^{A}$ composed only of Clifford gates. Then the cost functions ${C}_{{\mathsf{LET}}}$ and ${C}_{{\mathsf{LLET}}}$ exhibit weak-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 3, as well as (2) a noise process during the implementation of ${{ \mathcal W }}_{1}^{A}={{ \mathcal W }}_{1,k}\ \circ \cdots \circ \ {{ \mathcal W }}_{\mathrm{1,1}}$ , in which global Pauli channels $\{{{ \mathcal P }}_{1}^{A}$ , ..., ${{ \mathcal P }}_{k}^{A}\}$ act on system $A$ , such that the overall channel on $A$ is ${{ \mathcal P }}_{k}^{A}\,\circ \,{{ \mathcal W }}_{1,k}\cdots \circ \,{{ \mathcal P }}_{1}^{A}\circ {{ \mathcal W }}_{\mathrm{1,1}}$ .

Corollary 8. Let the $W={V}^{\dagger }U$ gate sequence have the form $W={W}_{2}^{A}{W}_{1}^{A}$ with ${W}_{1}^{A}={W}_{1}^{A^{\prime} }\otimes {W}_{1}^{A^{\prime\prime} }$ being a tensor product, i.e. $W$ is a tensor product up to a particular time. Then the cost functions ${C}_{{\mathsf{LET}}}$ and ${C}_{{\mathsf{LLET}}}$ exhibit weak-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 3, as well as (2) a noise process during the implementations of ${{ \mathcal W }}_{1}^{A^{\prime} }={{ \mathcal W }}_{1,k}^{A^{\prime} }\ \circ \cdots \circ \ {{ \mathcal W }}_{1,1}^{A^{\prime} }$ and ${{ \mathcal W }}_{1}^{A^{\prime\prime} }={{ \mathcal W }}_{1,l}^{A^{\prime\prime} }\ \circ \cdots \circ \ {{ \mathcal W }}_{1,1}^{A^{\prime\prime} }$ in which local depolarizing channels $\{{{ \mathcal D }}_{1,1}^{A^{\prime} }$ , ..., ${{ \mathcal D }}_{1,k}^{A^{\prime} }\}$ and $\{{{ \mathcal D }}_{1,1}^{A^{\prime\prime} }$ , ..., ${{ \mathcal D }}_{1,l}^{A^{\prime\prime} }\}$ act on subsystems $A^{\prime}$ and $A^{\prime\prime}$ , respectively, such that the overall channel on $A=A^{\prime} A^{\prime\prime}$ is $({{ \mathcal D }}_{1,k}^{A^{\prime} }\,\circ \,{{ \mathcal W }}_{1,k}^{A^{\prime} }...{{ \mathcal D }}_{1,1}^{A^{\prime} }\,\circ \,{{ \mathcal W }}_{1,1}^{A^{\prime} })\otimes ({{ \mathcal D }}_{1,l}^{A^{\prime\prime} }\,\circ \,{{ \mathcal W }}_{1,l}^{A^{\prime\prime} }...{{ \mathcal D }}_{1,1}^{A^{\prime\prime} }\,\circ \,{{ \mathcal W }}_{1,1}^{A^{\prime\prime} })$ .

6. Implementations

In this section, we present the results of implementing VQC on the following three-qubit unitaries: the Toffoli gate, the three-qubit quantum Fourier transform (QFT), and a W-state preparation circuit. Each of these unitaries is of interest, e.g. the Toffoli gate when combined with the Hadamard gate provides a universal gate set for quantum computing [35], the QFT is a subroutine in Shor's algorithm [36], and W-state preparation is useful for the quantum approximate optimization algorithm [37, 9]. Figure 5 shows gate sequences corresponding to these unitaries obtained from the literature. The Toffoli gate in figure 5(a) is decomposed into a gate sequence that contains nine one-qubit gates and six CNOTs [38]. For the QFT we employ its textbook circuit [33] in figure 5(b), while the circuit for W-state preparation in figure 5(c) was derived from [39, 40].

**Figure 5.** Quantum circuits for: (a) Toffoli Gate, (b) three-qubit quantum Fourier transform, and (c) three-qubit W-state preparation. Here, R_m stands for the controlled phase gate with a phase shift of $\phi ={{\rm{e}}}^{2\pi {\rm{i}}/{2}^{m}}$ , and ${V}_{k}({{\boldsymbol{\beta }}}_{k})$ is given by (21). For the three-qubit W-state preparation circuit we have ${{\boldsymbol{\beta }}}_{1}=(2\arccos (\sqrt{1/3}),0,0)$ and ${{\boldsymbol{\beta }}}_{2}=(\pi /2,0,0)$ .
Download figure:
Standard image High-resolution image

**Figure 5.** Quantum circuits for: (a) Toffoli Gate, (b) three-qubit quantum Fourier transform, and (c) three-qubit W-state preparation. Here, R_m stands for the controlled phase gate with a phase shift of $\phi ={{\rm{e}}}^{2\pi {\rm{i}}/{2}^{m}}$ , and ${V}_{k}({{\boldsymbol{\beta }}}_{k})$ is given by (21). For the three-qubit W-state preparation circuit we have ${{\boldsymbol{\beta }}}_{1}=(2\arccos (\sqrt{1/3}),0,0)$ and ${{\boldsymbol{\beta }}}_{2}=(\pi /2,0,0)$ .
Download figure:
Standard image High-resolution image

Our VQC implementations were performed using IBM's noisy quantum simulator [28] with a noise model built from the reported noise parameters and connectivity of IBM's 14-qubit Melbourne quantum computer [41]. We remark that for VQC, we must have a target unitary U that is written as a gate sequence in the native gate language and the native connectivity of the hardware. IBM's simulator for the Melbourne device has a square lattice connectivity and native gate alphabet of CNOTs, arbitrary rotation around Z and $\pi /2$ rotation around X. Hence, transforming the gate sequences in figure 5 for the native device will typically add an overhead of additional gates. Therefore, the target gate sequences in our implementations actually correspond to IBM's compilation (with this overhead included) of the circuits in figure 5.

In IBM's noise model [28, 42], one-qubit gate errors are modeled as a single-qubit depolarizing error followed by a thermal relaxation error, where thermal relaxation refers to both T₁ and T₂ channels. Similarly, two-qubit gate errors consist of a two-qubit depolarizing error followed by single-qubit thermal relaxation errors on each qubit. Finally, the noise model includes single-qubit readout errors.

We employ two different ansatzes, shown in figure 6, and (as described below) we employ gradient-based optimization algorithms to train the gate sequence $V({\boldsymbol{\alpha }})$ . In figures 7–8, we plot the results of implementing VQC with IBM's noisy simulator for the three-qubit gates in figure 5. In each plot, we show the value of the noisy cost functions versus the number of iterations of the optimization algorithm. Additionally, we plot the corresponding value of the noiseless cost functions evaluated for the variational parameters ${\boldsymbol{\alpha }}$ obtained from the noisy optimization. These results allow us to verify if the parameters obtained from the noisy optimization are indeed minimizing the noiseless cost functions. Before discussing the results, we first give details for our ansatzes and optimization methods.

**Figure 6.** (a) The dressed CNOT is composed of a CNOT preceded and followed by single-qubit gates ${V}_{k}({{\boldsymbol{\alpha }}}_{k})$ , where ${V}_{k}({{\boldsymbol{\alpha }}}_{k})$ is given by (21). (b) Two layers of the alternating-pair ansatz in the case of four qubits. Each layer is composed of dressed CNOTs acting on alternating pairs of neighboring qubits. (c) Schematic representation of the target-inspired ansatz. In this approach, the gate sequence of dressed CNOTs is obtained from the gate sequence of the target unitary U.
Download figure:
Standard image High-resolution image

**Figure 7.** VQC implementations for the Toffoli gate (top) and three-qubit QFT (bottom). The ansatz for $V({\boldsymbol{\alpha }})$ is: (a) one layer of the alternating-pair ansatz, (b) two layers of the alternating-pair ansatz, (c) the target-inspired ansatz. The blue and green curves respectively plot the values of ${\widetilde{C}}_{{\mathsf{HST}}}$ and ${\widetilde{C}}_{{\mathsf{LHST}}}$ obtained by training $V({\boldsymbol{\alpha }})$ in the presence of noise. The green and pink curves respectively plot the values of ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ evaluated at the variational parameters ${\boldsymbol{\alpha }}$ obtained from the noisy optimization of $V({\boldsymbol{\alpha }})$ . Curves are plotted as a function of the number of iterations in the gradient-descent algorithm, and the y-axis is in log-scale. The blue and red dashed lines in (a) and (b) correspond to the minimum value of ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ , respectively, determined by optimizing $V({\boldsymbol{\alpha }})$ in a noise-free environment. Top: in both (a) and (b), the green and pink curves converge to the dashed blue and red lines, respectively. Bottom: While in (a) the green and pink curves converge to the dashed lines, in (b) the termination condition for the optimization algorithm was reached before the pink curve could achieve convergence. The number of shots per iteration was N = 50 000 for (a) and (b). For (c) we employed the iCANS optimizer [44], where the total number of shots was $1.4\times {10}^{7}$ and the minimum number of shots per iteration was initially ${N}_{\min }=2$ . The thick dashed vertical line in (c) indicates the point where we set ${N}_{\min }=250$ , which helped to further reduce the cost function.
Download figure:
Standard image High-resolution image

**Figure 7.** VQC implementations for the Toffoli gate (top) and three-qubit QFT (bottom). The ansatz for $V({\boldsymbol{\alpha }})$ is: (a) one layer of the alternating-pair ansatz, (b) two layers of the alternating-pair ansatz, (c) the target-inspired ansatz. The blue and green curves respectively plot the values of ${\widetilde{C}}_{{\mathsf{HST}}}$ and ${\widetilde{C}}_{{\mathsf{LHST}}}$ obtained by training $V({\boldsymbol{\alpha }})$ in the presence of noise. The green and pink curves respectively plot the values of ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ evaluated at the variational parameters ${\boldsymbol{\alpha }}$ obtained from the noisy optimization of $V({\boldsymbol{\alpha }})$ . Curves are plotted as a function of the number of iterations in the gradient-descent algorithm, and the y-axis is in log-scale. The blue and red dashed lines in (a) and (b) correspond to the minimum value of ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ , respectively, determined by optimizing $V({\boldsymbol{\alpha }})$ in a noise-free environment. Top: in both (a) and (b), the green and pink curves converge to the dashed blue and red lines, respectively. Bottom: While in (a) the green and pink curves converge to the dashed lines, in (b) the termination condition for the optimization algorithm was reached before the pink curve could achieve convergence. The number of shots per iteration was N = 50 000 for (a) and (b). For (c) we employed the iCANS optimizer [44], where the total number of shots was $1.4\times {10}^{7}$ and the minimum number of shots per iteration was initially ${N}_{\min }=2$ . The thick dashed vertical line in (c) indicates the point where we set ${N}_{\min }=250$ , which helped to further reduce the cost function.
Download figure:
Standard image High-resolution image

**Figure 8.** VQC implementations for the three-qubit W-state preparation circuit for (a) the FUMC approach, and (b) the FISC approach. The trainable gate sequence $V({\boldsymbol{\alpha }})$ is given by the target-inspired ansatz. In the left (right) panel the blue and green curves plot respectively the values of ${\widetilde{C}}_{{\mathsf{HST}}}$ ( ${\widetilde{C}}_{{\mathsf{LET}}}$ ) and ${\widetilde{C}}_{{\mathsf{LHST}}}$ ( ${\widetilde{C}}_{{\mathsf{LLET}}}$ ) obtained by noisy training of $V({\boldsymbol{\alpha }})$ . Similarly, in the left (right) panel the green and pink curves give respectively the values of ${C}_{{\mathsf{HST}}}$ ( ${\widetilde{C}}_{{\mathsf{LET}}}$ ) and ${C}_{{\mathsf{LHST}}}$ ( ${\widetilde{C}}_{{\mathsf{LLET}}}$ ) evaluated at the variational parameters ${\boldsymbol{\alpha }}$ obtained from the noisy optimization of $V({\boldsymbol{\alpha }})$ . Curves are plotted as a function of the number of gradient-descent iterations, with the y-axis in log-scale. Via noisy training, the noiseless cost functions go down to $\sim {10}^{-4}$ . Initially we set ${N}_{\min }=2$ , and the thick dashed vertical lines shows the point where we increased this value to ${N}_{\min }=250$ . Increasing the minimum number of shots iCANS employs to compute each partial derivative leads to smaller cost function values in both cases.
Download figure:
Standard image High-resolution image

6.1. Ansatzes and optimization methods

As previously mentioned, to implement VQC we consider two ansatzes for the trainable unitary $V({\boldsymbol{\alpha }})$ . The building block of our ansatzes is a dressed CNOT gate, which is a two-qubit gate composed of a CNOT preceded and followed by single-qubit gates ${V}_{k}({{\boldsymbol{\alpha }}}_{k})$ acting on each qubit, as shown in figure 6(a). Each single-qubit gate ${V}_{k}({{\boldsymbol{\alpha }}}_{k})$ is decomposed (up to a global phase) into three elementary rotations parameterized by three angles in the vector ${{\boldsymbol{\alpha }}}_{k}=({\alpha }_{k,1},{\alpha }_{k,2},{\alpha }_{k,3})$ as

$\begin{eqnarray}&&{V}_{k}({{\boldsymbol{\alpha }}}_{k})={{\rm{e}}}^{-{\rm{i}}{\alpha }_{k,3}{\sigma }_{z}/2}{{\rm{e}}}^{-{\rm{i}}{\alpha }_{k,2}{\sigma }_{y}/2}{{\rm{e}}}^{-{\rm{i}}{\alpha }_{k,1}{\sigma }_{z}/2}.\end{eqnarray} \tag{ 21 }$

Let us now introduce our ansatzes. We note that our two ansatzes are fairly similar to the ones introduced in [19]. In our first ansatz, each layer is composed of n dressed CNOTs, where n is the number of qubits (in the special case of n = 2 each layer consists of one dressed CNOT), with the precise structure defined as follows.

Definition 10. We define the alternating-pair ansatz as a layered ansatz in which each layer consists of (parameterized) dressed CNOT gates acting on alternating pairs of neighboring qubits as illustrated in figure 6(b).

We remark that it is useful to distinguish between a complete ansatz, in which an exact compilation for U is contained inside the ansatz, versus an incomplete ansatz, where exact compilation is not possible. In general, a small number of layers can lead to an incomplete ansatz, where one can only reach approximate compilation. Hence, increasing the number of layers l could allow one to obtain better compilations of U. Note however that while a large number of layers can achieve a complete ansatz, it can also be harder to train and can lead to a longer-depth circuit.

The alternating-pair ansatz may not lead to the optimal depth compilation for U, particularly in the complete ansatz case. Our second ansatz attempts to fix the issue of introducing unnecessary depth by having a structure that depends on U.

Definition 11. We construct the target-inspired ansatz by taking the gate sequence for the target unitary $U$ , expanding this gate sequence into single-qubit gates and CNOTs, removing all single-qubit gates that precede or follow a CNOT, and replacing each remaining CNOT in the gate sequence with a (parameterized) dressed CNOT. Finally, each remaining single-qubit gate is replaced by a parametrized single-qubit gate.

As schematically depicted in figure 6(c), each layer is now composed of one dressed CNOT. This ansatz will always be complete since its structure is inspired by U. While this ansatz is not useful to compress the number of CNOTs in $V({\boldsymbol{\alpha }})$ , it is useful as a proof-of-concept to demonstrate OPR for complete ansatzes. We remark that a simple modification of this ansatz, where the placements of the dressed CNOTs are optimized over instead of fixed, would actually be useful for circuit-depth compression. Furthermore, we have implemented this dressed CNOT placement optimization, and we find that we obtain similar noise resilience results as those for the target-inspired ansatz.

Let us now discuss the optimization methods. As previously mentioned, the trainable gate sequence $V({\boldsymbol{\alpha }})$ is a function of a set of parameters ${\boldsymbol{\alpha }}$ corresponding to the collection of the internal gate angles in each dressed CNOT. To optimize these parameters, we employ a gradient-descent approach. This approach exploits the fact that the gradient with respect to ${\boldsymbol{\alpha }}$ of ${C}_{{\mathsf{HST}}}$ , ${C}_{{\mathsf{LHST}}}$ , ${C}_{{\mathsf{LET}}}$ , and ${C}_{{\mathsf{LLET}}}$ can be computed by using the circuits for HST, LHST, LET, and LLET, respectively [43, 19]. We remark that we used different gradient-based approaches for the shallow and deep ansatz cases, since the latter requires a more sophisticated and efficient optimizer.

Specifically, for the shallow ansatz cases where there are few parameters, we employ the simple gradient-based approach outlined in [19, appendix 4] . In this approach, the number of shots N per iteration is fixed. (We choose N = 50 000.) On the other hand, for deep ansatzes with larger numbers of parameters, we employ a more sophisticated gradient-based approach that improves efficiency by reducing the number of shots required [44]. This approach is the individual coupled adaptive number of shots (iCANS) algorithm of [44], which is a measurement-frugal method that often outperforms other optimizers in the presence of noise. The iCANS optimizer frugally adjusts the number of shots both for a given iteration and for a given partial derivative in a stochastic gradient descent. When employing iCANS, one sets as input: (1) the total number of shots employed during the optimization, and (2) the minimum number of shots (denoted N_min) employed to estimate the gradient for a given iteration. We set the latter to initially be N_min = 2 and then later increase this to N_min = 250, which empirically leads to good convergence.

6.2. Toffoli gate

The top panels in figure 7 show results of implementing VQC for the Toffoli gate. Figure 7 (top, a) corresponds to $V({\boldsymbol{\alpha }})$ being given by a single layer of the alternating-pair ansatz of definition 10. Here, the noisy cost functions ${\widetilde{C}}_{{\mathsf{HST}}}$ and ${\widetilde{C}}_{{\mathsf{LHST}}}$ (blue and red curve, respectively) tend to decrease as the number of iterations increases and converge to non-zero values. We remark that the number of iterations can be different for ${\widetilde{C}}_{{\mathsf{HST}}}$ and ${\widetilde{C}}_{{\mathsf{LHST}}}$ since the termination condition of the optimization algorithm can be reached for a different number of iterations.

Figure 7 (top, a) also depicts the cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ evaluated for the variational parameters ${\boldsymbol{\alpha }}$ obtained from the noisy optimization (green and pink curve, respectively). These curves show that as the number of iterations increases, both ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ tend to decrease too, indicating that the noisy training is indirectly training the noiseless cost functions, i.e. the adjustments to the parameters ${\boldsymbol{\alpha }}$ made by noisy training are reducing the noiseless cost functions. Note that ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ do not converge to zero since a single layer of three dressed CNOTs forms an incomplete ansatz for the Toffoli gate.

In order to determine if the algorithm is reaching the minimum value achievable with just one layer, we have also implemented VQC to compile the Toffoli gate in a noise-free simulation. The minimum values achieved for ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ are shown as a blue and red dashed curve, respectively. Surprisingly, the cost functions evaluated with the parameters from the noisy training (green and pink curves) converge to the dashed lines. This suggests that the optimal parameters are noise resilient since noisy training reaches the minimum value obtained by noise-free training. As a caveat, however, we note that it is not clear whether the minima reached are global or local optima.

Figure 7 (top, b) plots the VQC results for Toffoli with $V({\boldsymbol{\alpha }})$ given by two layers of the alternating-pair ansatz. In this case, ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ converge to values which are smaller than the ones obtained in the one-layer case. The latter indicates that two layers allow for a more complete compilation of the Toffoli gate, albeit it appears that the ansatz is not yet complete. Note that both the decomposition of the Toffoli gate in figure 5, as well as two layers of the alternating-pair ansatz, consist of six CNOTs. However, the placement of the dressed CNOTs does not seem to be optimal. Finally, let us remark that the green and pink curves converge to the dashed blue and red lines, respectively. Hence, this once again shows that the optimal parameters are noise resilient. Similar to the previous case, it is not clear whether the minima reached are global or local minima.

Figure 7 (top, c) shows results for the target-inspired ansatz of definition 11. As the number of iterations increases, all curves tend to decrease, with the green and pink curves converging to values of the order of 10⁻⁴. We remark that we have verified that $W={V}^{\dagger }U\approx {\mathbb{1}}$ for the parameters obtained. In this case, we do not plot dashed blue and red curves since the ansatz is complete and the minimum of the noiseless cost functions is zero.

These results indicate that optimizing $V({\boldsymbol{\alpha }})$ in the presence of noise yields the correct variational parameters ${\boldsymbol{\alpha }}$ , which minimize the noiseless cost function. Hence, both ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ appear to exhibit OPR for the realistic noise model considered.

6.3. Quantum Fourier transform

We now discuss the VQC results for the three-qubit QFT. Figure 7 shows the results for $V({\boldsymbol{\alpha }})$ consisting of: a single layer of the alternating-pair ansatz of definition 10 (bottom, a), two layers of the alternating-pair ansatz (bottom, b), and the target-inspired ansatz of definition 11 (bottom, c). As shown in these plots, most of the results for QFT are similar to the results for the Toffoli gate. In all cases the noiseless cost functions tended to decrease with iterations, indicating that noisy training indirectly trains the noiseless costs.

For the one-layer case of figure 7 (bottom, a) the green and pink curves (noiseless cost functions evaluated at the parameters obtained from noisy training) converge to the value obtained by training in a noise-free environment (dashed curve). Here, the non-zero value of the dashed curve indicates that a one-layer ansatz is incomplete. This is in contrast to figure 7 (bottom, b), where the dashed red line of ${C}_{{\mathsf{LHST}}}$ is of the order of 10⁻⁴, implying that the ansatz is complete. Once again, in figure 7 (bottom, b), the green and pink curves approximately converge to the dashed lines (noiseless training), indicating noise resilience. Finally, figure 7 (bottom, c), shows that that both ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ appear to exhibit OPR, as we can indirectly train the parameters in $V({\boldsymbol{\alpha }})$ in the presence of noise.

6.4. W-state preparation

Finally, we discuss the results of implementing of VQC for both FUMC and FISC of a W-state preparation circuit. We remark here that we did not perform FISC for the Toffoli gate and the QFT since those unitaries act trivially on the $| {\bf{0}}\rangle$ state. Moreover, we are only interested in comparing the FUMC and the FISC approach with a complete ansatz, meaning that we only considered the target-inspired ansatz of definition 11.

As shown in figure 8, all cost functions ${C}_{{\mathsf{HST}}}$ , ${C}_{{\mathsf{LHST}}}$ , ${C}_{{\mathsf{LET}}}$ , and ${C}_{{\mathsf{LLET}}}$ can be optimized indirectly via noisy training of $V({\boldsymbol{\alpha }})$ . Both for FUMC and FISC the cost functions go down to $\sim {10}^{-4}$ , while for FUMC one can even reach values of $\sim {10}^{-5}$ when employing the LHST. Hence, our numerics indicate that ${C}_{{\mathsf{HST}}}$ , ${C}_{{\mathsf{LHST}}}$ , ${C}_{{\mathsf{LET}}}$ , and ${C}_{{\mathsf{LLET}}}$ appear to exhibit OPR to IBM's realistic noise model.

7. Discussion

7.1. VQC in the NISQ era

Our analytical and numerical results suggest that variational quantum compiling (VQC) could be a useful tool for near-term noisy quantum computing. While there are several intended uses for VQC [19], the main purpose is for circuit-depth compression of quantum algorithms. This depth compression arises because VQC could achieve optimal compiling, whereas classical methods for quantum compiling either scale exponentially (if they are aiming at optimal compiling) or are sub-optimal when they are restricted to local (instead of global) compiling of the circuit.

Suppose one is able to achieve depth compression with VQC. This implies that the target unitary U has a longer depth than the trained gate sequence $V({\boldsymbol{\alpha }})$ . Prior to our work, one may have been concerned that this depth compression might not reduce noise, because perhaps the noise occurring during U is somehow compiled into the gate sequence $V({\boldsymbol{\alpha }})$ . However, our work shows that this is not the case. Despite various sources of incoherent noise (e.g. see the noise model in figure 3), we find that one learns the correct optimal parameters ${\boldsymbol{\alpha }}$ for $V({\boldsymbol{\alpha }})$ . This means that, after performing VQC, if one was to implement the gate sequence $V({\boldsymbol{\alpha }})$ instead of U, then one should see that $V({\boldsymbol{\alpha }})$ really does achieve less noise than U, since the depth of $V({\boldsymbol{\alpha }})$ is shorter.

7.2. Summary of results

In this work, we treated two different forms of VQC: Full Unitary Matrix Compiling (FUMC) and Fixed Input State Compiling (FISC). Our main analytical results were stated in theorems 1–3. We found that both FUMC and FISC are resilient to measurement noise. In addition, they are both resilient to global depolarizing noise acting continuously throughout the circuit and global Pauli noise occurring just prior to the implementation of $W={V}^{\dagger }U$ .

For FUMC, we were able to prove resilience to additional sources of noise, such as Pauli gate noise during the entangling and disentangling gates as well as non-unital Pauli noise occurring at particular times in the circuit. The fact that our noise resilience results are more extensive for FUMC than for FISC may simply be due to the fact that the cost-evaluation circuit for FUMC is more complicated than that for FISC. Hence it is possible that this additional resilience is needed to make the two approaches have similar levels of noise resilience. Alternatively, it could be possible that either FUMC or FISC is more noise resilient than the other, although this remains to be established. (Note that our numerics did not see a significant difference in the noise resilience of FUMC versus FISC.)

In addition, Corollaries 1–8 stated resilience results for noise models that go beyond the noise models considered in theorems 1–3, at the expense of possibly specializing the form of the unitary $W={V}^{\dagger }U$ (for example, to Clifford unitaries or tensor-product unitaries). In particular, these corollaries considered noise that occurs during the implementation of W, which is certainly practically relevant.

Our numerical results were presented in figures 7–8. Generally speaking, these numerics agreed with our theoretical expectations and hinted at resilience beyond what is stated in our theorems, which we discuss in the next subsection. We emphasize that our implementations employed the noise model of IBM's 14-qubit Melbourne device, and hence this shows that VQC exhibits resilience for currently available hardware.

7.3. Noise resilience beyond our theorems

There are two senses in which VQC might exhibit resilience beyond the results stated in our theorems. The first sense is that VQC may be resilient to more general noise models than the ones we considered. The second sense is that VQC may be resilient even for the incomplete ansatz case, on which we elaborate below. Both of these possibilities appear to be supported by our numerical implementations.

For evidence supporting the idea that VQC may be resilient to more general noise models, consider the following. The noise model associated with IBM's 14-qubit Melbourne device is more general than the noise models depicted in figures 3 and 4, and the unitaries we considered in figure 5 do not fall into the special cases (e.g. Clifford or tensor product) treated by Corollaries 1–8. For example, IBM's noise model has non-unital Pauli noise associated with each gate and hence occurring throughout the implementation of $W={V}^{\dagger }U$ . Thus, our theorems and corollaries do not cover all of noise processes occurring in IBM's noise model. Despite this, we were able to reduce the noiseless cost (via noisy training) to $\sim {10}^{-4}$ for the Toffoli gate (figure 7 (top, c)) and QFT (figure 7 (bottom, c)), and to $\sim {10}^{-5}$ for W state preparation (figure 8).

Naturally, our theorems and corollaries have a bias towards noise models that are mathematically easy to work with, such as Pauli noise or depolarizing noise, since this makes it easier to formulate proofs. It is therefore important for future work to attempt to show resilience beyond these noise models.

As noted above, VQC may also have resilience beyond the complete ansatz case. Recall that we say an ansatz for $V({\boldsymbol{\alpha }})$ is complete (incomplete) if it contains (does not contain) an exact compilation of U. Our theorems and corollaries are restricted to the complete ansatz case, whereas our numerics in figure 7 also consider the incomplete ansatz case. Interestingly, figure 7 showed that typically one can obtain the same value for the noiseless cost with either noisy or noiseless training. This surprising result suggests that perhaps the optimal values for ${\boldsymbol{\alpha }}$ may be resilient to noise even for the incomplete ansatz case, and future work should investigate this possibility.

In addition, it will be important to investigate the effect of noise on the parameter landscape and parameter trainability (e.g. [45]). Our work indicates that the global optimum of VQC may not change with noise, but does not address the difficulty of finding this optimum.

7.4. Coherent versus incoherent noise

In the Introduction, we emphasized the distinction between OPR and cost value resilience [7]. The latter is relevant to coherent noise, whereas OPR is relevant to incoherent noise. Intuitively, we anticipate that coherent noise (e.g. systematic gate biases) in VQC will often shift the location of the global minimum in parameter space, and hence we expect coherent noise to have a non-trivial effect on the optimal parameters in VQC. Because of this intuition, we have focused our paper and our definition of OPR solely on incoherent noise. We remark that our definition of OPR, which is stated in terms of unitaries (rather than parameters), would need to be modified if one is interested in studying parameter resilience for coherent noise. However, as noted, we do not anticipate resilience to coherent noise to hold. We also remark that other strategies exist to correct coherent noise [46]. Nevertheless, an interesting question for future work will be see whether OPR holds partially whenever both coherent and incoherent noise are present. In addition, it will be interesting to combine the ideas of OPR and cost value resilience into a single framework.

7.5. Noise resilience of VQE

Finally, let us consider VHQCAs more generally. In particular, let us revisit the variational quantum eigensolver (VQE) that we discussed in section 2. As we now show, VQC is a special case of VQE. This idea was noted for FISC in [20]. However, the argument is more subtle for the FUMC case.

The key observation is that the various cost functions can be rewritten as the expectation values for some effective Hamiltonians:

$\begin{eqnarray}\begin{array}{rcl}{C}_{{\mathsf{LET}}} & = & \langle \psi ({\boldsymbol{\alpha }})| {H}_{{\mathsf{LET}}}| \psi ({\boldsymbol{\alpha }})\rangle ,\quad {C}_{{\mathsf{LLET}}}=\langle \psi ({\boldsymbol{\alpha }})| {H}_{{\mathsf{LLET}}}| \psi ({\boldsymbol{\alpha }})\rangle ,\\ {C}_{{\mathsf{HST}}} & = & \langle \chi ({\boldsymbol{\alpha }})| {H}_{{\mathsf{HST}}}| \chi ({\boldsymbol{\alpha }})\rangle ,\quad {C}_{{\mathsf{LHST}}}=\langle \chi ({\boldsymbol{\alpha }})| {H}_{{\mathsf{LHST}}}| \chi ({\boldsymbol{\alpha }})\rangle .\end{array}\end{eqnarray} \tag{ 22 }$

Here $| \psi ({\boldsymbol{\alpha }})\rangle \in {{ \mathcal H }}^{A}$ and $| \chi ({\boldsymbol{\alpha }})\rangle \in {{ \mathcal H }}^{{AB}}$ are n-qubit and $2n$ -qubit states, respectively, given by

$\begin{eqnarray}&&| \psi ({\boldsymbol{\alpha }})\rangle =V({\boldsymbol{\alpha }})| {\bf{0}}\rangle ,\qquad | \chi ({\boldsymbol{\alpha }})\rangle =(V({\boldsymbol{\alpha }})\otimes {{\mathbb{1}}}^{B})| {\rm{\Phi }}\rangle ,\end{eqnarray} \tag{ 23 }$

where ${{ \mathcal H }}^{X}$ denotes the Hilbert space of system X, and $| {\rm{\Phi }}\rangle =E| {\bf{0}}\rangle$ is the standard maximally entangled state on AB. We remark that $| \chi ({\boldsymbol{\alpha }})\rangle$ is simply the Choi state associated with $V({\boldsymbol{\alpha }})$ .

For the cost functions associated with FISC, the effective Hamiltonians are given by

$\begin{eqnarray}&&{H}_{{\mathsf{LET}}}={{\mathbb{1}}}^{A}-U| {\bf{0}}\rangle \langle {\bf{0}}| {U}^{\dagger },\qquad {H}_{{\mathsf{LLET}}}={{\mathbb{1}}}^{A}-\displaystyle \frac{1}{n}\sum _{j=1}^{n}U({P}_{0}^{{A}_{j}}\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}){U}^{\dagger },\end{eqnarray} \tag{ 24 }$

where ${P}_{0}^{{A}_{j}}$ is the projector onto the zero state of A_j. For the cost functions associated with FUMC, the effective Hamiltonians are given by

$\begin{eqnarray}\begin{array}{rcl}{H}_{{\mathsf{HST}}} & = & {{\mathbb{1}}}^{{AB}}-(U\otimes {{\mathbb{1}}}^{B})| {\rm{\Phi }}\rangle \langle {\rm{\Phi }}| ({U}^{\dagger }\otimes {{\mathbb{1}}}^{B}),\\ {H}_{{\mathsf{LHST}}} & = & {{\mathbb{1}}}^{{AB}}-\displaystyle \frac{1}{n}\sum _{j=1}^{n}(U\otimes {{\mathbb{1}}}^{B})(| {{\rm{\Phi }}}^{\left(j\right)}\rangle \langle {{\rm{\Phi }}}^{\left(j\right)}| \otimes {{\mathbb{1}}}^{{\overline{A}}_{j}{\overline{B}}_{j}})({U}^{\dagger }\otimes {{\mathbb{1}}}^{B}),\end{array}\end{eqnarray} \tag{ 25 }$

where $| {{\rm{\Phi }}}^{\left(j\right)}\rangle$ is the standard maximally entangled state on ${A}_{j}{B}_{j}$ . With these Hamiltonians, one can verify that the expressions in (22) are equal to the original cost function definitions in section 3. Hence, we have just shown that VQC is a special case of VQE, where the goal is to prepare the ground state of one of the Hamiltonians in (24) or (25).

The fact that VQC is a special case of VQE implies that, for specific Hamiltonians, VQE is noise resilient. Namely, we have shown that VQE exhibits OPR when the Hamiltonian has the form in either (24) or (25). This naturally points to the question of whether VQE is resilient more generally. It is therefore a very interesting direction for future research to extend our noise resilience to Hamiltonians other than the ones we considered.

8. Conclusions

In this work, we discovered a novel kind of noise resilience for variational hybrid quantum-classical algorithms (VHQCAs). We introduced the idea of optimal parameter resilience (OPR), where the variational parameters corresponding to the global optimum are unaffected by various types of incoherent noise. We showed that variational quantum compiling (VQC) exhibits OPR. This paves the way for VQC to be used in the era of noisy intermediate-scale quantum computing as a tool for circuit-depth compression. Important future research directions include: (1) extending our theorems to show resilience to more general noise models than the ones we considered (which our numerics suggest may be possible), (2) exploring noise resilience for the incomplete ansatz case (which our numerics indicate may also be resilient), (3) analyzing approximate noise resilience, (4) studying the effect of noise on the parameter training process, and (5) generalizing our resilience results to other Hamiltonians for the variational quantum eigensolver and exploring resilience for other VHQCAs (for example, some evidence of noise resilience was recently reported in [47]).

Acknowledgments

We thank Lukasz Cincio and Mark M Wilde for helpful discussions. KS acknowledges support from the US Department of Energy (DOE) through a quantum computing program sponsored by the LANL Information Science & Technology Institute. SK acknowledges support from the National Science Foundation and the National Science and Engineering Research Council of Canada Postgraduate Scholarship. MC was supported by the Center for Nonlinear Studies at Los Alamos National Laboratory (LANL). PJC acknowledges support from the LANL ASC Beyond Moore's Law project. MC and PJC also acknowledge support from the LDRD program at LANL. This work was also supported by the US DOE, Office of Science, Office of Advanced Scientific Computing Research.

Appendix A.: Preliminaries

The main goal of the appendix is to provide the proofs of theorems 1–3 and Corollaries 1–8. For these proofs, we will need to first review some definitions and properties. We point readers to [33, 34] for additional background.

Pauli Basis. In our proofs, we will work in the Pauli product basis, involving a tensor product of one-qubit Pauli operators. This is a natural basis to choose, given the qubit structure of quantum computers. Let

$\begin{eqnarray}&&{X}^{{\boldsymbol{l}}}:={\sigma }_{x}^{{l}_{1}}\otimes {\sigma }_{x}^{{l}_{2}}\,\otimes \cdots \otimes \,{\sigma }_{x}^{{l}_{n}},\qquad {Z}^{{\boldsymbol{k}}}:={\sigma }_{z}^{{k}_{1}}\,\otimes {\sigma }_{z}^{{k}_{2}}\,\otimes \cdots \otimes \,{\sigma }_{z}^{{k}_{n}},\end{eqnarray} \tag{ A1 }$

where ${l}_{1},{l}_{2},\,\ldots ,\,{l}_{n}\in \{0,1\}$ , ${k}_{1},{k}_{2},\,\ldots ,\,{k}_{n}\in \{0,1\}$ , ${\boldsymbol{l}}=({l}_{1}$ , ..., ${l}_{n})$ , and ${\boldsymbol{k}}=({k}_{1}$ , ... ${k}_{n})$ . The following properties are satisfied by the Pauli operators:

$\begin{eqnarray}&&{X}^{{{\boldsymbol{l}}}_{1}}{X}^{{{\boldsymbol{l}}}_{2}}={X}^{{{\boldsymbol{l}}}_{1}\oplus {{\boldsymbol{l}}}_{2}},\quad {Z}^{{{\boldsymbol{k}}}_{1}}{Z}^{{{\boldsymbol{k}}}_{2}}={Z}^{{{\boldsymbol{k}}}_{1}\oplus {{\boldsymbol{k}}}_{2}},\quad {X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}={\left(-1\right)}^{{\boldsymbol{l}}\cdot {\boldsymbol{k}}}{Z}^{{\boldsymbol{k}}}{X}^{{\boldsymbol{l}}},\quad \mathrm{Tr}[{X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}]={2}^{n}{\delta }_{{\boldsymbol{l}},{\bf{0}}},{\delta }_{{\boldsymbol{k}},{\bf{0}}},\end{eqnarray} \tag{ A2 }$

which follow from the properties of the single-qubit Pauli operators.

Pauli group. The Pauli group of n qubits is ${{\mathbb{G}}}_{n}:=\{\pm 1,\pm i\}\times {\{I,{\sigma }_{x},{\sigma }_{y},{\sigma }_{z}\}}^{\otimes n}$ .

Clifford group. The Clifford group on n qubits is the set of unitaries that normalize the Pauli group, i.e.

$\begin{eqnarray}&&{{\mathbb{C}}}_{n}:=\{U:U{{\mathbb{G}}}_{n}{U}^{\dagger }\in {{\mathbb{G}}}_{n}\}.\end{eqnarray} \tag{ A3 }$

Maximally entangled states. In what follows, we consider the following maximally entangled states $| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| =| {\phi }^{+}\rangle \langle {\phi }^{+}{| }^{\otimes n}$ , where $| {\phi }^{+}\rangle =(| 0,0\rangle +| 1,1\rangle )/\sqrt{2}$ . The aforementioned tensor product of maximally entangled states can be written in the Pauli basis as follows:

$\begin{eqnarray}&&| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}{| }_{{AB}}=\displaystyle \frac{1}{{2}^{2n}}\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{X}_{A}^{{\boldsymbol{l}}}{Z}_{A}^{{\boldsymbol{k}}}\otimes {X}_{B}^{{\boldsymbol{l}}}{Z}_{B}^{{\boldsymbol{k}}}=\displaystyle \frac{1}{{2}^{2n}}\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{Z}_{A}^{{\boldsymbol{k}}}{X}_{A}^{{\boldsymbol{l}}}\otimes {Z}_{B}^{{\boldsymbol{k}}}{X}_{B}^{{\boldsymbol{l}}}.\end{eqnarray} \tag{ A4 }$

All-zero state. Noting that $| 0\rangle \langle 0| =({\mathbb{1}}+{\sigma }_{z})/2$ , then in the Pauli basis the all-zero state $| {\bf{0}}\rangle \langle {\bf{0}}| =| 0\rangle \langle 0{| }^{\otimes n}$ is

$\begin{eqnarray}&&| {\bf{0}}\rangle \langle {\bf{0}}| =\displaystyle \frac{1}{{2}^{n}}{\left({\mathbb{1}}+{\sigma }_{z}\right)}^{\otimes n}=\displaystyle \frac{1}{{2}^{n}}\sum _{{\boldsymbol{l}}}{Z}^{{\boldsymbol{l}}}.\end{eqnarray} \tag{ A5 }$

Pauli channels. A Pauli noise channel corresponds to the action of random Pauli operators on a quantum state ρ according to a probability distribution. Let ${{ \mathcal P }}^{A}$ denote an n-qubit Pauli channel acting on system A = A₁, ...A_n. Then the action of ${{ \mathcal P }}^{A}$ on the state ρ is given by

$\begin{eqnarray}&&{{ \mathcal P }}^{A}(\rho )=\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{p}_{{\boldsymbol{l}},{\boldsymbol{k}}}^{A}{X}_{A}^{{\boldsymbol{l}}}{Z}_{A}^{{\boldsymbol{k}}}\rho {\left({X}_{A}^{{\boldsymbol{l}}}{Z}_{A}^{{\boldsymbol{k}}}\right)}^{\dagger },\end{eqnarray} \tag{ A6 }$

where $0\leqslant {p}_{{\boldsymbol{l}},{\boldsymbol{k}}}^{A}\leqslant 1$ , and ${\sum }_{{\boldsymbol{l}},{\boldsymbol{k}}}{p}_{{\boldsymbol{l}},{\boldsymbol{k}}}^{A}=1$ . Using the properties in (A2), we find that

$\begin{eqnarray}&&{{ \mathcal P }}^{A}({X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}})=\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{p}_{{\boldsymbol{l}},{\boldsymbol{k}}}^{A}{X}_{A}^{{\boldsymbol{l}}}{Z}_{A}^{{\boldsymbol{k}}}{X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{Z}_{A}^{{\boldsymbol{k}}}{X}_{A}^{{\boldsymbol{l}}}=\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{\left(-1\right)}^{{\boldsymbol{a}}\cdot {\boldsymbol{k}}}{\left(-1\right)}^{{\boldsymbol{b}}\cdot {\boldsymbol{l}}}{p}_{{\boldsymbol{l}},{\boldsymbol{k}}}^{A}{X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}={p}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{A}{X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}},\end{eqnarray} \tag{ A7 }$

where ${p}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{A}:={\sum }_{{\boldsymbol{l}},{\boldsymbol{k}}}{\left(-1\right)}^{{\boldsymbol{a}}\cdot {\boldsymbol{k}}}{\left(-1\right)}^{{\boldsymbol{b}}\cdot {\boldsymbol{l}}}{p}_{{\boldsymbol{l}},{\boldsymbol{k}}}^{A}$ and $-1\leqslant {p}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{A}\leqslant 1$ for all ${\boldsymbol{a}},{\boldsymbol{b}}\in \{0,1\}{}^{n}$ . Similarly, the action of a global Pauli channel ${{ \mathcal P }}^{{AB}}$ acting on systems $A={A}_{1}\cdots {A}_{n}$ and $B={B}_{1}\cdots {B}_{n}$ , respectively, is defined as

$\begin{eqnarray}&&{{ \mathcal P }}^{{AB}}({X}_{A}^{{{\boldsymbol{a}}}_{1}}{Z}_{A}^{{{\boldsymbol{b}}}_{1}}\otimes {X}_{B}^{{{\boldsymbol{a}}}_{2}}{Z}_{B}^{{{\boldsymbol{b}}}_{2}})={p}_{{{\boldsymbol{a}}}_{1},{{\boldsymbol{a}}}_{2},{{\boldsymbol{b}}}_{1},{{\boldsymbol{b}}}_{2}}^{{AB}}{X}_{A}^{{{\boldsymbol{a}}}_{1}}{Z}_{A}^{{{\boldsymbol{b}}}_{1}}\otimes {X}_{B}^{{{\boldsymbol{a}}}_{2}}{Z}_{B}^{{{\boldsymbol{b}}}_{2}}.\end{eqnarray} \tag{ A8 }$

Non-unital Pauli noise channels. The action of a non-unital Pauli channel ${{ \mathcal P }}_{{\rm{NU}}}$ on an n-qubit Pauli operators is

$\begin{eqnarray}&&{{ \mathcal P }}_{{\rm{NU}}}({X}^{{\boldsymbol{a}}}{Z}^{{\boldsymbol{b}}})={c}_{{\boldsymbol{a}},{\boldsymbol{b}}}{X}^{{\boldsymbol{a}}}{Z}^{{\boldsymbol{b}}}\quad \forall \,{\boldsymbol{a}}\ne {\bf{0}},{\boldsymbol{b}}\ne {\bf{0}},\,\end{eqnarray} \tag{ A9 }$

$\begin{eqnarray}&&{{ \mathcal P }}_{{\rm{NU}}}({X}^{{\bf{0}}}{Z}^{{\bf{0}}})={{ \mathcal P }}_{{\rm{NU}}}({\mathbb{1}})={\mathbb{1}}+\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{a}},{\boldsymbol{b}}}{X}^{{\boldsymbol{a}}}{Z}^{{\boldsymbol{b}}}.\end{eqnarray} \tag{ A10 }$

We now prove the following lemma based on Clifford unitaries and Pauli channels.

Lemma 1. Let $W$ be a Clifford unitary and let ${ \mathcal P }$ be a Pauli channel. Then for any state ρ, the following holds:

$\begin{eqnarray}&&({ \mathcal W }\ \circ \ { \mathcal P })(\rho )=({ \mathcal Q }\ \circ \ { \mathcal W })(\rho ),\end{eqnarray} \tag{ A11 }$

where ${ \mathcal Q }$ is another Pauli channel.

Proof. From (A6) it follows that

$\begin{eqnarray}&&{ \mathcal W }\circ { \mathcal P }(\rho )=W\left(\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{p}_{{\boldsymbol{l}},{\boldsymbol{k}}}{X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}\rho {Z}^{{\boldsymbol{k}}}{X}^{{\boldsymbol{l}}}\right){W}^{\dagger }=\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{p}_{{\boldsymbol{l}},{\boldsymbol{k}}}({{WX}}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}{W}^{\dagger })(W\rho {W}^{\dagger })({{WZ}}^{{\boldsymbol{k}}}{X}^{{\boldsymbol{l}}}{W}^{\dagger })\end{eqnarray} \tag{ A12 }$

$\begin{eqnarray}&&=\ \sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{p}_{{\boldsymbol{l}},{\boldsymbol{k}}}{X}^{{\boldsymbol{m}}({\boldsymbol{l}},{\boldsymbol{k}})}{Z}^{{\boldsymbol{n}}({\boldsymbol{l}},{\boldsymbol{k}})}W\rho {W}^{\dagger }{Z}^{{\boldsymbol{n}}({\boldsymbol{l}},{\boldsymbol{k}})}{X}^{{\boldsymbol{m}}({\boldsymbol{l}},{\boldsymbol{k}})}\,\end{eqnarray} \tag{ A13 }$

$\begin{eqnarray}&&=\ ({ \mathcal Q }\ \circ \ { \mathcal W })(\rho ).\,\end{eqnarray} \tag{ A14 }$

The third equality follows from the definition of a Clifford unitary (A3), while the last equality follows from (A6).□

Appendix B.: Noisy entangling and disentangling gates in FUMC

For the proofs given in appendices D–G, we will make use of some properties of the noisy versions of entangling E and disentangling ${E}^{\dagger }$ gates that appear in FUMC. Hence, it is helpful to first state these properties in this appendix. Recall that, for Pauli gate noise acting during E or ${E}^{\dagger }$ , we assume that global Pauli channels act before and after each Hadamard, as well as before and after each CNOT. This noise model incorporates the case when there could be correlated Pauli noise acting on different qubits during E and ${E}^{\dagger }$ . We note that the noisy entangling gate is the same for both the HST and the LHST.

Let E = E^AB denote the ideal entangling gate, which can be split into a tensor product of two qubit entangling gates ${E}^{{A}_{j}{B}_{j}}$ as

$\begin{eqnarray}&&{E}^{{AB}}={E}^{{A}_{1}{B}_{1}}\otimes {E}^{{A}_{2}{B}_{2}}\ \otimes \cdots \otimes \ {E}^{{A}_{n}{B}_{n}}=\underset{j=1}{\overset{n}{\displaystyle \bigotimes }}{E}^{{A}_{j}{B}_{j}}.\end{eqnarray} \tag{ B1 }$

Moreover, each ${E}^{{A}_{j}{B}_{j}}$ consists of a Hadamard gate acting on A_j followed by a CNOT gate acting on both A_j and B_j. In the quantum channel notation we write this as ${{ \mathcal E }}^{{A}_{j}{B}_{j}}={{ \mathcal C }}_{X}^{{A}_{j}{B}_{j}}\,\circ \,({{ \mathcal H }}^{{A}_{j}}\otimes {{ \mathcal I }}^{{B}_{j}}),$ where ${{ \mathcal H }}^{{A}_{j}}$ are the quantum channels that implement the Hadamard gates and ${{ \mathcal C }}_{X}^{{A}_{j}{B}_{j}}$ are the quantum channels that implement the CNOTs. The noisy version of ${{ \mathcal E }}^{{AB}}$ , which we denote by ${\widetilde{{ \mathcal E }}}^{{AB}}$ , is

$\begin{eqnarray}&&{\widetilde{{ \mathcal E }}}^{{AB}}:=\underset{j=1}{\overset{n}{\displaystyle \bigotimes }}{{ \mathcal R }}_{j}^{{AB}}\,\circ \,{{ \mathcal C }}_{X}^{{A}_{j}{B}_{j}}\,\circ \,\underset{j=1}{\overset{n}{\displaystyle \bigotimes }}{{ \mathcal Q }}_{j}^{{AB}}\,\circ \,({{ \mathcal H }}^{{A}_{j}}\otimes {{ \mathcal I }}^{{B}_{j}})\,\circ \,{{ \mathcal P }}_{j}^{{AB}},\end{eqnarray} \tag{ B2 }$

where ${{ \mathcal P }}_{j}^{{AB}}$ , ${{ \mathcal Q }}_{j}^{{AB}}$ , and ${{ \mathcal R }}_{j}^{{AB}}$ are $2n$ -qubit global Pauli channels for all ${i}\in \{1,...,n\}$ , as defined in (A8). Since both Hadamard and CNOT gates are Clifford unitaries, by using lemma 1 we find that

$\begin{eqnarray}&&{\widetilde{{ \mathcal E }}}^{{AB}}:={{ \mathcal M }}^{{AB}}\,\circ \,\underset{j=1}{\overset{n}{\displaystyle \bigotimes }}{{ \mathcal C }}_{X}^{{A}_{j}{B}_{j}}\,\circ \,\underset{j=1}{\overset{n}{\displaystyle \bigotimes }}({{ \mathcal H }}^{{A}_{j}}\otimes {{ \mathcal I }}^{{B}_{j}}),\end{eqnarray} \tag{ B3 }$

where ${{ \mathcal M }}^{{AB}}$ is another Pauli channel.

We now apply ${\widetilde{{ \mathcal E }}}^{{AB}}$ on the all-zeros state $| {\bf{0}},{\bf{0}}\rangle \langle {\bf{0}},{\bf{0}}{| }^{{AB}}$ . Consider the following chain of equalities:

$\begin{eqnarray}&&{\widetilde{{ \mathcal E }}}^{{AB}}(| {\bf{0}},{\bf{0}}\rangle \langle {\bf{0}},{\bf{0}}{| }^{{AB}})={\widetilde{{ \mathcal E }}}^{{AB}}\left(\displaystyle \frac{1}{{2}^{2n}}\sum _{{\boldsymbol{a}},{\boldsymbol{b}}}{Z}_{A}^{{\boldsymbol{a}}}\otimes {Z}_{B}^{{\boldsymbol{b}}}\right)=\displaystyle \frac{1}{{2}^{2n}}\sum _{{\boldsymbol{a}},{\boldsymbol{b}}}{m}_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}{X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}},\end{eqnarray} \tag{ B4 }$

where we used (A5), (A8), and the following identities for all j ∈ {1, ... n}:

$\begin{eqnarray}\begin{array}{rcl}({{ \mathcal H }}^{{A}_{j}}\otimes {{ \mathcal I }}^{{B}_{j}})({Z}_{{A}_{j}}^{{a}_{j}}\otimes {Z}_{{B}_{j}}^{{b}_{j}}) & = & {X}_{{A}_{j}}^{{a}_{j}}\otimes {Z}_{{B}_{j}}^{{b}_{j}},\,({{ \mathcal C }}_{X}^{{A}_{j}{B}_{j}})({X}_{{A}_{j}}^{{a}_{j}}\otimes {{\mathbb{1}}}_{B})\\ & = & {X}_{{A}_{j}}^{{a}_{j}}\otimes {X}_{{B}_{j}}^{{a}_{j}},\,({{ \mathcal C }}_{X}^{{A}_{j}{B}_{j}})({{\mathbb{1}}}_{{A}_{j}}\otimes {Z}_{{B}_{j}}^{{b}_{j}})={Z}_{{A}_{j}}^{{b}_{j}}\otimes {Z}_{{B}_{j}}^{{b}_{j}}.\end{array}\end{eqnarray} \tag{ B5 }$

The noisy disentangling channel for the HST is given by the adjoint of the noisy entangling channel, as defined in (B2). On the other hand, since in the LHST only two qubits ${A}_{j}{B}_{j}$ are measured for a given run of the experiment, the disentangling channel is applied only on the ${A}_{j}{B}_{j}$ pair. However, we assume that global Pauli channels act on $2n$ qubits before and after the Hadamard and CNOT gate. For each j ∈ {1, ..., n}, the disentangling channel is given by the adjoint of the following channel:

$\begin{eqnarray}&&{\widetilde{{ \mathcal E }}}_{j}^{{\prime} {AB}}:={{ \mathcal R }}_{j}^{{AB}}\,\circ \,({{ \mathcal C }}_{X}^{{A}_{j}{B}_{j}}\otimes {{ \mathcal I }}^{{\overline{A}}_{j}{\overline{B}}_{j}})\,\circ \,{Q}_{j}^{{AB}}\,\circ \,({{ \mathcal H }}^{{A}_{j}}\otimes {{ \mathcal I }}^{{B}_{j}}\otimes {{ \mathcal I }}^{{\overline{A}}_{j}{\overline{B}}_{j}})\,\circ \,{{ \mathcal P }}_{j}^{{AB}},\end{eqnarray} \tag{ B6 }$

$\begin{eqnarray}&&=\,{{ \mathcal M }}_{j}^{{AB}}\,\circ \,({{ \mathcal C }}_{X}^{{A}_{j}{B}_{j}}\otimes {{ \mathcal I }}^{{\overline{A}}_{j}{\overline{B}}_{j}})\,\circ \,({{ \mathcal H }}^{{A}_{j}}\otimes {{ \mathcal I }}^{{B}_{j}}\otimes {{ \mathcal I }}^{{\overline{A}}_{j}{\overline{B}}_{j}}),\end{eqnarray} \tag{ B7 }$

where ${{ \mathcal P }}_{j}^{{AB}}$ , ${{ \mathcal Q }}_{j}^{{AB}}$ , ${{ \mathcal R }}_{j}^{{AB}}$ , and ${{ \mathcal M }}_{j}^{{AB}}$ are $2n$ -qubit global Pauli channels, as defined in (A8), and we used lemma 1. We remark that the Pauli channels are defined with a j subscript in (B7) to emphasize that for different runs of the experiment the Pauli channels that act could be different.

From arguments similar to those used to derive (B4), we find that

$\begin{eqnarray}&&{\widetilde{{ \mathcal E }}}_{j}^{{\prime} {AB}}(| 0,0\rangle \langle 0,0{| }^{{A}_{j}{B}_{j}}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}{\overline{B}}_{j}})=\displaystyle \frac{1}{{2}^{2}}\sum _{{a}_{j},{b}_{j}=0}^{1}{m}_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{{AB}}({X}_{{A}_{j}}^{{a}_{j}}{Z}_{{A}_{j}}^{{b}_{j}}\otimes {X}_{{B}_{j}}^{{a}_{j}}{Z}_{{B}_{j}}^{{b}_{j}}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}{\overline{B}}_{j}}).\end{eqnarray} \tag{ B8 }$

Appendix C.: Measurement noise in FUMC

For the proofs given in appendices D–G, we will make use of some properties of measurement noise in FUMC. Hence, it is helpful to first state these properties in this appendix.

Let ${P}_{{\bf{0}}}$ denote the POVM element associated with getting the all-zeros outcome in the noiseless HST, which can be expressed as ${P}_{{\bf{0}}}:=| {\bf{0}}\rangle \langle {\bf{0}}| ={\displaystyle \bigotimes }_{j=1}^{2n}| 0\rangle \langle 0| .$ We consider the measurement noise as follows. For each qubit j, where j ∈ {1, ..., 2n}, the ideal projector $| 0\rangle \langle 0|$ gets replaced by ${p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1|$ . Moreover, we assume that for all j the following strict inequality holds: ${p}_{00}^{\left(j\right)}\gt {p}_{01}^{\left(j\right)}.$

Let ${\widetilde{P}}_{{\bf{0}}}$ denote the noisy POVM element. Then the following equalities hold:

$\begin{eqnarray}&&{\widetilde{P}}_{{\bf{0}}}=\underset{j=1}{\overset{n}{\displaystyle \bigotimes }}\left({p}_{00}^{{A}_{j}}| 0\rangle \langle 0{| }^{{A}_{j}}+{p}_{01}^{{A}_{j}}| 1\rangle \langle 1{| }^{{A}_{j}}\right)\otimes \underset{j=1}{\overset{n}{\displaystyle \bigotimes }}\left({p}_{00}^{{B}_{j}}| 0\rangle \langle 0{| }^{{B}_{j}}+{p}_{01}^{{B}_{j}}| 1\rangle \langle 1{| }^{{B}_{j}}\right)\end{eqnarray} \tag{ C1 }$

$\begin{eqnarray}&&=\sum _{{\boldsymbol{a}},{\boldsymbol{b}}}{p}^{A}({\boldsymbol{a}}){p}^{B}({\boldsymbol{b}})| {\boldsymbol{a}},{\boldsymbol{b}}\rangle \langle {\boldsymbol{a}},{\boldsymbol{b}}{| }^{{AB}},\,\end{eqnarray} \tag{ C2 }$

with ${p}^{A}({\boldsymbol{a}})={\left({p}_{01}^{{A}_{1}}\right)}^{{a}_{1}}\cdots {\left({p}_{01}^{{A}_{n}}\right)}^{{a}_{n}}{\left({p}_{00}^{{A}_{1}}\right)}^{1-{a}_{1}}\cdots {\left({p}_{00}^{{A}_{n}}\right)}^{1-{a}_{n}}$ and ${p}^{B}({\boldsymbol{b}})={\left({p}_{01}^{{B}_{1}}\right)}^{{b}_{1}}\cdots {\left({p}_{01}^{{B}_{n}}\right)}^{{b}_{n}}{\left({p}_{00}^{{B}_{1}}\right)}^{1-{b}_{1}}\cdots {\left({p}_{00}^{{B}_{n}}\right)}^{1-{b}_{n}}$ .

C.1. Effective noisy measurement operator for the HST

In the noiseless HST, the measurement is preceded by the disentangling unitary ${\left({E}^{{AB}}\right)}^{\dagger }$ , where E^AB is defined in (B1). In the Heisenberg picture, this corresponds to the evolution of the measurement operator with respect to the unitary E^AB. We now derive the effective noisy POVM element as the evolution of ${\widetilde{P}}_{{\bf{0}}}$ under the noisy entangling channel ${\widetilde{{ \mathcal E }}}^{{AB}}$ (defined in section B).

Using (A5), $| {\boldsymbol{a}},{\boldsymbol{b}}\rangle \langle {\boldsymbol{a}},{\boldsymbol{b}}{| }^{{AB}}$ can be expressed as follows:

$\begin{eqnarray}\begin{array}{rcl}| {\boldsymbol{a}},{\boldsymbol{b}}\rangle \langle {\boldsymbol{a}},{\boldsymbol{b}}{| }^{{AB}} & = & ({X}_{A}^{{\boldsymbol{a}}}\otimes {X}_{B}^{{\boldsymbol{b}}})\left(\displaystyle \frac{1}{{2}^{2n}}\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{Z}_{A}^{{\boldsymbol{l}}}\otimes {Z}_{B}^{{\boldsymbol{k}}}\right)({X}_{A}^{{\boldsymbol{a}}}\otimes {X}_{B}^{{\boldsymbol{b}}})\\ & = & \displaystyle \frac{1}{{2}^{2n}}\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{\left(-1\right)}^{{\boldsymbol{a}}\cdot {\boldsymbol{l}}}{\left(-1\right)}^{{\boldsymbol{b}}\cdot {\boldsymbol{k}}}{Z}_{A}^{{\boldsymbol{l}}}\otimes {Z}_{B}^{{\boldsymbol{k}}},\end{array}\end{eqnarray} \tag{ C3 }$

where we used the properties of the Pauli operators as defined in (A2). Then, from (B4) and the linearity of quantum channels, it follows that

$\begin{eqnarray}&&{\widetilde{{ \mathcal E }}}^{{AB}}(| {\boldsymbol{a}},{\boldsymbol{b}}\rangle \langle {\boldsymbol{a}},{\boldsymbol{b}}{| }^{{AB}})=\displaystyle \frac{1}{{2}^{2n}}\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{m}_{{\boldsymbol{l}},{\boldsymbol{l}},{\boldsymbol{k}},{\boldsymbol{k}}}^{{AB}}{\left(-1\right)}^{{\boldsymbol{a}}\cdot {\boldsymbol{l}}}{\left(-1\right)}^{{\boldsymbol{b}}\cdot {\boldsymbol{k}}}{X}_{A}^{{\boldsymbol{l}}}{Z}_{A}^{{\boldsymbol{k}}}\otimes {X}_{B}^{{\boldsymbol{l}}}{Z}_{B}^{{\boldsymbol{k}}}.\end{eqnarray} \tag{ C4 }$

Therefore, from (C2) and (C4) it follows that

$\begin{eqnarray}&&{\widetilde{{ \mathcal E }}}^{{AB}}({\widetilde{P}}_{{\bf{0}}})=\displaystyle \frac{1}{{2}^{2n}}\sum _{{\boldsymbol{a}},{\boldsymbol{b}}}{m}_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{A}{Z}_{A}^{{\boldsymbol{b}}}{X}_{A}^{{\boldsymbol{a}}}\otimes {Z}_{B}^{{\boldsymbol{b}}}{X}_{B}^{{\boldsymbol{a}}},\end{eqnarray} \tag{ C5 }$

where ${\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{A}={\sum }_{{\boldsymbol{l}},{\boldsymbol{k}}}{\left(-1\right)}^{{\boldsymbol{a}}\cdot {\boldsymbol{l}}}{\left(-1\right)}^{{\boldsymbol{b}}\cdot {\boldsymbol{k}}}{p}^{A}({\boldsymbol{l}}){p}^{B}({\boldsymbol{k}})$ , and ${p}^{A}({\boldsymbol{l}})$ and ${p}^{B}({\boldsymbol{k}})$ are probability distributions as in (C2).

C.2. Effective noisy measurement operator for the LHST

In the LHST, a noisy measurement on two qubits ${A}_{j}{B}_{j}$ is preceded by the disentangling unitary ${\left({E}^{{A}_{j}{B}_{j}}\right)}^{\dagger }$ acting on the same two qubits. Similar to section C.1, we now derive the effective POVM element as the evolution of the operator ${Q}_{00}^{\left(j\right)}$ (defined below) under the adjoint of the noisy disentangling channel, as defined in (B7). The noisy POVM for the qubits ${A}_{j}{B}_{j}$ is given by

$\begin{eqnarray}&&{\widetilde{Q}}_{00}^{\left(j\right)}=\sum _{a^{\prime} ,b^{\prime} =0}^{1}{p}^{{A}_{j}}(a^{\prime} ){p}^{{B}_{j}}(b^{\prime} )| a^{\prime} ,b^{\prime} \rangle \langle a^{\prime} ,b^{\prime} {| }^{{A}_{j}{B}_{j}},\end{eqnarray} \tag{ C6 }$

which follows from (C2). Moreover, the overall noisy POVM for the LHST is defined as

$\begin{eqnarray}&&{\widetilde{Q}}_{00}=\displaystyle \frac{1}{n}\sum _{j=1}^{n}{\widetilde{Q}}_{00}^{\left(j\right)}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}{\overline{B}}_{j}}.\end{eqnarray} \tag{ C7 }$

By using arguments similar to those used in (C3), (C4), and (C5), we find that

$\begin{eqnarray}&&{\widetilde{{ \mathcal E }}}_{j}^{{\prime} {AB}}({\widetilde{Q}}_{00}^{\left(j\right)}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}{\overline{B}}_{j}})=\displaystyle \frac{1}{{2}^{2}}\sum _{{a}_{j},{b}_{j}}{m}_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{{AB}}{\widehat{p}}_{{a}_{j},{b}_{j}}^{{A}_{j}}{Z}_{A}^{{b}_{j}}{X}_{A}^{{a}_{j}}\otimes {Z}_{{B}_{j}}^{{b}_{j}}{X}_{{B}_{j}}^{{a}_{j}}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}{\overline{B}}_{j}},\end{eqnarray} \tag{ C8 }$

where ${\widetilde{{ \mathcal E }}}_{j}^{{\prime} {AB}}$ is given by (B7) and ${\widehat{p}}_{{a}_{j},{b}_{j}}^{{A}_{j}}={\sum }_{a^{\prime} ,b^{\prime} =0}^{1}{\left(-1\right)}^{{a}_{j}\cdot a^{\prime} }{\left(-1\right)}^{{b}_{j}\cdot b^{\prime} }{p}^{{A}_{j}}(a^{\prime} ){p}^{{B}_{j}}(b^{\prime} )$ .

Therefore, the overall effective noisy POVM for the LHST is defined as

$\begin{eqnarray}&&{\widetilde{{ \mathcal E }}}^{{\prime} {AB}}({\widetilde{Q}}_{00})=\displaystyle \frac{1}{{2}^{2}}\displaystyle \frac{1}{n}\sum _{j=1}^{n}\sum _{{a}_{j},{b}_{j}=0}^{1}{m}_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{{AB}}{\widehat{p}}_{{a}_{j},{b}_{j}}^{{A}_{j}}{Z}_{A}^{{b}_{j}}{X}_{A}^{{a}_{j}}\otimes {Z}_{{B}_{j}}^{{b}_{j}}{X}_{{B}_{j}}^{{a}_{j}}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}{\overline{B}}_{j}}.\end{eqnarray} \tag{ C9 }$

Appendix D.: Proof of theorem 1

Before providing a proof of theorem 1, we prove the following lemma.

Lemma 2. Let ${C}_{{\mathsf{QC}}}(V)$ be a cost function of $V$ with $V\in {{\mathbb{V}}}_{d}$ , and ${{\mathbb{V}}}_{d}$ the set of $d\times d$ unitary matrices. Additionally suppose that ${C}_{{\mathsf{QC}}}(V)$ can be evaluated using a quantum circuit denoted ${\mathsf{QC}}$ as follows:

$\begin{eqnarray}&&{C}_{{\mathsf{QC}}}(V)\ :=\mathrm{Tr}[{\rm{\Lambda }}{{ \mathcal E }}_{V}(\rho )],\end{eqnarray} \tag{ D1 }$

where $\rho$ is a quantum state, ${\rm{\Lambda }}$ denotes a POVM element and ${{ \mathcal E }}_{V}$ denotes the noisy unital quantum channel describing the evolution of the state throughout the computation, which depends on the unitary $V$ . Then ${\widetilde{C}}_{{\mathsf{QC}}}(V)$ exhibits strong-OPR to a noise model composed of ${{ \mathcal E }}_{V}$ and a global depolarizing channels acting continuously throughout the computation.

Proof. Without loss of generality let us decompose ${{ \mathcal E }}_{V}$ as k noisy unital quantum channels: ${{ \mathcal E }}_{V}={{ \mathcal E }}_{V}^{k}\,\circ \ldots \circ \,{{ \mathcal E }}_{V}^{1}$ . In the presence of global depolarizing noise acting throughout the computation, the cost function can now be expressed as

$\begin{eqnarray}&&{\widetilde{C}}_{{\mathsf{QC}}}(V)=\mathrm{Tr}\left[{\rm{\Lambda }}({{ \mathcal D }}^{k+1}\,\circ \,{{ \mathcal E }}_{V}^{k}\,\circ \ldots \circ \,{{ \mathcal D }}^{2}\,\circ \,{{ \mathcal E }}_{V}^{1}\,\circ \,{{ \mathcal D }}^{1})(\rho )\right],\end{eqnarray} \tag{ D2 }$

where we have interleaved the channels ${{ \mathcal E }}_{V}^{i}$ with global depolarizing channels ${{ \mathcal D }}^{i}$ . From definition 1 and from the fact that ${{ \mathcal E }}_{V}^{i}({\mathbb{1}})={\mathbb{1}}$ , it follows that

$\begin{eqnarray}&&{\widetilde{C}}_{{\mathsf{QC}}}(V)=\mathrm{Tr}\left[{\rm{\Lambda }}({{ \mathcal D }}^{k+1}\circ {{ \mathcal E }}_{V}^{k}\,\circ \ldots \circ \,{{ \mathcal D }}^{2}\,\circ \,{{ \mathcal E }}_{V}^{1}\circ {{ \mathcal D }}^{1})(\rho )\right]=p\mathrm{Tr}\left[{\rm{\Lambda }}({{ \mathcal E }}_{V}^{k}\,\circ \ldots {{ \mathcal E }}_{V}^{2}\,\circ \,{{ \mathcal E }}_{V}^{1})(\rho )\right]+(1-p)\mathrm{Tr}\left[{\rm{\Lambda }}{\mathbb{1}}\right]/{2}^{n}\end{eqnarray} \tag{ D3 }$

$\begin{eqnarray}&&=\,{{pC}}_{{\mathsf{QC}}}(V)+(1-p)/{2}^{n},\end{eqnarray} \tag{ D4 }$

where $p={p}_{k+1}\ldots {p}_{1}$ . Let ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ denote the sets of unitaries that optimize ${C}_{{\mathsf{QC}}}(V)$ i.e.

$\begin{eqnarray}&&{{\mathbb{V}}}_{d}^{{\rm{opt}}}=\{V^{\prime} \in {{\mathbb{V}}}_{d}:{C}_{{\mathsf{QC}}}(V^{\prime} )=\mathop{\min }\limits_{V\in {{\mathbb{V}}}_{d}}{C}_{{\mathsf{QC}}}(V)\}.\end{eqnarray} \tag{ D5 }$

Then, from (D4) we have that any unitary in ${{\mathbb{V}}}_{d}^{{opt}}$ will also optimize ${\widetilde{C}}_{{\mathsf{QC}}}(V)$ . Hence ${\widetilde{C}}_{{\mathsf{QC}}}(V)$ exhibits strong-OPR to a noise model composed of ${{ \mathcal E }}_{V}$ and a global depolarizing channels acting throughout the computation.□

By means of lemma 2 we know that if we show that a quantity exhibits OPR to a noise model ${ \mathcal N }$ which does not include global depolarizing noise acting continuously throughout the computation, then said quantity will also exhibit OPR if we include global depolarizing noise to ${ \mathcal N }$ .

We now provide a proof for theorem 1.

Theorem 1. The cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to Noise Model 1 in definition 7.

Proof. We begin by breaking up the HST circuit into three time intervals. In the first time interval, the noisy entangling channel ${\widetilde{{ \mathcal E }}}^{{AB}}$ is applied. In the second time interval, the quantum channel ${{ \mathcal V }}^{\dagger }\,\circ \,{ \mathcal U }$ implements the unitaries U and ${V}^{\dagger }$ . Finally, in the third time interval ${\left({\widetilde{{ \mathcal E }}}^{{AB}}\right)}^{\dagger }$ is applied. We assume that the global depolarizing noise occurs on systems AB during all three time intervals and the global depolarizing noise occurs on system A during the implementation of ${{ \mathcal V }}^{\dagger }\,\circ \,{ \mathcal U }$ . Moreover, suppose that two different global Pauli channels ${{ \mathcal Q }}^{{AB}}$ and ${\widehat{{ \mathcal Q }}}^{{AB}}$ act at times τ₁ and τ₂, respectively, and global non-unital Pauli channels act continuously on system B in between τ₁ and τ₂.

Let ${\rho }^{\left(0\right)}$ denotes the initial state of the HST circuit and is given by ${\rho }^{\left(0\right)}=| {\bf{0}},{\bf{0}}\rangle \langle {\bf{0}},{\bf{0}}{| }^{{AB}}$ . At τ₁ the state is

$\begin{eqnarray}&&{\rho }^{\left(1\right)}={{ \mathcal Q }}^{{AB}}({{ \mathcal D }}_{{p}^{\left(1,k\right)}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}_{k}^{{AB}}...{{ \mathcal D }}_{{p}^{\left(\mathrm{1,1}\right)}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}_{1}^{{AB}}({\rho }^{\left(0\right)})),\end{eqnarray} \tag{ D6 }$

where we have broken up the τ₁ into k time increments and ${\widetilde{{ \mathcal E }}}_{k}^{{AB}}\,\circ \,...{\widetilde{{ \mathcal E }}}_{1}^{{AB}}$ is the channel that implements the noisy entangling channel ${\widetilde{{ \mathcal E }}}^{{AB}}$ , as defined in (B2). Moreover, each ${\widetilde{{ \mathcal E }}}_{i}^{{AB}}$ is followed by a global depolarizing channel ${{ \mathcal D }}_{{p}^{\left(1,i\right)}}^{{AB}}$ , where ${p}^{\left(r,s\right)}$ denotes the depolarizing probability for the sth time increment of the rth time interval. Then ${\rho }^{\left(1\right)}$ reduces to

$\begin{eqnarray}&&{\rho }^{\left(1\right)}={{ \mathcal Q }}^{{AB}}\left({{ \mathcal D }}_{{p}^{\left(1,k\right)}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}_{k}^{{AB}}...{\widetilde{{ \mathcal E }}}_{2}^{{AB}}({p}^{\left(\mathrm{1,1}\right)}{\widetilde{{ \mathcal E }}}_{1}^{{AB}}({\rho }^{\left(0\right)})+(1-{p}^{\left(\mathrm{1,1}\right)}){\mathbb{1}}/{2}^{2n}\right)\end{eqnarray} \tag{ D7 }$

$\begin{eqnarray}&&=\ {p}^{\left(1\right)}{{ \mathcal Q }}^{{AB}}\circ {\widetilde{{ \mathcal E }}}^{{AB}}({\rho }^{\left(0\right)})+(1-{p}^{\left(1\right)}){\mathbb{1}}/d={p}^{\left(1\right)}\left[\displaystyle \frac{1}{{2}^{2n}}\sum _{{\boldsymbol{a}},{\boldsymbol{b}}}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}{X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right]+(1-{p}^{\left(1\right)}){\mathbb{1}}/{2}^{2n},\end{eqnarray} \tag{ D8 }$

where ${p}^{\left(1\right)}={p}^{\left(\mathrm{1,1}\right)}...{p}^{\left(1,k\right)}$ . The second equality follows from lemma 2 as ${\widetilde{{ \mathcal E }}}^{{AB}}$ consists of only unitary and Pauli channels, and thus each ${\widetilde{{ \mathcal E }}}_{i}^{{AB}}$ is a unital channel, where i ∈ {1, ..., k}. The last equality follows from (B4) and (A8), where ${\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}={m}_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}{q}_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}$ .

Similarly, the state at τ₂ is given by

$\begin{eqnarray}&&{\rho }^{\left(2\right)}={\widehat{{ \mathcal Q }}}^{{AB}}({{ \mathcal D }}_{{p}^{\left(2,l\right)}}^{{AB}}\,\circ \,{{ \mathcal D }}_{{s}^{\left(2,l\right)}}^{A}\,\circ \,({{ \mathcal W }}_{l}\otimes {{ \mathcal P }}_{{\rm{NU}},l}^{B})\,...{{ \mathcal D }}_{{p}^{\left(\mathrm{2,1}\right)}}^{{AB}}\,\circ \,{{ \mathcal D }}_{{s}^{\left(\mathrm{2,1}\right)}}^{A}\circ ({{ \mathcal W }}_{1}\otimes {{ \mathcal P }}_{{\rm{NU}},1}^{B})({\rho }^{\left(1\right)})).\end{eqnarray} \tag{ D9 }$

We first find the action of the channel ${{ \mathcal W }}_{1}\otimes {{ \mathcal P }}_{{\rm{NU}},1}^{B}$ on ${\rho }^{\left(1\right)}$ . Consider that

$\begin{eqnarray}&&({{ \mathcal W }}_{1}\otimes {{ \mathcal P }}_{{\rm{NU}},1}^{B})({\rho }^{\left(1\right)})=\displaystyle \frac{1}{{2}^{2n}}({{ \mathcal W }}_{1}\otimes {{ \mathcal P }}_{{\rm{NU}},1}^{B})\left[{p}^{\left(1\right)}\left(\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}{X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)+{{\mathbb{1}}}_{{AB}}\right]\end{eqnarray} \tag{ D10 }$

$\begin{eqnarray}&&=\displaystyle \frac{1}{{2}^{2n}}\left[{p}^{\left(1\right)}\left(\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}{c}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(1\right)}{W}_{1}{X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}_{1}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)+{{\mathbb{1}}}_{{AB}}+\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}^{\left(1\right)}{{\mathbb{1}}}_{A}\otimes {X}_{B}^{{\boldsymbol{g}}}{Z}_{B}^{{\boldsymbol{h}}}\right],\end{eqnarray} \tag{ D11 }$

where we used the definition of a non-unital Pauli channel from (A9) and (A10). We note that the terms that are independent of W_i do not affect the global optima. Therefore, the only relevant term in (D9) is

$\begin{eqnarray}&&{\widetilde{\rho }}^{\left(2\right)}=\displaystyle \frac{{p}^{\left(2\right)}{s}^{\left(2\right)}{p}^{\left(1\right)}}{{2}^{2n}}{\widehat{{ \mathcal Q }}}^{{AB}}\left(\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i=1}^{m}{c}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right),\end{eqnarray} \tag{ D12 }$

where ${p}^{\left(2\right)}={p}^{\left(\mathrm{2,1}\right)}...{p}^{\left(2,l\right)}$ and ${s}^{\left(2\right)}={s}^{\left(\mathrm{2,1}\right)}...{s}^{\left(2,l\right)}$ , and where we have used (A9) and lemma 2.

Finally, the relevant term after the action of the noisy disentangling channel is

$\begin{eqnarray}&&{\widetilde{\rho }}^{\left(3\right)}={{ \mathcal D }}_{{p}^{\left(3,m\right)}}^{{AB}}\circ {\left({\widetilde{{ \mathcal E }}}_{m}^{{AB}}\right)}^{\dagger }...{{ \mathcal D }}_{{p}^{\left(\mathrm{3,1}\right)}}^{{AB}}\circ {\left({\widetilde{{ \mathcal E }}}_{1}^{{AB}}\right)}^{\dagger }({\widetilde{\rho }}^{\left(2\right)})={p}^{\left(3\right)}{\left({\widetilde{{ \mathcal E }}}^{{AB}}\right)}^{\dagger }({\widetilde{\rho }}^{\left(2\right)})+(1-{p}^{\left(3\right)}){\mathbb{1}}/{2}^{2n},\end{eqnarray} \tag{ D13 }$

where ${p}^{\left(3\right)}={p}^{\left(3,m\right)}...{p}^{\left(\mathrm{3,1}\right)}$ . The last equality follows from the fact that the channel ${\left({\widetilde{{ \mathcal E }}}^{{AB}}\right)}^{\dagger }$ consists of unitary channels and Pauli channels, and thus each ${\left({\widetilde{{ \mathcal E }}}_{i}^{{AB}}\right)}^{\dagger }$ is a unital channel. Therefore, the term that decides the global optima in the HST is given by

$\begin{eqnarray}&&{\sigma }^{\left(3\right)}={\left({\widetilde{{ \mathcal E }}}^{{AB}}\right)}^{\dagger }\,\circ \,{\widehat{{ \mathcal Q }}}^{{AB}}\left(\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i=1}^{m}{c}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right),\end{eqnarray} \tag{ D14 }$

where we have omitted the scaling factors. Let ${\widetilde{F}}_{{\mathsf{HST}}}(V)\propto f(V)\ :=\mathrm{Tr}\left[{\widetilde{P}}_{{\bf{0}}}{\sigma }^{\left(3\right)}\right]$ . Then

$\begin{eqnarray}&&f(V)=\mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}^{{AB}})({\widetilde{P}}_{{\bf{0}}})\left(\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i=1}^{m}{c}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)\right]\end{eqnarray} \tag{ D15 }$

$\begin{eqnarray}&&=\,\mathrm{Tr}\left[\sum _{\displaystyle \genfrac{}{}{0em}{}{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\widetilde{{\boldsymbol{a}}},\widetilde{{\boldsymbol{b}}}}}{\kappa }_{{\boldsymbol{a}},\widetilde{{\boldsymbol{a}}},{\boldsymbol{b}},\widetilde{{\boldsymbol{b}}}}^{{AB}}{Z}_{A}^{\widetilde{{\boldsymbol{b}}}}{X}_{A}^{\widetilde{{\boldsymbol{a}}}}{{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {Z}_{B}^{\widetilde{{\boldsymbol{b}}}}{X}_{B}^{\widetilde{{\boldsymbol{a}}}}{X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right]\,\end{eqnarray} \tag{ D16 }$

$\begin{eqnarray}&&=\,{\mathrm{Tr}}_{A}\left[\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\kappa }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}{Z}_{A}^{{\boldsymbol{b}}}{X}_{A}^{{\boldsymbol{a}}}{{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\right].\,\end{eqnarray} \tag{ D17 }$

The second equality follows from (C5), where we set ${\kappa }_{{\boldsymbol{a}},\widetilde{{\boldsymbol{a}}},{\boldsymbol{b}},\widetilde{{\boldsymbol{b}}}}^{{AB}}:=(1/{2}^{2n}){\widetilde{m}}_{\widetilde{{\boldsymbol{a}}},\widetilde{{\boldsymbol{a}}},\widetilde{{\boldsymbol{b}}},\widetilde{{\boldsymbol{b}}}}^{{AB}}{\widehat{p}}_{\widetilde{{\boldsymbol{a}}},\widetilde{{\boldsymbol{b}}}}^{A}{\widehat{q}}_{\widetilde{{\boldsymbol{a}}},\widetilde{{\boldsymbol{a}}},\widetilde{{\boldsymbol{b}}},\widetilde{{\boldsymbol{b}}}}^{{AB}}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left({\prod }_{i=1}^{m}{c}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right)$ . The last equality follows from (A2). Let ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ denote the sets of unitaries that optimize ${F}_{{\mathsf{HST}}}(V)$ (and hence ${C}_{{\mathsf{HST}}}(V)$ ) such that

$\begin{eqnarray}&&{{\mathbb{V}}}_{d}^{{\rm{opt}}}=\{V^{\prime} \in {{\mathbb{V}}}_{d}:W={(V^{\prime} )}^{\dagger }U={{\rm{e}}}^{{\rm{i}}\phi }{\mathbb{1}},\quad \mathrm{for}\ \mathrm{some}\quad \phi \in [0,2\pi ]\}.\end{eqnarray} \tag{ D18 }$

We remark that this set of unitaries also optimizes ${F}_{{\mathsf{LHST}}}(V)$ (and hence ${C}_{{\mathsf{LHST}}}(V)$ ). Then, for $V^{\prime} \in {{\mathbb{V}}}_{d}$ we find $f(V^{\prime} )={\sum }_{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\kappa }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}$ . Let

$\begin{eqnarray}&&T(V):=\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}\sqrt{{\kappa }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}}{X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes | {\boldsymbol{a}},{\boldsymbol{b}}\rangle ,\quad S(V):=\sum _{({\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} )\ne ({\bf{0}},{\bf{0}})}\sqrt{{\kappa }_{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} ,{\boldsymbol{b}}^{\prime} }^{{AB}}}{W}^{\dagger }{X}_{A}^{{\boldsymbol{a}}^{\prime} }{Z}_{A}^{{\boldsymbol{b}}^{\prime} }\otimes | {\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} \rangle .\end{eqnarray} \tag{ D19 }$

Consider the following inequality:

$\begin{eqnarray}&&f(V)=| \left\langle S(V),T(V)\right\rangle | \leqslant \sqrt{\mathrm{Tr}\left(S{\left(V\right)}^{\dagger }S\left(V\right)\right)}\sqrt{\mathrm{Tr}\left(T{\left(V\right)}^{\dagger }T\left(V\right)\right)}=\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\kappa }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}},\end{eqnarray} \tag{ D20 }$

where we used the Cauchy–Schwarz inequality. Moreover, note that the inequality in (D20) is saturated for any matrix $V^{\prime} \in {{\mathbb{V}}}_{d}$ if we assume that the coefficients ${\kappa }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}$ characterizing the noise satisfy ${\kappa }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}\geqslant 0$ . Therefore, the set of unitaries that optimize ${F}_{{\mathsf{HST}}}(V)$ (and hence ${C}_{{\mathsf{HST}}}(V)$ ) is ${\widetilde{{\mathbb{V}}}}_{d}^{{opt}}={{\mathbb{V}}}_{d}^{{opt}}$ . According to definition 6, the latter means that ${C}_{{\mathsf{HST}}}$ exhibits strong-OPR to Noise Model 1 in definition 7.

We now show that the cost function ${C}_{{\mathsf{LHST}}}$ exhibits strong-OPR to Noise Model 1. The LHST corresponds to the optimization of the following function:

$\begin{eqnarray}&&{\widetilde{F}}_{{\mathsf{LHST}}}(V)\propto g(V)=\mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}^{{\prime} {AB}})({\widetilde{Q}}_{00})\left(\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i=1}^{m}{c}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)\right],\end{eqnarray} \tag{ D21 }$

where we replaced the disentangling and measurement channels in (D15) with (C9). Consider the following:

$\begin{eqnarray}&&\,\begin{array}{l}g(V)=\mathrm{Tr}\left[\left(\displaystyle \frac{1}{{2}^{2}}\displaystyle \frac{1}{n}\sum _{j=1}^{n}\sum _{{a}_{j}^{{\prime} },{b}_{j}^{{\prime} }=0}^{1}{\widetilde{m}}_{{a}_{j}^{{\prime} },{a}_{j}^{{\prime} },{b}_{j}^{{\prime} },{b}_{j}^{{\prime} }}^{{AB}}{\widehat{p}}_{{a}_{j}^{{\prime} },{b}_{j}^{{\prime} }}^{{A}_{j}}{\widehat{q}}_{{a}_{j}^{{\prime} },{a}_{j}^{{\prime} },{b}_{j}^{{\prime} },{b}_{j}^{{\prime} }}{Z}_{{A}_{j}}^{{b}_{j}^{{\prime} }}{X}_{{A}_{j}}^{{a}_{j}^{{\prime} }}\otimes {Z}_{{B}_{j}}^{{b}_{j}^{{\prime} }}{X}_{{B}_{j}}^{{a}_{j}^{{\prime} }}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}{\overline{B}}_{j}}\right)\right.\\ \,\left.\times \,\left(\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i=1}^{m}{c}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)\right]\end{array}\,\end{eqnarray} \tag{ D22 }$

$\begin{eqnarray}&&=\ \mathrm{Tr}\left[\sum _{j=1}^{n}\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}\sum _{{a}_{j}^{{\prime} },{b}_{j}^{{\prime} }=0}^{1}{\xi }_{{\boldsymbol{a}},{a}_{j}^{{\prime} },{\boldsymbol{b}},{b}_{j}^{{\prime} }}^{\left(j\right)}({Z}_{{A}_{j}}^{{b}_{j}^{{\prime} }}{X}_{{A}_{j}}^{{a}_{j}^{{\prime} }}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}}){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {Z}_{{B}_{j}}^{{b}_{j}^{{\prime} }}{X}_{{B}_{j}}^{{a}_{j}^{{\prime} }}{X}_{{B}_{j}}^{{a}_{j}}{Z}_{{B}_{j}}^{{b}_{j}}{X}_{{\overline{B}}_{j}}^{{\overline{a}}_{j}}{Z}_{{\overline{B}}_{j}}^{{\overline{b}}_{j}}\right]\,\end{eqnarray} \tag{ D23 }$

$\begin{eqnarray}&&\,=\ {\mathrm{Tr}}_{A}\left[\sum _{j=1}^{n}\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}\sum _{{a}_{j}^{{\prime} },{b}_{j}^{{\prime} }=0}^{1}{\xi }_{{\boldsymbol{a}},{a}_{j}^{{\prime} },{\boldsymbol{b}},{b}_{j}^{{\prime} }}^{\left(j\right)}({Z}_{{A}_{j}}^{{b}_{j}^{{\prime} }}{X}_{{A}_{j}}^{{a}_{j}^{{\prime} }}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}}){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {\mathrm{Tr}}_{{B}_{j}}\left({Z}_{{B}_{j}}^{{b}_{j}^{{\prime} }}{X}_{{B}_{j}}^{{a}_{j}^{{\prime} }}{X}_{{B}_{j}}^{{a}_{j}}{Z}_{{B}_{j}}^{{b}_{j}}\right){\mathrm{Tr}}_{{\overline{B}}_{j}}\left({X}_{{\overline{B}}_{j}}^{{\overline{a}}_{j}}{Z}_{{\overline{B}}_{j}}^{{\overline{b}}_{j}}\right)\right]\end{eqnarray} \tag{ D24 }$

$\begin{eqnarray}&&=\ {\mathrm{Tr}}_{A}\left[\sum _{j=1}^{n}\sum _{({a}_{j},{b}_{j})\ne (0,0)}{\xi }_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{\left(j\right)}({Z}_{{A}_{j}}^{{b}_{j}}{X}_{{A}_{j}}^{{a}_{j}}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}})(W({X}_{{A}_{j}}^{{a}_{j}}{Z}_{{A}_{j}}^{{b}_{j}}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}}){W}^{\dagger })\right]\,\end{eqnarray} \tag{ D25 }$

$\begin{eqnarray}&&\leqslant \,\sum _{j=1}^{n}\sum _{({a}_{j},{b}_{j})\ne (0,0)}{\xi }_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{\left(j\right)},\,\end{eqnarray} \tag{ D26 }$

where in (D24) we have split ${\mathrm{Tr}}_{B}$ into a contribution from qubit B_j and a contribution on all qubits except B_j, and where ${\xi }_{{\boldsymbol{a}},{a}_{j}^{{\prime} },{\boldsymbol{b}},{b}_{j}^{{\prime} }}^{\left(j\right)}=(1/4n){\widetilde{m}}_{{a}_{j}^{{\prime} },{a}_{j}^{{\prime} },{b}_{j}^{{\prime} }{b}_{j}^{{\prime} }}^{A,B}{\widehat{p}}_{{a}_{j}^{{\prime} },{b}_{j}^{{\prime} }}^{{A}_{j}}{\widehat{q}}_{{a}_{j}^{{\prime} },{a}_{j}^{{\prime} },{b}_{j}^{{\prime} },{b}_{j}^{{\prime} }}{\beta }_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left({\prod }_{i=1}^{m}{c}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right)$ . The first equality is derived from (C9), while the inequality follows from the arguments similar to (D20).

Here we remark that the inequality (D26) is saturated for any unitary matrix in the set of unitaries that optimize ${F}_{{\mathsf{HST}}}(V)$ (and hence ${C}_{{\rm{L}}{\mathsf{HST}}}(V)$ ) given by (D18). Hence, ${C}_{{\mathsf{LHST}}}$ exhibits strong-OPR to Noise Model 1 in definition 7 if we assume that the coefficients ${\xi }_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{\left(j\right)}$ characterizing the noise satisfy ${\xi }_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{\left(j\right)}\geqslant 0$ .□

Appendix E.: Proof of theorem 2

Theorem 2. The cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to Noise Model 2 in definition 8.

Proof. We break up the HST circuit into three time intervals similar to section D. We again assume that the global depolarizing noise occurs on system AB during all three time intervals and the global depolarizing noise occurs on system A during the implementation of ${{ \mathcal V }}^{\dagger }\ \circ \ { \mathcal U }$ . Moreover, suppose that a global Pauli channel ${{ \mathcal Q }}^{{AB}}$ followed by a global non-unital Pauli channel ${{ \mathcal P }}_{{\rm{NU}}}^{A}$ acts at time τ₁. Furthermore, a global pauli channel ${\widehat{{ \mathcal Q }}}^{{AB}}$ acts at time τ₂, while a global Pauli channel acts continuously on the system B in between τ₁ and τ₂.

The state at τ₁ is given by

$\begin{eqnarray}&&{\rho }^{\left(1\right)}={p}^{\left(1\right)}{{ \mathcal P }}_{{\rm{NU}}}^{A}\ \circ \ {{ \mathcal Q }}^{{AB}}\circ {\widetilde{{ \mathcal E }}}^{{AB}}({\rho }^{\left(0\right)})+(1-{p}^{\left(1\right)}){{ \mathcal P }}_{{\rm{NU}}}^{A}({\mathbb{1}}/{2}^{2n})\,\end{eqnarray} \tag{ E1 }$

$\begin{eqnarray}&&=\ {p}^{\left(1\right)}\left[\displaystyle \frac{1}{{2}^{2n}}\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}{X}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right]+\displaystyle \frac{1}{{2}^{2n}}{\mathbb{1}}+\displaystyle \frac{1}{{2}^{2n}}\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{X}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}\otimes {{\mathbb{1}}}_{B}.\end{eqnarray} \tag{ E2 }$

The first equality follows from arguments similar to those used to derive (D6)–(D8). The last equality follows from (B4), (A9), and (A10), where ${\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}={m}_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}{q}_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}{c}_{{\boldsymbol{a}},{\boldsymbol{b}}}$ .

At τ₂ the state is

$\begin{eqnarray}&&{\rho }^{\left(2\right)}={\widehat{{ \mathcal Q }}}^{{AB}}({{ \mathcal D }}_{{p}^{\left(2,l\right)}}^{{AB}}\circ {{ \mathcal D }}_{{s}^{\left(2,l\right)}}^{A}\circ ({{ \mathcal W }}_{l}\otimes {\widehat{{ \mathcal P }}}_{l}^{B})...{{ \mathcal D }}_{{p}^{\left(\mathrm{2,1}\right)}}^{{AB}}\ \circ \ {{ \mathcal D }}_{{s}^{\left(\mathrm{2,1}\right)}}^{A}\circ ({{ \mathcal W }}_{1}\otimes {\widehat{{ \mathcal P }}}_{1}^{B})({\rho }^{\left(1\right)})).\end{eqnarray} \tag{ E3 }$

The term that depends on W in (E3) is given by

$\begin{eqnarray}&&{\widetilde{\rho }}^{\left(2\right)}=\displaystyle \frac{1}{{2}^{2n}}{\widehat{{ \mathcal Q }}}^{{AB}}\left[{p}^{\left(2\right)}{s}^{\left(2\right)}{p}^{\left(1\right)}\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i}^{l}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}+\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{{WX}}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}^{\dagger }\otimes {{\mathbb{1}}}_{B}\right],\end{eqnarray} \tag{ E4 }$

where we used the definition of Pauli channels from (A6) and (A8). By omitting the scaling factors, the relevant term after ${\tau }_{3}$ is given by

$\begin{eqnarray}\begin{array}{rcl}{\widetilde{\rho }}^{\left(3\right)} & = & {\left({\widetilde{{ \mathcal E }}}^{{AB}}\right)}^{\dagger }\ \circ \ {\widehat{{ \mathcal Q }}}^{{AB}}\left({p}^{\left(2\right)}{s}^{\left(2\right)}{p}^{\left(1\right)}\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i}^{l}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)\\ & & +\,{\left({\widetilde{{ \mathcal E }}}^{{AB}}\right)}^{\dagger }\ \circ \ {\widehat{{ \mathcal Q }}}^{{AB}}\left(\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{{WX}}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}^{\dagger }\otimes {{\mathbb{1}}}_{B}\right).\end{array}\end{eqnarray} \tag{ E5 }$

Let ${\widetilde{F}}_{{\rm{HST}}}(V)\propto f(V)\ :=\mathrm{Tr}\left[{\widetilde{P}}_{{\bf{0}}}{\widetilde{\rho }}^{\left(3\right)}\right]$ . Then

$\begin{eqnarray}\begin{array}{rcl}f(V) & = & \mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\ \circ \ {\widetilde{{ \mathcal E }}}^{{AB}})({\widetilde{P}}_{{\bf{0}}})\left({p}^{\left(2\right)}{s}^{\left(2\right)}{p}^{\left(1\right)}\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i}^{l}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)\right]\\ & & +\,\mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\ \circ \ {\widetilde{{ \mathcal E }}}^{{AB}})({\widetilde{P}}_{{\bf{0}}})\left(\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{{WX}}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}^{\dagger }\otimes {{\mathbb{1}}}_{B}\right)\right].\end{array}\end{eqnarray} \tag{ E6 }$

Moreover, for simplicity we denote

$\begin{eqnarray}&&{f}_{1}(V)\ :=\mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}^{{AB}})({\widetilde{P}}_{{\bf{0}}})\left(\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i}^{l}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)\right],\end{eqnarray} \tag{ E7 }$

$\begin{eqnarray}&&{f}_{2}(V)\ :=\mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}^{{AB}})({\widetilde{P}}_{{\bf{0}}})\left(\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{{WX}}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}^{\dagger }\otimes {{\mathbb{1}}}_{B}\right)\right].\,\end{eqnarray} \tag{ E8 }$

Let us focus on ${f}_{1}(V)$ and ${f}_{2}(V)$ individually. Consider the following:

$\begin{eqnarray}\begin{array}{rcl}{f}_{1}(V) & = & \mathrm{Tr}\left[\displaystyle \sum _{\displaystyle \genfrac{}{}{0em}{}{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} }}{\vartheta }_{{\boldsymbol{a}},{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}},{\boldsymbol{b}}^{\prime} }^{{AB}}{Z}_{A}^{{\boldsymbol{b}}^{\prime} }{X}_{A}^{{\boldsymbol{a}}^{\prime} }{{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {Z}_{B}^{{\boldsymbol{b}}^{\prime} }{X}_{B}^{{\boldsymbol{a}}^{\prime} }{X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right]\\ & = & \mathrm{Tr}\left[\displaystyle \sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\vartheta }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}{Z}_{A}^{{\boldsymbol{b}}}{X}_{A}^{{\boldsymbol{a}}}{{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\right]\\ & \leqslant & \displaystyle \sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\vartheta }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}.\end{array}\end{eqnarray} \tag{ E9 }$

The first equality follows from (C5), where ${\vartheta }_{{\boldsymbol{a}},{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}},{\boldsymbol{b}}^{\prime} }^{{AB}}=(1/{2}^{2n}){\widetilde{m}}_{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} ,{\boldsymbol{b}}^{\prime} }^{{AB}}{\widehat{\widetilde{p}}}_{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} }^{A}{\widehat{q}}_{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} ,{\boldsymbol{b}}^{\prime} }^{{AB}}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left({\prod }_{i}^{l}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right)$ . The inequality follows from the arguments similar to (D20). Here, the last inequality in (E9) is saturated for any matrix V in the set ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ of unitaries that optimize ${F}_{{\mathsf{HST}}}(V)$ (and hence ${C}_{{\rm{L}}{\mathsf{HST}}}(V)$ ) given by (D18).

On the other hand

$\begin{eqnarray}&&{f}_{2}(V)=\mathrm{Tr}\left[\sum _{\displaystyle \genfrac{}{}{0em}{}{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} }}{\varsigma }_{{\boldsymbol{g}},{\boldsymbol{a}}^{\prime} ,{\boldsymbol{h}},{\boldsymbol{b}}^{\prime} }^{{AB}}{Z}_{A}^{{\boldsymbol{b}}^{\prime} }{X}_{A}^{{\boldsymbol{a}}^{\prime} }{{WX}}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}^{\dagger }\otimes {Z}_{B}^{{\boldsymbol{a}}^{\prime} }{X}_{B}^{{\boldsymbol{b}}^{\prime} }\right]\end{eqnarray} \tag{ E10 }$

$\begin{eqnarray}&&\,=\,{\mathrm{Tr}}_{A}\left[\sum _{\displaystyle \genfrac{}{}{0em}{}{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} }}{\varsigma }_{{\boldsymbol{g}},{\boldsymbol{a}}^{\prime} ,{\boldsymbol{h}},{\boldsymbol{b}}^{\prime} }^{{AB}}{Z}_{A}^{{\boldsymbol{b}}^{\prime} }{X}_{A}^{{\boldsymbol{a}}^{\prime} }{{WX}}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}^{\dagger }\otimes {\mathrm{Tr}}_{B}\left({Z}_{B}^{{\boldsymbol{a}}^{\prime} }{X}_{B}^{{\boldsymbol{b}}^{\prime} }\right)\right]\end{eqnarray} \tag{ E11 }$

$\begin{eqnarray}&&=\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{\varsigma }_{{\boldsymbol{g}},{\bf{0}},{\boldsymbol{h}},{\bf{0}}}^{{AB}}{\mathrm{Tr}}_{A}\left({X}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}\right)\,=\ 0.\,\end{eqnarray} \tag{ E12 }$

where ${\varsigma }_{{\boldsymbol{g}},{\boldsymbol{a}}^{\prime} ,{\boldsymbol{h}},{\boldsymbol{b}}^{\prime} }^{{AB}}=(1/{2}^{2n}){\widetilde{m}}_{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} ,{\boldsymbol{b}}^{\prime} }^{{AB}}{\widehat{\widetilde{p}}}_{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} }^{A}{\widehat{q}}_{{\boldsymbol{a}}^{\prime} ,{\boldsymbol{a}}^{\prime} ,{\boldsymbol{b}}^{\prime} ,{\boldsymbol{b}}^{\prime} }^{{AB}}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left({\prod }_{i}^{l}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right)$ . From the last equality it follows that ${f}_{2}(V)$ is independent of W (and hence of V) and thus does not affect the global optima. Therefore, from (E9) it follows that the set of unitaries that optimize ${\widetilde{F}}_{{\mathsf{HST}}}(V)$ (and hence ${\widetilde{C}}_{{\mathsf{HST}}}(V)$ ) is ${\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}={{\mathbb{V}}}_{d}^{{\rm{opt}}}$ . From definition 6 this implies that ${C}_{{\mathsf{HST}}}$ exhibits strong-OPR to Noise Model 2 in definition 8 if we assume that the coefficients ${\vartheta }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}$ characterizing the noise satisfy ${\vartheta }_{{\boldsymbol{a}},{\boldsymbol{a}},{\boldsymbol{b}},{\boldsymbol{b}}}^{{AB}}\geqslant 0$ .

We now show that the cost function ${C}_{{\mathsf{LHST}}}$ exhibits strong-OPR to Noise Model 2. In particular, in the LHST we want to optimize the following function:

$\begin{eqnarray}&&\begin{array}{l}{\widetilde{F}}_{{\mathsf{LHST}}}(V)\propto g(V)\\ \quad =\mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}^{{\prime} {AB}})({\widetilde{Q}}_{00})\left({p}^{\left(2\right)}{s}^{\left(2\right)}{p}^{\left(1\right)}\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i}^{l}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)\right]\\ \,+\mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}^{{\prime} {AB}})({\widetilde{Q}}_{00})\left(\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{{WX}}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}^{\dagger }\otimes {{\mathbb{1}}}_{B}\right)\right]\,,\end{array}\end{eqnarray} \tag{ E13 }$

where we replaced the disentangling and measurement channels in (E6) with (C9). We now break up g(V) into two different functions.

$\begin{eqnarray}&&{g}_{1}(V)\ :=\mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}^{{\prime} {AB}})({\widetilde{Q}}_{00})\left(\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left(\prod _{i}^{l}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {X}_{B}^{{\boldsymbol{a}}}{Z}_{B}^{{\boldsymbol{b}}}\right)\right],\end{eqnarray} \tag{ E14 }$

$\begin{eqnarray}&&{g}_{2}(V)\ :=\mathrm{Tr}\left[({\widehat{{ \mathcal Q }}}^{{AB}}\,\circ \,{\widetilde{{ \mathcal E }}}^{{\prime} {AB}})({\widetilde{Q}}_{00})\left(\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{{WX}}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}^{\dagger }\otimes {{\mathbb{1}}}_{B}\right)\right].\,\end{eqnarray} \tag{ E15 }$

By using arguments similar to those used to derive equations (E10)–(E12) and from (C9), it follows that ${g}_{2}(V)$ is independent of W (and hence of V). Therefore, to prove the noise resilience of the LHST, we focus only on ${g}_{1}(V)$ . We then get:

$\begin{eqnarray}&&{g}_{1}(V)=\mathrm{Tr}\left[\sum _{j=1}^{n}\sum _{({\boldsymbol{a}},{\boldsymbol{b}})\ne ({\bf{0}},{\bf{0}})}\sum _{{a}_{j}^{{\prime} },{b}_{j}^{{\prime} }=0}^{1}{\tau }_{{\boldsymbol{a}},{a}_{j}^{{\prime} },{\boldsymbol{b}},{b}_{j}^{{\prime} }}^{\left(j\right)}({Z}_{{A}_{j}}^{{b}_{j}^{{\prime} }}{X}_{{A}_{j}}^{{a}_{j}^{{\prime} }}\otimes {{\mathbb{1}}}_{{\overline{A}}_{j}}){{WX}}_{A}^{{\boldsymbol{a}}}{Z}_{A}^{{\boldsymbol{b}}}{W}^{\dagger }\otimes {Z}_{{B}_{j}}^{{b}_{j}^{{\prime} }}{X}_{{B}_{j}}^{{a}_{j}^{{\prime} }}{X}_{{B}_{j}}^{{a}_{j}}{Z}_{{B}_{j}}^{{b}_{j}}{X}_{{\overline{B}}_{j}}^{{\overline{a}}_{j}}{Z}_{{\overline{B}}_{j}}^{{\overline{b}}_{j}}\right],\end{eqnarray} \tag{ E16 }$

where ${\tau }_{{\boldsymbol{a}},{a}_{j}^{{\prime} },{\boldsymbol{b}},{b}_{j}^{{\prime} }}^{\left(j\right)}=(1/4n){\widetilde{m}}_{{a}_{j}^{{\prime} },{a}_{j}^{{\prime} },{b}_{j}^{{\prime} },{b}_{j}^{{\prime} }}^{{AB}}{\widehat{\widetilde{p}}}_{a^{\prime} ,b^{\prime} }^{{A}_{j}}{\widehat{q}}_{{a}_{j}^{{\prime} },{a}_{j}^{{\prime} },{b}_{j}^{{\prime} },{b}_{j}^{{\prime} }}^{{AB}}{\widetilde{\beta }}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{{AB}}\left({\prod }_{i}^{l}{\widehat{p}}_{{\boldsymbol{a}},{\boldsymbol{b}}}^{\left(i\right)}\right)$ . We note that (E16) is similar to (D23). Therefore, from the proof in section D it follows that

$\begin{eqnarray}&&{g}_{1}(V)\leqslant \sum _{j=1}^{n}\sum _{({a}_{j},{b}_{j})\ne (0,0)}{\tau }_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{\left(j\right)},\end{eqnarray} \tag{ E17 }$

where the inequality is saturated for unitaries $V^{\prime}$ in the set ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ of unitaries that optimize ${F}_{{\mathsf{LHST}}}(V)$ (and hence ${C}_{{\mathsf{LHST}}}(V)$ ) given by (D18). This further implies that

$\begin{eqnarray}&&g(V)\leqslant g(V^{\prime} ),\quad \mathrm{for}\ \mathrm{all}\quad V^{\prime} \in {{\mathbb{V}}}_{d}^{{\rm{opt}}}={\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}.\end{eqnarray} \tag{ E18 }$

Thus ${C}_{{\mathsf{LHST}}}$ exhibits strong-OPR to Noise Model 2 if we assume that the coefficients ${\tau }_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{\left(j\right)}$ characterizing the noise satisfy ${\tau }_{{a}_{j},{a}_{j},{b}_{j},{b}_{j}}^{\left(j\right)}\geqslant 0$ .□

Appendix F.: Proof of theorem 3

Theorem 3. The cost functions ${C}_{{\mathsf{LET}}}$ and ${C}_{{\mathsf{LLET}}}$ exhibit weak-OPR, as defined in definition 6, to Noise Model 3 in definition 9.

Proof. Let us remark that in order to show weak-OPR to Noise Model 3 we just need to consider Pauli noise acting at τ₁ and measurement noise, since noise resilience to global depolarizing noise follows from lemma 2.

We first consider the ${C}_{{\mathsf{LET}}}$ cost function. From equations (A5) and (A6) we get that the action of the Pauli channel acting at time τ₁ is given by

$\begin{eqnarray}&&{ \mathcal P }(| {\bf{0}}\rangle \langle {\bf{0}}| )=\sum _{{\boldsymbol{l}},{\boldsymbol{k}}}{q}_{{\boldsymbol{l}},{\boldsymbol{k}}}{X}^{{\boldsymbol{l}}}{Z}^{{\boldsymbol{k}}}| {\bf{0}}\rangle \langle {\bf{0}}| {Z}^{{\boldsymbol{k}}}{X}^{{\boldsymbol{l}}}=\sum _{{\boldsymbol{l}}}{q}_{{\boldsymbol{l}}}| {\boldsymbol{l}}\rangle \langle {\boldsymbol{l}}| ,\end{eqnarray} \tag{ F1 }$

where ${q}_{{\boldsymbol{l}}}={\sum }_{{\boldsymbol{k}}}{q}_{{\boldsymbol{l}},{\boldsymbol{k}}}$ . Similarly, we can express the noisy measurement POVM from definition 5 as

$\begin{eqnarray}&&{\widetilde{P}}_{{\bf{0}}}=\underset{j=1}{\overset{n}{\displaystyle \bigotimes }}\left({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1| \right)=\sum _{{\boldsymbol{i}}}{p}_{{\boldsymbol{i}}}| {\boldsymbol{i}}\rangle \langle {\boldsymbol{i}}| ,\end{eqnarray} \tag{ F2 }$

with ${\boldsymbol{i}}={i}_{1}{i}_{2}\ldots {i}_{n}$ a bit string and ${p}_{{\boldsymbol{i}}}={p}_{0{i}_{1}}^{\left(1\right)}{p}_{0{i}_{2}}^{\left(2\right)}\ldots {p}_{0{i}_{n}}^{\left(n\right)}$ . For the present noise model we are interested in determining the optimum of the function

$\begin{eqnarray}&&{\widetilde{G}}_{{\mathsf{LET}}}(V)=\mathrm{Tr}\left[{\widetilde{P}}_{{\bf{0}}}({ \mathcal W }\,\circ \,{ \mathcal P })(| {\bf{0}}\rangle \langle {\bf{0}}| )\right],\end{eqnarray} \tag{ F3 }$

with ${ \mathcal W }={{ \mathcal V }}^{\dagger }\,\circ \,{ \mathcal U }$ the channel that implements U followed by ${V}^{\dagger }$ . Then, by means of (F1) and (F2) we find

$\begin{eqnarray}&&{\widetilde{G}}_{{\mathsf{LET}}}(V)=\mathrm{Tr}\left[\left(\sum _{{\boldsymbol{i}}}{p}_{{\boldsymbol{i}}}| {\boldsymbol{i}}\rangle \langle {\boldsymbol{i}}| \right)\left(\sum _{{\boldsymbol{l}}}{q}_{{\boldsymbol{l}}}W| {\boldsymbol{l}}\rangle \langle {\boldsymbol{l}}| {W}^{\dagger }\right)\right]=\sum _{{\boldsymbol{i}},{\boldsymbol{l}}}{p}_{{\boldsymbol{i}}}{q}_{{\boldsymbol{l}}}{w}_{{\boldsymbol{il}}},\end{eqnarray} \tag{ F4 }$

where ${w}_{{\boldsymbol{il}}}=| \langle {\boldsymbol{i}}| W| {\boldsymbol{l}}\rangle {| }^{2}$ are the matrix elements of a doubly stochastic matrix such that ${\sum }_{{\boldsymbol{i}}}{w}_{{\boldsymbol{il}}}={\sum }_{{\boldsymbol{l}}}{w}_{{\boldsymbol{il}}}=1$ .

Let us now denote by ${{\boldsymbol{q}}}^{\downarrow }$ the vector with elements ${q}_{{\boldsymbol{i}}}$ ordered in decreasing order. Similarly, we denote by ${{\boldsymbol{p}}}^{\downarrow }$ the vector with elements ${p}_{{\boldsymbol{l}}}$ ordered in decreasing order. Additionally, let $\{| {q}_{r}\rangle \}$ and $\{| {p}_{s}\rangle \}$ be the basis in which ${{\boldsymbol{q}}}^{\downarrow }$ and ${{\boldsymbol{p}}}^{\downarrow }$ are ordered, respectively, i.e.

$\begin{eqnarray}&&{ \mathcal P }(| {\bf{0}}\rangle \langle {\bf{0}}| )=\sum _{r}{q}_{r}^{\downarrow }| {q}_{r}\rangle \langle {q}_{r}| ,\qquad \mathrm{and}\qquad {\widetilde{P}}_{{\bf{0}}}=\sum _{s}{p}_{s}^{\downarrow }| {p}_{s}\rangle \langle {p}_{s}| .\end{eqnarray} \tag{ F5 }$

Then, from the permutation inequality (or the rearrangement inequality) [48] we have

$\begin{eqnarray}&&{\widetilde{G}}_{{\mathsf{LET}}}(V)=\sum _{{\boldsymbol{i}},{\boldsymbol{l}}}{p}_{{\boldsymbol{i}}}{q}_{{\boldsymbol{l}}}{w}_{{\boldsymbol{il}}}\leqslant {{\boldsymbol{p}}}^{\downarrow }\cdot {{\boldsymbol{q}}}^{\downarrow }.\end{eqnarray} \tag{ F6 }$

The inequality in (F6) is saturated for matrices $W\in {\mathbb{S}}$ , where ${\mathbb{S}}$ is the subset of the Permutation Group which maps $\{| {p}_{s}\rangle \}$ to $\{| {q}_{r}\rangle \}$ . We remark here that if the vector ${{\boldsymbol{q}}}^{\downarrow }$ (or ${{\boldsymbol{p}}}^{\downarrow }$ ) has components of equal magnitude, then the set ${\mathbb{S}}$ is degenerate. Moreover, note that

$\begin{eqnarray}&&{p}_{{\bf{0}}}\geqslant {p}_{{\boldsymbol{i}}},\quad \mathrm{and}\quad {q}_{{\bf{0}}}\geqslant {q}_{{\boldsymbol{i}}},\quad \forall {\boldsymbol{i}}\ne {\bf{0}},\end{eqnarray} \tag{ F7 }$

where the second inequality follows from definition 2, while the first inequality always holds since ${p}_{{\bf{0}}}={\prod }_{j=1}^{n}{p}_{00}^{\left(j\right)}$ , and since we have assumed that ${p}_{00}^{\left(j\right)}\gt {p}_{01}^{\left(j\right)}$ $\forall j$ .

We now recall that ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ denotes the set of unitaries that optimize ${C}_{{\mathsf{LET}}}(V)$ and ${C}_{{\mathsf{LLET}}}(V)$ , i.e. $\forall V^{\prime} \in {{\mathbb{V}}}_{d}^{{opt}}$ we have $W^{\prime} | 0\rangle ={(V^{\prime} )}^{\dagger }U| 0\rangle =| 0\rangle$ (up to a global phase), which entails ${w}_{{\boldsymbol{i}}{\bf{0}}}^{{\prime} }={w}_{{\bf{0}}{\boldsymbol{i}}}^{{\prime} }={\delta }_{{\boldsymbol{i}},{\bf{0}}}$ , and hence equation (F4) becomes

$\begin{eqnarray}&&{\widetilde{G}}_{{\mathsf{LET}}}(V^{\prime} )={p}_{{\bf{0}}}{q}_{{\bf{0}}}+\sum _{{\boldsymbol{i}},{\boldsymbol{l}}\ne {\bf{0}}}{p}_{{\boldsymbol{i}}}{q}_{{\boldsymbol{l}}}{w}_{{\boldsymbol{il}}}^{{\prime} }.\end{eqnarray} \tag{ F8 }$

Since ${p}_{{\bf{0}}}\geqslant {p}_{{\boldsymbol{i}}}$ and ${q}_{{\bf{0}}}\geqslant {q}_{{\boldsymbol{i}}}$ $\forall {\boldsymbol{i}}$ then the first term in (F8) corresponds to the first term in the summation ${{\boldsymbol{p}}}^{\downarrow }\cdot {{\boldsymbol{q}}}^{\downarrow }={\sum }_{r}{{\boldsymbol{q}}}_{r}^{\downarrow }{{\boldsymbol{p}}}_{r}^{\downarrow }$ . Hence, in order to saturate (F6) we now need that $W^{\prime} \in {\mathbb{S}}$ , i.e. the $(n-1)\times (n-1)$ principal submatrix of $W^{\prime}$ with matrix elements $\langle {\boldsymbol{z}}| W^{\prime} | {\boldsymbol{z}}^{\prime} \rangle$ (such that ${\boldsymbol{z}},{\boldsymbol{z}}^{\prime} \ne {\bf{0}}$ ) must map $\{| {p}_{s}\rangle \}$ to $\{| {q}_{r}\rangle \}$ (where $s\ne 0$ and $r\ne 0$ ). Combining this result with (F6) we have that for any matrix V in ${{\mathbb{V}}}_{d}$ (the set of d × d unitary matrices)

$\begin{eqnarray}&&{\widetilde{G}}_{{\mathsf{LET}}}(V)\leqslant {{\boldsymbol{p}}}^{\downarrow }\cdot {{\boldsymbol{q}}}^{\downarrow }={\widetilde{G}}_{{\mathsf{LET}}}(V^{\prime} ),\end{eqnarray} \tag{ F9 }$

where $V^{\prime} \in {\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}$ and where

$\begin{eqnarray}&&{\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}=\{V^{\prime} \in {{\mathbb{V}}}_{d}:W={(V^{\prime} )}^{\dagger }U\in {\mathbb{S}}\}.\end{eqnarray} \tag{ F10 }$

Evidently, not all matrices in ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ are in ${\mathbb{S}}$ , which then entails that ${\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}\subseteq {{\mathbb{V}}}_{d}^{{\rm{opt}}}$ , and further means that ${C}_{{\mathsf{LET}}}$ exhibits weak-OPR to Noise Model 3 according to definition 6.

Let us now consider the noise resilience of LLET to Noise Model 3 of definition 9. We are now interested in the optimum of

$\begin{eqnarray}&&{\widetilde{G}}_{{\mathsf{LLET}}}(V)=\displaystyle \frac{1}{n}\sum _{j=1}^{n}\mathrm{Tr}\left[\left({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1| )\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\right)({ \mathcal W }\circ { \mathcal P })(| {\bf{0}}\rangle \langle {\bf{0}}| )\right]\end{eqnarray} \tag{ F11 }$

$\begin{eqnarray}&&\,=\displaystyle \frac{1}{n}\sum _{j=1}^{n}\mathrm{Tr}\left[\left({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1| )\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\right)(\sum _{{\boldsymbol{l}}}{q}_{{\boldsymbol{l}}}W| {\boldsymbol{l}}\rangle \langle {\boldsymbol{l}}| {W}^{\dagger })\right].\end{eqnarray} \tag{ F12 }$

For any matrix $V^{\prime} \in {{\mathbb{V}}}_{d}^{{\rm{opt}}}$ we have $W^{\prime} | {\bf{0}}\rangle ={(V^{\prime} )}^{\dagger }U| {\bf{0}}\rangle =| {\bf{0}}\rangle$ (up to global phase) and ${\sum }_{{\boldsymbol{l}}}{q}_{{\boldsymbol{l}}}W^{\prime} | {\boldsymbol{l}}\rangle \langle {\boldsymbol{l}}| {(W^{\prime} )}^{\dagger }={q}_{{\bf{0}}}| {\bf{0}}\rangle \langle {\bf{0}}| +{\sum }_{{\boldsymbol{l}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}W^{\prime} | {\boldsymbol{l}}\rangle \langle {\boldsymbol{l}}| {(W^{\prime} )}^{\dagger }$ , which leads to

$\begin{eqnarray}&&{\widetilde{G}}_{{\mathsf{LLET}}}(V^{\prime} )=\displaystyle \frac{1}{n}\sum _{j=1}^{n}{p}_{00}^{\left(j\right)}{q}_{{\bf{0}}}+\displaystyle \frac{1}{n}\sum _{j=1}^{n}\mathrm{Tr}\left[\left({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1| )\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\right))\left(\sum _{{\boldsymbol{l}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}W^{\prime} | {\boldsymbol{l}}\rangle \langle {\boldsymbol{l}}| {\left(W^{\prime} \right)}^{\dagger }\right)\right].\end{eqnarray} \tag{ F13 }$

On the other hand, for any unitary matrix $V\in {{\mathbb{V}}}_{d}$

$\begin{eqnarray}\begin{array}{rcl}{\widetilde{G}}_{{\mathsf{LLET}}}(V) & = & \displaystyle \frac{1}{n}\sum _{j=1}^{n}\mathrm{Tr}\left[\left({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1| )\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\right){q}_{{\bf{0}}}W| {\bf{0}}\rangle \langle {\bf{0}}| {W}^{\dagger }\right]\\ & & +\,\displaystyle \frac{1}{n}\sum _{j=1}^{n}\mathrm{Tr}\left[\left({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1| )\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\right)(\sum _{{\boldsymbol{l}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}W| {\boldsymbol{l}}\rangle \langle {\boldsymbol{l}}| {W}^{\dagger })\right]\\ & \leqslant & \displaystyle \frac{1}{n}\sum _{j=1}^{n}\left(\mathrm{Tr}\left[{p}_{00}^{\left(j\right)}{q}_{{\bf{0}}}{\mathbb{1}}W| {\bf{0}}\rangle \langle {\bf{0}}| {W}^{\dagger }\right]+\mathrm{Tr}\left[\left({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1| )\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\right)(\sum _{{\boldsymbol{l}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}W| {\boldsymbol{l}}\rangle \langle {\boldsymbol{l}}| {W}^{\dagger })\right]\right)\\ & = & \displaystyle \frac{1}{n}\sum _{j=1}^{n}{p}_{00}^{\left(j\right)}{q}_{{\bf{0}}}+\displaystyle \frac{1}{n}\sum _{j=1}^{n}\mathrm{Tr}\left[\left({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1| )\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\right)(\sum _{{\boldsymbol{l}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}W| {\boldsymbol{l}}\rangle \langle {\boldsymbol{l}}| {W}^{\dagger })\right],\end{array}\end{eqnarray} \tag{ F14 }$

where the inequality follows from the fact that ${p}_{00}^{\left(j\right)}\gt {p}_{01}^{\left(j\right)}$ , and hence

$\begin{eqnarray}&&({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{01}^{\left(j\right)}| 1\rangle \langle 1| )\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\leqslant ({p}_{00}^{\left(j\right)}| 0\rangle \langle 0| +{p}_{00}^{\left(j\right)}| 1\rangle \langle 1| )\otimes {{\mathbb{1}}}^{{\overline{A}}_{j}}\leqslant {p}_{00}^{\left(j\right)}{\mathbb{1}}.\end{eqnarray} \tag{ F15 }$

We can then simplify equation (F14) as

$\begin{eqnarray}&&{\widetilde{G}}_{{\mathsf{LLET}}}(V)\leqslant \displaystyle \frac{1}{n}\sum _{j=1}^{n}{p}_{00}^{\left(j\right)}{q}_{{\bf{0}}}+\displaystyle \frac{1}{n}\sum _{j=1}^{n}\sum _{{\boldsymbol{l}}\ne {\bf{0}},{\boldsymbol{k}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}{p}_{{\boldsymbol{k}}}^{\left(j\right)}{w}_{{\boldsymbol{kl}}}=\displaystyle \frac{1}{n}\sum _{j=1}^{n}{p}_{00}^{\left(j\right)}{q}_{{\bf{0}}}+\sum _{{\boldsymbol{l}}\ne {\bf{0}},{\boldsymbol{k}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}{\tilde{p}}_{{\boldsymbol{k}}}{w}_{{\boldsymbol{kl}}},\end{eqnarray} \tag{ F16 }$

where we have ${p}_{{\boldsymbol{k}}}^{\left(j\right)}={p}_{00}^{\left(j\right)}$ if k_j = 0, and ${p}_{{\boldsymbol{k}}}^{\left(j\right)}={p}_{01}^{\left(j\right)}$ if k_j = 1. On the the other hand, in the second equality of (F16) we have defined ${\tilde{p}}_{{\boldsymbol{k}}}=\tfrac{1}{n}{\sum }_{j=1}^{n}{p}_{{\boldsymbol{k}}}^{\left(j\right)}$ . Finally, the following inequality follows again from the rearrangement inequality

$\begin{eqnarray}&&{\widetilde{G}}_{{\mathsf{LLET}}}(V)\leqslant \displaystyle \frac{1}{n}\sum _{j=1}^{n}{p}_{00}^{\left(j\right)}{q}_{{\bf{0}}}+\sum _{{\boldsymbol{l}}\ne {\bf{0}},{\boldsymbol{k}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}^{\downarrow }{\tilde{p}}_{{\boldsymbol{k}}}^{\downarrow },\end{eqnarray} \tag{ F17 }$

which is saturated for matrices $W\in {\mathbb{S}}^{\prime}$ , where ${\mathbb{S}}^{\prime}$ is a subset of the Permutation Group such that ${\sum }_{{\boldsymbol{l}}\ne {\bf{0}},{\boldsymbol{k}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}{\tilde{p}}_{{\boldsymbol{k}}}{w}_{{\boldsymbol{kl}}}={\sum }_{{\boldsymbol{l}}\ne {\bf{0}},{\boldsymbol{k}}\ne {\bf{0}}}{q}_{{\boldsymbol{l}}}^{\downarrow }{\tilde{p}}_{{\boldsymbol{k}}}^{\downarrow }$ . Here ${q}^{\downarrow }$ and ${\tilde{p}}^{\downarrow }$ are vectors with components ${q}_{{\boldsymbol{l}}}$ and ${\tilde{p}}_{{\boldsymbol{k}}}$ in decreasing order, respectively. Hence, we can define the set of matrices which saturate (F17) as

$\begin{eqnarray}&&{\widetilde{{\mathbb{V}}}}_{d}^{{\rm{o}}{\rm{p}}{\rm{t}}}=\{{V}^{{\rm{{\prime} }}}\in {{\mathbb{V}}}_{d}:W={({V}^{{\rm{{\prime} }}})}^{\dagger }U\in {{\mathbb{S}}}^{{\rm{{\prime} }}}\}.\end{eqnarray} \tag{ F18 }$

While any matrix in ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ saturates the inequality in (F14), only a subset will also saturate (F17). Hence, ${\widetilde{{\mathbb{V}}}}_{d}^{{\rm{opt}}}\subseteq {{\mathbb{V}}}_{d}^{{\rm{opt}}}$ , and ${C}_{{\mathsf{LLET}}}$ exhibits weak-OPR to Noise Model 3 according to definition 6.□

Appendix G.: Proof of corollaries 1–8

Corollary 1. The cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 1, as well as (2) a noise process during the implementation of ${ \mathcal W }={{ \mathcal W }}_{k}\ \circ \cdots \circ \ {{ \mathcal W }}_{1}={{ \mathcal V }}^{\dagger }\circ { \mathcal U }$ ( ${\rm{i}}.{\rm{e}}.$ in the time interval between ${\tau }_{1}$ and ${\tau }_{2}$ ) in which global Pauli channels $\{{{ \mathcal P }}_{1}^{A}$ , ..., ${{ \mathcal P }}_{k}^{A}\}$ act on system $A$ , such that the overall channel on $A$ is ${{ \mathcal P }}_{k}^{A}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{1}^{A}\ \circ \ {{ \mathcal W }}_{1}$ , provided that the following condition is satisfied:

$\begin{eqnarray}&&({{ \mathcal P }}_{k}^{A}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{1}^{A}\,\circ \,{{ \mathcal W }}_{1})(\cdot )=({{ \mathcal W }}_{k}\,\circ \,{{ \mathcal W }}_{k-1}\cdots \circ \ {{ \mathcal W }}_{1}\,\circ \,{\widehat{{ \mathcal P }}}^{A})(\cdot ).\end{eqnarray} \tag{ G1 }$

Here ${\widehat{{ \mathcal P }}}^{A}$ is also a Pauli channel, and the channels ${ \mathcal U }$ , ${{ \mathcal V }}^{\dagger }$ , and ${ \mathcal W }$ correspond to conjugating the state by the unitaries $U$ , ${V}^{\dagger }$ , and $W$ , respectively.

Proof. This follows from the fact that the overall noisy channel acting during the implementation of ${ \mathcal W }$ is mathematically equivalent to a Pauli channel followed by the unitary ${ \mathcal W }$ , as described in the condition (G1) and by invoking theorem 1, which allows for Pauli channel noise at time τ₁.□

Corollary 2. Let the $W={V}^{\dagger }U$ gate sequence have the form $W={W}_{2}^{A}{W}_{1}^{A}$ with ${W}_{1}^{A}$ be composed only of Clifford gates. Then the cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 1, as well as (2) a noise process during the implementation of ${{ \mathcal W }}_{1}^{A}={{ \mathcal W }}_{1,k}\ \circ \cdots \circ \ {{ \mathcal W }}_{\mathrm{1,1}}$ , in which global Pauli channels $\{{{ \mathcal P }}_{1}^{A}$ , ..., ${{ \mathcal P }}_{k}^{A}\}$ act on system $A$ , such that the overall channel on $A$ is ${{ \mathcal P }}_{k}^{A}\,\circ \,{{ \mathcal W }}_{1,k}\cdots \circ \,{{ \mathcal P }}_{1}^{A}\ \circ \ {{ \mathcal W }}_{\mathrm{1,1}}$ .

Proof. From lemma 1 it follows that Clifford unitaries satisfy the condition in (G1). Therefore, corollary 2 is a special case of corollary 1. □

Corollary 3. Let the $W={V}^{\dagger }U$ gate sequence have the form $W={W}_{2}^{A}{W}_{1}^{A}$ with ${W}_{1}^{A}={W}_{1}^{A^{\prime} }\otimes {W}_{1}^{A^{\prime\prime} }$ being a tensor product, ${\rm{i}}.{\rm{e}}.$ , $W$ is a tensor product up to a particular time. Then the cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 1, as well as (2) a noise process during the implementations of ${{ \mathcal W }}_{1}^{A^{\prime} }={{ \mathcal W }}_{1,k}^{A^{\prime} }\ \circ \cdots \circ \ {{ \mathcal W }}_{1,1}^{A^{\prime} }$ and ${{ \mathcal W }}_{1}^{A^{\prime\prime} }={{ \mathcal W }}_{1,l}^{A^{\prime\prime} }\ \circ \cdots \circ \ {{ \mathcal W }}_{1,1}^{A^{\prime\prime} }$ in which local depolarizing channels $\{{{ \mathcal D }}_{1,1}^{A^{\prime} }$ , ..., ${{ \mathcal D }}_{1,k}^{A^{\prime} }\}$ and $\{{{ \mathcal D }}_{1,1}^{A^{\prime\prime} }$ , ..., ${{ \mathcal D }}_{1,l}^{A^{\prime\prime} }\}$ act on subsystems $A^{\prime}$ and $A^{\prime\prime}$ , respectively, such that the overall channel on $A^{\prime} A^{\prime\prime}$ is $({{ \mathcal D }}_{1,k}^{A^{\prime} }\,\circ \,{{ \mathcal W }}_{1,k}^{A^{\prime} }...{{ \mathcal D }}_{1,1}^{A^{\prime} }\,\circ \,{{ \mathcal W }}_{1,1}^{A^{\prime} })\otimes ({{ \mathcal D }}_{1,l}^{A^{\prime\prime} }\,\circ \,{{ \mathcal W }}_{1,l}^{A^{\prime\prime} }...{{ \mathcal D }}_{1,1}^{A^{\prime\prime} }\,\circ \,{{ \mathcal W }}_{1,1}^{A^{\prime\prime} })$ .

Proof. Let ρ denote a quantum state. Consider the following chain of equalities:

$\begin{eqnarray}&&({{ \mathcal D }}_{p}^{A^{\prime} }\otimes {{ \mathcal D }}_{q}^{A^{\prime\prime} })({{ \mathcal W }}^{A^{\prime} }\otimes {{ \mathcal W }}^{A^{\prime\prime} })(\rho )=({{ \mathcal I }}^{A^{\prime} }\otimes {{ \mathcal D }}_{q}^{A^{\prime\prime} })\left(p({{ \mathcal W }}^{A^{\prime} }\otimes {{ \mathcal W }}^{A^{\prime\prime} }(\rho ))+(1-p){\pi }^{A^{\prime} }{\mathrm{Tr}}_{A^{\prime} }(({{ \mathcal W }}^{A^{\prime} }\otimes {{ \mathcal W }}^{A^{\prime\prime} })(\rho )\right)\end{eqnarray} \tag{ G2 }$

$\begin{eqnarray}&&\,=\,({{ \mathcal I }}^{A^{\prime} }\otimes {{ \mathcal D }}_{q}^{A^{\prime\prime} })\left(p({{ \mathcal W }}^{A^{\prime} }\otimes {{ \mathcal W }}^{A^{\prime\prime} }(\rho ))+(1-p){\pi }^{A^{\prime} }{\mathrm{Tr}}_{A^{\prime} }(({{ \mathcal I }}^{A^{\prime} }\otimes {{ \mathcal W }}^{A^{\prime\prime} })(\rho )\right)\end{eqnarray} \tag{ G3 }$

$\begin{eqnarray}&&\,=\,({{ \mathcal I }}^{A^{\prime} }\otimes {{ \mathcal D }}_{q}^{A^{\prime\prime} })\left({{ \mathcal W }}^{A^{\prime} }\otimes {{ \mathcal W }}^{A^{\prime\prime} })(p\rho +(1-p){\pi }^{A^{\prime} }{\mathrm{Tr}}_{A^{\prime} }(\rho )\right)\end{eqnarray} \tag{ G4 }$

$\begin{eqnarray}&&=\,({{ \mathcal I }}^{A^{\prime} }\otimes {{ \mathcal D }}_{q}^{A^{\prime\prime} })({{ \mathcal W }}^{A^{\prime} }\otimes {{ \mathcal W }}^{A^{\prime\prime} })({{ \mathcal D }}_{p}^{A^{\prime} }(\rho ))\,\end{eqnarray} \tag{ G5 }$

$\begin{eqnarray}&&=\,({{ \mathcal W }}^{A^{\prime} }\otimes {{ \mathcal W }}^{A^{\prime\prime} })({{ \mathcal D }}_{p}^{A^{\prime} }\otimes {{ \mathcal D }}_{q}^{A^{\prime\prime} })(\rho ),\,\end{eqnarray} \tag{ G6 }$

where ${\pi }^{A^{\prime} }$ is a maximally mixed state on system $A^{\prime}$ . Therefore, the result follows by applying (G6) several times and invoking corollary 1.□

Corollary 4. The cost functions ${C}_{{\mathsf{HST}}}$ and ${C}_{{\mathsf{LHST}}}$ exhibit strong-OPR to the following noise model: (1) all noise processes in Noise Model 2, as well as (2) a noise process during the implementation of ${ \mathcal W }={{ \mathcal W }}_{k}\ \circ \cdots \circ \ {{ \mathcal W }}_{1}={{ \mathcal V }}^{\dagger }\circ { \mathcal U }$ (i.e. in the time interval between ${\tau }_{1}$ and ${\tau }_{2}$ ) in which global non-unital Pauli channels $\{{{ \mathcal P }}_{{\rm{NU}},1}^{A}$ , ..., ${{ \mathcal P }}_{{\rm{NU}},k}^{A}\}$ act on system $A$ such that the overall channel on $A$ is ${{ \mathcal P }}_{{\rm{NU}},k}^{A}\circ {{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{{\rm{NU}},1}^{A}\,\circ \,{{ \mathcal W }}_{1}$ , provided that the following condition is satisfied:

$\begin{eqnarray}&&({{ \mathcal P }}_{{\rm{NU}},k}^{A}\ \circ \ {{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{{\rm{NU}},1}^{A}\,\circ \,{{ \mathcal W }}_{1})(\cdot )=({{ \mathcal W }}_{k}\,\circ \,{{ \mathcal W }}_{k-1}\cdots {{ \mathcal W }}_{1}\circ \,{\widehat{{ \mathcal P }}}_{{\rm{NU}}}^{A})(\cdot ),\end{eqnarray} \tag{ G7 }$

where ${\widehat{{ \mathcal P }}}_{{\rm{NU}}}^{A}$ is also a Pauli channel.

Proof. This follows from the fact that the overall noisy channel acting during the implementation of ${ \mathcal W }$ is mathematically equivalent to a non-unital Pauli channel followed by the unitary ${ \mathcal W }$ , as described in the condition (G7) and by invoking theorem 2, which allows for non-unital Pauli noise at time τ₁.□

Corollary 5. The cost functions ${C}_{{\mathsf{HST}}}$ exhibits strong-OPR to the following noise model: (1) global depolarizing noise acting continuously throughout the circuit, (2) global non-unital Pauli noise on system A at a fixed time in between ${\tau }_{1}$ and ${\tau }_{2}$ .

Proof. Let us decompose ${ \mathcal W }$ as ${ \mathcal W }={{ \mathcal W }}_{2}\,\circ \,{{ \mathcal W }}_{1}$ such that the non-unital Pauli channel ${{ \mathcal P }}_{{\rm{NU}}}^{A}$ acts at time $\tau ^{\prime}$ between ${ \mathcal W }$ ₁ and ${ \mathcal W }$ ₂, with the overall channel between τ₁ and τ₂ given by ${{ \mathcal W }}_{2}\,\circ \,{{ \mathcal P }}_{{\rm{NU}}}^{A}\,\circ \,{{ \mathcal W }}_{1}$ . The state at time τ₁ is

$\begin{eqnarray}&&{\rho }^{\left(1\right)}={p}^{\left(1\right)}| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| +(1-{p}^{\left(1\right)}){\mathbb{1}}/d,\end{eqnarray} \tag{ G8 }$

where ${p}^{\left(1\right)}={p}^{\left(k,1\right)}\cdots {p}^{\left(\mathrm{1,1}\right)}$ corresponds to the continuous depolarizing channel as discussed in appendix D. We break up the time interval in between $\tau ^{\prime}$ and τ₁ into l steps. The state at time $\tau ^{\prime}$ is given by

$\begin{eqnarray}&&{\widetilde{\rho }}^{\left(2\right)}={{ \mathcal P }}_{{\rm{NU}}}^{A}\circ {{ \mathcal D }}_{{q}^{\left(2,l\right)}}^{{AB}}\circ {{ \mathcal W }}_{1}^{l}\cdots \circ \ {{ \mathcal D }}_{{q}^{\left(\mathrm{2,1}\right)}}^{{AB}}\circ {{ \mathcal W }}_{1}^{1}({\rho }^{\left(1\right)})\,\end{eqnarray} \tag{ G9 }$

$\begin{eqnarray}&&={{ \mathcal P }}_{{\rm{NU}}}^{A}({p}^{\left(1\right)}{q}^{\left(2\right)}{{ \mathcal W }}_{1}(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| )+(1-{p}^{\left(1\right)}{q}^{\left(2\right)}){\mathbb{1}}/d)\,\end{eqnarray} \tag{ G10 }$

$\begin{eqnarray}&&\,={p}^{\left(1\right)}{q}^{\left(2\right)}{{ \mathcal P }}_{{\rm{NU}}}^{A}({{ \mathcal W }}_{1}(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| ))+(1-{p}^{\left(1\right)}{q}^{\left(2\right)}){\mathbb{1}}/d+(1-{p}^{\left(1\right)}{q}^{\left(2\right)})\displaystyle \frac{1}{d}\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{X}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}\otimes {{\mathbb{1}}}_{B},\end{eqnarray} \tag{ G11 }$

where ${q}^{\left(2\right)}={q}^{\left(2,k\right)}\cdots {q}^{\left(\mathrm{2,1}\right)}$ and ${{ \mathcal W }}_{1}={{ \mathcal W }}_{1}^{l}\cdots {{ \mathcal W }}_{1}^{1}$ . Similarly, we break up the the time interval between τ₂ and $\tau ^{\prime}$ into m steps. The term that depends on ${ \mathcal W }$ at time τ₂ is given by

$\begin{eqnarray}&&{\widetilde{\sigma }}^{\left(2\right)}={p}^{\left(1\right)}{q}^{\left(2\right)}{r}^{\left(2\right)}{{ \mathcal W }}_{2}\,\circ \,{{ \mathcal P }}_{{\rm{NU}}}^{A}\,\circ \,{{ \mathcal W }}_{1}(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| )+{r}^{\left(2\right)}(1-{p}^{\left(1\right)}{q}^{\left(2\right)})\displaystyle \frac{1}{d}\sum _{({\boldsymbol{g}},{\boldsymbol{h}})\ne ({\bf{0}},{\bf{0}})}{d}_{{\boldsymbol{g}},{\boldsymbol{h}}}{W}_{2}{X}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}_{2}^{\dagger }\otimes {{\mathbb{1}}}_{B}.\end{eqnarray} \tag{ G12 }$

Let

$\begin{eqnarray}&&{\widetilde{F}}_{{\mathsf{HST}}}(V)\propto f(V)\ :=\mathrm{Tr}[| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| {\widetilde{\sigma }}^{\left(2\right)}].\end{eqnarray} \tag{ G13 }$

Moreover, for simplicity we denote

$\begin{eqnarray}&&{f}_{1}(V)\ :=\mathrm{Tr}\left[| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| ({{ \mathcal W }}_{2}\,\circ \,{{ \mathcal P }}_{{\rm{NU}}}^{A}\circ {{ \mathcal W }}_{1})(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| )\right],\end{eqnarray} \tag{ G14 }$

$\begin{eqnarray}&&{f}_{2}(V)\,:=\mathrm{Tr}\left[| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| ({W}_{2}{X}_{A}^{{\boldsymbol{g}}}{Z}_{A}^{{\boldsymbol{h}}}{W}_{2}^{\dagger }\otimes {{\mathbb{1}}}_{B})\right].\end{eqnarray} \tag{ G15 }$

Consider the followings:

$\begin{eqnarray}&&{f}_{1}(V)=\mathrm{Tr}\left[| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| ({{ \mathcal W }}_{2}^{A}\,\circ \,{{ \mathcal P }}_{{\rm{NU}}}^{A})\left(\left({{ \mathcal I }}_{A}\otimes {\left({{ \mathcal W }}_{1}^{T}\right)}^{B}\right)\left(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}{| }_{{AB}}\right)\right)\right]\end{eqnarray} \tag{ G16 }$

$\begin{eqnarray}&&\,=\,\mathrm{Tr}\left[\left({{ \mathcal I }}_{A}\otimes {\left({{ \mathcal W }}_{1}^{* }\right)}^{B}\right)\left(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| \right)\left({{ \mathcal W }}_{2}^{A}\,\circ \,{{ \mathcal P }}_{{\rm{NU}}}^{A}\right)\left(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| \right)\right]\end{eqnarray} \tag{ G17 }$

$\begin{eqnarray}&&\,=\,\mathrm{Tr}\left[\left({\left({{ \mathcal W }}_{1}^{\dagger }\right)}^{A}\otimes {{ \mathcal I }}_{B}\right)\left(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| \right)\left({{ \mathcal W }}_{2}^{A}\,\circ \,{{ \mathcal P }}_{{\rm{NU}}}^{A}\right)\left(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| \right)\right]\end{eqnarray} \tag{ G18 }$

$\begin{eqnarray}&&=\,\mathrm{Tr}\left[| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| ({{ \mathcal W }}_{1}^{A}\,\circ \,{{ \mathcal W }}_{2}^{A}\,\circ \,{{ \mathcal P }}_{{\rm{NU}}}^{A})(| {{\rm{\Phi }}}^{+}\rangle \langle {{\rm{\Phi }}}^{+}| )\right]\,\end{eqnarray} \tag{ G19 }$

$\begin{eqnarray}&&\leqslant \,{f}_{1}(V^{\prime} ),\,\end{eqnarray} \tag{ G20 }$

where $V^{\prime} \in {{\mathbb{V}}}_{d}^{{\rm{opt}}}$ , and where ${{\mathbb{V}}}_{d}^{{\rm{opt}}}$ denote the sets of unitaries that optimize ${F}_{{\mathsf{HST}}}(V)$ (and hence ${C}_{{\mathsf{HST}}}(V)$ ) as defined in (D18). The first and third equalities follow from the ricochet property. The last equality corresponds to the case when there is non-unital Pauli noise at time τ₁ and no other noise in the HST circuit, which is a special case of theorem 2. Therefore, the inequality follows from theorem 2. Moreover, by using the arguments similar to (E10)–(E12), we find that ${f}_{2}(V)$ is independent of W. This completes the proof.□

Corollary 6. The cost functions ${C}_{{\mathsf{LET}}}$ and ${C}_{{\mathsf{LLET}}}$ exhibit weak-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 3, as well as (2) a noise process during the implementation of ${ \mathcal W }={{ \mathcal W }}_{k}\ \circ \cdots \circ \ {{ \mathcal W }}_{1}={{ \mathcal V }}^{\dagger }\circ { \mathcal U }$ in which global Pauli channels $\{{{ \mathcal P }}_{1}$ , ..., ${{ \mathcal P }}_{k}\}$ act, such that the overall channel is ${{ \mathcal P }}_{k}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{1}\circ {{ \mathcal W }}_{1}$ , provided that the following condition is satisfied:

$\begin{eqnarray}&&({{ \mathcal P }}_{k}\,\circ \,{{ \mathcal W }}_{k}\cdots \circ \,{{ \mathcal P }}_{1}\,\circ \,{{ \mathcal W }}_{1})(\cdot )=({{ \mathcal W }}_{k}\,\circ \,{{ \mathcal W }}_{k-1}\cdots \circ \ {{ \mathcal W }}_{1}\,\circ \,\widehat{{ \mathcal P }})(\cdot ).\end{eqnarray} \tag{ G21 }$

where $\widehat{{ \mathcal P }}$ is also a Pauli channel.

Proof. This follows from arguments similar to corollary 1 and by invoking theorem 3.□

Corollary 7. Let the $W={V}^{\dagger }U$ gate sequence have the form $W={W}_{2}^{A}{W}_{1}^{A}$ with ${W}_{1}^{A}$ be composed only of Clifford gates. Then the cost functions ${C}_{{\mathsf{LET}}}$ and ${C}_{{\mathsf{LLET}}}$ exhibit weak-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 3, as well as (2) a noise process during the implementation of ${{ \mathcal W }}_{1}^{A}={{ \mathcal W }}_{1,k}\ \circ \cdots \circ \ {{ \mathcal W }}_{\mathrm{1,1}}$ , in which global Pauli channels $\{{{ \mathcal P }}_{1}^{A}$ , ..., ${{ \mathcal P }}_{k}^{A}\}$ act on system $A$ , such that the overall channel on $A$ is ${{ \mathcal P }}_{k}^{A}\,\circ \,{{ \mathcal W }}_{1,k}\cdots \circ \,{{ \mathcal P }}_{1}^{A}\,\circ \,{{ \mathcal W }}_{\mathrm{1,1}}$ .

Proof. This corollary is a special case of corollary 6, since lemma 1 implies that Clifford unitaries satisfy (G21).□

Corollary 8. Let the $W={V}^{\dagger }U$ gate sequence have the form $W={W}_{2}^{A}{W}_{1}^{A}$ with ${W}_{1}^{A}={W}_{1}^{A^{\prime} }\otimes {W}_{1}^{A^{\prime\prime} }$ being a tensor product, ${\rm{i}}.{\rm{e}}.$ $W$ is a tensor product up to a particular time. Then the cost functions ${C}_{{\mathsf{LET}}}$ and ${C}_{{\mathsf{LLET}}}$ exhibit weak-OPR to a noise model that includes the following: (1) all noise processes in Noise Model 3, as well as (2) a noise process during the implementations of ${{ \mathcal W }}_{1}^{A^{\prime} }={{ \mathcal W }}_{1,k}^{A^{\prime} }\ \circ \cdots \circ \ {{ \mathcal W }}_{1,1}^{A^{\prime} }$ and ${{ \mathcal W }}_{1}^{A^{\prime\prime} }={{ \mathcal W }}_{1,l}^{A^{\prime\prime} }\ \circ \cdots \circ \ {{ \mathcal W }}_{1,1}^{A^{\prime\prime} }$ in which local depolarizing channels $\{{{ \mathcal D }}_{1,1}^{A^{\prime} }$ , ..., ${{ \mathcal D }}_{1,k}^{A^{\prime} }\}$ and $\{{{ \mathcal D }}_{1,1}^{A^{\prime\prime} }$ , ..., ${{ \mathcal D }}_{1,l}^{A^{\prime\prime} }\}$ act on subsystems $A^{\prime}$ and $A^{\prime\prime}$ ,respectively, such that the overall channel on $A^{\prime} A^{\prime\prime}$ is $({{ \mathcal D }}_{1,k}^{A^{\prime} }\,\circ \,{{ \mathcal W }}_{1,k}^{A^{\prime} }...{{ \mathcal D }}_{1,1}^{A^{\prime} }\,\circ \,{{ \mathcal W }}_{1,1}^{A^{\prime} })\otimes ({{ \mathcal D }}_{1,l}^{A^{\prime\prime} }\,\circ \,{{ \mathcal W }}_{1,l}^{A^{\prime\prime} }...{{ \mathcal D }}_{1,1}^{A^{\prime\prime} }\,\circ \,{{ \mathcal W }}_{1,1}^{A^{\prime\prime} })$ .

Proof. This follows from arguments similar to the proof of corollary 3 and by invoking corollary 6.□

Noise resilience of variational quantum compiling

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Warm-up: simple VQE example

3. Background: variational quantum compiling

3.1. Full unitary matrix compiling

3.2. Compiling with a fixed input state

4. Noise processes

5. Main results

5.1. Noise resilience of full unitary matrix compiling

5.2. Noise resilience of fixed input state compiling

6. Implementations

6.1. Ansatzes and optimization methods

6.2. Toffoli gate

6.3. Quantum Fourier transform

6.4. W-state preparation

7. Discussion

7.1. VQC in the NISQ era

7.2. Summary of results

7.3. Noise resilience beyond our theorems

7.4. Coherent versus incoherent noise

7.5. Noise resilience of VQE

8. Conclusions

Acknowledgments

Appendix A.: Preliminaries

Appendix B.: Noisy entangling and disentangling gates in FUMC

Appendix C.: Measurement noise in FUMC

C.1. Effective noisy measurement operator for the HST

C.2. Effective noisy measurement operator for the LHST

Appendix D.: Proof of theorem 1

Appendix E.: Proof of theorem 2

Appendix F.: Proof of theorem 3

Appendix G.: Proof of corollaries 1–8

Noise resilience of variational quantum compiling

Article metrics

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Warm-up: simple VQE example

3. Background: variational quantum compiling

3.1. Full unitary matrix compiling

3.2. Compiling with a fixed input state

4. Noise processes

5. Main results

5.1. Noise resilience of full unitary matrix compiling

5.2. Noise resilience of fixed input state compiling

6. Implementations

6.1. Ansatzes and optimization methods

6.2. Toffoli gate

6.3. Quantum Fourier transform

6.4. W-state preparation

7. Discussion

7.1. VQC in the NISQ era

7.2. Summary of results

7.3. Noise resilience beyond our theorems

7.4. Coherent versus incoherent noise

7.5. Noise resilience of VQE

8. Conclusions

Acknowledgments

Appendix A.: Preliminaries

Appendix B.: Noisy entangling and disentangling gates in FUMC

Appendix C.: Measurement noise in FUMC

C.1. Effective noisy measurement operator for the HST

C.2. Effective noisy measurement operator for the LHST

Appendix D.: Proof of theorem 1

Appendix E.: Proof of theorem 2

Appendix F.: Proof of theorem 3

Appendix G.: Proof of corollaries 1–8