Constructing a virtual two-qubit gate by sampling single-qubit operations

Kosuke Mitarai; Keisuke Fujii

doi:10.1088/1367-2630/abd7bc

1. Introduction

Quantum computers have attracted much attention recently, mainly due to the rapid development of actual hardware [1–3]. The quantum computer that is to appear shortly is called noisy intermediate scale quantum devices, or in short, NISQ devices [4]. We expect NISQ devices to have ∼100 of qubits with non-negligible noise in the near future. Such devices are believed to be not simulatable by classical computers when the control precision of the qubits is sufficiently high [5–8]. In this sense, NISQ devices have computational power that exceeds classical computers. Many researchers are actively developing ways to exploit their power for practical applications [9–15]. However, we still suffer from the limited number of qubits available on actual devices and the limited depth of circuits that can be run while maintaining the resultant quantum state meaningful.

If techniques to decompose a quantum circuit to smaller ones are developed, they can extend the applicability of such devices. Smaller quantum circuits may refer to ones with the smaller number of qubits or gates. Peng et al recently proposed a clustering approach based on a tensor network representation of a quantum circuit [16], which greatly progressed the technical development. They showed that we can 'cut' an identity gate, by sampling measure-and-prepare channels on a qubit according to a certain quasi-probability distribution. In reference [17], we proposed methods to construct quantum circuits equivalent to the Hadamard test, which successfully reduces the depth of certain quantum circuits. These techniques share a same idea in that they reconstruct a result of a coherent quantum operation from certain incoherent operations by combining the results obtained from them.

An approach which has the same flavor as the above have been utilized in the context of memory-efficient classical simulation of quantum circuits. Since the direct simulation of a quantum circuit with over 50 qubits breaks down due to the need of storing 2⁵⁰ complex numbers in memory, the classical simulator must decompose the given quantum circuit to smaller ones, especially in the number of qubits. References [18, 19] have provided one way for such decomposition, which 'cuts' controlled-Z gates by separately simulating two cases where the control qubit is |0⟩ or |1⟩ and then combining them, and they performed classical simulation of over 50-qubit quantum circuits. A similar technique has been utilized by Bravyi et al in reference [20] to remove a relatively small number of qubits from a large quantum circuit by replacing the qubits with a classical simulator. Their approach can be viewed as 'space-like' cut rather than the 'time-like' cut proposed by Peng et al [16]. However, their techniques are intended to run on a classical computer and cannot be utilized for simulating a large quantum circuit with a small quantum computer.

In this work, we present a technique to perform 'space-like' cut on a quantum computer. More specifically, we present a way to decompose a controlled gate into a sequence of single-qubit operations which consists of projective measurements of Pauli X, Y, and Z operators, and single-qubit rotations around x, y, and z-axes. We note that our method does not generate any entanglement between the qubits as it is impossible to do so with such single-qubit operations. Our method only 'simulates' effects of entanglement using classical post-processing and sampling. More concretely, although entangling gates cannot be performed with local operations and classical communications in single-shot experiments as widely known [21], we show that it is possible to perform a computational task of evaluating expectation values of the output of entangling circuits by sampling certain sets of gates and applying classical post-processing. The overhead required for our proposed technique, which scales exponentially to the number of decomposition performed, gives a characterization of the entangling gates from a computational viewpoint, which is different from the existing theories of entanglement quantification in e.g. [22].

The method proposed here can be considered as a generalization of our previous work [17] and a variant of the quantum circuit decomposition presented in reference [16]. It can also be viewed as a fully quantum version of the technique utilized in efficient classical simulation schemes [18–20]. In some cases, our method provides a better scaling against reference [16] when simulating a large quantum circuit with smaller ones. The proposed technique is also useful when we want to apply two-qubit gates between a distant pair of qubits, which otherwise would require many swap operations to perform. This work extends the applicability of NISQ devices whose circuit depth and connectivity are limited.

2. Gate decomposition

2.1. Tensor network representation of quantum circuits

Quantum computation is completely specified with a quantum circuit, U, an initial state with its density matrix representation, ρ, and an observable, O, measured at the output. Given U, ρ, and O, any quantum computation can be represented by a tensor network [23–25]. We define the tensor representation of U, ρ, and O in the following manner.

Suppose that our quantum computer has n qubits. We define a complete set of basis in the space of 2 × 2 complex matrix and its dual as ${\left\{\vert {e}_{i}\rangle \rangle \right\}}_{i=1}^{4}$ and ${\left\{\langle \langle {e}_{i}\vert \right\}}_{i=1}^{4}$ respectively, and assume orthonormality under the trace inner product; 〈〈e_i|e_j〉〉 = δ_ij. We use the trace inner product, that is, for matrices A and B, 〈〈A|B〉〉 = Tr(A^† B). A density matrix ρ can be decomposed into the sum of $\vert {e}_{{j}_{1}}\rangle \rangle \otimes \vert {e}_{{j}_{2}}\rangle \rangle \otimes \cdots \otimes \vert {e}_{{j}_{n}}\rangle \rangle =\vert {e}_{{j}_{1}}{e}_{{j}_{2}}\dots {e}_{{j}_{n}}\rangle \rangle$ as

$\begin{equation}\vert \rho \rangle \rangle =\sum\limits _{{j}_{1},\dots ,{j}_{n}}{\rho }_{\boldsymbol{j}}\vert {e}_{{j}_{1}}{e}_{{j}_{2}}\dots {e}_{{j}_{n}}\rangle \rangle ,\end{equation} \tag{ 1 }$

where j = (j₁, j₂, ..., j_n). We refer to the elements ${\rho }_{\boldsymbol{j}}=\langle \langle {e}_{{j}_{1}}{e}_{{j}_{2}}\dots {e}_{{j}_{n}}\vert \rho \rangle \rangle$ as the tensor representation of ρ. An observable O can also be decomposed into the same form. Note that we can naturally assume tensor representations of observables and density matrices consist of real numbers because they are always Hermitian and we can choose the basis ${\left\{\vert {e}_{i}\rangle \rangle \right\}}_{i=1}^{4}$ as Hermitian, e.g. we can use the Pauli matrices {I, X, Y, Z} as the basis. Therefore, we assume ρ_i and O_i are real henceforth. The quantum circuit, U, transforms ρ into UρU^†. We define a corresponding superoperator $\mathcal{S}\left(U\right)$ whose action is defined by $\mathcal{S}\left(U\right)\rho =U\rho {U}^{{\dagger}}$ . Superoperator can be decomposed as,

$\begin{equation}\mathcal{S}\left(U\right)=\sum\limits _{{j}_{1},\dots ,{j}_{n}}\sum\limits _{{k}_{1},\dots ,{k}_{n}}\mathcal{S}{\left(U\right)}_{\boldsymbol{j},\boldsymbol{k}}\vert {e}_{{j}_{1}}\dots {e}_{{j}_{n}}\rangle \rangle \langle \langle {e}_{{k}_{1}}\dots {e}_{{k}_{n}}\vert .\end{equation} \tag{ 2 }$

Note that this decomposition is not limited to superoperators of unitary matrices, but also is applicable for any linear operator that acts on a density matrix. We call $\mathcal{S}{\left(U\right)}_{\boldsymbol{j},\boldsymbol{k}}=\langle \langle {e}_{{j}_{1}}\dots {e}_{{j}_{n}}\vert \mathcal{S}\left(U\right)\vert {e}_{{k}_{1}}\dots {e}_{{k}_{n}}\rangle \rangle$ tensor representation of $\mathcal{S}\left(U\right)$ . When we use the Pauli operators as basis set, $\mathcal{S}{\left(U\right)}_{\boldsymbol{j},\boldsymbol{k}}$ is referred as Pauli transfer matrix.

Quantum computation ends with measuring the observable O. This output can be written down as,

$\begin{equation}\langle \langle O\vert \mathcal{S}\left(U\right)\vert \rho \rangle \rangle =\mathrm{T}\mathrm{r}\left(OU\rho {U}^{{\dagger}}\right)\end{equation} \tag{ 3 }$

$\begin{equation}=\sum\limits _{{j}_{1},\dots ,{j}_{n}}\sum\limits _{{k}_{1},\dots ,{k}_{n}}{O}_{\boldsymbol{j}}\mathcal{S}{\left(U\right)}_{\boldsymbol{j},\boldsymbol{k}}{\rho }_{\boldsymbol{k}},\end{equation} \tag{ 4 }$

In many cases, U is a product of elementary gates ${\left\{{U}_{i}\right\}}_{i=1}^{L}$ , that is, U = U_L...U₁. The tensor representation of the overall gate, $\mathcal{S}\left(U\right)$ , is also a product of $\mathcal{S}\left({U}_{i}\right)$ ; $\mathcal{S}\left(U\right)=\mathcal{S}\left({U}_{L}\right)\dots \mathcal{S}\left({U}_{1}\right)$ . An important note is that as long as the tensor representation of each element is unchanged, the result of the overall computation is also unchanged. If $\mathcal{S}\left(U\right)$ can be represented by a sum of some simple operations as $\mathcal{S}\left(U\right)={\sum }_{i}{c}_{i}\mathcal{S}\left({V}_{i}\right)$ with coefficients {c_i}, the expectation value of an observable O can be computed with the following equality,

$\begin{equation}\langle \langle O\vert \mathcal{S}\left(U\right)\vert \rho \rangle \rangle =\sum\limits _{i}{c}_{i}\langle \langle O\vert \mathcal{S}\left({V}_{i}\right)\vert \rho \rangle \rangle .\end{equation} \tag{ 5 }$

Note that c_i can, in general, depend on the state |ρ〉〉. We use this scheme to perform the 'decomposition' of a circuit in this work.

It is noteworthy that as we perform decompositions of a superoperator rather than an operator such as U itself, the method becomes friendly for a realistic quantum device. A direct decomposition of U into some simple operators {V_i}, i.e. U = ∑_i c_i V_i, can also be utilized for the same task; however, as expectation values are calculated as ⟨0|U^† OU|0⟩ where |0⟩ is an initial state, this approach requires us to evaluate ${\sum }_{i,j}{c}_{i}{c}_{j}^{{\ast}}\langle 0\vert {V}_{j}^{{\dagger}}O{V}_{i}\vert 0\rangle$ which are rather hard for the NISQ devices. This fact demonstrates the advantage of using the above formalism. The tensor network representation of the superoperator formalism allows us to graphically understand the decompositions.

2.2. Virtual two-qubit gate

We can show the following, which can then be utilized to decompose any two-qubit gate into a sequence of single-qubit operations.

Lemma 1. For operators A₁ and A₂ such that ${A}_{1}^{2}=I$ and ${A}_{2}^{2}=I$ ,

$\begin{align}\hfill \mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)& ={\mathrm{cos}}^{2}\enspace \theta \mathcal{S}\left(I\otimes I\right)+{\mathrm{sin}}^{2}\enspace \theta \mathcal{S}\left({A}_{1}\otimes {A}_{2}\right)\hfill \\ \hfill & \quad +\frac{1}{8}\enspace \mathrm{cos}\enspace \theta \enspace \mathrm{sin}\enspace \theta \sum\limits _{\left({\alpha }_{1},{\alpha }_{2}\right)\in {\left\{{\pm}1\right\}}^{2}}{\alpha }_{1}{\alpha }_{2}\left[\mathcal{S}\left(\left(I+{\alpha }_{1}{A}_{1}\right)\otimes \left(I+i{\alpha }_{2}{A}_{2}\right)\right)\right.\hfill \\ \hfill & \quad \left.+\mathcal{S}\left(\left(I+i{\alpha }_{1}{A}_{1}\right)\otimes \left(I+{\alpha }_{2}{A}_{2}\right)\right)\right]\hfill \end{align} \tag{ 6 }$

To prove this, we can directly check the tensor representation of both hand side is equivalent. For detailed calculation, see appendix A1. This theorem is schematically depicted in figure 1(a). Notice that the operation that is proportional to I ± A and I ± iA for A ∈ {X, Y, Z} can respectively be performed by a projective measurement and a single-qubit rotation.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Decomposition of (a) a non-local gate and (b) a non-local non-destructive measurement into a sequence of local operations. A₁ and A₂ are operators such that ${A}_{1}^{2}=I$ and ${A}_{2}^{2}=I$ .
Download figure:
Standard image High-resolution image

**Figure 1.** Decomposition of (a) a non-local gate and (b) a non-local non-destructive measurement into a sequence of local operations. A₁ and A₂ are operators such that ${A}_{1}^{2}=I$ and ${A}_{2}^{2}=I$ .
Download figure:
Standard image High-resolution image

The correspondence with a single-qubit rotation is clear from the formula, ${\mathrm{e}}^{{\pm}\mathrm{i}\pi A/4}=\frac{1}{\sqrt{2}}\left(I{\pm}iA\right)$ , which is the rotation of angle π/2 around the A axis. Let ${\mathcal{M}}_{A}$ be the projective measurement on the A basis (A ∈ {X, Y, Z}), that is, ${\mathcal{M}}_{A}$ acts on a density matrix ρ as,

$\begin{equation}{\mathcal{M}}_{A}\rho =\frac{1}{\mathrm{T}\mathrm{r}\left(\rho \frac{I+\alpha A}{2}\right)}\left(\frac{I+\alpha A}{2}\right)\rho \left(\frac{I+\alpha A}{2}\right),\end{equation} \tag{ 7 }$

depending on the result of the measurement α ∈ {1, −1}. This is equivalent to $\mathcal{S}\left(I{\pm}A\right)$ up to the factor of $4\enspace \mathrm{T}\mathrm{r}\left(\rho \frac{I+\alpha A}{2}\right)$ , that is,

$\begin{equation}\mathcal{S}\left(I+\alpha A\right)=4\enspace \mathrm{T}\mathrm{r}\left(\rho \frac{I+\alpha A}{2}\right){\mathcal{M}}_{A,\alpha },\end{equation} \tag{ 8 }$

where ${\mathcal{M}}_{A,\alpha }$ is a measurement operation postselected with the measurement outcome α. $\mathrm{T}\mathrm{r}\left(\rho \frac{I+\alpha A}{2}\right)$ is the probability of getting the result α by measuring ρ on the A basis. Lemma 1 with this fact implies that the gate ${\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}$ can be decomposed, in a sense of equation (5), into a sum of I ⊗ I, A₁ ⊗ A₂, ${\mathcal{M}}_{{A}_{1}}\otimes {\mathrm{e}}^{{\pm}\mathrm{i}\pi {A}_{2}/4}$ , and ${\mathrm{e}}^{{\pm}\mathrm{i}\pi {A}_{1}/4}\otimes {\mathcal{M}}_{{A}_{2}}$ , which can be stated as lemma below. Notably, this technique can be applied for any θ, which enables us to perform continuous two-qubit gates.

Lemma 2. A quantum gate ${\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}$ with operators A₁ and A₂ such that ${A}_{1}^{2}=I$ and ${A}_{2}^{2}=I$ can be decomposed into six single-qubit operations. For any quantum state |ρ〉〉, to achieve the error of the decomposition with respect to the trace distance with probability at least 1 − δ, the required number of circuit runs is O(log(1/δ)/²).

The detailed proof is given in appendix B. Intuitively, since the error comes from the probabilistic part of the decomposition, that is the renormalization factor in equation (8) $\mathrm{T}\mathrm{r}\left(\rho \frac{I+\alpha A}{2}\right)$ , if we want to estimate $\mathrm{T}\mathrm{r}\left(\rho \frac{I+\alpha A}{2}\right)$ within error , O(1/²) repetition would suffice.

Let us finally mention the case of the controlled-Z gate, which we denote by CZ. CZ can be decomposed into

$\begin{equation}\text{C}\text{Z}={\text{e}}^{\text{i}\pi I\otimes Z/4}\enspace {\text{e}}^{\text{i}\pi Z\otimes I/4}\enspace {\text{e}}^{-\text{i}\pi Z\otimes Z/4},\end{equation} \tag{ 9 }$

ignoring the global phase. This means we can decompose a CZ gate using lemma 2. The decomposition is shown in figure 2. Similar decompositions can be performed on some basic two-qubit gates such as CNOT. Endo et al [26] also provides such decomposition (reference [26], appendix B). However, our protocol above is slightly advantageous in that the number of single-qubit operations required is six compared to theirs which requires nine of them.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Decomposition of controlled-Z gate into a sequence of single-qubit operations.
Download figure:
Standard image High-resolution image

2.3. Virtual non-destructive measurement of two-qubit operators

In the previous subsection, we showed that any two-qubit rotation can be decomposed into a sum of single-qubit operations. Here, we extend the strategy to construct virtual non-destructive measurement of two-qubit operators. Similar to the previous section, we can show the following. This theorem is schematically shown in figure 1(b).

Lemma 3. For operators A₁ and A₂ such that ${A}_{1}^{2}=I$ and ${A}_{2}^{2}=I$ ,

$\begin{align}\hfill \mathcal{S}\left(I+{A}_{1}\otimes {A}_{2}\right)& =\mathcal{S}\left(I\otimes I\right)+\mathcal{S}\left({A}_{1}\otimes {A}_{2}\right)+\frac{1}{8}\sum\limits _{\left({\alpha }_{1},{\alpha }_{2}\right)\in {\left\{{\pm}1\right\}}^{2}}{\alpha }_{1}{\alpha }_{2}\left[\mathcal{S}\left(\left(I+{\alpha }_{1}{A}_{1}\right)\otimes \left(I+{\alpha }_{2}{A}_{2}\right)\right)\right.\hfill \\ \hfill & \quad \left.-\mathcal{S}\left(\left(I+i{\alpha }_{1}{A}_{1}\right)\otimes \left(I+i{\alpha }_{2}{A}_{2}\right)\right)\right].\hfill \end{align} \tag{ 10 }$

This can also be shown by the direct calculation of both hand side. See appendix A2 for detailed calculation.

The above lemma can be utilized to show the following.

Lemma 4. A non-local projection $\frac{I+{A}_{1}\otimes {A}_{2}}{2}$ with operators A₁ and A₂ such that ${A}_{1}^{2}=1$ and ${A}_{2}^{2}=2$ can be decomposed into six single-qubit operations. For any quantum state |ρ〉〉, to achieve the error of the decomposition with respect to the trace distance with probability at least 1 − δ, the required number of circuit runs is O(log(1/δ)/²).

This can be shown with exactly the same approach taken to prove lemma 2, which is provided in appendix B.

3. Application

3.1. Simulation of large quantum circuits

The idea of simulating a large quantum circuit by a small quantum computer has been put forward in reference [16]. Peng et al utilized the equivalence shown in figure 3. In the figure,

$\begin{equation}\begin{aligned}\hfill {O}_{1}=I,& {\rho }_{1}=\vert 0\rangle \langle 0\vert ,\hfill & \hfill {c}_{1}=+1/2,\\ \hfill {O}_{2}=I,& {\rho }_{2}=\vert 1\rangle \langle 1\vert ,\hfill & \hfill {c}_{2}=+1/2,\\ \hfill {O}_{3}=X,& {\rho }_{3}=\vert +\rangle \langle +\vert ,\hfill & \hfill {c}_{3}=+1/2,\\ \hfill {O}_{4}=X,& {\rho }_{4}=\vert -\rangle \langle -\vert ,\hfill & \hfill {c}_{4}=-1/2,\\ \hfill {O}_{5}=Y,& {\rho }_{5}=\vert +i\rangle \langle +i\vert ,\hfill & \hfill {c}_{5}=+1/2,\\ \hfill {O}_{6}=Y,& {\rho }_{6}=\vert -i\rangle \langle -i\vert ,\hfill & \hfill {c}_{6}=-1/2,\\ \hfill {O}_{7}=Z,& {\rho }_{7}=\vert 0\rangle \langle 0\vert ,\hfill & \hfill {c}_{7}=+1/2,\\ \hfill {O}_{8}=Z,& {\rho }_{5}=\vert 1\rangle \langle 1\vert ,\hfill & \hfill {c}_{8}=-1/2,\end{aligned}\end{equation} \tag{ 11 }$

where $\vert {\pm}\rangle =\left(\vert 0\rangle {\pm}\vert 1\rangle \right)/\sqrt{2}$ and $\vert {\pm}i\rangle =\left(\vert 0\rangle {\pm}i\vert 1\rangle \right)/\sqrt{2}$ . The symbols ⊳ and ⊲ denotes the measurement of a certain observable and the preparation of a certain state, respectively. Contrasting this technique and ours, we refer to the former and the latter as 'time-like' and 'space-like' cut, respectively. More concretely, a time-like cut of a quantum channel can be defined as a decomposition of the channel in the sense of equation (5) using measure-and-prepare channels only. In contrast, a space-like cut of a non-local quantum channel is a decomposition of the channel using local quantum channels only.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** Time-like cut employed in reference [16].
Download figure:
Standard image High-resolution image

The decomposition presented in the previous section can also be used in this direction. Let us compare the scaling of cost of our decomposition scheme and that of Peng et al by a simple example. We consider the case where we have an n-qubit quantum computer to simulate a 2n-qubit quantum circuit of figure 4, which has only one CZ gate between n-qubit 'cluster'. The task is to estimate the expectation value of a final observable O_f by measuring it in the computational basis. To simplify the discussion, we assume O_f is a string of Pauli Z's.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Two decomposition approach compared in main text. The top-right approach is the presented, and the bottom-right approach is of reference [16].
Download figure:
Standard image High-resolution image

Let v be a desired variance of the estimation of the expectation value of O_f. We can show a naive algorithm, which runs the equal number of circuits for each terms appearing in the decomposition, to perform the decomposition with time-like cuts, in the worst case, requires 2048/v runs of n-qubit circuit, while the space-like cut approach takes $\frac{15}{2v}$ runs. The analysis of this simple example is given in appendix D. Although the analysis given here is based on a naive algorithm and there are possibilities to improve it, this analysis somewhat shows the enhancement provided by our space-like cut protocol.

General case

We can consider a general case where we perform the time-like and space-like cuts simultaneously to make a given m-qubit quantum circuit runnable on an n-qubit quantum computer. Let the number of time-like and space-like cuts be M_t and M_s, respectively. See figure 5 for a schematic illustration. For space-like cuts, we assume they are performed only on CZ gates. The input state ρ is initialized in |0⟩⟨0|^⊗m and O_f is an output (diagonal) observable calculated from some output function f : {0, 1}^m → [−1, 1]. Our task here is to estimate the expectation $\mathbb{E}\left[f\left(y\right)\right]$ for a random bitstring y ∈ {0, 1}^m sampled from the original circuit. This model is adopted from reference [16] which originates in reference [20]. With this definition, we can get the following.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** Schematic illustration of performing the space-like cut and the time-like cut simultaneously.
Download figure:
Standard image High-resolution image

Theorem 5. The number of n-qubit circuit runs required to estimate $\mathbb{E}\left[f\left(y\right)\right]$ within accuracy with some high probability 1 − δ is $O\left(\frac{{9}^{{M}_{s}}1{6}^{{M}_{t}}}{{{\epsilon}}^{2}}\enspace \mathrm{log}\left(\frac{1}{2\delta }\right)\right)$ .

This implies that the decomposition of the circuit should be performed to minimize ${9}^{{M}_{s}}1{6}^{{M}_{t}}$ . A detailed proof is given in appendix E, however, the above can roughly be explained as follows. At each space-like cut, we get six different sets of single-qubit operations, so M_s cuts induce ${6}^{{M}_{s}}$ terms. Likewise, M_t time-like cuts induce ${8}^{{M}_{t}}$ terms, which makes the total number of circuits in decomposition ${6}^{{M}_{s}}{8}^{{M}_{t}}$ . With this decomposition, we can take a Monte-Carlo approach to estimate the sum, that is, we randomly choose circuits to run and average them. Hoeffding's inequality can be used to bound the error of such protocol, which states that if a magnitude of a random variable is always bounded by some constant a, then O(a²/²) samples would suffice to obtain an accuracy of . In this case, we are to estimate $\mathbb{E}\left[f\left(y\right)\right]={\sum }_{i=1}^{{6}^{{M}_{s}}{8}^{{M}_{t}}}{c}_{i}\langle \langle {O}_{f}\vert \mathcal{S}\left({V}_{i}\right)\vert \rho \rangle \rangle$ with i randomly drawn from $\left\{1,\dots ,{6}^{{M}_{s}}{8}^{{M}_{t}}\right\}$ and $\vert {c}_{i}\vert =1/{2}^{{M}_{s}+{M}_{t}}$ , that is, $\mathbb{E}\left[f\left(y\right)\right]$ is estimated by ${\mathbb{E}}_{i}\left[{6}^{{M}_{s}}{8}^{{M}_{t}}{c}_{i}\langle \langle {O}_{f}\vert \mathcal{S}\left({V}_{i}\right)\vert \rho \rangle \rangle \right]$ . The magnitude of random variable ${6}^{{M}_{s}}{8}^{{M}_{t}}{c}_{i}\langle \langle {O}_{f}\vert \mathcal{S}\left({V}_{i}\right)\vert \rho \rangle \rangle$ is roughly ${3}^{{M}_{s}}{4}^{{M}_{t}}$ , thus we can apply the Hoeffding bound to get the result.

3.2. Distant two-qubit gates

The theorem introduced above can be utilized to 'virtually' perform a two-qubit gate between qubits at distance. Figure 6 shows an example of such a virtual two-qubit gate. Notice that this protocol works irrespective of the distance between the qubits. Many swap gates are otherwise necessary for performing such gates, which makes them impractical on NISQ devices due to the non-negligible amount of decoherence and gate error of such devices.

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Decomposition of distant two-qubit gate on a square lattice. Each vertex of the graph represents a qubit and the edge represents the connectivity of the qubits. S is the set of pairs of single-qubit operations which appears in the formula in lemma 1, and c_s is the corresponding coefficient for each pair.
Download figure:
Standard image High-resolution image

This protocol might be useful for the variational algorithms such as the variational quantum eigensolver (VQE) [9] and the quantum approximate optimization algorithms (QAOA) [12]. Here, we describe an example in the QAOA. In the QAOA, we seek to find a ground state of a Hamiltonian H on n-qubit which is a sum of Pauli Z's and its products. For example, a Hamiltonian may have the form of,

$\begin{equation}H=\sum\limits _{ij}{J}_{ij}{Z}_{i}{Z}_{j}.\end{equation} \tag{ 12 }$

The QAOA tries to solve the problem by converting it to an optimization problem of a continuous variable β and γ . The optimization of β and γ are performed so as to minimize the function,

$\begin{equation}\langle H\left(\boldsymbol{\beta },\boldsymbol{\gamma }\right)\rangle =\langle +{\vert }^{\otimes n}{U}^{{\dagger}}\left(\boldsymbol{\beta },\boldsymbol{\gamma }\right)HU\left(\boldsymbol{\beta },\boldsymbol{\gamma }\right)\vert {+\rangle }^{\otimes n},\end{equation} \tag{ 13 }$

where,

$\begin{equation}U\left(\boldsymbol{\beta },\boldsymbol{\gamma }\right)={\text{e}}^{\text{i}{\boldsymbol{\beta }}_{p}\sum\limits _{i}{X}_{i}}\enspace {\text{e}}^{\text{i}{\boldsymbol{\gamma }}_{p}H}\dots {\text{e}}^{\text{i}{\boldsymbol{\gamma }}_{2}H}\enspace {\text{e}}^{\text{i}{\boldsymbol{\beta }}_{1}\sum\limits _{i}{X}_{i}}\enspace {\text{e}}^{\text{i}{\boldsymbol{\gamma }}_{1}H}.\end{equation} \tag{ 14 }$

This algorithm has been experimentally demonstrated [13] with the connectivity of the target Hamiltonian being equivalent to the connectivity of the actual device.

The equivalence of the connectivity is almost necessary from the requirement to perform e^{i
γ

H}. This requirement can somewhat be relaxed by our protocol which enables qubits to virtually interact irrespective of the distance between them. Let us now assume that an available device has a square-lattice connectivity of figure 6, and a Hamiltonian of the QAOA which we aim to solve has an interaction between one pair of qubits that is not included in the hardware connectivity graph. In this case, to execute the QAOA circuit (equation (14)), we can use our space-like technique p times to virtually apply the unitary. The scaling of the cost can be bounded by setting M_t = 0 and M_s = p in theorem 5 which gives us a scaling of 9^p ⁻² log[1/(2δ)]. The time-like cut approach of Peng et al [16] can also be utilized in this direction. However, as this approach would require 4 cuts per gate, the cost scaling is bounded by 16^4p ⁻² log[1/(2δ)] by setting M_t = 4p and M_s = 0 in theorem 5. This demonstrates an advantage, albeit in this special settings, of our technique over the previous result.

In the context of the VQE, which is also an algorithm to find a ground state of a Hamiltonian but mainly targets a concrete physical system such as molecules, it has been proposed to use the same kind of quantum circuits as the QAOA [27, 28]. Our result may also be applicable in constructing such circuits.

4. Discussion and conclusion

We described a technique to decompose a non-local operations into a sequence of local operations. As the single-qubit operations are generally more accurate on NISQ devices, the proposed technique can be used to enhance their capability. We believe intrinsic noise on single-qubit operations can be compensated by recent sophisticated error mitigation techniques [26]. In particular, our technique of the space-like cut of two-qubit gates can improve the simulation of a large quantum circuit with a small quantum computer in some cases. It would be interesting to investigate the best strategy to perform 'cuts' to reduce the number of qubits compatible with an available device. Also, the algorithm we have given to bound the cost scaling is rather straight forward and we believe it can be improved with a more sophisticated strategy.

The proposed algorithm can also be compared to the classical simulation strategy that splits a large circuit by decomposing two-qubit gates. For example, a controlled-NOT gate can be splitted using a tensor network based technique [29]. However, such techniques generally does not focus on decompositions of $\mathcal{S}\left(U\right)$ considered in this work but rather the two-qubit unitary U itself, which takes makes them difficult to be used on NISQ devices as equation (5) cannot be utilized anymore.

Our technique can induce an entanglement-like effect without performing any two-qubit gate with the cost mentioned in lemmas 2 and 4. This connects this work to areas like quantum communication. This 'virtual' entanglement creation could be done with the time-like cut proposed by Peng et al, but our work lowered the cost to perform the task. It is interesting to know whether ours is the optimal protocol or there is a more efficient way.

To summarize, our technique allows qubits to virtually interact irrespective of physical distances between them. The result is useful for applying a two-qubit gate to a distant pair of qubits. In particular, when applied to the NISQ devices, this may be employed to enhance the power of them. Future direction can be to explore if we can lower the resource to perform such virtual operations.

Acknowledgments

KM thanks the METI and IPA for their support through the MITOU Target program. KM is also supported by JSPS KAKENHI No. 19J10978 and No. 20K22330, and JST PRESTO JPMJPR2019. KF is supported by KAKENHI No. 16H02211, JST PRESTO JPMJPR1668, JST ERATO JPMJER1601, and JST CREST JPMJCR1673. The authors thank Suguru Endo for fruitful discussions and letting us become aware of reference [26]. This work is supported by MEXT Quantum Leap Flagship Program (MEXT Q-LEAP) Grant Nos. JPMXS0118067394 and JPMXS0120319794.

Appendix A.: Proof of lemmas 1 and 3

A tensor representation of $\mathcal{S}\left(\left(I+{\alpha }_{1}{A}_{1}\right)\otimes \left(I+{\alpha }_{2}{A}_{2}\right)\right)$ on a set of basis ${\left\{\vert {e}_{i}{e}_{j}\rangle \rangle \right\}}_{i,j=1}^{4}$ is as follows.

$\begin{align}\hfill & \langle \langle {e}_{i}{e}_{j}\vert \mathcal{S}\left(I+{\alpha }_{1}{A}_{1}\right)\otimes \left(I+{\alpha }_{2}{A}_{2}\right)\vert {e}_{k}{e}_{l}\rangle \rangle \hfill \\ \hfill & \quad =\mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left(I+{\alpha }_{1}{A}_{1}\right)\otimes \left(I+{\alpha }_{2}{A}_{2}\right){e}_{k}\otimes {e}_{l}\left(I+{\alpha }_{1}^{{\ast}}{A}_{1}\right)\otimes \left(I+{\alpha }_{2}^{{\ast}}{A}_{2}\right)\right)\hfill \\ \hfill & \quad =\mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\enspace {e}_{k}\otimes {e}_{l}\right)+{\alpha }_{1}\enspace \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes I\right){e}_{k}\otimes {e}_{l}\right)+{\alpha }_{1}^{{\ast}}\enspace \mathrm{T}\mathrm{r}\left({e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes I\right){e}_{i}\otimes {e}_{j}\right)\hfill \\ \hfill & \qquad +{\alpha }_{2}\enspace \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left(I\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\right)+{\alpha }_{2}^{{\ast}}\enspace \mathrm{T}\mathrm{r}\left({e}_{k}\otimes {e}_{l}\left(I\otimes {A}_{2}\right){e}_{i}\otimes {e}_{j}\right)\hfill \\ \hfill & \qquad +{\alpha }_{1}{\alpha }_{2}\enspace \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\right)+{\alpha }_{1}^{{\ast}}{\alpha }_{2}^{{\ast}}\enspace \mathrm{T}\mathrm{r}\left({e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes {A}_{2}\right){e}_{i}\otimes {e}_{j}\right)\hfill \\ \hfill & \qquad +{\alpha }_{1}{\alpha }_{2}^{{\ast}}\enspace \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes I\right){e}_{k}\otimes {e}_{l}\left(I\otimes {A}_{2}\right)\right)+{\alpha }_{1}^{{\ast}}{\alpha }_{2}\enspace \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left(I\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes I\right)\right)\hfill \\ \hfill & \qquad +{\alpha }_{1}\enspace \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left(I\otimes {A}_{2}\right)\right)+{\alpha }_{1}^{{\ast}}\enspace \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left(I\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes {A}_{2}\right)\right)\hfill \\ \hfill & \qquad +{\alpha }_{2}\enspace \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes I\right)\right)+{\alpha }_{2}^{{\ast}}\enspace \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes I\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes {A}_{2}\right)\right)\hfill \\ \hfill & \qquad +\mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes I\right)\right)+\mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes I\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes {A}_{2}\right)\right)\hfill \\ \hfill & \qquad +\mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes {A}_{2}\right)\right).\hfill \end{align} \tag{ A1 }$

Let,

$\begin{equation}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}_{ij,kl}{:=}\mathcal{S}{\left(\left(I+{\alpha }_{1}{A}_{1}\right)\otimes \left(I+{\alpha }_{2}{A}_{2}\right)\right)}_{ij,kl},\end{equation} \tag{ A2 }$

$\begin{equation}\left(\begin{matrix}\hfill {a}_{1,ijkl}\hfill \\ \hfill {a}_{2,ijkl}\hfill \\ \hfill {a}_{3,ijkl}\hfill \\ \hfill {a}_{4,ijkl}\hfill \\ \hfill {a}_{5,ijkl}\hfill \\ \hfill {a}_{6,ijkl}\hfill \\ \hfill {a}_{7,ijkl}\hfill \\ \hfill {a}_{8,ijkl}\hfill \\ \hfill {a}_{9,ijkl}\hfill \\ \hfill {a}_{10,ijkl}\hfill \\ \hfill {a}_{11,ijkl}\hfill \\ \hfill {a}_{12,ijkl}\hfill \\ \hfill {a}_{13,ijkl}\hfill \\ \hfill {a}_{14,ijkl}\hfill \\ \hfill {a}_{15,ijkl}\hfill \\ \hfill {a}_{16,ijkl}\hfill \end{matrix}\right){:=}\left(\begin{matrix}\hfill \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\enspace {e}_{k}\otimes {e}_{l}\right)\hfill \\ \hfill \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes I\right){e}_{k}\otimes {e}_{l}\right)\hfill \\ \hfill \mathrm{T}\mathrm{r}\left({e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes I\right){e}_{i}\otimes {e}_{j}\right)\hfill \\ \hfill \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left(I\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\right)\hfill \\ \hfill \mathrm{T}\mathrm{r}\left({e}_{k}\otimes {e}_{l}\left(I\otimes {A}_{2}\right){e}_{i}\otimes {e}_{j}\right)\hfill \\ \hfill \mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\right)\hfill \\ \hfill \mathrm{T}\mathrm{r}\left({e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes {A}_{2}\right){e}_{i}\otimes {e}_{j}\right)\hfill \\ \hfill \left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes I\right){e}_{k}\otimes {e}_{l}\left(I\otimes {A}_{2}\right)\right)\hfill \\ \hfill \left({e}_{i}\otimes {e}_{j}\left(I\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes I\right)\right)\hfill \\ \hfill \left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left(I\otimes {A}_{2}\right)\right)\hfill \\ \hfill \left({e}_{i}\otimes {e}_{j}\left(I\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes {A}_{2}\right)\right)\hfill \\ \hfill \left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes I\right)\right)\hfill \\ \hfill \left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes I\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes {A}_{2}\right)\right)\hfill \\ \hfill \left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes I\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes I\right)\right)\hfill \\ \hfill \left({e}_{i}\otimes {e}_{j}\left(I\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left(I\otimes {A}_{2}\right)\right)\hfill \\ \hfill \left({e}_{i}\otimes {e}_{j}\left({A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left({A}_{1}\otimes {A}_{2}\right)\right)\hfill \end{matrix}\right).\end{equation} \tag{ A3 }$

The relation can be summarized in matrix form,

$\begin{equation}\left(\begin{matrix}\hfill {\left\{+1,+1\right\}}_{ij,kl}\hfill \\ \hfill {\left\{+1,+i\right\}}_{ij,kl}\hfill \\ \hfill {\left\{+1,-1\right\}}_{ij,kl}\hfill \\ \hfill {\left\{+1,-i\right\}}_{ij,kl}\hfill \\ \hfill {\left\{+i,+1\right\}}_{ij,kl}\hfill \\ \hfill {\left\{+i,+i\right\}}_{ij,kl}\hfill \\ \hfill {\left\{+i,-1\right\}}_{ij,kl}\hfill \\ \hfill {\left\{+i,-i\right\}}_{ij,kl}\hfill \\ \hfill {\left\{-1,+1\right\}}_{ij,kl}\hfill \\ \hfill {\left\{-1,+i\right\}}_{ij,kl}\hfill \\ \hfill {\left\{-1,-1\right\}}_{ij,kl}\hfill \\ \hfill {\left\{-1,-i\right\}}_{ij,kl}\hfill \\ \hfill {\left\{-i,+1\right\}}_{ij,kl}\hfill \\ \hfill {\left\{-i,+i\right\}}_{ij,kl}\hfill \\ \hfill {\left\{-i,-1\right\}}_{ij,kl}\hfill \\ \hfill {\left\{-i,-i\right\}}_{ij,kl}\hfill \end{matrix}\right)=\left(\begin{array}{cccccccccccccccc}\hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill -i\hfill & \hfill i\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \end{array}\right)\left(\begin{matrix}\hfill {a}_{1}\hfill \\ \hfill {a}_{2}\hfill \\ \hfill {a}_{3}\hfill \\ \hfill {a}_{4}\hfill \\ \hfill {a}_{5}\hfill \\ \hfill {a}_{6}\hfill \\ \hfill {a}_{7}\hfill \\ \hfill {a}_{8}\hfill \\ \hfill {a}_{9}\hfill \\ \hfill {a}_{10}\hfill \\ \hfill {a}_{11}\hfill \\ \hfill {a}_{12}\hfill \\ \hfill {a}_{13}\hfill \\ \hfill {a}_{14}\hfill \\ \hfill {a}_{15}\hfill \\ \hfill {a}_{16}\hfill \end{matrix}\right).\end{equation} \tag{ A4 }$

A.1. Proof of lemma 1

Tensor representation of $\mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)$ is,

$\begin{align}\hfill & \langle \langle {e}_{i}{e}_{j}\vert \mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)\vert {e}_{k}{e}_{l}\rangle \rangle =\langle \langle {e}_{i}{e}_{j}\vert \mathcal{S}\left(\mathrm{cos}\enspace \theta +i\enspace \mathrm{sin}\enspace \theta {A}_{1}\otimes {A}_{2}\right)\vert {e}_{k}{e}_{l}\rangle \rangle \hfill \\ \hfill & \qquad =\mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left(\mathrm{cos}\enspace \theta I+i\enspace \mathrm{sin}\enspace \theta {A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left(\mathrm{cos}\enspace \theta I-i\enspace \mathrm{sin}\enspace \theta {A}_{1}\otimes {A}_{2}\right)\right)\hfill \\ \hfill & \qquad ={\mathrm{cos}}^{2}\enspace \theta {a}_{1,ijkl}+i\enspace \mathrm{sin}\enspace \theta \enspace \mathrm{cos}\enspace \theta \left({a}_{6,ijkl}-{a}_{7,ijkl}\right)+{\mathrm{sin}}^{2}\enspace \theta {a}_{16,ijkl}.\hfill \end{align} \tag{ A5 }$

Observe that,

$\begin{equation}\begin{cases}\left\{+1,+i\right\}-\left\{+1,-i\right\}\quad \hfill \\ =2i\left({a}_{4}-{a}_{5}\right)+2i\left({a}_{6}-{a}_{7}\right)-2i\left({a}_{8}-{a}_{9}\right)+2i\left({a}_{12}-{a}_{13}\right),\quad \hfill \\ \left\{-1,+i\right\}-\left\{-1,-i\right\}\quad \hfill \\ =2i\left({a}_{4}-{a}_{5}\right)-2i\left({a}_{6}-{a}_{7}\right)+2i\left({a}_{8}-{a}_{9}\right)+2i\left({a}_{12}-{a}_{13}\right),\quad \hfill \\ \left\{+i,+1\right\}-\left\{-i,+1\right\}\quad \hfill \\ =2i\left({a}_{2}-{a}_{3}\right)+2i\left({a}_{6}-{a}_{7}\right)+2i\left({a}_{8}-{a}_{9}\right)+2i\left({a}_{12}-{a}_{13}\right),\quad \hfill \\ \left\{+i,-1\right\}-\left\{-i,-1\right\}\quad \hfill \\ =2i\left({a}_{2}-{a}_{3}\right)-2i\left({a}_{6}-{a}_{7}\right)-2i\left({a}_{8}-{a}_{9}\right)+2i\left({a}_{12}-{a}_{13}\right),\quad \hfill \end{cases}\end{equation} \tag{ A6 }$

where we abbreviated the subscripts ijkl. We can solve the above for i(a₆ − a₇), and obtain

$\begin{align}\hfill 8i\left({a}_{6}-{a}_{7}\right)& =\left\{+1,+i\right\}-\left\{+1,-i\right\}-\left\{-1,+i\right\}+\left\{-1,-i\right\}\hfill \\ \hfill & \quad +\left\{+i,+1\right\}-\left\{-i,+1\right\}-\left\{+i,-1\right\}+\left\{-i,-1\right\}\hfill \end{align} \tag{ A7 }$

$\begin{equation}=\sum\limits _{\boldsymbol{\alpha }\in {\left\{{\pm}1\right\}}^{2}}{\alpha }_{1}{\alpha }_{2}\left[\mathcal{S}\left(\left(I+{\alpha }_{1}{A}_{1}\right)\otimes \left(I+i{\alpha }_{2}{A}_{2}\right)\right)+\mathcal{S}\left(\left(I+i{\alpha }_{1}{A}_{1}\right)\otimes i\left(I+{\alpha }_{2}{A}_{2}\right)\right)\right].\end{equation} \tag{ A8 }$

Combining this with equation (A5) completes the proof.

A.2. Proof of lemma 3

We first write down the tensor representation of the projective measurement, I + β A₁ ⊗ A₂ for β = ±1.

$\begin{align}\hfill \langle \langle {e}_{i}{e}_{j}\vert \mathcal{S}\left(I+\boldsymbol{\beta }{A}_{1}\otimes {A}_{2}\right)\vert {e}_{k}{e}_{l}\rangle \rangle & =\mathrm{T}\mathrm{r}\left({e}_{i}\otimes {e}_{j}\left(I+\boldsymbol{\beta }{A}_{1}\otimes {A}_{2}\right){e}_{k}\otimes {e}_{l}\left(I+\boldsymbol{\beta }{A}_{1}\otimes {A}_{2}\right)\right)\hfill \\ \hfill & ={a}_{1,ijkl}+\boldsymbol{\beta }\left({a}_{6,ijkl}+{a}_{7,ijkl}\right)+{a}_{16,ijkl}\hfill \end{align} \tag{ A9 }$

Similarly to the previous proof, observe that,

$\begin{equation}\begin{cases}\left\{+1,+1\right\}-\left\{+1,-1\right\}\quad \hfill \\ =2\left({a}_{4}+{a}_{5}\right)+2\left({a}_{6}+{a}_{7}\right)+2\left({a}_{8}+{a}_{9}\right)+2\left({a}_{12}+{a}_{13}\right),\quad \hfill \\ \left\{-1,+1\right\}-\left\{-1,-1\right\}\quad \hfill \\ =2\left({a}_{4}+{a}_{5}\right)-2\left({a}_{6}+{a}_{7}\right)-2\left({a}_{8}+{a}_{9}\right)+2\left({a}_{12}+{a}_{13}\right),\quad \hfill \\ \left\{+i,+i\right\}-\left\{+i,-i\right\}\quad \hfill \\ =2i\left({a}_{4}-{a}_{5}\right)-2\left({a}_{6}+{a}_{7}\right)+2\left({a}_{8}+{a}_{9}\right)+2i\left({a}_{12}-{a}_{13}\right),\quad \hfill \\ \left\{-i,+i\right\}-\left\{-i,-i\right\}\quad \hfill \\ =2i\left({a}_{4}-{a}_{5}\right)+2\left({a}_{6}+{a}_{7}\right)-2\left({a}_{8}+{a}_{9}\right)+2i\left({a}_{12}-{a}_{13}\right),\quad \hfill \end{cases}\end{equation} \tag{ A10 }$

We can solve the above for i(a₆ + a₇), and obtain,

$\begin{align}\hfill 8\left({a}_{6}+{a}_{7}\right)& =\left\{+1,+1\right\}-\left\{+1,-1\right\}-\left\{-1,+1\right\}+\left\{-1,-1\right\}\hfill \\ \hfill & \quad -\left\{+i,+i\right\}+\left\{+i,-i\right\}+\left\{-i,+i\right\}-\left\{-i,-i\right\}\hfill \end{align} \tag{ A11 }$

$\begin{equation}=\sum\limits _{\boldsymbol{\alpha }\in {\left\{{\pm}1\right\}}^{2}}{\alpha }_{1}{\alpha }_{2}\left[\mathcal{S}\left(\left(I+{\alpha }_{1}{A}_{1}\right)\otimes \left(I+{\alpha }_{2}{A}_{2}\right)\right)-\mathcal{S}\left(\left(I+i{\alpha }_{1}{A}_{1}\right)\otimes \left(I+i{\alpha }_{2}{A}_{2}\right)\right)\right].\end{equation} \tag{ A12 }$

A.3. Relation with reference [20]

Bravyi et al has considered to remove k qubits in a given n + k-qubit circuit at the cost of O(kd2^k) classical computation, where d defined to be proportional to the number of gates applied to the k-qubit system. The technique utilized in their work, in particular, figure 2 in reference [20] can also provide a derivation to the above lemmas when combined with our recent technique developed in reference [28].

Appendix B.: Proof of lemma 2

Suppose that we are applying $\mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)$ to some state ρ and want to decompose the gate. We name each operation in the decomposition as,

$\begin{align}\hfill {{\Phi}}_{1,\boldsymbol{\beta }}& =\mathcal{S}\left(I\otimes I\right),\hfill \\ \hfill {{\Phi}}_{2,\boldsymbol{\beta }}& =\mathcal{S}\left({A}_{1}\otimes {A}_{2}\right),\hfill \\ \hfill {{\Phi}}_{3,\boldsymbol{\beta }}& =\boldsymbol{\beta }{\mathcal{M}}_{{A}_{1},\boldsymbol{\beta }}\otimes \mathcal{S}\left({\text{e}}^{\text{i}\pi {A}_{2}/4}\right),\hfill \\ \hfill {{\Phi}}_{4,\boldsymbol{\beta }}& =\boldsymbol{\beta }{\mathcal{M}}_{{A}_{1},\boldsymbol{\beta }}\otimes \mathcal{S}\left({\text{e}}^{-\text{i}\pi {A}_{2}/4}\right),\hfill \\ \hfill {{\Phi}}_{5,\boldsymbol{\beta }}& =\boldsymbol{\beta }\mathcal{S}\left({\text{e}}^{\text{i}\pi {A}_{1}/4}\right)\otimes {\mathcal{M}}_{{A}_{2},\boldsymbol{\beta }},\hfill \\ \hfill {{\Phi}}_{6,\boldsymbol{\beta }}& =\boldsymbol{\beta }\mathcal{S}\left({\text{e}}^{-\text{i}\pi {A}_{1}/4}\right)\otimes {\mathcal{M}}_{{A}_{2},\boldsymbol{\beta }}.\hfill \end{align} \tag{ B1 }$

which is not physical when β _3,4,5,6 = −1 but achievable with classical post processing. ${\mathcal{M}}_{{A}_{i},\boldsymbol{\beta }}$ is a postselective measurement operation, which has been introduced in the main text. For convenience, we define coefficients ${\left\{{c}_{i}\right\}}_{i=1}^{6}$ as

$\begin{align}\hfill {c}_{1}& ={\mathrm{cos}}^{2}\enspace \theta ,\hfill \\ \hfill {c}_{2}& ={\mathrm{sin}}^{2}\enspace \theta ,\hfill \\ \hfill {c}_{3}& =-{c}_{4}={c}_{5}=-{c}_{6}=\mathrm{cos}\enspace \theta \enspace \mathrm{sin}\enspace \theta ,\hfill \end{align} \tag{ B2 }$

Then,

$\begin{equation}\mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)\vert \rho \rangle \rangle \end{equation} \tag{ B3 }$

$\begin{equation}=\left[{c}_{1}{{\Phi}}_{1,\boldsymbol{\beta }}+{c}_{2}{{\Phi}}_{2,\boldsymbol{\beta }}\right.\end{equation} \tag{ B4 }$

$\begin{equation}\quad \sum\limits _{\boldsymbol{\beta }\in \left\{1,-1\right\}}\enspace \mathrm{T}\mathrm{r}\left(\rho \frac{I+\boldsymbol{\beta }{A}_{1}}{2}\right)\left({c}_{3}{{\Phi}}_{3,\boldsymbol{\beta }}+{c}_{4}{{\Phi}}_{4,\boldsymbol{\beta }}\right)\end{equation} \tag{ B5 }$

$\begin{equation}\quad \left.\sum\limits _{\boldsymbol{\beta }\in \left\{1,-1\right\}}\enspace \mathrm{T}\mathrm{r}\left(\rho \frac{I+\boldsymbol{\beta }{A}_{2}}{2}\right)\left({c}_{5}{{\Phi}}_{5,\boldsymbol{\beta }}+{c}_{6}{{\Phi}}_{6,\boldsymbol{\beta }}\right)\right]\vert \rho \rangle \rangle \end{equation} \tag{ B6 }$

We take a naive algorithm to bound the error of the decomposition. We define a probabilistic map below

$\begin{align}\hfill {{\Phi}}_{1}& =\mathcal{S}\left(I\otimes I\right),\hfill \\ \hfill {{\Phi}}_{2}& =\mathcal{S}\left({A}_{1}\otimes {A}_{2}\right),\hfill \\ \hfill {{\Phi}}_{3}& ={\mathcal{M}}_{{A}_{1}}^{\prime }\otimes \mathcal{S}\left({\text{e}}^{\text{i}\pi {A}_{2}/4}\right),\hfill \\ \hfill {{\Phi}}_{4}& ={\mathcal{M}}_{{A}_{1}}^{\prime }\otimes \mathcal{S}\left({\text{e}}^{-\text{i}\pi {A}_{2}/4}\right),\hfill \\ \hfill {{\Phi}}_{5}& =\mathcal{S}\left({\text{e}}^{\text{i}\pi {A}_{1}/4}\right)\otimes {\mathcal{M}}_{{A}_{2}}^{\prime },\hfill \\ \hfill {{\Phi}}_{6}& =\mathcal{S}\left({\text{e}}^{-\text{i}\pi {A}_{1}/4}\right)\otimes {\mathcal{M}}_{{A}_{2}}^{\prime },\hfill \end{align} \tag{ B7 }$

where ${\mathcal{M}}_{{A}_{i}}^{\prime }$ acts on a state ρ probabilistically as,

$\begin{equation}{\mathcal{M}}_{{A}_{i}}^{\prime }\left(\rho \right)\to b{\mathcal{M}}_{{A}_{i},b}\left(\rho \right)\end{equation} \tag{ B8 }$

where b is a random variable with probability distribution $p\left(b={\pm}1\right)=\mathrm{T}\mathrm{r}\left(\rho \frac{I{\pm}{A}_{i}}{2}\right)$ . Again, when b = −1 this map is non-physical but can be realized with classical post processing. Φ_i becomes Φ_i,b with probability $\mathrm{T}\mathrm{r}\left(\rho \frac{I{\pm}{A}_{i}}{2}\right)$ , and therefore,

$\begin{equation}\mathbb{E}\left[{{\Phi}}_{i}\right]=\mathrm{T}\mathrm{r}\left(\rho \frac{I+{A}_{1}}{2}\right){{\Phi}}_{i,+1}+\mathrm{T}\mathrm{r}\left(\rho \frac{I-{A}_{1}}{2}\right){{\Phi}}_{i,-1}\end{equation} \tag{ B9 }$

for i = 3, 4. A similar equality holds for i = 5, 6. This yields,

$\begin{equation}\mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)\vert \rho \rangle \rangle =\sum\limits _{i=1}^{6}{c}_{i}\mathbb{E}\left[{{\Phi}}_{i}\vert \rho \rangle \rangle \right]\end{equation} \tag{ B10 }$

Suppose that we take N samples for each i = 1, ..., 6 to estimate $\mathbb{E}\left[{{\Phi}}_{i}\vert \rho \rangle \rangle \right]$ . The i = 1, 2 cases are not probabilistic and hence do not introduce error. We are left to consider the error induced by i = 3, 4, 5, 6. In this case, we can estimate $\vert {\mu }_{i}\rangle \rangle =\mathbb{E}\left[{{\Phi}}_{i}\vert \rho \rangle \rangle \right]$ by

$\begin{equation}\vert \bar{{\mu }_{i}}\rangle \rangle =\frac{1}{N}\sum\limits _{j=1}^{N}{{\Phi}}_{i,{b}_{ij}}\vert \rho \rangle \rangle ,\end{equation} \tag{ B11 }$

where ${\left\{{b}_{ij}\right\}}_{j=1}^{N}$ are samples drawn from the distribution which is identical to the above mentioned b. Now the difference between the true state $\mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)\vert \rho \rangle \rangle$ and the estimated ${\sum }_{i=1}^{6}{c}_{i}\vert \bar{{\mu }_{i}}\rangle \rangle$ is,

$\begin{align}\hfill \mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)\vert \rho \rangle \rangle -\sum\limits _{i=1}^{6}{c}_{i}\vert \bar{{\mu }_{i}}\rangle \rangle & =\left(\mathrm{T}\mathrm{r}\left(\rho \frac{I+{A}_{1}}{2}\right)-\frac{1}{2N}\sum\limits _{j=1}^{N}\left({b}_{3j}+1\right)\right){c}_{3}{{\Phi}}_{3,+1}\vert \rho \rangle \rangle \hfill \\ \hfill & \quad +\left(\mathrm{T}\mathrm{r}\left(\rho \frac{I-{A}_{1}}{2}\right)-\frac{1}{2N}\sum\limits _{j=1}^{N}\left(1-{b}_{3j}\right)\right){c}_{3}{{\Phi}}_{3,-1}\vert \rho \rangle \rangle \hfill \\ \hfill & \quad +\left(\mathrm{T}\mathrm{r}\left(\rho \frac{I+{A}_{1}}{2}\right)-\frac{1}{2N}\sum\limits _{j=1}^{N}\left({b}_{4j}+1\right)\right){c}_{4}{{\Phi}}_{4,+1}\vert \rho \rangle \rangle \hfill \\ \hfill & \quad +\left(\mathrm{T}\mathrm{r}\left(\rho \frac{I-{A}_{1}}{2}\right)-\frac{1}{2N}\sum\limits _{j=1}^{N}\left(1-{b}_{4j}\right)\right){c}_{4}{{\Phi}}_{4,+1}\vert \rho \rangle \rangle \hfill \\ \hfill & \quad +\left(\mathrm{T}\mathrm{r}\left(\rho \frac{I+{A}_{2}}{2}\right)-\frac{1}{2N}\sum\limits _{j=1}^{N}\left({b}_{5j}+1\right)\right){c}_{5}{{\Phi}}_{5,+1}\vert \rho \rangle \rangle \hfill \\ \hfill & \quad +\left(\mathrm{T}\mathrm{r}\left(\rho \frac{I-{A}_{2}}{2}\right)-\frac{1}{2N}\sum\limits _{j=1}^{N}\left(1-{b}_{5j}\right)\right){c}_{5}{{\Phi}}_{5,-1}\vert \rho \rangle \rangle \hfill \\ \hfill & \quad +\left(\mathrm{T}\mathrm{r}\left(\rho \frac{I+{A}_{2}}{2}\right)-\frac{1}{2N}\sum\limits _{j=1}^{N}\left({b}_{6j}+1\right)\right){c}_{6}{{\Phi}}_{6,+1}\vert \rho \rangle \rangle \hfill \\ \hfill & \quad +\left(\mathrm{T}\mathrm{r}\left(\rho \frac{I-{A}_{2}}{2}\right)-\frac{1}{2N}\sum\limits _{j=1}^{N}\left(1-{b}_{5j}\right)\right){c}_{6}{{\Phi}}_{6,-1}\vert \rho \rangle \rangle .\hfill \end{align} \tag{ B12 }$

$\frac{1{\pm}{b}_{ij}}{2}$ is a Bernouilli random variable with the expectation $\frac{I{\pm}{A}_{1}}{2}$ and $\frac{I{\pm}{A}_{2}}{2}$ respectively for i = 3, 4 and i = 5, 6. This means that, for example, the difference between $\frac{1}{2N}{\sum }_{j=1}^{N}\left({b}_{3j}+1\right)$ and $\frac{I+{A}_{1}}{2}$ is bounded by > 0, that is, $\left\vert \frac{1}{2N}{\sum }_{j=1}^{N}\left({b}_{3j}+1\right)-\mathrm{T}\mathrm{r}\left[\rho \frac{I+{A}_{1}}{2}\right]\right\vert {\leqslant}{\epsilon}$ with probability at most 1 − exp(−2² N) from Hoeffding's inequality. The same bound holds for every term in equation (B12). Noting that if $\left\vert \frac{1}{2N}{\sum }_{j=1}^{N}\left({b}_{3j}+1\right)-\mathrm{T}\mathrm{r}\left[\rho \frac{I+{A}_{1}}{2}\right]\right\vert {\leqslant}{\epsilon}$ holds, $\left\vert \frac{1}{2N}{\sum }_{j=1}^{N}\left(1-{b}_{3j}\right)-\mathrm{T}\mathrm{r}\left[\rho \frac{I-{A}_{1}}{2}\right]\right\vert {\leqslant}{\epsilon}$ also holds, the probability that at least one of the differences in equation (B12) is larger than is at most 4 exp(−2² N), by union bound. Therefore, with probability at least 1–4 exp(−2² N),

$\begin{align}\hfill {\Vert}\mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)\vert \rho \rangle \rangle -\sum\limits _{i=1}^{6}{c}_{i}\vert \bar{{\mu }_{i}}\rangle \rangle {\Vert}& {\leqslant}{\epsilon}{\Vert}\sum\limits _{i=3}^{6}\sum\limits _{\boldsymbol{\beta }\in \left\{1,-1\right\}}{c}_{i}{{\Phi}}_{i,\boldsymbol{\beta }}\vert \rho \rangle \rangle {\Vert}\hfill \\ \hfill & {\leqslant}{\epsilon}\sum\limits _{i=3}^{6}\sum\limits _{\boldsymbol{\beta }\in \left\{1,-1\right\}}{\Vert}{c}_{i}{{\Phi}}_{i,\boldsymbol{\beta }}\vert \rho \rangle \rangle {\Vert},\hfill \end{align} \tag{ B13 }$

holds for any norm ||⋅||. The second inequality follows from the triangle inequality. Considering the trace norm ||⋅||₁, which gives ||Φ_{i,
β}|ρ||₁ = 1, and taking |c_i| ⩽ 1 into account, we get

$\begin{equation}{\Vert}\mathcal{S}\left({\text{e}}^{\text{i}\theta {A}_{1}\otimes {A}_{2}}\right)\vert \rho \rangle \rangle -\vert \bar{{\mu }_{i}}\rangle \rangle {\Vert}{\leqslant}8{\epsilon}.\end{equation} \tag{ B14 }$

With this, we conclude that, given the desired error 1/ and a probability 1 − δ by which we wish to lower-bound the probability of getting the error larger than , we can take $N=-\frac{32}{{{\epsilon}}^{2}}\enspace \mathrm{ln}\left(1-\delta \right)$ .

Appendix C.: Time-like cut for identity channel

The time-like cut approach proposed in reference [16] can be derived in the following manner. Let us consider an identity channel ${\mathcal{I}}_{a}$ on the ath qubit. It can be expanded as,

$\begin{equation}{\mathcal{I}}_{a}=\sum\limits _{{i}_{a}=0}^{3}\sum\limits _{{j}_{a}=0}^{3}\vert {e}_{{i}_{a}}\rangle \rangle \langle \langle {e}_{{j}_{a}}\vert \langle \langle {e}_{{i}_{a}}\vert {\mathcal{I}}_{a}\vert {e}_{{j}_{a}}\rangle \rangle .\end{equation} \tag{ C1 }$

Since we assumed |e_i〉〉 are orthonormal to each other and $\mathcal{I}\vert \rho \rangle \rangle =\vert \rho \rangle \rangle$ for any ρ,

$\begin{equation}{\mathcal{I}}_{a}=\sum\limits _{{i}_{a}=0}^{3}\vert {e}_{{i}_{a}}\rangle \rangle \langle \langle {e}_{{i}_{a}}\vert \end{equation} \tag{ C2 }$

If we apply this to a n-qubit density matrix $\vert \rho \rangle \rangle ={\sum }_{{j}_{1},\dots ,{j}_{n}}{\rho }_{\boldsymbol{j}}\vert {e}_{{j}_{1}}{e}_{{j}_{2}}\dots {e}_{{j}_{n}}\rangle \rangle$ in this form, we see,

$\begin{equation}\mathcal{I}\vert \rho \rangle \rangle =\sum\limits _{{i}_{a}=0}^{3}\vert {e}_{{i}_{a}}\rangle \rangle \langle \langle {e}_{{i}_{a}}\vert \rho \rangle \rangle \end{equation} \tag{ C3 }$

$\begin{equation}=\sum\limits _{{i}_{a}=0}^{3}{\mathrm{T}\mathrm{r}}_{a}\left({e}_{{i}_{a}}\rho \right)\otimes \vert {e}_{{i}_{a}}\rangle \rangle .\end{equation} \tag{ C4 }$

Choosing $\vert {e}_{{j}_{1}}\rangle \rangle$ to be Pauli matrices $\left\{I,X,Y,Z\right\}/\sqrt{2}$ , we conclude,

$\begin{equation}\mathcal{I}\vert \rho \rangle \rangle =\frac{1}{2}\sum\limits _{A\in \left\{I,X,Y,Z\right\}}{\mathrm{T}\mathrm{r}}_{a}\left({A}_{a}\rho \right)\otimes {A}_{a}.\end{equation} \tag{ C5 }$

This equation implies that we can first measure expectation values of X, Y, Z at the ath qubit and then re-input each eigenstates.

Appendix D.: Analysis of the simple example given in section 3.1

D.1. Cost of time-like cut

First, we consider the 'time-like' cut approach. Let us name two-qubit on which the CZ gate acts a and b, and let σ_a and σ_b be their 1-qubit reduced density matrices after the gate U₁ ⊗ V₁.

We assume,

One n-qubit device is available.
Qubits that are measured in the basis of an observable O_i can be reused to prepare the input state ρ_j.

We take the following naive approach to estimate ⟨O_f⟩. First, divide the allowed number of circuit runs N into N/2 to run the divided circuit for equal times. N/2 runs are further divided into N/128 runs to run the circuit with O_i and ρ_j for i, j ∈ {1, 2, ..., 8}² [30]. With each N/128 runs with a pair (O_i, ρ_j), we estimate the value of the tensor network below. Since we assumed O_f is a tensor product of Pauli Z's and O_i is drawn from I, X, Y, Z, from each run we obtain a measurement result o_ij,r = ±1, where r is the index to distinguish the runs. Using o_ij,r, we estimate the above tensor network by ${\tilde {o}}_{ij}=\frac{1}{N/128}{\sum }_{r=1}^{N/128}{o}_{ij,r}$ . Since o_ij,r is a random variable which takes {+1, −1}, this estimator ${\tilde {o}}_{ij}$ approximately follows a normal distribution with an expectation $\mathbb{E}\left[{o}_{ij}\right]$ and a variance $\frac{1}{N/128}\left(1-\mathbb{E}{\left[{o}_{ij}\right]}^{2}\right){\leqslant}\frac{1}{N/128}$ , for sufficiently large N. Therefore, with N/128 runs of a quantum circuit, we can estimate the value of the above tensor network with the variance $\frac{128}{N}$ at most. We can obtain the same result for the other cluster.

For each i, j, k, l, the tensor in the sum of figure 4 is estimated by the product of the above estimators because the measurement result is independent on each cluster. The variance of each tensor network can be evaluated because they are a product of two random variables approximately drawn from normal distribution with variance at most $\frac{128}{N}$ , and it is at most $\frac{64}{N}$ . We have that c_ijkl ∈ {±1/2⁴}. This reduces the variance of each term in the summation to 1/(2N). However, when we take the sum since each term can be approximated by a normal distribution with the variance at most 1/(2N) and we take summation of 8⁴ = 4096 terms, the result has the variance at most 2048/N.

D.2. Cost of space-like cut

For the space-like cut, we divide the allowed number of circuit runs, N, to N/6 [31]. First, we run eight circuits that do not involve the measurement in the middle and obtain estimators for these four tensor network. Each of the estimators has the variance at most 6/N. Let us now move on to the circuits with the measurement. For arbitrary density matrix ρ', the Z measurement produces the density matrix $\left(\frac{I+\alpha }{2}\right){\rho }^{\prime }\left(\frac{I+\alpha }{2}\right)/{p}_{\alpha }$ with probability p_α. Therefore, to obtain the above decomposition, we need to know the normalization factor p_α. With N/6 circuit runs, p_α is estimated to the variance p_α(1 − p_α)/(N/6) which is at most 6/(4N) = 3/(2N). Note that when N is large, the distribution of the estimator ${\tilde {p}}_{\alpha }$ can be thought of as a normal distribution. Conditioned on α, we construct an estimator of O_f. Since α is obtained with probability p_α, for each α we have p_α N/6 samples to estimate O_f. Therefore, for each α, the estimator of O_f has the variance of 6/p_α N. Since the estimator of the tensor network is the product of the estimators of conditioned O_f and p_α, its variance is $\frac{6}{N}\frac{1}{{p}_{\alpha }+1/\left[{p}_{\alpha }\left(1-{p}_{\alpha }\right)\right]}{\leqslant}\frac{3}{2N}$ . Each pair of the tensor network in figure 2 is multiplied together, and if we perform this with the estimators obtained above, the variance of each term is at most $\frac{3}{N}$ . We further multiply each term with ±1/2, then the variance is reduced to $\frac{3}{4N}$ . Finally, the summation of 10 such term leads to the variance of $\frac{15}{2N}$ .

Appendix E.: Proof of theorem 5

We follow the approach taken in reference [16]. The task here is to perform the decomposition of m-qubit circuit like the one shown in figure 5 so that the original quantum circuit can be approximated with an n-qubit quantum computer, where the input state ρ is initialized in |0⟩⟨0|^⊗m and O_f is an output (diagonal) observable calculated from some output function f : {0, 1}^m → [−1, 1]. We want to estimate ${\mathbb{E}}_{y}\left[f\left(y\right)\right]$ for n-bit measurement outcomes y to some accuracy > 0 with some high probability 1 − δ.

Let the number of space-like cuts and time-like cuts performed in the decomposition be M_s and M_t respectively. We assume the space-like cuts are performed only on CZ gates. We redefine the probabilistic map Φ'_i that is used to decompose $\mathcal{S}\left({\text{e}}^{\text{i}\theta A\otimes B}\right)$ as,

$\begin{align}\hfill {{\Phi}}_{1}^{\prime }& =\mathcal{S}\left(I\otimes I\right),\hfill \\ \hfill {{\Phi}}_{2}^{\prime }& =\mathcal{S}\left(A\otimes B\right),\hfill \\ \hfill {{\Phi}}_{3}^{\prime }& ={\mathcal{M}}_{A}\otimes \mathcal{S}\left({\text{e}}^{\text{i}\pi B/4}\right),\hfill \\ \hfill {{\Phi}}_{4}^{\prime }& ={\mathcal{M}}_{A}\otimes \mathcal{S}\left({\text{e}}^{-\text{i}\pi B/4}\right),\hfill \\ \hfill {{\Phi}}_{5}^{\prime }& =\mathcal{S}\left({\text{e}}^{\text{i}\pi A/4}\right)\otimes {\mathcal{M}}_{B},\hfill \\ \hfill {{\Phi}}_{6}^{\prime }& =\mathcal{S}\left({\text{e}}^{-\text{i}\pi A/4}\right)\otimes {\mathcal{M}}_{B},\hfill \end{align} \tag{ E1 }$

where ${\mathcal{M}}_{A}$ and ${\mathcal{M}}_{B}$ and the projective measurement of A and B. Let s_k ∈ {1, ..., 6} be an index of the above probabilistic map ${{\Phi}}_{{s}_{k}}^{\prime }$ applied to the kth space-like cut k ∈ {1, ..., M_s} and t_l be an index of an observable-state pair $\left({O}_{{t}_{l}},{\rho }_{{t}_{l}}\right)$ in equation (11) applied to the lth time-like cut l ∈ {1, ..., M_t}. The coefficients associated with a space-like cut (equation (B2)) and a time-like cut (equation (11)) are redefined as ${c}_{{s}_{k}}^{\text{space}}$ and ${c}_{{t}_{l}}^{\text{time}}$ , respectively. With one set of indices, $s={\left\{{s}_{k}\right\}}_{k=1}^{{M}_{s}}\in {\left\{1,\dots ,6\right\}}^{{M}_{s}}$ and $t={\left\{{t}_{k}\right\}}_{k=1}^{{M}_{t}}\in {\left\{1,\dots ,8\right\}}^{{M}_{t}}$ , we can define a corresponding quantum circuit which is induced by replacing every cut two-qubit gate by ${{\Phi}}_{{s}_{k}}^{\prime }$ and every cut qubit line by the measurement of ${O}_{{t}_{l}}$ and the preparation of ${\rho }_{{t}_{l}}$ .

When we run this circuit on n-qubit quantum device, we get the measurement outcomes at each cut, which is a string of ±1 from ${{\Phi}}_{{s}_{k}}^{\prime }$ and the measurement of ${O}_{{t}_{l}}$ , and the ones at the output qubit which is a bitstring of length n. Let such outcomes from the kth space-like cut, the lth time-like cut and the output qubits be ${b}_{{s}_{k}}^{\text{space}}\in \left\{+1,-1\right\}$ , ${b}_{{t}_{l}}^{\text{time}}\in \left\{+1,-1\right\}$ and y_(s,t) ∈ {0, 1}ⁿ, respectively. Since s_k = 1, 2 does not involve measurement, we define ${b}_{1}^{\text{space}}={b}_{2}^{\text{space}}=1$ . With the definition above and the equality for performing the decomposition (equations (B10) and (11), figures 1 and 3), notice that,

$\begin{equation}{\mathbb{E}}_{y}\left[f\left(y\right)\right]=\sum\limits _{s\in {\left\{1,\dots ,6\right\}}^{{M}_{s}}}\sum\limits _{t\in {\left\{1,\dots ,8\right\}}^{{M}_{t}}}\prod\limits _{k=1}^{{M}_{s}}{c}_{{s}_{k}}\prod\limits _{l=1}^{{M}_{t}}{c}_{{t}_{l}}{\mathbb{E}}_{\left(\left\{{b}_{{s}_{k}}^{\text{space}}\right\},\left\{{b}_{{t}_{l}}^{\text{time}}\right\},{y}_{\left(s,t\right)}\right)}\left[\prod\limits _{k=1}^{{M}_{s}}{b}_{{s}_{k}}^{\text{space}}\prod\limits _{l=1}^{{M}_{t}}{b}_{{t}_{l}}^{\text{time}}f\left({y}_{\left(s,t\right)}\right)\right],\end{equation} \tag{ E2 }$

where the expectation on the right-hand side is defined over a distribution of $\left\{{b}_{{s}_{k}}^{\text{space}}\right\},\left\{{b}_{{t}_{l}}^{\text{time}}\right\}$ and y_(s,t) for a quantum circuit induced by a given set of indices (s, t).

We can take a Monte-Carlo approach to estimate the sum of the right-hand side of equation (E2). If we sample s and t from a uniform distribution on ${\left\{1,\dots ,6\right\}}^{{M}_{s}}$ and ${\left\{1,\dots ,8\right\}}^{{M}_{t}}$ respectively, equation (E2) can be rewritten as,

$\begin{equation}{\mathbb{E}}_{y}\left[f\left(y\right)\right]={\mathbb{E}}_{\left(s,t,\left\{{b}_{{s}_{k}}^{\text{space}}\right\},\left\{{b}_{{t}_{l}}^{\text{time}}\right\},{y}_{\left(s,t\right)}\right)}\left[{6}^{{M}_{s}}{8}^{{M}_{t}}\prod\limits _{k=1}^{{M}_{s}}{c}_{{s}_{k}}{b}_{{s}_{k}}^{\text{space}}\prod\limits _{l=1}^{{M}_{t}}{c}_{{t}_{l}}{b}_{{t}_{l}}^{\text{time}}f\left({y}_{\left(s,t\right)}\right)\right],\end{equation} \tag{ E3 }$

Let us define a random variable

$\begin{equation}{X}_{\left(s,t\right)}={6}^{{M}_{s}}{8}^{{M}_{t}}\prod\limits _{k=1}^{{M}_{s}}{c}_{{s}_{k}}{b}_{{s}_{k}}^{\text{space}}\prod\limits _{l=1}^{{M}_{t}}{c}_{{t}_{l}}{b}_{{t}_{l}}^{\text{time}}f\left({y}_{\left(s,t\right)}\right).\end{equation} \tag{ E4 }$

Let ${\left({s}^{\left(i\right)},{t}^{\left(i\right)}\right)}_{i=1}^{N}$ be N randomly sampled (s, t) pair. Then, ${\mathbb{E}}_{y}\left[f\left(y\right)\right]$ can be estimated by $\frac{1}{N}{\sum }_{i=1}^{N}{X}_{\left({s}^{\left(i\right)},{t}^{\left(i\right)}\right)}$ . We will use the Hoeffding's inequality to bound the error of this Monte-Carlo approach. The magnitude of X_(s,t) is bounded by,

$\begin{equation}\left\vert {6}^{{M}_{s}}{8}^{{M}_{t}}\prod\limits _{k=1}^{{M}_{s}}{c}_{{s}_{k}}{b}_{{s}_{k}}^{\text{space}}\prod\limits _{l=1}^{{M}_{t}}{c}_{{t}_{l}}\prod\limits _{k=1}^{{M}_{s}}{b}_{{t}_{l}}^{\text{time}}f\left({y}_{\left(s,t\right)}\right)\right\vert {\leqslant}{3}^{{M}_{s}}{4}^{{M}_{t}},\end{equation} \tag{ E5 }$

because $\vert {c}_{{t}_{l}}\vert =1/2$ , |f(y_(s,t))| ⩽ 1, $\vert {b}_{{s}_{k}}^{\text{space}}\vert =1$ , $\vert {b}_{{t}_{l}}^{\text{time}}\vert =1$ , and $\vert {c}_{{s}_{k}}\vert =1/2$ which follows from the assumption that the space-like cuts are performed only on CZ gates. With the above bound of the magnitude, the Hoeffding's inequality guarantees that,

$\begin{equation}\mathrm{Pr}\left[\left\vert \frac{1}{N}\sum\limits _{i=1}^{N}{X}_{\left({s}^{\left(i\right)},{t}^{\left(i\right)}\right)}-{\mathbb{E}}_{y}\left[f\left(y\right)\right]\right\vert {\leqslant}{\epsilon}\right]\end{equation} \tag{ E6 }$

$\begin{equation}{\geqslant}1-2\enspace \mathrm{exp}\left(-\frac{N{{\epsilon}}^{2}}{2\cdot {9}^{{M}_{s}}\cdot 1{6}^{{M}_{t}}}\right).\end{equation} \tag{ E7 }$

Therefore, for given and the probability 1 − δ to which we want to bound the probability of getting an error larger than , we take $N=\frac{2\cdot {9}^{{M}_{s}}\cdot 1{6}^{{M}_{t}}}{{{\epsilon}}^{2}}\enspace \mathrm{ln}\left(\frac{1}{2\delta }\right)$ .