A noise-robust quantum dynamics learning protocol based on Choi–Jamiolkowski isomorphism: theory and experiment

With the rapid development of quantum technology, the growing manipulated Hilbert space makes learning the dynamics of the quantum system a significant challenge. Machine learning technique has brought apparent advantages in some learning strategies, therefore, we introduce it to indirect learning in this paper. Based on Choi–Jamiolkowski isomorphism, we propose a protocol that learns the dynamics of an inaccessible quantum system using a quantum device at hand. For an n-qubit system, the learning task can be done iteratively, with operational complexity O(poly(n,L)/ϵ2) in each iteration, where L is the circuit depth and ε is the measurement error. Then we theoretically prove its noise resilience to global depolarization, state preparation and measurement noise, and unitary noise in gates implementation, where we find the learned dynamics stay invariant. Finally, we investigate the protocol experimentally on a nitrogen-vacancy center system with a natural noise source. The results show that the behavior of a relatively intractable nuclear spin can be learned through an easily accessible electron spin under different noise models, demonstrating the protocol’s feasibility.


Introduction
The rapid development of quantum technology permits manipulating a larger and larger quantum system [1,2].However, the exponentially growing Hilbert space brings great challenges to learning the dynamics of the quantum systems, which hinders applications such as characterization, verification, and validation [3].It is understood that exponential computational resources are required to fully understand a large quantum system: the overhead for standard quantum tomography is O(2 n ) with n the qubit number [4].To mitigate this overhead, efforts have been dedicated to designing feasible methods, even with limitations at times: low rank, sparsity, and operation symmetry are introduced to reduce learning consumption [5,6].Besides, the adaptive measurement protocol and entanglement bases are helpful but with stringent realization conditions [7,8].Machine learning methods which are applied to quantum physics, have brought apparent advantages as approximate methods [9][10][11].For example, Bayes' theorem and quantum neural network are applied to parameter estimation and quantum tomography [12,13].
In this paper, inspired by quantum machine learning, we combine parameterized quantum circuits (PQCs) and optimization methods for quantum dynamics learning.The Choi-Jamiolkowski isomorphism, which establishes a correspondence between quantum states and channels [14,15], is employed to construct our indirect protocol, in which we can learn the dynamic of a relatively intractable quantum system with the one at hand.The learning problem for n-qubit system is thus transformed to an iterative process that optimizes L-depth PQCs.When iterative steps are limited to some finite constant, the operational complexity is O(poly(n, L)/ϵ 2 ) with ε the statistic measurement error of each intermediate output.Moreover, noise resilience is proved for several noisy models, including global depolarization, state preparation and measurement (SPAM) noise, and unitary noise in gate implementation, as defined below, which supports the feasibility for current noisy devices [16][17][18].At last, we demonstrate it with experiments on a spin-based quantum device with or without equivalent SPAM noise, and numerical simulations involving other noise.
Concerning the experiments, we employ a single nitrogen-vacancy (NV) center system in diamond, which is hybridized with electron and nuclear spins [19][20][21].The NV electron spin can be controlled with rapid and high-accuracy operations, and nuclear spins have long coherence time [22][23][24].In our demonstrations, behaviors of the 14 N nuclear spin are imitated and learned by the electron spin.The background dephasing source caused by the spin bath of 13 C nuclei (1.1% natural abundance) is used to study the protocol's resilience to the noise.Experimental results show that although values of the optimized loss function vary with noise strength, the learned unitary dynamics remain invariant, assisting the validation of this protocol.
The structure of this paper is as follows: section 2 provides a detailed theoretical framework of the learning protocol.Section 3 delves into a theoretical analysis, establishing the algorithm's robustness against various prevalent noise.In section 4, the application of the algorithm is demonstrated through a single-qubit experiment within an NV center system, validating its feasibility and noise resilience.Section 5 extends the validation through both single-qubit and two-qubit numerical simulations, incorporating errors of different types and magnitudes to further affirm the resistance of the protocol to noise.Finally, section 6 presents an analysis of our methodology against existing approaches, and a concise summary on potential future research directions.

Indirect learning protocol
Complete information of an unknown unitary U with dimension d = 2 n can be learned by equal-size PQCs V(θ) if for any state |ψ⟩ in d-dimension Hilbert space H.By Choi-Jamiolkowski isomorphism [25], the sequentially applied quantum channels U • V † (θ) can be mapped to a Choi matrix M through a maximum entanglement state in which I is the d-dimension identity.Equation ( 1) is thus equivalent to the contraction condition Tr[M • |I⟩⟩⟨⟨I|] = 1.Based on the matrix product representation [26], if we denote there will be Tr[M That is, the output probability of state U ⊗ V * (θ)|I⟩⟩ on |I⟩⟩ measures the fidelity between U and V(θ) and can act as a loss function in learning the unitary U.
The theory of Choi-Jamiolkowski isomorphism can be illustrated in figure 1(a) with graphical language in tensor network [26].The closed line linking V † (θ) and U in the middle sub-figure is dubbed as a contraction and formulated as Einstein summation.If U and V * (θ) satisfy equation (1), the action of U ⊗ V * (θ) on |I⟩⟩ will keep |I⟩⟩ invariant, as shown in the right sub-figure of figure 1(a).
Based on the discussion above, we propose a learning protocol, involving an ancillary quantum system (denoted as A, ancillary) that learns the dynamics of an equal-size principal system (denoted as P, principal), and the quantum circuit is specified in figure 1(b).The measurement of the entanglement state |I⟩⟩ can be transformed into the local measurement of the product state |0⟩ ⊗n by the SPAM circuit as The circuit of learning one system with PQCs.We denote the circuit in red dashed boxes as the SPAM procedure.
depicted in the red boxes.Furthermore, V * (θ) can usually be set as a layered module with alternating parameterized single-qubit gate layers and fixed two-qubit gate layers, as shown in the inset.The projection measurement probability of the output state On the condition that 1 is targeted, U will be fully learned by V * (θ).Specifically, the learning task is equivalent to solving the following optimization problem, where we can employ the gradient-based method to optimize the parameters, as shown in appendix A.1.
Remarkably, V * (θ) can be efficiently converted into V(θ), therefore the learned U, by reversing all the rotation angles of single-qubit gates and conjugating all the two-qubit gates (appendix A.2).The circuit in figure 1(b) is named as Hilbert-Schmidt test (HST) circuit [27,28], which has been employed in quantum compiling and simulation algorithms [27,28].
In this part, we will further discuss about the setting of 'inaccessible' .Traditionally, an 'inaccessible' system refers to a system whose evolution process is unknown and cannot be directly controlled or manipulated.However, in our protocol, the term does not suggest that the system is entirely unmanageable throughout the entire learning process.Instead, it indicates there exists a period that the direct control is not feasible.Specifically, in the context of the described protocol, even though the evolution process U of System P remains unknown (thus making it 'inaccessible'), there still exists the capability to interacting System P and A. During the initial entanglement preparation and the final entanglement decoding and measurement, Syetem P still allows for specific interactions, such as a CNOT operation with system A, and can be effectively initialized and measured.This can be encountered in practical applications.A prime example is entanglement distribution.When one photon from an entangled pair is transmitted through an unknown quantum channel, the photon navigating this channel can be called an 'inaccessible' system.Another example is encountered in the NV center system.Though the coupling between the electron and nuclear spins is known, the environment information to be detected by the nuclear spin, such as the couplings between the nuclear spin and the other nuclear spins, is unknown.However, the influence from the other nuclear spins is weak and can be ignored during the entanglement generation.In this scenario, the 'inaccessible' system specifically refers to the nuclear spin, which are situated within a partial known environment [29].
The complexity of executing the protocol is linear with the iteration times.Within each iteration, the gates queried in PQCs V * (θ) is usually set as polynomial to the qubit number n and the circuit depth L. As the gradient-based method is taken for optimizing f(θ), evaluation of each component of the gradient with accuracy 1 − ϵ requires repeating the circuits for about 1/ϵ 2 times by the central limit theorem, therefore, the overall complexity of a single iteration is O(poly(n, L)/ϵ 2 ).
The limited expressive capability of PQCs is one factor causing the learning error and has brought about a lot of discussions [30,31].Most physical systems primarily exhibit short-range correlations and can be well captured by circuits such as the layered one shown in figure 1(b).For a more general situation, we denote the PQCs' expressive error near U as ε e , that is, the optimal θ o satisfies 1 Besides, we denote ε t as the training truncation threshold, that is, the optimization ends up with θ ′ and 1 And the protocol will finally output a result with a reasonable accuracy 1 − ϵ that meets ϵ e < ϵ < ϵ t .

Noise resilience analysis
Recent research implies that hybrid methods sometimes exhibit robustness to specific types of noise [32,33], and our protocol is of no exception.We analyze its resilience property to three types of noise.We denote E as the noisy quantum channel and ρ a d-dimension quantum state that E acts on.

Definition 1.
Global depolarization E D that can occur anywhere in the circuits.By 'global' , we mean that every Pauli terms P i ∈ {{I, σ x , σ y , σ z } ⊗n − {I ⊗n }} occurs in an equivalent probability.Mathematically, it acts as where I = I ⊗n is the d-dimension identity and p measures the depolarization strength.
Definition 2. SPAM noise, which can occur in both P and A in the red boxes of figure 1(b).In general, Pauli X noise is the dominant noise in the SPAM procedure.Here, we generalize it to any specific Pauli noise and define it as SPAM noise.The SPAM noise E P acts as where P is a specified and fixed Pauli operation, that is, P remains unchanged in each experiment.Usually, it describes the noise from the quantum device or quantum control in the SPAM procedure.
Pauli X noise generally dominates the SPAM procedure is because gate error usually be much smaller than SPAM error(especially the measurement error) in many quantum computing systems, like superconducting [1,2], neutral atom [34], and the NV center [35] used in this work.Definition 3. Random unitary noise (RUN) E RP that occurs between each adjacent two layers in the PQCs V * (θ).We model this type of noise as where E i is unitary, K can be any integer between 0 and 4 n − 1, and K i =0 p i = 1 for normalization.Assume that θ o is the parameter configuration that optimizes f(θ) under noise-free condition, and θ ′ is the one under noise condition.The identity between θ o and θ ′ for noise models, (i) the global depolarization and SPAM noise, and (ii) RUN that accompanies the random compiling technique in V * (θ), are concluded in propositions 1 and 2, respectively.Proposition 1. Global depolarization will not change the optimal point of the loss function of the circuit in figure 1(b).Besides, assuming the SPAM noise occurs with strength p 1 for the preparation procedure and p 2 for the measurement procedure, the optimal point is invariant if The RUN between layers in V * (θ) can be mitigated by introducing extra Haar random unitary operations U r , U r † between adjacent layers in V * (θ) and evaluating the loss function in an average way.
That is, we can get the same optimal point and, therefore, the right learning result of the circuit in figure 1

Experiments with an NV center in diamond
To demonstrate our protocol, we perform experiments with a spin system associated with a single NV center in diamond.As shown in figure 2(a), an NV center consists of a substitutional nitrogen atom and an adjacent vacancy in the diamond crystal lattice.With a magnetic field B z applied along the NV symmetry axis, the Hamiltonian of the electron spin and the 14 N nuclear spin is where D = 2.87 GHz is the electronic zero-field splitting, Q = −4.95MHz is the nuclear quadrupolar interaction, S z (I z ) is spin-1 operators of the electron (nuclear) spin, γ e (γ n ) is the gyromagnetic ratio of the electron (nuclear) spin, and A = −2.16MHz is the hyperfine interaction of the two spins.
NV center can be seen as a natural testbed for our protocol.Firstly, different physical spins are involved.The electron spin can be manipulated by fast microwave (MW) pulses with high accuracy, while direct The operational circuit is based on the NV center, in which operations in orange (green) are applied to the 14 N nuclear (electron) spin.Single-qubit gate Rn(θ) denotes a rotation along the n-axis with a degree of θ(in this experiment, n was chosen as fixed x-axis), and the controlled gates are rotations with a direction conditioned on the state of the electron spin [19].The green dashed box is used to obtain the gradients, and the gray box is a process to dephase the electron spin.(c) The energy levels of the two-qubit system, where the arrow in orange (green) represents the RF (MW) pulse for the nucleus (electron).controlling 14 N nuclear spin requires radio-frequency (RF) pulses with long duration and low accuracy.Secondly, coupling exists between the nuclear and electron spins, and readout of both the nuclear and electron spins can be realized by detecting only the electron spin (appendix A.2).At last, apart from the electron and 14 N nuclear spin, there are 13 C nuclei with spin 1/2 randomly distributed around the NV center.Due to the hyperfine interaction, these nuclei will dephase the electron spin and act as a natural dephasing source for noise test (appendix C.3).In our experiments, we use the RF pulse to mimic the environment to be detected by the nuclear spin.
Two sets of experiments are conducted: (I) a typical noise-free learning task and (II) testing noise resilience, that is, the power of the RF pulse applied to the 14 N nuclear spin (marked as n) is learned by the electron spin (marked as e) without/with the natural noise source.Figure 2(b) provides an operational circuit in the NV platform, where a subspace of the system is selected to form a two-qubit system, consisting of four energy levels |m I , m S ⟩ as depicted in figure 2(c).The 14 N nuclear and electron spins are effectively polarized to |1, 0⟩ by optical pumping via the excited state level anti-crossing [36] with the magnetic field strength of 455 G.The single-qubit gates on the electron spin are realized by hard MW pulses, whose frequency is the average of two electron resonance frequencies as shown in figure 2(c).For quantum gates on the nuclear spin, decoherence-protected gates are employed, where the dephasing of the electron spin is suppressed by dynamical decoupling and the nuclear spin is driven by RF during the time interval between π pulses [19].For experimental simplicity, we employ an alternative method to generate the maximally entangled state, which differs from that depicted in figure 1(b).The approach involves applying a rotation gate R x (π/2) followed by a conditional gate R x (±π/2), which is a rotation by π/2 when the electron spin state is |0⟩ and by −π/2 when the electron spin state is | − 1⟩.After implementing the pulse sequence, the value of loss function f can be obtained by measuring the probability of the final state on |1, 0⟩.See appendices C.2 and C. 4 for more details about the decoherence-protected gates and measurement, respectively.
The experimental process follows the sequence: Ω ← α ← θ ← v. On the nucleus side, Ω represents the power of the RF pulse, which is encoded into a local rotation R x (α), that is, α ∝ Ω; on the electron side, a parameterized R x (θ) is realized by an MW pulse, and θ can be tuned and optimized by the voltage v on an IQ modulator.Once the optimization is done, R x (α) = R * x (θ), and we can infer Ω by v from Ω = h 1 v + h 2 , with h 1 , h 2 device-dependent parameters; see appendices C.1 and C.5 for more information.Parameter update is implemented by a gradient-based method, where analytical gradients are evaluated by running two modified operational circuits, one with R x (π/2) inserted in the green dashed box of figure 2(b) and the other with R x (−π/2).We denote the outputs of running the two modified circuits as o 1 = f(θ + π/2) and o 2 = f(θ − π/2), and parameters can be updated as θ ← θ − η∂f(θ)/∂θ, where ∂f(θ)/∂θ = (o 1 − o 2 )/2, with η = 1.5 being the learning rate; see appendix C.6 and [37,38] for more information on the analytical gradients.
(I) We first investigate the case in the absence of noise, which is represented by green diamonds in figures 3(a)-(c).For calculating gradients, o 1 and o 2 are measured and plotted in figure 3(b).After about 10 iterations, o 1 ≈ o 2 , which implies gradients vanishing.This is also evidenced by figure 3(c), where the output fidelity of |1, 0⟩ (loss function) as well as the voltage on the IQ modulator v, both converged with iterations.In this scenario, Ω is learned as 6.1(2) kHz.
(II) To investigate the effect of the noise to our protocol, a dephasing noise is introduced into the gray box of figure 2(b).The dephasing noise naturally comes from the couplings between the electron spin and the randomly distributed 13 C nuclear spins.Since the electron spin is driven into the x − y plane of the Bloch sphere by the first π/2 pulse, the dephasing noise with strength p acts in the same way as depolarization with strength p ′ = 2p on this state (appendix C.7). Besides, depolarization with strength p ′ = 2p in the gray box can be equivalently moved out as dephasing noise with strength p ′ ′ = p ′ /2 = p based on the property of Clifford operations as explained in appendix B.1, so the dephasing noise in the gray box of figure 2(b) plays the role of effective global depolarization that occurs in the same location, and is equivalent to another dephasing noise that occurs before PQCs V(θ).A free induction decay signal of the electron spin is shown in figure 3(a).The dephasing noise level p can be regulated by the idle time and can be quantified as the distance r between the state and the center in the Bloch sphere, that is, as explained in appendix C.7, we have p = (1 − r)/2, and effective depolarization noise level p ′ = 1 − r, or in another viewpoint, dephasing noise with level p ′ ′ = (1 − r)/2 that occurs before PQCs.Two non-zero dephasing levels are selected, partially and completely dephasing, with idle times picked as 0.93 µs (r = 0.27, p = p ′ ′ = 0.365 and p ′ = 0.73) and 6 µs(r = 0, p = p ′ ′ = 0.5 and p ′ = 1), respectively.They are represented by red triangles and blue circles, and the corresponding states are plotted in the inserted Bloch sphere.We select 0.93 µs because it equals to 1/(1.08 MHz) with 1.08 MHz the detuning of the applied MW pulse.For reference, no dephasing (0 idle time, green diamonds) is demonstrated in experiment (I).
The experimental results are shown in figures 3(b) and (c).Though converging values of o 1 , o 2 (therefore the gradient) in figure 3(b), and the noisy output fidelity of |1, 0⟩ (loss function) in figure 3(c) decrease with the increment of the dephasing level, the optimal voltage v, with a decelerated convergence, stays invariant as shown in the solid line in figure 3(c).The learned pulse power Ω of the two noisy experiments are estimated as 6.2(4) kHz and 6.8(4) kHz (the reference value in noise-free Experiment (I) is 6.1(2) kHz).As a comparison, an additional benchmarking experiment is conducted (appendix C.5), from which we extract v = 0.1792(14) V (black dashed line) and Ω = 6.52 (7) kHz.Therefore, the learning accuracy can always be kept over 90% with a depolarization that occurs in the gray box in figure 2 with noise strength ranging widely from 0 to even 1.Remarkably, the depolarization noise here is equivalent to a dephasing noise before PQCs, which is a particular SPAM noise.That is, this experiment verifies the high robustness of our protocol to SPAM noise as discussed in appendix B.1.
Notice that there are still experiment errors between the pulse power from learning and that from benchmarking, these errors are accounted for in two aspects.The principal one comes from ∼96.4% imperfect polarization of the electron spin via laser pumping.Next, throughout the experiments of ∼133 µs, the electron spin suffers a longitudinal relaxation characterized by T 1 ∼ 4.5 ms.These effects cause the fidelity to be less than 1, and a ∼95.0%fidelity is simulated according to the data, as shown in the pink dashed line of figure 3(c).As for the rest ∼2.3% (max ∼92.7% in the experiment), we attribute it to errors such as the detuning of MW (1.08 MHz), the magnetic field drift, and the leakage of MW and RF.

Numerical simulations
Besides the experiment, we also conduct numerical simulations to complete the verification of propositions 1 and 2. First of all, we assume that all the noise occurs outside the quantum gates.That is, for a L-depth quantum circuit and the quantum state ρ, the noise channels act as where E i denotes the noise channel between ith and (i − 1)th layer, U i denotes the super-operator of the ith layer quantum gates, and • denotes the sequential action of quantum gates.
To verify the noise resilience property to the global depolarization and SPAM noise in proposition 1, we numerically simulate the circuit in figure 4(a).Both one and two-qubit cases are simulated, where the target dynamics U are chosen randomly from Haar distribution.For one-qubit case, the layered PQC is set as e −iσyα e −iσzβ e −iσyγ .For the two-qubit case, the PQC is set as a three-layer circuit, where each layer is configured as one e −i σ 1 z σ 2 z γ and parameterized single-qubit gates.We suppose the global depolarization (DN) occurs throughout the entire circuit with probability p D , and the noise happens outside the quantum gates.The SPAM noise, which is in the form of Pauli operators, can be moved in or out of the dashed red boxes in figure 1(b) to become another Pauli operator, as there are only Clifford gates within the red boxes.Here, we denote the two parts of the SPAM noise as SPAM 1 with probability p 1 and SPAM 2 with probability p 2 , and both of them take the Pauli X form.
Specifically, we simulate the circuit in figure 4(a) with three different noise scenarios and three different noise level settings in each scenario.To verify the noise resilience to the global depolarization in the first scenario, p D is set as 0.01, 0.03 and 0.05, while p 1 and p 2 are set as zero.To verify the noise resilience to SPAM noise, the second and third scenarios are considered.In the second scenario, p 1 is set to the same value as p 2 , which is chosen to be 0.02, 0.04 and 0.06, while p D is set as a relatively small value of 0.02.In the third scenario, p 1 = 0 and p 2 is set as 0.02, 0.04 and 0.06, while p D is also set as 0.02.In the simulation of each noise level in the three scenarios, we run the protocol for 20 times.By evaluating the noisy loss function f, which is defined as the probability of the output state on the initial state, and the learned gate infidelity we plot the results of these three scenarios in figures 5(a)-(c), respectively.The subscripts 1, 2 of the figures refer to the system size.From figures 5(a)-(c), we can see that although the converging values of the noisy loss function f decay with the noise level increasing (dashed lines), the learned gate infidelity always approaches 0 (solid lines).This result is in accord with the analysis in proposition 1.
To verify the noise resilience to the RUN in proposition 2, we numerically simulate the circuit in figure 4(b).For one-qubit simulation, circuit is set as the previous case.We consider three different scenarios, in which we specify the RUN as σ x , σ y , and σ z , respectively.The RUN level of all the three scenarios is set as 0.2, while other noise sources are ignored in this case.For the two-qubit simulation, the unitary is specified as e −i σ 1 z σ 2 z π/2 and a two-layer circuit is employed.The error formulates as exp(−i π 10 σ 1 x ⊗ I 2 ), exp(−i π 10 σ 1 y ⊗ I 2 ) and exp(−i π 10 σ 1 z ⊗ I 2 ), respectively.As for other parameters, the number of iterations is set as 50, and the learning rate is set as 0.1 in both one and two-qubit cases.In our simulation, we optimize the circuit in figure 4(b) with 10 different configurations of initial parameters, with each parameter chosen randomly from [0, 2π).According to proposition 2, we run the circuits many times for each configuration and insert Haar random unitary operators between adjacent layers in each time.By averaging the loss function among these running times, the noisy loss function and gate infidelity of the 10 configurations are plotted in figure 5(d)).The x-axis of figure 5(d) denotes the running times involved in the average (labeled as 'Evaluation No.').The results show that the learned gate infidelity decays with the Evaluation No., which is comprehensible since more running times means a smaller deviation from the Haar average.In other words, inserting Haar random unitary operators and evaluating the loss function in an average way can help us reduce the infection of RUN, therefore verifying our analysis in proposition 2.

Conclusion and discussion
In conclusion, we introduced a Choi-Jamiolkowski Isomorphism based indirect learning protocol, with the operation complexity O(poly(n, L)/ϵ 2 ) for each iteration.This protocol is suitable for learning the dynamics of an inaccessible system with the help of another easy-controlled system.We theoretically proved that it is robust against several noise models.In proposition 1, the effect by global depolarization and SPAM noise is addressed, while in proposition 2, the RUN in gate implementations can be mitigated by random compiling like technique [39].These noise models are commonly encountered in typical experimental settings.Moreover, to verify the effectiveness of the theory, we demonstrate the protocol on an NV center system experimentally and with one-qubit and two-qubit numerical simulations, and the results show the robustness to the above three kinds of noise.Through theoretical and experimental results, our work is expected to benefit applications such as characterization, validation, and verification for current noisy quantum devices.
Some requirements for validating the protocol are illustrated as follows.Firstly, U is unitary and time-invariant, so it is reliable when repeatedly queried.Secondly, the term 'inaccessible' should be interpreted dialectically.System P and A can be strongly coupled for some time, laying the foundation for using the Choi-Jamiolkowski based method.A review of techniques of coupling two different quantum systems can be found in [40].
The scalability of our protocol is highly related to the expressivity and trainability of PQCs, which have been extensively discussed in current research of variational quantum algorithms and quantum neural networks [30,31,[41][42][43][44][45].Despite the difficulties of scaling a general PQCs, some prior knowledge about U, like the sparsity, locality and symmetry of the generator Hamiltonian, can usually be accessible and used to enhance the PQCs' performance nearby U while preserving a low circuit depth [8,46].It is worth mentioning that a recent work [47] introduces a data quantum Fisher information metric to measure the model performance, which may be adopted here as an indicator for adaptively designing the PQCs' architecture even in the more general case.Furthermore, reinforcement learning, which has been used to design parameterized circuit for ground energy learning (see [48] and appendix A.1), may also benefit our protocol.
This study raises several open questions that warrant further investigation.To begin with, calibration of initial states and measurements, a necessity extensively explored in the standard process tomography [49], is equally crucial here.Specifically, systematic errors in preparing and measuring the state |I⟩⟩ can lead to inaccuracies in learning the evolution process U. Developing a calibration-free Choi-Jamiolkowski based learning protocol is a pivotal challenge.Insights into this problem may be gleaned from the recently proposed gate set tomography [50][51][52].Secondly, the methodology for learning the process U when the ancillary dimension d a does not equal to the principal dimension d p .The case where d a < d p has been addressed in prior research [53], while the converse case d a > d p remains unexplored.Intriguingly, the additional dimensions in the ancillary system for d a > d p can potentially facilitate the learning of a more general completely positive and trace-preserving map, extending beyond mere unitary processes.
Remarkably, we observe that several existing studies have successfully integrated quantum circuits with machine learning techniques to address challenges in quantum tomography.This includes advancements in quantum state tomography [54,55], quantum process tomography [53,56], and gate set tomography [52].Notably, the concurrent studies of [53,56] propose an innovative learning protocol for unitary processes utilizing entangled states.Our research contributes to this field by focusing on the learning of relatively intractable systems.Furthermore, we delve into the effects of noise occurred in the current quantum devices, which is often encountered in practical applications.In each iteration, all the n × L components of the gradient have to be evaluated.The numerical method, which treats the ansatz as a 'black box' , is common for PQCs' optimization.By querying the system with different input parameters, the gradient can be obtained by differential formula as For each component ∂f(α i,j )/∂α i,j , the central limit theorem suggests that ∼1/ϵ 2 runs of PQCs are required to evaluate it, with ϵ the desired error level.Therefore, the complexity of each iteration is limited to O(poly(n, L)/ϵ 2 ).Besides, circuits depth L is usually set as poly(n) in practice, which makes the execution of the protocol feasible.
In addition, the typical algorithm above can be replaced by other methods, such as stochastic gradient descent, mini-batch gradient descent, and momentum-based gradient, which are introduced in classical machine learning research.They are variants of the conventional gradient and can be found in most machine learning tutorials.
With respect to the generation of PQCs, we can leverage a recent protocol based on reinforcement learning [48], as shown in figure A1.The agent and environment play two interacting roles in the field of reinforcement learning.The agent refers to the learning algorithm, while the environment provides rewards and states to the agent after receiving actions.For circuit construction, the current circuit is encoded as a state, and the gate being added to the circuit is encoded as an action.They are represented in a format suitable neural networks, typically labeled as a list of numbers.Starting with an empty list, the agent is supposed to construct the circuit step by step.At each time step, it receives the state and reward from the environment and provides the corresponding action.In [48], the learning process is illustrated using a typical q-learning framework (algorithm 2).The agent is realized by a neural network, utilizing a double deep-Q network with an ε-greedy policy and an ADAM optimizer.Their approach, which focuses on learning the ground-state energy of lithium hydride, can serve as a proposal for generating PQCs in our work.

Algorithm 2
Input: a policy Q(s, a), where s is a state and a is an action.

A.2. Remarks on the protocol
Here, we re-emphasize some assumptions that make our protocol more practical and efficient.i) the efficiency of PQCs architecture with alternated single and entanglement gates layers, ii) the strong coupling exists in the composite quantum system of A and P, iii) how to get approximated U with the optimized PQCs V * (θ).
Remark.The finite expressivity of the PQCs has caused a great deal of discussion in recent research [30,31].Luckily, most gapped physical systems involve only short-range correlations [57], which can be well approximated by matrix product states (MPS) or projected entangled pair states (PEPs), therefore, efficiently captured by the PQCs [58].
Remark.The logicality of assumption that A and P can be entangled is based on the observation of real physical systems.For example, strong coupling exists in the hybrid system, like the system of the cavity-QED and nucleus-electron system; distributed computing has been realized between two quantum chips [59][60][61].Besides, even if unknown system P that U acts on is hard to access, we can measure the output ρ o by swapping the states of P and A before the measurement.This can be realized by a SWAP gate, guaranteed by the coupling assumption between P and A. And the operation complexity is simply doubled.NV center in diamond is a natural platform to implement this scheme as it permits SWAP and subsequent measurement on the electron spin.
Remark.U can be reconstructed once V * (θ) ∼ U * is determined.The parameterized quantum circuits are constructed with only fixed two-qubit gates and single-parameterized rotation gates: where T k denotes the fixed two-qubit gates in the kth layer, and S k,j denotes the parameterized single-qubit rotation gate in the kth layer that acts on the jth qubit with rotation angle θ k,j .In general, the two-qubit gates are chosen as CONTROL-NOT, CONTROL-Z, SWAP, or iSWAP.The complex conjugate of the former three are themselves, and of the last one is itself with an extra minus, which can be easily eliminated by introducing a π phase in the corresponding two qubits.Besides, since (e −iσxθ ) * = e iσxθ , (e −iσyθ ) * = e −iσyθ , and (e −iσzθ ) * = e iσzθ , the singlequbit gate S k,j (θ k,j ) can also be easily realized with the optimized S * k,j (θ k,j ).Overall, U can be reconstructed with knowing V * (θ) by replacing each gate in equation (A2) with its corresponding complex conjugate.

Appendix B. Proofs of noise resilience
We suppose θ ′ and θ o to be the optimal points for f ′ (θ) and f(θ), which are the loss functions with and without noise, respectively.In this section, we prove the two propositions in the main text.

B.1. The proof of proposition 1
This proposition considers the global depolarization and SPAM noise.In the following, we provide proofs of resilience to the two noise models separately.

Result 1.
Global depolarization has no effect on optimizing the loss function.
as the L-depth PQCs without noise, and as the noisy PQCs with noise channel E acting between layers.The superoperator V i (θ i ) denote the ith layer of the parameterized circuits as shown in figure 1(b) of the main text, which acts as Similarly, the superoperator of the target unitary U is represented as U that acts as U(ρ) = UρU † .By this, the ideal loss function for the L-depth PQCs at θ is in which aver ρ denotes averaging over an arbitrary quantum pure state selected uniformly and randomly from the n-qubit Hilbert space.Notice that the loss function in equation (B1) is exactly the loss function defined in equation ( 1) and equivalent to the loss function in equation ( 4).So here and in the following in this appendix, we will focus on equation (B1).The noisy loss function f ′ (θ) under global depolarization noise formulates as where in the third and fourth line, we have explicitly written down the action of the Lth (5).The term in the fourth line equal to the term in the sixth line results from all the E and V i (θ i ) being unital channels that keep I invariant.Similarly, we get the seventh line and the result in the last line.
We can see that, as long as the output result remains quantum property and detectable, f ′ (θ) and f(θ) will share the same optimum point.Quantitatively, if is the measurement error of the cost function and ϵ is the required error tolerance of θ o , the output signal of f ′ will determine θ o with accuracy 1 − ϵ.
Result 2. For the SPAM noise with p 1 the noise probability in the state preparation procedure and p 2 the noise probability in the state measurement procedure, the optimal point of the loss function of the n-qubit circuit in figure 1 The SPAM procedure involves state preparation procedure and state measurement procedure as shown in the two red boxes in figure 1(b), and the procedure consists of only Clifford gates.If we denote C as a Clifford operation and P as a Pauli operator, we will have with P ′ = CPC † being another Pauli operator.That is, the Pauli operator that occurs in the SPAM procedure can be moved out and replaced by another Pauli operator that lies between the SPAM procedure and PQCs, and vice versa.So the SPAM noise resilience condition within the SPAM procedure is also suitable for the specific Pauli noise between the SPAM procedure and PQCs.For the same reason, the noise we have introduced in our experiment in figure 2(b) can be equivalently moved out and thus plays the role of global depolarization in the entire circuit.First of all, we suppose that both P and A are single-qubit systems, and then we generalize the result to the multi-qubit case.If we denote E 1 (•) and E 2 (•) as the noisy state preparation and state measurement channel, their action on an arbitrary single-qubit state ρ can be specified as where For the single-qubit case, the noisy loss function f ′ (θ) is with d = 2.In the second line in equation (B5), we have used the following fact, where P 2 is Hermitian.In the third line of equation (B5), we have used the results of unitary design as follows, with {ξ k (θ)} specified as In fact, the SPAM noise can be modeled as some specific Pauli operators.Without loss of generality, we suppose it to be σ x , and the result is suitable for other Pauli operators.That is, we can take P 1 = P 2 = σ x and thus ξ 1 = ξ 4 , ξ 2 = ξ 3 .Then we have where At the noise-free optimal point θ o , f(θ o ) = 1 is in the maximum and g(θ o ) = 0 is in the minimum.To ensure that θ o still be the optimal point of noisy f ′ (θ), it is required that the decreasing speed of the first term in equation (B9) is bigger than the increasing speed of the second term when θ deviates from θ o .To quantify the variation behavior of f ′ (θ) deviating from θ o in the vicinity of θ o , we can always suppose that where n = n x êx + n y êy + n z êz is a unit vector and β = θ − θ o is small.Therefore, f(θ) can be simplified as and g(θ) can be simplified as which varies in the fastest speed when n = êx .In other words, deviation along the x-axis makes both f and g vary in the fastest way.Besides, the gradient ∂ θ f = −∂ θ g, so f and g share exactly the antipodal behavior nearby β ∼ 0. To ensure that θ o is still the optimal point of noisy f ′ (θ), we only require that the weight of f(θ) is larger than that of g(θ), that is, Approximately, we require that p 1 ∼ p 2 < 1 2n or p 1 ∼ p 2 > 1 − 1 2n , which means p 1 and p 2 being smaller than 1/2 or p 1 and p 2 being larger than 1/2 are both acceptable.This can be comprehended well: the first case that both p 1 and p 2 are smaller than 1/2 is like a weak noise limit, while the latter case that both p 1 and p 2 are larger than 1/2 is like a purely coherent σ x noise limit, in which p 1 = p 2 = 1 and f ′ is equivalent to the noise-free f.
The multi-qubit SPAM noise can be analyzed in the same way.We suppose that all the qubits share the same single-qubit SPAM noise probability p 1 for state preparation and p 2 for state measurement, that is, for an n-qubit state ρ, the noise acts as where Similar to the single-qubit case, we have the multi-qubit noisy loss function as ) has the same form as with Σ i1 and Σ i2 being some Pauli terms in {σ Similarly, at the noise-free optimal point θ o , we have f(θ o ) = 1 and g i (θ o ) = 0.In the fastest deviation direction of each g i (θ), g i (θ) behaves in an exactly antipodal way with f(θ), so we have the approximation that f ∼ cos 2 (β/2) and g i ∼ sin 2 (β/2) nearby β ∼ 0. To ensure that θ o is still the optimal point of the multi-qubit cost function f ′ (θ) under SPAM noise, we require that the weight of first term in equation (B15) is larger than the summation of weights of all other terms, that is Consequently, when the parameters vary from the ideal optimal point, all the g i increase in a speed no more than the decreasing speed of f.In the large n limit, we have 1 Combining the two results above, proposition 1 is proven.

B.2. Remarks on proposition 1
In this section, we provide a detailed discussion of our numerical and experimental validations for proposition 1.
The proposed method is resilient to depolarization error with strength smaller than 1 − (ϵ M /∆ o ) 1/L .This implies that as long as depolarization is less than 1, which suggests the quantum state is not entirely disrupted, accurate unitary operations can be learned.This is based on the nature of depolarization as an averaged form of noise.In the numerical simulations, f can be measured with high accuracy (ϵ M ∼ 0) (some data inserted) and we set ∆ o to be finite, so the robustness bound of p can be near 1!In detail, we have tested the cases p = 0.01, 0.03 and 0.05, and all of them show robustness to depolarization noise, as proven above.
For the SPAM noise, we have set p 1 = p 2 = 0.02, 0.04, 0.04, and p 1 = 0, p 2 = 0.02, 0.04, 0.06 for both one and two-qubit in our numerical simulations in figure 5 of the main text.Based on the proofs above, we can derive that the error tolerance is 1 − p 1 − p 2 + 2p 1 p 2 > 1/2 for single-qubit learning task and 1 − p 1 − p 2 + 2p 1 p 2 > 1/ √ 2 ≈ 0.707 for two-qubit task.In the cases that p 1 = p 2 , the robustness bound is p 1 = p 2 < 0.5 for single-qubit task, and p 1 = p 2 < 0.178 for two-qubit task.In the cases that p 1 = 0, the robustness bound for p 2 is p 2 < 1 − 2 −1/n , that is p 2 < 0.5 for single-qubit task, and p 2 < 0.292 for two-qubit task.All the settings in our numerical simulations are lower than the corresponding bounds.Therefore, all the results show robustness to SPAM noise as shown in figure 5 of the main text.
In our NV-based experiments, we test the robustness to the depolarization noise that occurs in the gray box of figure 2 of the main text.Given the equivalence between depolarization and dephasing noise for quantum states in the x − y plane, and considering that dephasing noise commutes with the subsequent CNOT gate, it can be equivalently moved in front of the PQCs and be interpreted as a specific type of SPAM noise.Therefore, we equivalently demonstrate a learning task that shows resilience to this particular SPAM noise.Notably, our experiments reveal that the system's dynamics can be accurately learned even at a noise level p = 0.5, as illustrated in figure 3(c).This robustness results from the bound condition in equation (B13), even in the most adverse scenarios.In our experiments, the dephasing noise takes Pauli σ z while V(θ) is a rotation along x, therefore, the corresponding g(θ) = |Tr[e −iθσx/2 • σ z ]| 2 = 0.In this case, we have f ′ (θ) = (1 − p)f(θ) and the right results can be learned as long as p < 1.

B.3. The proof of proposition 2
Proof.The noise-free loss function is also f(θ) in equation (B1) with θ o the noise-free optimum point.As the counterpart, the noisy loss function is where p (k) i k specifies the probability of unitary E i k that occurs in the kth layer.In the third line, we denote E i k (ρ) as the super operator E i k ρE † i k , and in the fourth line, we re-express the trace of the density matrix as the square of quantum state overlap.Obviously, when we take K = 4 n − 1 and p (k) K = p 0 for each k, we can reproduce global depolarization and, therefore, the ideal loss function f(θ).
For more general probabilities p (k) i k , Haar random unitary operations U r , U r † are introduced between each adjacent two layers in V * (θ), which formulates f ′ (θ) as By averaging the Haar random unitaries U r , mathematically, we have the integration on U r 1 , U r 2 , . .., U r L over the unitary group with respect to the Haar measure Therefore, we have where we have used the fact that ˆdµ In conclusion, optimizing f ′ (θ) will produce the same optimal θ o as f(θ) and proposition 2 is proven.In fact, inserting Haar random unitary operators will transform the unitary noise into an effective global depolarization, so it shares the same noise resilience condition with the global depolarization.

C.1. Experimental setup
The experiments are performed with a single NV center in a high-purity diamond sample from element six on a home-built confocal microscopy platform at room temperature.A solid immersion lens (SIL) is etched above the NV center to enhance the fluorescence collection efficiency, and an Ω-type coplanar waveguide line is fabricated around the SIL to deliver the MW and RF pulses efficiently.A green laser (532 nm) is switched on or off by an acoustic optical modulator (AOM) in the double-pass configuration and focused on the NV center through an objective with a numerical aperture of 0.9 and magnification of 100 (MPLFLN100X).The fluorescence above 650 nm is detected by an avalanche photodiode (SPCM-AQRH-45-FC).A permanent magnet is used to generate an external magnetic field along the NV symmetry axis.The strength of the magnetic field is adjusted to be about 455 G, where the electron spin and the 14 N nuclear spin can be efficiently polarized when a laser pulse is applied [36].The MW signals are generated by an analog signal generator (Keysight N5181B) and are modulated by an IQ modulator (HENGWEI MICROWAVE HWIQ0102).The voltages applied on the IQ modulator, which are set by an arbitrary waveform generator (Tektronic AWG5014C), determine the amplitudes of MW.In comparison, the RF signals with different frequencies, phases, and amplitudes are generated directly by the AWG.

C.2. Decoherence-protected quantum gates
We use the subspace of m S = 0, −1 and m I = 1, 0 to construct a two-qubit system.In the doubly rotating frame, with the rotating frequencies denoted as 'MW' and 'RF' in figure 2(c), the Hamiltonian in equation ( 8) can be rewritten as Applying an RF pulse to drive the 14 N nuclear spin directly, the Hamiltonian can be calculated as where ω 1 ≪ |A| is the Rabi frequency.However, the decoherence rate of the electron spin is much quicker than that of controlling the nuclear spin.To overcome this problem, [19] has proposed decoherence-protected gates to realize conditional and unconditional nuclear spin rotations.The pulse sequence is shown in figure C1, where the dephasing of the electron spin, caused by the 13 C nuclear spin bath, is suppressed by dynamical decoupling, while the nuclear spin is driven by RF pulses between the π pulses.On the one hand, when the time interval τ in the dynamical decoupling satisfies τ = 2nπ/|A|, the evolution of the two-qubit system will be U uncon = I ⊗ exp(−i 2τ ω 1 σ x ), which is an unconditional rotation.On the other hand, when τ = (2n + 1)π/|A|, the evolution will be , which is a conditional rotation.

C.3. Dephasing of the electron spin
Around the NV center, spin-1/2 13 C nuclear spins are randomly distributed in the diamond lattice with 1.1 % nature abundance.These 13 C nuclear spins constitute a pure dephasing environment to the NV electron spin.When the 14 N nuclear spin is in the state of |1⟩, the Hamiltonian of the NV electron spin interacting with the 13 C nuclear spin bath can be described as where γ C is the gyromagnetic ratio of the nuclear spin, A j i is the hyperfine interaction strength between the electron spin and the jth nuclear spin, I j i is the nuclear spin operator, and ∆ = |A|/2 is the detuning frequency in the rotating frame described in appendix C.2.The weak interactions between nuclear spins are neglected in the timescales considered here.To measure the dephasing of the electron spin, we can perform the Ramsey measurement, and the readout signal can be calculated as where )τ } are the evolution operators of the jth 13 C nuclear spin when m S = 0 and m S = −1, and τ is the free evolution time.The cumulative product in equation (C4) contributes an exponential decay e −(τ /T * 2 ) 2 , with T * 2 the dephasing time [62].If there is a strongly coupled 13 C nuclear spin, denoted as 's' , equation (C4) will become where we have used U s 1 (τ ) ≈ exp{−i(γ C B z I s z − A s z I s z )τ } at strong magnetic field.We can see that the strongly coupled 13 C nuclear spin contributes a beat to the signal.

C.4. Measurement
Since we want to obtain the possibility of the final state in the |1, 0⟩, we only need the diagonal elements of the final density matrix, labeled as n 10 , n 11 , n 01 , and n 00 , respectively.Firstly, we determine the fluorescence intensity of the four energy levels, PL 10 , PL 11 , PL 01 , and PL 00 , by the experiment sequences shown in figure C2(a).Here we assume the polarization of the electron spin and nuclear spin is perfect.Then we obtain the measurement results of M1, M2, M3, and M4 by implementing four experiment sequences shown in figure C2 we can calculate the population of the four energy levels.

C.5. Benchmarking experiments
To evaluate the learning effect, we implement additional experiments to calibrate the power of the RF pulse applied to the nucleus.We directly drive the nuclear spin, and the Rabi oscillation (see figure C3(a)) is fitted by the function of A cos(2πΩt + ϕ)e −t/T + C. The power is estimated to be Ω = 6.52 (7) kHz, where the uncertainty is calculated from the fitting residuals.We next calibrate the MW power ω e on the electron spin with the voltage v applied to the IQ modulator.For every data point in figure C3(b), a Rabi measurement of the electron spin is performed, and we fit the data linearly as ω e = p 1 v + p 2 with p 1 and p 2 device-dependent parameters.Considering the duration of R x (α) and R x (θ) in figure 2(b), that is, α = ΩT n with T n the duration of R x (α) and θ = ω e T e with T e the duration of R x (θ), we can obtain the relation between Ω and the optimized v as Ω = p 1 T e /T n v +p 2 T e /T n = h 1 v + h 2 , with h 1 = 51.55 kHz/V and h 2 = −2.72 kHz.Based on the relation, we in turn calculate the objective voltage as v = 0.1792(14) V, and we denote it with the gray horizontal line in figure 3(c).
We measure the polarization of the NV electron spin in figure C4 with the method described in [24].We measure the amplitude of the nuclear Rabi oscillation between |1, 0⟩ and |0, 0⟩, with (blue) and without (green) a π pulse on the NV electron spin, respectively.The polarization can be extracted as 0.964 (14).T 1 of the NV electron spin is measured in figure C5 and fitted as 4.5(3) ms.

C.6. Gradients
For simplicity, we assume the loss function is f(θ) = 1/d 2 • |Tr(U obj R θ x )| 2 , in which other quantum gates in PQCs can be absorbed into U obj .To obtain the gradient about θ experimentally, we can use the difference to approximate the differential, as mentioned in appendix A.1.So the gradient can be calculated by [f(θ + δ) − f(θ − δ)]/2δ, with the error of O(δ 2 ).On the one hand, we need δ to be small enough to keep the approximation accurate.On the other hand, δ has to be big enough to make f(θ + δ) and f(θ − δ) distinguishable from the experiment noise.Without the requirement to δ, we use the method of analytical gradient, just like the parameter-shift rule [37,38].We run the circuit twice, once with R In comparison, (C8) so we can see that o 1 − o 2 = ∂f/2∂θ and the method produces precisely the differential of the cost function.

C.7. Equivalence between dephasing and depolarization in the experiments
In our experiments, the noise acts on a non-trivial quantum state ρ that lies on the x − y plane of the Bloch sphere as ρ = 1 2 I + r x σ x + r y σ y .(C9) In the case that ρ suffers a dephasing channel with strength p, we have On the other hand, when ρ is injected into the depolarization channel with strength p ′ , we will have That is, by taking p ′ = 2p, the effect of dephasing to this non-trivial state ρ is equivalent to that of depolarization.

Figure 1 .
Figure 1.Depiction of the indirect learning.(a) The method of Choi-Jamiolkowski isomorphism.(b)The circuit of learning one system with PQCs.We denote the circuit in red dashed boxes as the SPAM procedure.
(b) even under the noise conditions defined above.The proofs are shown in appendices B.1 and B.3.

Figure 2 .
Figure 2. Experimental setups.(a) The structure of the NV center in the diamond crystal lattice (gray), where cyan (green) represents the nitrogen (vacancy).(b)The operational circuit is based on the NV center, in which operations in orange (green) are applied to the14 N nuclear (electron) spin.Single-qubit gate Rn(θ) denotes a rotation along the n-axis with a degree of θ(in this experiment, n was chosen as fixed x-axis), and the controlled gates are rotations with a direction conditioned on the state of the electron spin[19].The green dashed box is used to obtain the gradients, and the gray box is a process to dephase the electron spin.(c) The energy levels of the two-qubit system, where the arrow in orange (green) represents the RF (MW) pulse for the nucleus (electron).

Figure 3 .
Figure 3. Experiment results.(a) Free induction decay signal of the electron spin.The dephasing time, T * 2 , is estimated as 2.27(13) µs, and the beat pattern is due to a nearby strongly coupled 13 C nuclear spin.The green diamond, red triangle, and blue circle represent three idle times, i.e. three different noise levels (p = p ′ ′ = 0, 0.365 and 0.5 and p ′ = 0, 0.73 and 1, respectively.)The corresponding states in the Bloch sphere are sketched in the inset, where the dashed blue line is a schematic description of dephasing.(b) The iterative optimization procedure under three different noise levels.The three noise levels are represented by green diamonds, red triangles and blue circles, respectively, and the dashed and solid lines denote o1 and o2 which are used to calculate the gradients.(c) The fidelity (dashed lines) and the voltage applied to the IQ modulator (solid lines) of each iteration.The gray horizontal line denotes the target voltage obtained from benchmarking experiments in appendix C.5.All error bars are 1 s.d.

Figure 4 .
Figure 4. Noisy circuits.DN denotes the global depolarization.SPAM1 and SPAM2 denote the SPAM noise occurs in the state preparation procedure and in the state measurement procedure, respectively.SPAM1 and SPAM2 are set as Pauli X channel in the simulations.RUN denotes the random unitary noise occurring in the parameterized quantum circuit.

Figure 5 .
Figure 5. Numerical simulations for the global depolarization, SPAM noise, and random unitary noise for single-qubit (the first row) and two-qubit (the second row) cases.We simulate three different noise levels in each scenario, denoted by green, red and blue lines, respectively.We use the Monte-Calro method to simulate the convergence of the loss function and the learned gate infidelity.In noise-free situations, loss function and gate infidelity behave in the same way, but in noisy situations, they behave differently.(a) Simulations for the global depolarization.(b) Simulations for the SPAM noise with p1 = p2.(c) Simulations for the SPAM noise with p1 = 0 and p2 nonzero.(d) Simulations for random unitary noise by inserting random operators.For (a)-(c), the simulations are conducted for 20 times, and the solid lines denote the medium values according to the converging speed and the shaded areas cover 60% of the simulated times.For (d), they are simulated for 10 times, and the solid lines denote the medium values and the shaded areas cover 20%.

1 :
Initialize Q(s, a) 2: while in the training do 3: choose a according to s by Q(s, a), 4: take action, and observe the reward and state 5: update Q(s, a) 6: end while 7: return s

Figure C1 .
Figure C1.The pulse sequence of decoherence-protected gates.

Figure C2 .
Figure C2.Tomography of the diagonal elements.(a) Determination of the fluorescence intensity of the four energy levels.MW represents an unconditional π pulse on the electron spin, and RF is a conditional π pulse on the nuclear spin, with the frequency shown in figure 2(c).(b) Experiment sequences for tomography of diagonal elements.U denotes the sequence shown in figure 2(b).

Figure C3 .
Figure C3.Benchmarking experiments.(a) Rabi oscillation of14 N nuclear spin from which the Rabi power is fitted to be 6.52(7) kHz.(b) Rabi power of the electron spin is approximately linear with the voltage applied to the IQ modulator near the convergence results.The error bars are ±1 s.d.

Figure C4 .
Figure C4.Measuring the polarization of the NV electron spin.