Machine learning for continuous quantum error correction on superconducting qubits

Ian Convy; Haoran Liao; Song Zhang; Sahil Patel; William P Livingston; Ho Nam Nguyen; Irfan Siddiqi; K Birgitta Whaley

doi:10.1088/1367-2630/ac66f9

1. Introduction

The prevalence of errors acting upon quantum states, either as a result of imperfect quantum operations or decoherence arising from interactions with the environment, severely limits the implementation of quantum computation on physical qubits. A variety of methods have been proposed to suppress the frequency of these errors, such as dynamic decoupling [1], application of a penalty Hamiltonian [2], decoherence-free subspace encoding [3], and near-optimal recovery based on process tomography [4, 5]. In addition to these tools for error prevention, there exist many schemes for quantum error correction (QEC) that are able to return the system to its proper configuration after an error occurs [6]. The ability to correct errors rather just suppress them is vital to the development of fault-tolerant quantum computation [7].

An essential feature of QEC is the measurement of certain error syndrome operators, which provides information about errors on the physical qubits without collapsing the logical quantum state. In the canonical approach, QEC is conducted in a discrete manner, using quantum logic gates to transfer the qubit information to ancilla qubits and subsequently making projective measurements on these to extract the error syndromes. However, in contrast to this theoretical idealization of instantaneous projections of the quantum state, experimental implementation of such measurements inherently involves performing weak measurements over finite time intervals [8], with the dispersive readouts in superconducting qubit architectures constituting the prime example of this in today's quantum technologies [9–12]. This has motivated the development of continuous quantum error correction (CQEC) [13–22], where the error syndrome operators are measured weakly in strength and continuously in time.

CQEC operates by directly coupling the data qubits to continuous readout devices. This avoids the ancilla qubits and periodic entangling gates found in discrete QEC, reducing hardware resources. Additionally, the presence of these entangling gate sequences and ancillas introduces additional error mechanisms, occurring in-between entangling gates or on ancillas, that can cause logical errors [20, 22]. On noisy quantum hardware, multiple rounds of entangling gates and ancilla readouts are required to accurately identify the system state⁸ . All of this is also avoided by measuring data qubits directly, as in CQEC.

In addition to quantum memory, CQEC naturally lends itself to modes of quantum computation involving continuous evolution under time-dependent Hamiltonians, such as adiabatic quantum computing [23] and quantum simulation [24]. Given that the Hamiltonians considered generally do not commute with the error operators, the action of an error induces spurious Hamiltonian evolution within the corresponding error subspace until the error is ultimately diagnosed and corrected, resulting in the accrual of logical errors [21]. CQEC can effectively shorten the spurious evolution time in the error subspaces, and therefore increase the target state fidelity in quantum annealing.

Previous theoretical work on CQEC has focused primarily on measurement signals that behave in an idealized manner [19–21], such that each sample is assumed to be i.i.d. Gaussian with a mean given by one of the syndrome eigenvalues. However, in real dispersive readout signals we observe a wide variety of 'imperfections' caused by hardware limitations and post-processing effects, which can lead to more complicated syndrome dynamics or significant alterations to the noise distribution. A well-calibrated CQEC protocol should be designed to take into account any significant non-ideal behavior for a given architecture. However, it is often difficult to generate a precise mathematical description of the imperfections present in real measurement signals.

Machine learning algorithms offer a solution to this problem, as they can be optimized to solve a task by looking directly at the relevant data instead of relying on hard-coded decision rules. Highly expressive models involving multiple neural network layers have proven to be particularly effective at solving complex tasks such as image recognition and language translation [25]. The recurrent neural network (RNN) is a popular sequential learning model, because it operates on inputs of varying length and provides an output at each step. After being trained on a set of non-ideal measurement signals, an RNN can function as a CQEC algorithm by generating probabilities which describe the likelihood of an error at a given time step. Most importantly, the flexibility of the algorithm allows it to handle imperfections in the signal that would otherwise be impractical to model.

In this paper we investigate the performance of an RNN-based CQEC algorithm which acts on measurement signals with non-ideal behavior. We emphasize here active correction, in which errors are corrected during the experiment as soon as they are observed. To quantify the benefits of using a neural network, we compare the RNN to a conventional double threshold scheme as well as to a discrete Bayesian classifier. The first threshold scheme for CQEC was by Sarovar et al [16], who used the sign of the averaged measurement signals (i.e., a threshold at zero) to identify the error subspace. This filter was improved upon in Atalaya et al [19] and Atalaya et al [21], as well as in Mohseninia et al [20], by adding a second threshold to better detect errors that affect multiple syndromes. We chose to compare our RNN model to the threshold scheme in [21], since it had superior performance in numerical tests (see appendix F).

The remainder of the paper is structured as follows. Section 2 reviews the three-qubit bit-flip code that will be used to evaluate the three models, and outlines the idealized mathematical formulation of CQEC. In section 3 we use physical experimental data to characterize the imperfections that are present in typical superconducting qubit signals. We find that the noise possesses a significant amount of auto-correlation, while the syndromes demonstrate complex transient behavior after every bit-flip, as well as drift of the mean values over time. Section 4 then describes in detail the double threshold, discrete Bayesian, and RNN-based models that we will be comparing. In section 5 we test the error correction capabilities of the models using four different sets of synthetic data, each displaying a different characteristic feature or set of features of non-ideal behavior. We show that the RNN is able to outperform the double threshold across all synthetic experiments, achieving results comparable to those of the Bayesian model. Section 6 summarizes our findings and proposes directions for future work.

2. Background

We exemplify our CQEC protocol by operating it on the three-qubit bit-flip stabilizer code; in general, the protocol works with any QEC codes. The three-qubit bit-flip stabilizer code encodes the logical states |0⟩ and |1⟩ into |0⟩_L = |000⟩ and |1⟩_L = |111⟩, respectively, where the stabilizer generators are chosen to be S₁ = Z₁ Z₂ and S₂ = Z₂ Z₃, which also serve as the error syndrome operators. The states |000⟩ and |111⟩ span the code subspace, in which the syndromes have values (S₁ = +1, S₂ = +1). The (S₁ = −1, S₂ = +1), (S₁ = −1, S₂ = −1), (S₁ = +1, S₂ = −1) subspaces are known as the error subspaces, which are spanned by the basis states {|011⟩, |100⟩}, {|010⟩, |101⟩} and {|001⟩, |110⟩}, respectively. A logical error in quantum memory, i.e., when there is no Hamiltonian evolution, is an error attributed to the logical X operator, X_L = X₁ X₂ X₃.

In the continuous operation of the three-qubit bit-flip code, the error syndrome operators S_k, k = {1, 2} are continuously and simultaneously measured to yield the following idealized signals for each S_k as a function of time t:

$\begin{equation}{I}_{k}(t)=\sqrt{{{\Gamma}}_{m}^{k}}\,\text{tr}[{S}_{k}\rho (t)]+{\xi }_{k}(t).\end{equation} \tag{ 1 }$

Here ρ(t) is the density matrix of the three physical qubits and ${{\Gamma}}_{m}^{k}$ is the measurement strength that determines the time to sufficiently resolve the mean values of the syndromes under constant variance. Specifically, $1/{{\Gamma}}_{m}^{k}$ is the time needed to distinguish between the eigenvalues of S_k with a signal-to-noise ratio (SNR) of 1⁹ . In the Markovian approximation, ξ_k(t) is Gaussian white noise, i.e., $\xi (t)=\dot{W}(t)$ where W(t) is a Wiener process, with a two-time correlation function ⟨ξ_k(t)ξ_k'(t')⟩ = δ_kk' δ(t − t'), where the ⟨⋅⟩ denotes average over an ensemble of noise realizations. In the continuous operation, the observer receives noisy voltage traces with means proportional to the syndrome operator eigenvalues and variances that determine the continuous measurement collapse timescales. Monitoring both error syndromes with streams of noisy signals represents a gradual gain of knowledge of the measurement outcome to diagnose bit-flip errors that occur. We shall refer to the parity of I_k(t) as even or odd depending on whether the mean value of I_k(t) is positive or negative. In an actual experiment we will only have access to the averaged signals taken at discrete time steps separated by Δt, which we denote by I_k,t at time step t :

$\begin{equation}{I}_{k,t}=\sqrt{{{\Gamma}}_{m}^{k}}\;\text{tr}[{S}_{k}\rho (t)]+\frac{{\Delta}W}{{\Delta}t},\end{equation} \tag{ 2 }$

where ${\Delta}W\sim \mathcal{N}(0,{\Delta}t)$ . We shall assume that ρ(t) only changes due to bit-flips at the beginning of each time step Δt for very small Δt.

In previous work, reference [20] compared the performance of a linear approximate Bayesian classifier and the double threshold model with one threshold fixed at y = 0 and another threshold at y > 0 in correcting the three-qubit bit-flip code for quantum memory. Reference [21] analyzed the double threshold model with two varying thresholds in correcting the three-qubit bit-flip code, and applied it to quantum annealing under bit-flip errors X_q with which the chosen annealing Hamiltonian does not commute. In the current work, we shall study the performance of machine learning algorithms both in quantum memory and in quantum annealing.

The stochastic master equation (SME) [8] governing the evolution of ρ(t) under measurements with a finite rate of information extraction implied by equation (1) in the presence of bit-flip errors is given by [16, 21]

$\begin{equation}\dot{\rho }(t)=-\mathrm{i}[H(t),\rho ]+\sum\limits _{k=1,2}\left[\frac{{{\Gamma}}_{\phi }^{k}}{2}\left({S}_{k}\rho {S}_{k}-\rho \right)+\sqrt{{{\Gamma}}_{m}^{k}}{\xi }_{k}(t)\left(\frac{{S}_{k}\rho +\rho {S}_{k}}{2}-\rho {\langle {S}_{k}\rangle }_{\rho }\right)\right]+\sum\limits _{q=1,2,3}{\gamma }_{q}\left({X}_{q}\rho {X}_{q}-\rho \right).\end{equation} \tag{ 3 }$

The first term describes coherent evolution of the three-qubit state under a Hamiltonian H(t), which can, for instance, be a quantum annealing Hamiltonian. The second term describes the back-action induced by the simultaneous continuous measurement of the error syndrome operators S₁ and S₂ on the three-qubit state, where ${{\Gamma}}_{\phi }^{k}$ is the measurement-induced ensemble dephasing rate of the corresponding error syndrome operator S_k. The measurement strength ${{\Gamma}}_{m}^{k}$ , is related to the detector efficiency η_k as ${{\Gamma}}_{m}^{k}=2{{\Gamma}}_{\phi }^{k}{\eta }_{k}$ . The first two terms can be obtained by substituting operators c_k ∝ S_k into the general SME $\mathrm{d}\rho =-\mathrm{i}[H,\rho ]\mathrm{d}t+{\sum }_{k}(\mathcal{D}[{c}_{k}]\rho \,\mathrm{d}t+\sqrt{{\eta }_{k}}\mathcal{H}[{c}_{k}]\rho \,\mathrm{d}W)$ . The third term describes the decoherence of the three-qubit state in the presence of bit-flip errors, with γ_q, q = {1, 2, 3} denoting the bit-flip error rate of the qth physical qubit. While the idealized measurement signals mentioned above assume no effect induced by physical experimental apparatus in the qubit readouts, there are various imperfections of the measurement signals in practice that make the error diagnosis more challenging. We shall first present the characteristics of these measurement signals from physical experiments below and explain their implications for our purpose.

3. Problem setup

3.1. Characteristics of CQEC measurement signals

The superconducting qubits are monitored using voltage signals from homodyne measurements of the parity operators that are derived from tones reflected off the resonator (see appendix A). The resonator signal is fed into a Josephson parametric amplifier (JPA) in order to increase the signal strength without adding a significant amount of noise. The amplified radio frequency signals are then demodulated and digitized. After a further digital demodulation, the signals are processed with an exponential anti-aliasing filter with a time constant of 32 ns. This filtered signal, which is averaged in Δt = 32 ns bins, is then streamed from the digitizer card to the computer.

Due to the effects of the amplifier and resonator, we expect that measurements performed on such real superconducting devices will deviate from the idealized behavior predicted by equation (1). In particular, we can anticipate the following three imperfections:

(a)
The noise will possess a high degree of positive auto-correlation at short temporal lags due to the narrow low-pass bandwidth of the JPA and anti-aliasing filter.
(b)
When a bit-flip occurs, the syndrome means will change gradually rather than instantaneously as the resonator reaches its new steady state. These periods are referred to as resonator transients to stress their temporary nature, and arise because of time-dependent changes in the measurement strength ${{\Gamma}}_{m}^{k}$ (see appendix D).
(c)
The values of the syndromes will drift over time due to small changes in experimental conditions (e.g. temperature). Unlike the other imperfections, this effect is only noticeable when comparing across quantum trajectories rather than within them.

These non-ideal behaviors in the measurement signals extracted from our typical physical experiments will be incorporated into our simulated experiments in section 5.

Figure 1 shows experimental dispersive readouts taken from three transmon qubits [26] over the span of 6 μs [22]. The blue and orange lines are a record of the outputs from the two resonators, each measuring a different pair of qubits for their syndromes. The top figure shows the measurement signals from a single experiment, which contain large amounts of auto-correlated noise. During the experiment an X₂ error was injected at 3.0 μs, flipping the system from |100⟩ to |110⟩, but the weak-measurement noise largely obscures its effect on the syndrome values.

To reveal these underlying syndromes, the bottom figure of figure 1 shows an average over the measurements from roughly 47 500 experiments, each initialized to |100⟩ and injected with an X₂ error at 3.0 μs. It takes approximately 2 μs after initialization for the syndromes to reach their steady-state values for |100⟩, as the number of photons in each resonator increases from zero gradually. We ignore this effect in our analysis, as it will only occur once at the start of an experiment. After the X₂ error is injected, the syndromes do not instantaneously jump to a new pair of values but instead enter a transitory period which can include significant oscillations. These transients derive from the time-dependent changes in the measurement rate ${{\Gamma}}_{m}^{k}(t)$ analyzed in appendix D. This period lasts for roughly 2 μs, after which the syndromes stabilize at their new steady-state values for |110⟩.

Depending on the underlying hardware, a measurement signal may be generated on a wide variety of different scales, such as the arbitrary voltage scale in figure 1. To denote a signal generically on any scale, we write the measurement samples as

$\begin{equation}{I}_{k,t}={\bar{S}}_{k,t}+\sqrt{{\tau }_{k}}{\varepsilon }_{t},\end{equation} \tag{ 4 }$

where ${\bar{S}}_{k,t}$ is the scaled mean of the kth resonator at step t, τ_k is the scaled variance of the kth resonator, and ${\varepsilon }_{t}\sim \mathcal{N}(0,1)$ . In this notation, the physical quantities Γ_m and Δt from equation (2) have been absorbed into ${\bar{S}}_{k,t}$ and τ_k.

3.2. Impact of auto-correlations

Unlike the other imperfections, the challenge posed by auto-correlated signal noise can be characterized theoretically. If the Gaussian noise in I_k,t is correlated, then the distribution of noise samples can be parameterized in terms of a covariance matrix Σ whose off-diagonal elements determine the degree of correlation. For simplicity we restrict our analysis to dependencies that are Markovian, such that I_k,t depends only on the preceding measurement I_k,t−1, though our conclusions are not limited to this regime. Using a correlation coefficient of 0 < ρ < 1, the joint Gaussian log-density describing I_k,t and I_k,t−1 is

$\begin{equation*}\begin{aligned}\hfill \mathrm{log}\,p({I}_{k,t}{I}_{k,t-1}\vert {\bar{S}}_{k,t})=-\frac{1}{2{\tau }_{k}(1-{\rho }^{2})}\left[\begin{matrix}\hfill {\tilde{I}}_{k,t}\hfill & \hfill {\tilde{I}}_{k,t-1}\hfill \end{matrix}\right]\left[\begin{matrix}\hfill 1\hfill & \hfill -\rho \hfill \\ \hfill -\rho \hfill & \hfill 1\hfill \end{matrix}\right]\left[\begin{matrix}\hfill {\tilde{I}}_{k,t}\hfill \\ \hfill {\tilde{I}}_{k,t-1}\hfill \end{matrix}\right]+A,\end{aligned}\end{equation*}$

where ${\tilde{I}}_{k,j}\equiv {I}_{k,j}-{\bar{S}}_{k,j}$ denotes the centered signal sample at step j and A is the log of the normalization constant. We shall assume hereafter that the signal has been rescaled such that ${\bar{S}}_{k,j}=\pm 1$ .

The effect of auto-correlations on error correction is best characterized in terms of how it impacts the usefulness of the syndrome measurements. To be more precise, we know that the purpose of each measurement is to provide some information about whether the underlying syndrome value of the state is 1 or −1. When framed in these terms, we can formalize and quantify a notion of measurement 'usefulness' using Bayesian theory, specifically a ratio called the Bayes factor which we denote as ϕ [27]. This factor can be written in log form as

$\begin{equation}\mathrm{l}\mathrm{o}\mathrm{g}\;{\phi }_{k,t}=\mathrm{l}\mathrm{o}\mathrm{g}\;\;p({I}_{k,t}\vert {I}_{k,t-1},{\bar{S}}_{k,t}=1)-\mathrm{l}\mathrm{o}\mathrm{g}\;p({I}_{k,t}\vert {I}_{k,t-1},{\bar{S}}_{k,t}=-1),\end{equation} \tag{ 5 }$

and quantifies how much evidence I_k,t gives about the underlying syndrome value if we have already seen the previous measurement I_k,t−1. The larger the magnitude of log ϕ_k,t the more useful I_k,t is for our task, with its sign simply indicating whether the evidence supports a value of 1 or −1.

Let Q = Σ⁻¹. By making the substitutions σ⁻¹ = Q₂₂ and $\mu ={\bar{S}}_{k,t}-{Q}_{12}/{Q}_{22}({I}_{k,t-1}-{\bar{S}}_{k,t})$ in the unconditional log-densities $-{({I}_{k,t}-\mu )}^{2}/(2\sigma )+A$ , each of the conditional log-densities in equation (5) can be written as

$\begin{equation*}\begin{aligned}\hfill \mathrm{log}\,p({I}_{k,t}\vert {I}_{k,t-1},{\bar{S}}_{k,t})=-\frac{{[{I}_{k,t}-{\bar{S}}_{k,t}-\rho ({I}_{k,t-1}-{\bar{S}}_{k,t})]}^{2}}{2{\tau }_{k}(1-{\rho }^{2})}+A,\end{aligned}\end{equation*}$

where A is again the normalization constant [28]. Expanding the numerator and keeping only the terms that depend on ${\bar{S}}_{k,t}$ gives

$\begin{equation*}\mathrm{log}\,p({I}_{k,t}\vert {I}_{k,t-1},{\bar{S}}_{k,t})\to \frac{{S}_{k}^{2}(\rho -1)+2{\bar{S}}_{k,t}({I}_{k,t}-\rho {I}_{k,t-1})}{2{\tau }_{k}(1+\rho )},\end{equation*}$

where we ignore the other terms since they will cancel when computing log ϕ_k,t. After substituting this representation back into equation (5) we get

$\begin{equation}\mathrm{log}\,{\phi }_{k,t}=\frac{2({I}_{k,t}-\rho {I}_{k,t-1})}{{\tau }_{k}(1+\rho )},\end{equation} \tag{ 6 }$

where the value of log ϕ_k,t depends not only on I_k,t and I_k,t−1 but also on the variance and auto-correlation of the measurements.

To see the impact of the auto-covariance more clearly, we compute the expectation value $\mathbb{E}[\mathrm{log}\,{\phi }_{k,t}]$ with respect to a Gaussian distribution centered on the true syndrome value ${S}_{k,t}^{\prime }=\pm 1$ . Since equation (6) is linear, we can simply substitute in ${S}_{k,t}^{\prime }$ for I_k,t and I_k,t−1 to get $\mathbb{E}[\mathrm{log}\,{\phi }_{k,t}]$ . After taking its magnitude, we have

$\begin{equation}\vert \mathbb{E}[\mathrm{log}\,{\phi }_{k,t}]\vert =\frac{2(1-\rho )}{{\tau }_{k}(1+\rho )},\end{equation} \tag{ 7 }$

which decreases as the value of ρ increases. Equation (7) shows that positive auto-correlation (ρ > 0) in the signal makes each of our measurements less useful than if the noise had been uncorrelated (ρ = 0), which means that it will take longer for us to determine the value of ${\bar{S}}_{k,t}$ at a given measurement strength.

This result can be understood by imagining that ${\bar{S}}_{k,t}$ and I_k,t−1 are competing to determine the value of I_k,t, with smaller ρ favoring ${\bar{S}}_{k,t}$ . The more that ${\bar{S}}_{k,t}$ affects the measurement, the more that the measurement in turn tells us about ${\bar{S}}_{k,t}$ and thus the more useful it is to us. When ρ is large, the value of I_k,t tends to lie very close to the value of I_k,t−1 regardless of whether ${\bar{S}}_{k,t}$ is 1 or −1, and therefore the measurement does not reveal much new information about the syndrome.

4. Models

4.1. Double thresholds

The double threshold protocol from [21] uses two standard signal processing methods, filtering and thresholding, to identify errors. The raw measurement signal is first passed through an exponential filter to smooth out oscillations, and then this averaged value is compared to a pair of adjustable threshold values to determine the state of the system. A slightly different double threshold protocol was proposed in [20], which used boxcar averaging and fixed one of the thresholds at zero.

To estimate the definite error syndromes from the noisy measurements, we first filter the raw signals I_k(t) to obtain corresponding filtered signals ${\mathcal{I}}_{k}(t)$ according to

$\begin{equation*}{\dot{\mathcal{I}}}_{k}(t)=-\frac{{\mathcal{I}}_{k}(t)}{\tau }+\frac{{I}_{k}(t)}{\tau },\end{equation*}$

where τ is the averaging time parameter, and whose discretized version is similar. In the regime where t − t₀ ≫ τ where t₀ is at the last filtered signal reset, ${\mathcal{I}}_{k}(t)$ reads as

$\begin{equation*}{\mathcal{I}}_{k}(t)={\int }_{{t}_{0}}^{t}\mathrm{d}{t}^{\prime }\frac{{\mathrm{e}}^{-\frac{t-{t}^{\prime }}{\tau }}}{\tau }{I}_{k}\left({t}^{\prime }\right).\end{equation*}$

Thresholds for error correction

After filtering the measurement signals, we then apply a double thresholding protocol to the filtered signals ${\mathcal{I}}_{1}(t)$ and ${\mathcal{I}}_{2}(t)$ that is parameterized by the two thresholds Θ₁ and Θ₂, where Θ₁ is the threshold for the −1 value of the error syndromes and Θ₂ is the threshold for the +1 value of the error syndromes. If at least one of ${\mathcal{I}}_{1}(t)$ or ${\mathcal{I}}_{2}(t)$ is found to lie within the interval (Θ₁, Θ₂), we declare to be uncertain of the error syndromes and do not perform any error correction operation. Otherwise, we apply the following procedure, in accordance with the standard approach for error diagnosis and correction. If both ${\mathcal{I}}_{1}(t) > {{\Theta}}_{2}$ and ${\mathcal{I}}_{2}(t) > {{\Theta}}_{2}$ , then we diagnose the error syndromes as (S₁ = +1, S₂ = +1) and accordingly perform no error correction operation. If ${\mathcal{I}}_{1}(t)< {{\Theta}}_{1}$ and ${\mathcal{I}}_{2}(t) > {{\Theta}}_{2}$ , then we diagnose the error syndromes as (S₁ = −1, S₂ = +1) and accordingly perform the error correction operation C_op = X₁. If both ${\mathcal{I}}_{1}(t)< {{\Theta}}_{1}$ and ${\mathcal{I}}_{2}(t)< {{\Theta}}_{1}$ , then we diagnose the error syndromes as (S₁ = −1, S₂ = −1) and accordingly perform the error correction operation C_op = X₂. If ${\mathcal{I}}_{1}(t) > {{\Theta}}_{2}$ and ${\mathcal{I}}_{2}(t)< {{\Theta}}_{1}$ , then we diagnose the error syndromes as (S₁ = +1, S₂ = −1) and accordingly perform the error correction operation C_op = X₃.

In quantum annealing, we note that the error correction operations are applied immediately after the error syndromes are diagnosed to minimize the aforementioned spurious Hamiltonian evolution. The action of an error correction operation C_op, assumed to be instantaneous, changes the three-qubit state ρ(t) according to

$\begin{equation*}\rho (t)\to {C}_{\text{op}}\rho (t){C}_{\text{op}},\end{equation*}$

which applies to other models in our work as well. We note that the parameters {τ, Θ₁, Θ₂} constitutes the minimal set of tunable parameters. When the measurement signals I_k have white noise, their optimal values in minimizing the logical error rate can be obtained by equation (43) in [21] together with numerical optimizations.

We further reset the filtered signals ${\mathcal{I}}_{k}(t)$ to the corresponding initial syndrome value, at the same instant to avoid the transient delay in the filtered signals to reflect the application of the error correction operation on the state. Inherent within any error correction protocol, however, is the implicit assumption that the correction properly removes the error, which may not necessarily be the case if the error was misdiagnosed.

We note that the ${\mathcal{I}}_{k}(t)$ used by the double threshold model in CQEC consists of weighted contributions from every raw signal taken prior to t and after the last correction. The discrete Bayesian model and the RNN-based model that we discuss in this work can both be operated on raw signals, using all historical signals taken prior to a given t. This is in contrast to the projective measurement on ancilla superconducting qubits in discrete QEC that applies a matched filter [29] on raw signals taken only within each detection round.

4.2. Discrete Bayesian classifier

One weakness of the double-threshold scheme is that its predictions are essentially all-or-nothing, since there is no in-built quantity that expresses the model's confidence. This contrasts with probabilistic classifiers, which generate probability values for each prediction class instead of only a single guess. By framing the classification problem in terms of probabilities, we can incorporate our knowledge of the error and noise distributions into our model in a mathematically rigorous manner.

Since each qubit in our system will experience either one or zero net flips after every time step, there are eight different ways that a state can be altered by bit-flips and therefore eight different classes that our classifier must track. We denote each of the possible bit-flip configuration using the state that |000⟩ is taken to by the error, such that |001⟩ denotes a flip on the third qubit, |110⟩ denotes a flip on the first and second qubits, and so on. The goal of a probabilistic error corrector is to accurately determine the probability of all eight 'error states' at time step t given the measurement histories ${\mathcal{M}}_{t}^{k}\equiv {\left\{{I}_{k,{t}^{\prime }}\right\}}_{{t}^{\prime }=0}^{{t}^{\prime }=t}$ . We write this posterior probability as

$\begin{equation}\hat{p}({s}_{t})\equiv p({s}_{t}\vert {\mathcal{M}}_{t}^{1}{\mathcal{M}}_{t}^{2}),\end{equation} \tag{ 8 }$

where s_t ∈ {0, ..., 7} denotes the digital representation of the error state at step t.

In the remainder of this subsection we consider a probabilistic classifier constructed using Bayes' theorem, which makes prediction based on the posterior probabilities of the different basis states at each time step [30]. Starting with the knowledge of the initial state, this model uses a Markov chain and a set of Gaussian likelihoods to update our beliefs about the system conditioned on the specific measurement values that we observe.

The Bayesian algorithm described in this section is derived by assuming that the mean of a given measurement I_k,t is always determined by the state of the system at the end of the time step. This is equivalent to assuming that errors always happen at the beginning of each time step (see section 2). Since our method for generating quantum trajectories follows this assumption, the Bayesian model is theoretically optimal for the numerical tests carried out in section 5 without mean drift or resonator transients. As the length of the step Δt between measurements goes to zero, this algorithm converges to the Wonham filter [31], which is known to be optimal for continuous quantum filtering of error syndromes [32]. This filter is similar to the discretized, linear Wonham filter derived in [20], except that our filter does not rely on first-order approximations of the Markov evolution or Gaussian functions.

Model structure

Using Bayes' theorem, the posterior probability of equation (8) can be rearranged into the recursive form

$\begin{equation}\hat{p}({s}_{t})\propto p({I}_{1,t}{I}_{2,t}\vert {s}_{t}{\mathcal{M}}_{t-1}^{1}{\mathcal{M}}_{t-1}^{2})\sum\limits _{i=0}^{7}p({s}_{t}\vert {s}_{t-1}=i)\hat{p}({s}_{t-1}=i),\end{equation} \tag{ 9 }$

where we assume that the occurrence of an error is independent of any previous measurements and that I_k,t depends on the error state at time t along with past signal values due to auto-correlations.

This recursive expression describes a Bayesian filter which takes prior information about the error state of the system and updates it based on the transition probabilities p(s_t|s_t−1) and measurement likelihoods $p({I}_{1,t}{I}_{2,t}\vert {s}_{t}{\mathcal{M}}_{t-1}^{1}{\mathcal{M}}_{t-1}^{2})$ . The filter can be easily implemented once we have functional forms for these two terms, which we describe next.

Markovian state transitions

The Markovian assumption inherent in p(s_t|s_t−1) is reasonable, given that the net effect of an additional bit-flip error depends only on the error state the system before the error. We assume hereafter that the error rate γ_q is identical for all three qubits, i.e., γ_q = γ. This allows us to model the errors as a Markov chain [33] with an 8 × 8 rate matrix Q given by

$\begin{equation}{Q}_{ij}=\begin{cases}-3\gamma \quad \text{if}\enspace j=i\hfill \\ \ \ \gamma \quad \text{if}\enspace j\oplus i\in \left\{1,2,4\right\}\hfill \\ \ \ 0\quad \text{otherwise},\hfill \end{cases}\end{equation} \tag{ 10 }$

where we define our basis such that index i ∈ {0, ..., 7} corresponds to the error state whose classical binary representation is equal to i, e.g. 5 → |101⟩.

Since Q only gives the rate of transition per unit time, we need to compute the transition matrix J in order to get probabilities for a finite step. This matrix can be derived from Q as

$\begin{equation*}J={\mathrm{e}}^{Q{\Delta}t},\end{equation*}$

where Δt is the length of the time step. Element J_ij gives the probability of transitioning from error state i to error state j across the time step, so we can relate p(s_t|s_t−1) to J as p(s_t = j|s_t−1 = i) = J_ij. Using J, the sum in equation (9) can be evaluated to give probabilities $\tilde{p}({s}_{t})$

$\begin{equation}\tilde{p}({s}_{t}=j)\equiv \sum\limits _{i=0}^{7}\hat{p}({s}_{t-1}=i){J}_{ij},\end{equation} \tag{ 11 }$

which take into account the transitions induced by bit-flip errors during the time step.

Measurement likelihoods

The measurement likelihood $p({I}_{1,t}{I}_{2,t}\vert {s}_{t}{\mathcal{M}}_{t-1}^{1}{\mathcal{M}}_{t-1}^{2})$ describes the probability of generating signal values I_1,t and I_2,t given that the system is in error state s_t and that we had previously measured the values in ${\mathcal{M}}_{t-1}^{1}$ and ${\mathcal{M}}_{t-1}^{2}$ . Since the noise from each syndrome is independent, we can factor the likelihood as

$\begin{equation*}p({I}_{1,t}{I}_{2,t}\vert {s}_{t}{\mathcal{M}}_{t-1}^{1}{\mathcal{M}}_{t-1}^{2})=p({I}_{1,t}\vert {s}_{t}{\mathcal{M}}_{t-1}^{1})p({I}_{2,t}\vert {s}_{t}{\mathcal{M}}_{t-1}^{2})\end{equation*}$

with I_1,t and I_2,t contributing independently to the probability.

If the noise source is assumed to be Gaussian, then the probability density for each I_k,t has the form

$\begin{equation*}p({I}_{k,t}\vert {s}_{t}{\mathcal{M}}_{t-1}^{k})=\frac{1}{2\pi {\sigma }^{2}}\,\mathrm{exp}\left[\frac{-{({I}_{k,t}-{\mu }_{k,t})}^{2}}{2{\sigma }^{2}}\right],\end{equation*}$

where μ_k,t and σ² are the mean and variance of the signal conditioned on the past measurements ${\mathcal{M}}_{t-1}^{k}$ . In practice the auto-correlations rapidly decay, so we only need to condition on a small number of recent measurements. Hence, we let m_k,t−1 be the vector of these measurements, and let c be the vector of their corresponding covariance values. Then

$\begin{equation}{\mu }_{k,t}={\bar{S}}_{k,t}+{c}^{T}{{\Sigma}}^{-1}({m}_{k,t-1}-{\bar{S}}_{k,t}\vec{1}),\end{equation} \tag{ 12 }$

$\begin{equation}{\,\sigma }^{2}=\frac{\tau }{{\Delta}t}-{c}^{T}{{\Sigma}}^{-1}c,\end{equation} \tag{ 13 }$

where $\vec{1}$ is a vector of ones with the same dimension as m_k,t−1, Σ is the covariance matrix of the variables in m_k,t−1, and ${\bar{S}}_{k,t}$ is the mean corresponding to error state s_t [28]. Since the system always begins in the coding subspace, each error state maps to a definite error subspace and therefore has definite syndrome values regardless of how the logical state was initialized.

After the measurement pair I_k,t is received, the Gaussian likelihood functions are used to convert the probabilities from equation (11) into the next posteriors $\hat{p}({s}_{t})$ as

$\begin{equation}\hat{p}({s}_{t})\propto \tilde{p}({s}_{t})\cdot p({I}_{1,t}\vert {s}_{t}{\mathcal{M}}_{t-1}^{1})p({I}_{2,t}\vert {s}_{t}{\mathcal{M}}_{t-1}^{2}),\end{equation} \tag{ 14 }$

which will become probabilities after normalization.

Procedure for error correction

The probabilities from equation (14) can be understood as describing how likely it is that the system is in each of the eight error states based on the judgment of the model. Whenever |000⟩ does not have the highest probability, we can infer that at least one error has occurred and take the appropriate action to correct it. This procedure, which effectively takes the argmax of the posteriors, can be altered if certain forms of misclassification are more costly than others, or if the act of making a correction itself carries some cost. The procedure can also be modified so that it is more robust to imperfections in the signal, as we do in section 4.3 by introducing the τ_ignore and τ_streak hyperparameters.

Whenever any correction is made, we must update the model with this information by permuting its probabilities to reflect the applied bit-flip. In our example, a correction on the second qubit would lead us to swap the probabilities between pairs of error states which differ in only the second qubit, e.g., |010⟩ ⇌ |000⟩. Without this update the model will continue to recommend the same correction repeatedly, as it does not realize that the state of system has been changed.

A connection can be made between the Bayesian algorithm described here and the maximum likelihood decoder (MLD) commonly used in discrete error correction [34]. Given a specific noise channel and qubit encoding, the MLD is the protocol with the greatest probability of successfully correcting an error, assuming that we have access to projective measurements of the syndromes. The Bayesian model can be viewed as an extension of the MLD to the continuous measurement regime, where the syndrome measurements provide us with incomplete knowledge of the error subspace. As the variance of the Gaussian measurement noise goes to zero, the Bayesian model reduces to the standard MLD protocol for the three-qubit bit-flip code.

Impact of signal imperfections

Compared to thresholding schemes, the Bayesian classifier described here is far more sensitive to the assumptions we make about the noise and error distributions. Such sensitivity can be an advantage, since it allows for near optimal performance when our knowledge of these distributions is accurate.

Of course, when our assumptions about the distributions are wrong, the accuracy of the model can suffer significantly. Out of the three imperfections described in section 3, only the auto-correlation of neighboring samples is directly accounted for in the model. The resonator transients occur over relatively short time intervals, so they are likely to have only a modest impact on the model's performance. The syndrome drift also has a negative impact, as the mean values of the Gaussian distributions are key parameters in the model. If there is a discrepancy between the actual signal means and our pre-programmed values, then every measurement likelihood calculation will be biased.

We explore the size and significance of these effects for all three of our models in section 5.

4.3. Recurrent neural network (RNN)

Neural networks are a subset of the broader family of machine learning methods based on acquiring a learned representation of the data, which consists of parameterized layers of linear transformations and nonlinear activation functions. RNNs are a class of neural network in which the layers connect temporally, combining the previous time step and a hidden representation into the representation for the current time step. They are thus well suited for representation of the time-dependence of continuously measured error syndromes over discrete time steps. Using a training set of labeled signals, the RNN can learn the properties of the weak measurement signal and the structure of the underlying bit-flip channel, which allows it to accurately detect errors as they occur.

The dynamics of a simple recurrent neural network can be expressed by the following equations:

$\begin{equation*}\begin{aligned}\hfill & {h}_{t}={\sigma }_{h}\left({W}_{h}{x}_{t}+{U}_{h}{h}_{t-1}+{b}_{h}\right),\hfill \\ \hfill & {\,y}_{t}={\sigma }_{y}\left({W}_{y}{h}_{t}+{b}_{y}\right).\hfill \end{aligned}\end{equation*}$

For each time step t, the network accepts the input vector x_t and, along with the hidden state vector from the previous time step h_t−1, performs a linear transformation parameterized by the weight matrices W_h and U_h and the bias vector b_h before applying a nonlinear activation function given by σ_h. The result is the hidden state vector for the current time step h_t, which is acted upon by an analogous series of operations defined by W_y, b_y and σ_y to produce the output vector y_t. We note that the hidden state h_t effectively encodes a description of the history of inputs ${\left\{{x}_{{t}^{\prime }}\right\}}_{{t}^{\prime }=0}^{{t}^{\prime }=t}$ , which therefore allows the network to extract temporal, non-Markovian features from the data.

In our context, we consider the input at each time step to be the vector of measurement signals plus the initial basis state,

$\begin{equation}{x}_{t}=\left[\begin{matrix}\hfill {I}_{1,t}\hfill \\ \hfill {I}_{2,t}\hfill \\ \hfill {s}_{0}\hfill \end{matrix}\right].\end{equation} \tag{ 15 }$

Moreover, instead of the standard recurrent neural network architecture, we use a long short-term memory network (LSTM) [35], which is a particular type of recurrent neural network that involves cell states and various gates to evade the vanishing gradient problem of standard RNN architecture [36]. Nevertheless, the same principle underlying the standard function of RNN applies. The output y_t of the LSTM layer is subsequently passed through a dense layer and a softmax activation to produce the posterior probabilities of the eight basis states $p({s}_{t}\vert {\mathcal{M}}_{t}^{k})$ , and we select the basis state with the highest posterior as the prediction ${\hat{s}}_{t}$ .

Training

Training samples for the RNN require accurate labeling of the states corresponding to the measurement signals at every time step. However, in reality, decoherence effects such as amplitude damping and thermal excitation prevent us from knowing the correct state of the system at some arbitrary time. As a result, to train the RNN, we have to resort to measurement signals with a well defined underlying quantum state. This can be achieved by simulating the measurement signals on states in the absence of unwanted decoherence effects, which will be described in details in section 5. In the simulations, we provide the measurement strength, the single-qubit bit-flip error rate and the initial quantum state as input parameters, and the simulation produces a large number of quantum trajectories to be the training samples of the RNN. We then train the RNN to diagnose bit-flip errors on the three-qubit system, and the trained RNN can be subsequently used to actively correct for errors that occurred. That said, the same information used to generate the training samples is also provided as prior knowledge to the double threshold and the Bayesian model. The two models both require an explicit estimation of the measurement strength as well as the assumption of a certain error rate.

We maximize the likelihood of the RNN parameters on the training set by minimizing the cross entropy batch total loss function, which is defined as

$\begin{equation}\mathcal{L}=-\frac{1}{NT}\sum\limits _{n=1}^{N}\sum\limits _{t=1}^{T}\mathrm{log}\,{p}_{n}({s}_{t}),\end{equation} \tag{ 16 }$

where p_n(s_t) stands for the posterior probability of the true basis state s_t at time step t in the nth sample, while N denotes the mini-batch size and T denotes the total number of steps in each training sample.

To update the parameters to minimize the loss, we perform an iterative training procedure where for each step and parameter w, one applies a gradient descent update of the form $w{\leftarrow}w-\eta (\partial \mathcal{L}/\partial w)$ , where the gradients $\partial \mathcal{L}/\partial w$ are computed via backpropagation through the computation graph of the network.

In our experiments, the gradient descent update is preformed using the ADAM optimizer [37]. We adopt a two-layer stacked LSTM with a hidden state size of 32. This small hidden size limits the largest matrix-vector multiplication in computations, hence the memory required, and also limits the number of parameters, facilitating the implementation of the network in real-time experiments. We further provide a comparison test on the performance of different hidden state sizes in appendix C and show that both smaller LSTM and gated recurrent unit (GRU) architecture [38] offer comparable performance for our purpose. The number of stacked layers of the LSTM/GRU and the hyperparameters, such as the batch size in training, are tuned with the assistance of Ray Tune [39].

Re-calibration method for error correction

When performing active error correction, we once again wish to avoid the delay in the posterior probabilities output by the network to reflect the application of an error correction operation C_op on the system. In the case of the Bayesian classifier, we permute the elements of the vector of posterior probabilities, which encodes the state of the model, in accordance with the error correction operation. For the RNN, however, we cannot apply a particular transformation to the hidden state such that the vector of posterior probabilities outputted by the network is permuted in analogous manner, since the function mapping the hidden state to the output vector of posterior probabilities is highly nontrivial.

Any such delay in the network remaining unaware of the quantum state having been corrected is harmful, because another error X_q occurring during this delay, compounding with the correction C_op on the first error, will induce a logical error at the next error correction operation. To see this clearly, considering that the physical qubits are initially in |000⟩, and the first error X₁ results in the state |100⟩. After detecting the error, the model makes a correction that instantly returns the state back to |000⟩. However, the RNN still has the knowledge of the qubits being in |100⟩ until some time later at t_realize before accepting sufficient number of x_t's that allows it to predict |000⟩. If a second error X₂ occurs before t_realize, the syndromes become (S₁ = −1, S₂ = −1) because the state becomes |010⟩, whereas the RNN, only knowing the state in |100⟩, will eventually predict |101⟩ that has the same syndromes, which is then equivalent to diagnosing an X₃ error. After applying a second error correction C_op = X₃, the physical qubits are now in |111⟩, constituting a logical error. In other words, since we are not capable of injecting the knowledge of a correction operation into the RNN, a correction operation is equivalent to an error seen by the RNN and active correction effectively increases the bit-flip error rate γ in the eyes of the network. Although the correction is correlated with the detected error, the network is generally trained on quantum trajectories with uncorrelated random bit-flip error instances. As will be explained in section 5.1 that a greater γ will induce more logical errors, we conclude that the naive approach of active correction with the RNN suffers from more logical errors.

Therefore, we propose the following re-calibration protocol to effectively hide the action of any error correction operation from the network, so that there is no longer any delay in the posterior probabilities to begin with.

We specifically keep track of all the error correction operations that has been applied up to the present t,

$\begin{equation*}{N}_{q,t}=\text{Number}\;\;\text{of}\;\;{X}_{q}\;\;\text{corrections}\;\;\text{applied}.\end{equation*}$

When the measurement signals I_1,t and I_2,t have symmetric noise around their respective mean values and the possible means of I_k,t are always equal and opposite, each C_op correction changes the mean of I_1,t by a factor of −1 if C_op = X₁, changes the mean of I_2,t by a factor of −1 if C_op = X₃, and changes the mean of both I_k,t by a factor of −1 if C_op = X₂. To hide all the corrections done in the past, the measurement signals that are provided as input to the network for all subsequent time steps are then flipped according to N_q,t,

$\begin{equation*}\begin{aligned}\hfill {I}_{1,t}^{\prime }={(-1)}^{{N}_{1,t}+{N}_{2,t}}{I}_{1,t},\\ \hfill {I}_{2,t}^{\prime }={(-1)}^{{N}_{2,t}+{N}_{3,t}}{I}_{2,t},\end{aligned}\end{equation*}$

which we called the re-calibrated signals. From the perspective of the RNN when taking in ${I}_{k,t}^{\prime }$ , it appears as if no error correction operation has been applied to the physical qubits.

Given that at some time step we predict a different state ${\hat{s}}_{t}$ , we now perform our error correction operation relative to the previous predicted state ${\hat{s}}_{t-1}$ .

Adaption to resonator transients for probabilistic models

When the possible means of I_k,t are not equal and opposite, as occurs in the resonator transients upon applying C_op, the re-calibration method breaks down, because flipping the means of either or both I_k,t does not produce the means as if there was no correction applied. A solution to this is to impose an ignore time period τ_ignore right after the correction is applied at some t. During (t, t + τ_ignore], no input x_t is fed into the RNN. As a result, the hidden state of the network is frozen until the ignore time period ends. The re-calibrated signals are accepted by the network only after t + τ_ignore, which reduces the risk of getting incorrect predictions during the transients, but effectively increases the detection time of any error that occurs during the ignore period.

Imposing τ_ignore should be accompanied by a measure to ensure that the RNN diagnoses any error with sufficiently high confidence so that fewer false alarms of error will be followed by an ignore period τ_ignore upon correction. A feasible measure in practice is to determine an error correction operation only if the RNN predicts the same state ${\left\{{\hat{s}}_{{t}^{\prime }}\right\}}_{{t}^{\prime }=t}^{{t}^{\prime }=t+{\tau }_{\text{streak}}}$ for a streak of time steps τ_streak that is different from the old state ${\hat{s}}_{t-1}$ , which is a discrete quantity that is easy to optimize. The {τ_ignore, τ_streak} then constitutes a minimal set of tunable hyperparameters for the task of active correction in the presence of resonator transients, which applies to the Bayesian classifier explained in section 4.2 as well.

5. Simulated experiments

To evaluate the effectiveness of the three models described in section 4, we test their error correction capabilities on a large number of synthetic measurement sequences. The motivation for using artificial data instead of real data is twofold. First, by using artificial data we can precisely control the underlying measurement distribution, which allows us to separate out the effects of the different imperfections identified in section 3. Second, it is important that we know the true state of the system at every time step, as this is necessary both to train the RNN and to calculate intermediate fidelity values. Such knowledge would not be possible on a near-term quantum computer due to strong undesirable decoherence.

To ensure that our simulations are grounded in reality, we model them on data taken from a superconducting qubit device. Figure 1 shows measurements taken from this reference data, which consists of approximately 1.6 × 10⁶ sequences lasting 6 μs each¹⁰ . The sequences are comprised of 192 measurement pairs (one for each resonator), sampled every 32 ns. The data contains both 'flat' sequences, in which no bit-flip occurs, as well as sequences in which a bit-flip is deliberately applied to one of the three qubits to induce a state transition. Since these bit-flips are all applied at precisely the same time, we are able to track how the signal mean changes during the transient period.

Across all of our tests we employ four different simulation schemes, each of which is described below. The schemes are designated with letters A–D in order of how much non-ideal behavior they include, with scheme A having no imperfections and scheme D having all three imperfections. In all schemes, we ignore the thermal excitation for each qubit, since a typical excitation rate is on the order of 1 ms⁻¹.

Scheme A: idealized behavior

In our first scheme, the simulated signal simply conforms to the idealized behavior given by equation (1). At the beginning of each measurement sequence the system is set to a specified initial state in the coding subspace, and then the state of the next time step is determined by sampling a number n_q of bit-flips X_q for each qubit from the Poisson distribution, such that ${n}_{q}=\mathrm{exp}(-\gamma {\Delta}t){(\gamma {\Delta}t)}^{{n}_{q}}/{n}_{q}!$ where Δt is the time step size. These errors are applied to the corresponding qubits to get the next state. This cycle of sampling and propagating errors is repeated until we have generated a sufficiently long sequence of states.

To create the corresponding I_k,t, we sample a uni-variate Gaussian distribution at each time step with variance ${({{\Gamma}}_{m}^{k}{\Delta}t)}^{-1}$ and a mean of ±1 determined by the syndrome eigenvalue at that step. Our reference data has

$\begin{equation*}{{\Gamma}}_{m}^{k}\approx 4.7\times 1{0}^{6}\ {\mathrm{s}}^{-1},\quad {\Delta}t=32\times 1{0}^{-9}\ \text{s},\quad {\eta }_{k}\approx 0.5,\end{equation*}$

where ${{\Gamma}}_{m}^{k}$ needed to be estimated from the measurement signals while Δt was known to us in advance. This sequence of Gaussian samples plus the underlying states provides a complete description of a system in the context of our error correction task.

Scheme B: auto-correlations

As a first step away from ideal behavior, we consider noise that is correlated across time. The data generation process for this scheme is effectively the same as that of scheme A, except that the noise must be sampled sequentially in order to correctly capture the auto-correlations. In our reference data we find that significant auto-correlations extend back roughly four steps, with covariance given by

$\begin{equation*}{c}_{k}^{T}\approx 5.94\cdot \left[\begin{matrix}\hfill 0.61\hfill & \hfill 0.25\hfill & \hfill 0.1\hfill & \hfill 0.05\hfill \end{matrix}\right]\end{equation*}$

whose ith element is at lag-i. These values were found by taking every contiguous subsequence of length five in our reference data and using them all to compute a covariance matrix. We can simulate Gaussian noise with these auto-correlations one step at a time using equations (12) and (13).

Scheme C: auto-correlations with resonator transients

For our third scheme, we keep the auto-correlations from scheme B but alter the behavior of the syndrome values so that they include the resonator transients seen in figure 1 and explained in appendix D. To incorporate these patterns into our simulation, we first extract the mean values of the transient patterns from our reference data, consisting of 94 steps in total, for each of the twenty-four different single-flip transitions. Our sequence generation process is then identical to scheme B, except that after an error occurs the next 94 measurements are sampled from Gaussians centered on the transient means instead of the syndrome eigenvalues. The pattern that we use is matched to the state of the system before and after the error. After the transient period has elapsed, the means are set back to ±1 and further samples are generated as usual until another error occurs.

Scheme D: all Imperfections

Our final simulation scheme takes the auto-correlations and resonator transients from scheme C and adds an underlying drift term to the syndrome means. Since our reference data contains over a million trajectories collected over the span of multiple hours, it is possible to observe significant differences in the syndrome means between trajectories that are separated by large amounts of time, possibly due to temperature fluctuations.

For our experiments we elected to apply a linear drift Δ_i governed by

$\begin{equation*}{{\Delta}}_{i}=\frac{0.4}{N}\cdot i,\end{equation*}$

where i is an index that arbitrarily orders the different measurement sequences that we generate and N is the total number of these sequences. This drift term is added to every measurement in the ith sequence, resulting in a uniform shift of the overall signal means. The net drift across all runs represents a 40% change, which is consistent with the magnitude of the drift observed in our reference data.

5.1. Quantum memory state tracking

In quantum memory, it suffices to track the basis states in response to the bit-flip errors that have occurred and only apply error correction operations when needed. We generated 30 000 trajectories of length T = 20 μs from all four simulation schemes with a pre-defined single-qubit error rate as our testing samples, among which are equal portions of trajectories initialized in one of the eight basis states. While the RNN model employed here is trained on 100 000 quantum trajectories from the corresponding simulation scheme, the error rate, noise variance and auto-correlations input to the Bayesian model are also estimated from those quantum trajectories. The tunable parameters in the double threshold model are numerically optimized in schemes with imperfections; the filtering time τ typically lies in the range 0.3–1.6 μs, with larger τ for smaller γ.

In figure 2, we compare the final fidelity $\mathcal{F}={\vert \langle {\psi }_{T}\vert {\psi }_{0}\rangle \vert }^{2}$ against the initial state of the three models in tracking these quantum trajectories subject to bit-flips. The trend is that the final fidelity decreases as a function of the single-qubit error rate γ. This is because the higher the error rate is, the more chances there will be two different bit-flips before the correction to the first bit-flip is made, resulting in a logical error upon the correction, and therefore a lower final fidelity. For instance, a state starting at |000⟩ is flipped to |001⟩ at t₁ and is later also flipped to |011⟩ at t₂ > t₁, such that t₂ is smaller than t₁ + t_detect where t_detect is the detection time of the first error. Subsequently, the model perceiving syndromes with (S₁ = −1, S₂ = +1) will eventually make a C_op = X₁ correction and change the state to |111⟩, leading to a logical error. From the above argument, it is also evident that a shorter detection time is beneficial.

From figure 2, we see that the RNN and the Bayesian classifier outperform the double threshold in all simulation schemes, whereas the RNN approximates the Bayesian classifier in all schemes. As discussed in section 4.2, the Bayesian classifier is the optimal model of the three in schemes A and B where there are only auto-correlations in the signals, which is validated in this task. The fact that their performances in schemes C and D are very similar to that in scheme B indicates that the resonator transient pattern and the drifting of the means do not have a significant effect on all three models.

It is reasonable that the drift has a small negative effect to the two probabilistic models, since the drift is usually on the order of the separation of mean values of the two parities, which is in turn one order of magnitude smaller than the standard deviation of the noise. The large noise variance obscures the drifting means, making the drifted signals appear like more noisy signals with fixed means.

5.2. Extending T₁ time of the logical qubit

Although the models are motivated by correcting bit-flip errors, they can also be exploited in extending the T₁ time of the logical qubit in |1⟩_L = |111⟩. For this task, actively correcting the state is required as opposed to merely tracking the state. While for practical purpose the RNN model is trained on 30 000 quantum trajectories under bit-flips with a length of T = 120 μs, the Bayesian model, whose parameters are estimated from the same set of trajectories, uses a different transition matrix generated by Q' shown in equation (E1) which takes into account the asymmetric probabilities of transitions between the ground and excited state. The parameters for the double threshold model is numerically optimized on the same set of quantum trajectories.

For the three-qubit system initialized to the fully excited state |111⟩, we inspect the population within a Hamming distance 1 away from the initial state, i.e., the population P_exc of the set of basis states {|111⟩, |110⟩, |101⟩, |011⟩}, since these states can be recovered to the initial state by a majority vote. We compare this P_exc against the population of the excited state |1⟩ of a bare qubit as a function of time in all four simulation schemes, and the results are shown in figure 3. In all schemes, the encoded three-qubit system P_exc decays much slower under active correction by any of the three models than the bare qubit excited state population. In all schemes, both the Bayesian and the RNN-based model outrun the double threshold model.

**Figure 3.** The population of the excited states {|111⟩, |110⟩, |101⟩, |011⟩} as a function of time, obtained from simulated experiments with the four different schemes at a single-qubit decay rate of γ = 0.04 μs⁻¹. Each data point is averaged over 3000 independent quantum trajectories. The three-qubit system is initialized to |1⟩_L = |111⟩. As a comparison, the bare qubit (purple curve) is initialized to the |1⟩ state and is subject to amplitude damping with a time constant of T₁ = 25 μs, i.e., a decay rate of 0.04 μs⁻¹. For reference, the uncorrected three-qubit system decay curve is shown in red (see appendix E). For all schemes, the RNN-based model outperforms the double threshold model.
Download figure:
Standard image High-resolution image

5.3. Protecting against bit-flip errors

Similar to the task of extending the T₁ time of the state |1⟩_L, here we employ the three models to protect the initial state |1⟩_L from bit-flips. Shown in figure 4, we compare the population P_exc of the three-qubit system against the excited population of the bare qubit in time. For schemes A and B, both the Bayesian and the RNN-based model have an advantage over the double threshold. Furthermore, in figure 4 we extract the initial logical error rate Γ_L as a function of γ by computing the time derivative of P_exc at 9.6 μs at each γ. In either scheme with any of the three models, Γ_L scales approximately quadratic in γ, and we can see a strong suppression of Γ_L relative to a bare qubit or the uncorrected three qubits. We remark that, by introducing feedback based on noisy weak measurements, any correction protocol can underperform a majority vote on the encoded qubits without error correction at sufficiently small γ or runtime.

**Figure 4.** Left: the population of the excited states {|111⟩, |110⟩, |101⟩, |011⟩} as a function of time, obtained from simulated experiments under schemes A and B at a single-qubit bit-flip rate of 0.04 μs⁻¹. Each data point is averaged over 3000 independent quantum trajectories. The three-qubit system is initialized to |1⟩_L = |111⟩. As a comparison, the bare qubit (purple curve) is initialized to |1⟩ and is subject to a bit-flip rate of γ = 0.04 μs⁻¹. As a reference, the uncorrected three-qubit system decay curve is shown in red (see appendix E). In schemes A and B, the Bayesian model is the best among the three, and the Bayesian and RNN-based model both outrun the double threshold model. Right: the initial logical error rate Γ_L at 9.6 μs as a function of the single-qubit error rate γ. The fitted quadratic curves show a strong suppression of Γ_L for all three models in both schemes.
Download figure:
Standard image High-resolution image

To better understand the performance of the models in this important task, we analyze the detection time spent in true positive detection as well as the number of false alarms when the three-qubit system is in |1⟩_L. The difference between a true positive and a false alarm is illustrated in figure 5, which shows the actual and predicted states of the system when an X₃ error occurs and when the model falsely detects an X₁ error. When a true error occurs, the system remains in the corresponding error subspace for a duration determined by the detection time of the model, after which the error is corrected. By contrast, when the model falsely detects that an error has occurred due to measurement noise, it improperly applies a bit-flip to the system and thus pushes it out of the code subspace. After more measurements are recorded, the model determines that the system is in an error subspace and fixes its mistake by applying another bit-flip.

**Figure 5.** Response of the system basis state and model to a true bit-flip error and a false alarm as a function of time. At 1.0 μs an X₃ error is applied to the system, and after a small delay the error is detected and corrected. At 3.0 μs the model falsely detects and then 'corrects' for an X₁ error, which results in the system being temporarily pushed into an error subspace before the mistake is recognized and corrected. There are visible small constant offsets between the prediction and the system state at the false alarm due to the streak time period imposed in the correction protocol.
Download figure:
Standard image High-resolution image

As explained in section 5.1, a shorter detection is favorable and will lead to better error corrections, whereas here we can expect more frequent false alarms arises for models with a shorter detection time as a trade off, since the model is prone to make a correction. This is demonstrated in figure 6, where we can see that the best two models, the Bayesian and the RNN-based, both have a shorter detection time and more frequent false alarms at the same time. Nevertheless, for both of these two models, the overall frequency of all false positive detection remains low and is on the order of 0.1 μs⁻¹.

**Figure 6.** The distribution of detection time (with the left y-axis) and the distribution of false alarms of bit-flips (with the right y-axis) when the state is originally in |111⟩, over 100 000 quantum trajectories with an operation time T = 120 μs and with a single-qubit bit-flip rate γ = 0.04 μs⁻¹. The three qubits are initialized to |111⟩. The overall frequencies of all false alarms for the RNN-based, Bayesian, and double threshold models are 0.155(5), 0.117(2), 0.0022(2) μs⁻¹, respectively.
Download figure:
Standard image High-resolution image

5.4. Quantum annealing with time-dependent Hamiltonians

Having demonstrated a clear advantage using the RNN-based protocol for tasks in the quantum memory setting over the double threshold protocol, we now study the performance of our protocol for quantum annealing, using a time-dependent Hamiltonian that does not non-commute with the bit-flip errors. We note that the protocol is also applicable to evolution under quantum gate operations.

In quantum annealing, it is imperative to perform error diagnosis and correction in a manner that is both fast and accurate, in order to avoid accruing these logical errors while single bit-flip errors are being diagnosed and corrected. This is because the action of an error X_q effectively transforms the Hamiltonian from H(t) to X_q H(t)X_q in the Heisenberg picture. Until the error is properly diagnosed and corrected, subsequent coherent evolution of the logical state in the code subspace is due to the modified Hamiltonian X_q H(t)X_q. If the original Hamiltonian does not commute with the error, i.e. X_q H(t)X_q ≠ H(t), then such evolution will be spurious rather than as originally intended, causing logical errors to accrue.

For this simulated experiment (see appendix B), the annealing Hamiltonian with a strength Ω₀ evolving ${\rho }_{0}=\vert {\psi }_{0}\rangle \langle {\psi }_{0}\vert ,\vert {\psi }_{0}\rangle =(\vert {0\rangle }_{\text{L}}+\vert {1\rangle }_{\text{L}})/\sqrt{2}$ is chosen to be

$\begin{equation}H(t)=-{{\Omega}}_{0}\left[a(t){X}_{1}{X}_{2}{X}_{3}+b(t)\frac{{Z}_{1}+{Z}_{2}+{Z}_{3}}{3}\right],\end{equation} \tag{ 17 }$

where a(t) = 1 − t/T and b(t) = t/T. In the code subspace, it is equal to

$\begin{equation}h(t)=-{{\Omega}}_{0}\left[a(t){\sigma }_{x}+b(t){\sigma }_{z}\right],\end{equation} \tag{ 18 }$

whereas in any error subspace it is equal to the spurious Hamiltonian,

$\begin{equation*}{h}_{\text{spurious}}(t)=-{{\Omega}}_{0}\left[a(t){\sigma }_{x}+b(t)\frac{{\sigma }_{z}}{3}\right].\end{equation*}$

We adopt the reduction factor [21] as the metric for evaluating the model performance, which is defined as,

$\begin{equation}\mathcal{R}=\frac{1-{\mathcal{F}}_{\text{une}}}{1-\mathcal{F}},\end{equation} \tag{ 19 }$

whose numerator is the final infidelity of an unencoded bare qubit initialized to |0⟩ under the annealing Hamiltonian equation (18), and whose denominator is the final infidelity of the three-qubit encoded state in the code subspace with respect to the target quantum state. As $\dot{a}(t),\dot{b}(t)\to 0$ , the target quantum state becomes the ground state of the target Hamiltonian.

As shown in figure 7, at relatively low γ, the Bayesian model achieves the highest reduction factor in scheme A, while both the Bayesian and the RNN-based model outperform the double threshold. However at sufficiently high error rates γ, the encoded qubits under active correction using any of the three models show no improvement over a single unencoded qubit, as expected.

6. Discussion

We have proposed an RNN-based CQEC algorithm that is able to outperform the popular double threshold algorithm across all tasks for each of the four simulation schemes tested in section 5. This result holds regardless of whether the algorithms are protecting a system from bit-flip errors or from amplitude damping, and applies in the case of both quantum memory and quantum annealing. The relative performance of the three models does not depend significantly on the underlying error rate or the duration of the experiment, unless either of these values is exceptionally large.

The mathematical simplicity of equation (1) is a product of many idealized assumptions, so we can expect that measurements taken from real quantum devices will not necessarily be as easy to describe. Our analysis of superconducting qubit measurements in section 3 reveals several examples of non-ideal behavior in both the syndrome and noise distributions, and we expect similar findings in the outputs of other devices. While some signal imperfections can be accounted for in traditional CQEC algorithms, such as the incorporation of auto-correlations into the Bayesian classifier, most of them will not be easy to precisely characterize. It is in these situations that neural networks can best demonstrate their advantage, since they do not require any a priori description of the patterns within the measurement signals, but instead learn them directly from the training data. An interesting direction for further study is the extension of the RNN-based CQEC algorithm to correlated and leakage errors.

A CQEC algorithm should be practical to run on a sub-microsecond timescale, typically using an FPGA or other programmable, low-latency device. The Bayesian model requires division to normalize the posteriors, which is a very costly operation on FPGAs. This makes it challenging to efficiently implement the Bayesian model, although a more practical log-Bayesian approach has recently been developed [40]. The RNN-based model, by contrast, does not require division and avoids this problem. There are many precedents for running RNNs on FPGAs (see e.g. [41]). Since the RNN architecture used in our paper is small in size (more simplifications are discussed in appendix C), its computational latency is sub-microsecond. Nevertheless, more work will be needed in order to determine how best to interface the RNN with the quantum computer in a feedback loop. For supervised learning, there is the need for generating a sufficient amount of training data that incorporates the error information and the signal features. Further work could focus on determining the minimum amount and type of data that the RNN needs to train effectively, and understand how these needs change as the number of physical qubits in the error code increases.

Given low-latency implementations of the Bayesian and RNN-based models, an obvious next step for future work would be a direct comparison between these CQEC protocols and existing discrete QEC protocols on quantum hardware. Ristè et al [42] have already demonstrated discrete QEC for a three-qubit bit-flip code on transmons, and recent work by Livingston et al [22] has implemented a triple threshold CQEC protocol on similar hardware. By running experiments on a given physical device, a full comparison between discrete and continuous CQEC can be made under realistic conditions. Due to the lack of both entangling gates and ancillas, we are optimistic that CQEC could significantly improve the speed and fidelity of many QEC codes.

Acknowledgment

We would like to thank Philippe Lewalle, John Preskill, Kai-Isaak Ellers, and John Paul Marceaux for helpful discussions. HL and IC were supported by the National Aeronautics and Space Administration under Grant/Contract/Agreement No. 80NSSC19K1123 issued through the Aeronautics Research Mission Directorate. SZ, HNN, and KBW were supported by the US Department of Energy, Office of Science, National Quantum Information Science Research Centers, Quantum Systems Accelerator. WPL and IS were supported by the US Army Research Laboratory and the US Army Research Office under Contract/Grant No. W911NF-17-S-0008. Publication made possible in part by support from the Berkeley Research Impact Initiative sponsored by the UC Berkeley Library.

Data availability statement

The source code that support the findings of this study are openly available at the following URL/DOI: https://colab.research.google.com/drive/1NSwz4Qy3SlfE-fptz59-hj3880QJ7DVj?usp=sharing. The physical experiment data are available upon request.

Appendix A.: Homodyne measurements

The interaction Hamiltonian for the transmission line and the cavity field is given by

$\begin{equation}H=\mathrm{i}\sqrt{\frac{\gamma }{{\Delta}t}}(b{a}^{{\dagger}}-{b}^{{\dagger}}a),\end{equation} \tag{ A1 }$

where γ is the coupling strength and Δt is some coarse-grained time-scale in the collision model (see equations (14), (16) and (17) in [43]), b and a are the lowering operators of the cavity field and the transmission line, respectively.

The original Hamiltonian in equation (A1) then generates a unitary which we keep up to order Δt:

$\begin{equation*}U={\mathrm{e}}^{-\mathrm{i}H{\Delta}t}\approx 1+\sqrt{\gamma {\Delta}t}(b{a}^{{\dagger}}-{b}^{{\dagger}}a)-\frac{\gamma }{2}{(b{a}^{{\dagger}}-{b}^{{\dagger}}a)}^{2}{\Delta}t.\end{equation*}$

The homodyne measurement readouts the quadrature basis of the probe, in-phase I, quadrature Q, or some linear combination thereof, and can be implemented by a variety of devices. In our physical experiments, we use JPAs. For our analysis, we will measure in the I quadrature, in which we construct the quadrature operator R = a + a^†. Measuring in this basis, the output is a continuous variable r with associated Kraus operators [44]

$\begin{equation*}\begin{aligned}\hfill {{\Omega}}_{r}& =\langle r\vert U\vert 0\rangle \hfill \\ \hfill & =\langle r\vert 0\rangle +\langle r\vert 1\rangle \sqrt{\gamma {\Delta}t}b-\frac{\gamma }{2}{\Delta}t\left(\langle r\vert 0\rangle {b}^{{\dagger}}b+\langle r\vert 2\rangle \sqrt{2}{b}^{2}\right)\hfill \\ \hfill & =\langle r\vert 0\rangle \left[1+r\sqrt{\gamma {\Delta}t}b-\frac{\gamma }{2}{\Delta}t({b}^{{\dagger}}b-({r}^{2}-1){b}^{2})\right],\hfill \end{aligned}\end{equation*}$

where $\langle r\vert 0\rangle ={(2\pi )}^{-1/4}\mathrm{exp}(-{r}^{2}/4)=\sqrt{{P}_{0}(r)}$ is the probes's ground state in the position basis and P₀(r) is the probability of measuring r when the probe is in the ground state. In the last line, we have used the Hermite polynomials to express the harmonic oscillator's first and second excited states in terms of its ground state.

We determine the probability of measuring a particular outcome r as

$\begin{equation*}\begin{aligned}\hfill {p}_{r}& ={\langle {{\Omega}}_{r}^{{\dagger}}{{\Omega}}_{r}\rangle }_{\rho }\hfill \\ \hfill & ={P}_{0}(r)\left[1+r\sqrt{\gamma {\Delta}t}{\langle b+{b}^{{\dagger}}\rangle }_{\rho }+\gamma {\Delta}t({r}^{2}-1){\langle {b}^{{\dagger}}b\rangle }_{\rho }\right],\hfill \end{aligned}\end{equation*}$

where the average is taken over the states ρ of the cavity field coupled to the transmons [45].

If we approximate r as a Gaussian variable, we then want to determine the mean and variance of this:

$\begin{equation*}\begin{aligned}\hfill & {\langle r\rangle }_{\rho }={\int }_{-\infty }^{\infty }r{p}_{r}\,\mathrm{d}r=\sqrt{\gamma {\Delta}t}{\langle b+{b}^{{\dagger}}\rangle }_{\rho },\hfill \\ \hfill & {\langle {r}^{2}\rangle }_{\rho }={\int }_{-\infty }^{\infty }{r}^{2}{p}_{r}\,\mathrm{d}r=1.\hfill \end{aligned}\end{equation*}$

Let ΔW be drawn from a Gaussian distribution with variance Δt. The statistics of the measurement record of r can be reproduced by

$\begin{equation}r\sqrt{{\Delta}t}=\sqrt{\gamma }{\langle b+{b}^{{\dagger}}\rangle }_{\rho }{\Delta}t+{\Delta}W.\end{equation} \tag{ A2 }$

The voltage operator to be measured will be of the form

$\begin{equation*}\hat{V}\propto \frac{a+{a}^{{\dagger}}}{\sqrt{{\Delta}t}},\end{equation*}$

resulting in a classical voltage

$\begin{equation*}V=A\frac{r}{\sqrt{{\Delta}t}},\end{equation*}$

where A is a constant scaling factor in units of V ⋅ s^1/2 characterising the physical noise power in a certain bandwidth. Using equation (A2), the measured voltage V, which is written in terms of

$\begin{equation}V{\Delta}t=A\left(\sqrt{\gamma }{\langle b+{b}^{{\dagger}}\rangle }_{\rho }{\Delta}t+{\Delta}W\right),\end{equation} \tag{ A3 }$

has variance that scales as Δt⁻¹. The state of the transmons can be inferred from the homodyne measurement voltage in equation (A3) [45].

To implement a single parity measurement on two qubits, we dispersively couple two qubits to the same readout resonator. We tune the qubits to have the same dispersive coupling to the resonator so that the states |01⟩ and |10⟩ are indistinguishable on the I–Q plane. By making the dispersive shift χ much larger than the linewidth κ of the resonator, we can make the reflected phase of |00⟩ (close to π) and |11⟩ (close to −π) overlap with one another, making them indistinguishable as well. The reflected phase response is shown in figure 9. Altogether we implement a full parity measurement of odd excitations vs even excitations by measuring the I quadrature. In our experiment, we implement two of these full parity measurements—one between qubits 1 and 2 and the other between qubits 2 and 3 [22].

Appendix B.: Quantum annealing simulations

We adopt the jump/no-jump method for bit-flip errors. In this method, gradual decoherence due to the third term in equation (3) is described as the average effect of bit-flip errors X_q occurring at random times. At a finite time interval [t, t + Δt], a bit-flip error X_q occurs with probability γ_qΔt. If this error occurs, the quantum state jumps from ρ_t to ρ_t+Δt = X_q ρ_t X_q. Otherwise, the quantum state continuously evolves without environmental decoherence. On averaging over many instances of the bit-flip errors, the jump/no-jump approach reduces to the open quantum system model, where errors continuously change the mixed system state ρ(t).

In simulating the coherent evolution, we use the first-order Magnus expansion [46] of the annealing Hamiltonian H(t) in equation (17) at every finite time interval [t, t + Δt], ${\tilde{U}}_{t}=\mathrm{exp}\left[-\mathrm{i}H({t}^{\prime }){\Delta}t\right]$ where t' = t + Δt/2, such that the quantum state evolves as ${\rho }_{t+{\Delta}t}={\tilde{U}}_{t}{\rho }_{t}{\tilde{U}}_{t}^{{\dagger}}$ .

We average over 10 000 quantum trajectories obtained through the above-mentioned steps to simulate the ensemble density states ρ_t.

Appendix C.: RNN hidden state size vs performance

It is desirable to limit the size of the RNN to achieve sufficiently low computational latency in real-time experiments. We present the performance in state tracking in quantum memory as described in section 5.1 for the LSTM and GRU architectures with different hidden sizes in table 1. In examining the performance, we see that although we used LSTM with a hidden size 32 in our simulated experiments, it is possible to shrink the size of the network to 16 without harming the performance. We note that a smaller hidden size means smaller matrix-vector multiplications in computing the model, which then requires fewer memory resources in practice. The possible simplification is also suggested by the fact that the learning curves with a hidden size of 16 is very similar to that with a hidden size of 32, as shown in figure 8. Additionally, it is viable to use the GRU architecture to achieve the same performance. These results suggest that the RNN-based model may have a simpler structure and an even faster computation speed in real-time implementation on programmables like FPGAs.

Table 1. The testing performance of LSTM (top) and GRU (bottom) with different hidden sizes and the corresponding number of trainable parameters. The testing performance is measured by the final excited states population P_exc. The hidden size determines the largest matrix-vector multiplication operation performed when computing the model.

Hidden size	8	16	32	64
Parameter count	1064	3256	13 448	51 464
Final P_exc(±0.002)	0.851	0.880	0.884	0.882

Hidden size	8	16	32	64
Parameter count	816	2776	10 152	38 728
Final P_exc(±0.002)	0.816	0.880	0.879	0.881

**Figure 8.** The learning curves of LSTMs with hidden sizes of 16 and 32, and of GRUs with hidden sizes of 16 and 32, on the state tracking task in quantum memory as described in section 5.1. The accuracy is defined to be the fidelity with respect to the initial state averaged across all time steps, and the loss is computed by equation (16).
Download figure:
Standard image High-resolution image

We note that the size of the RNN can be further reduced, if assuming a fixed initial state so that the input to the RNN shown in equation (15) can be replaced by $x={[{I}_{1,t},{I}_{2,t}]}^{\mathrm{T}}$ .

Appendix D.: Resonator transients

The resonator transients are manifested from the varying SNR before the qubit-state-dependent coherent states |α_ζη(t)⟩ of the microwave field in the cavity reach their steady states when the resonator linewidth κ is small, where ζ, η ∈ {e, g} and e/g denotes the excited/ground state. The complex field amplitude ${\langle \hat{a}\rangle }_{\zeta \eta }={\alpha }_{\zeta \eta }$ given that the qubits are in state ζη satisfies [10, 45, 47]

$\begin{equation}\begin{cases}{\dot{\alpha }}_{\mathrm{e}\mathrm{e}}(t)=-\mathrm{i}\varepsilon -\mathrm{i}({\delta }_{r}+2\chi ){\alpha }_{\mathrm{e}\mathrm{e}}(t)-\frac{\kappa }{2}{\alpha }_{\mathrm{e}\mathrm{e}}(t),\quad \hfill \\ {\dot{\alpha }}_{\mathrm{g}\mathrm{g}}(t)=-\mathrm{i}\varepsilon -\mathrm{i}({\delta }_{r}-2\chi ){\alpha }_{\mathrm{g}\mathrm{g}}(t)-\frac{\kappa }{2}{\alpha }_{\mathrm{g}\mathrm{g}}(t),\quad \hfill \\ {\dot{\alpha }}_{\mathrm{e}\mathrm{g}}(t)=-\mathrm{i}\varepsilon -\mathrm{i}{\delta }_{r}{\alpha }_{\mathrm{e}\mathrm{g}}(t)-\frac{\kappa }{2}{\alpha }_{\mathrm{e}\mathrm{g}}(t),\quad \hfill \\ {\dot{\alpha }}_{\mathrm{g}\mathrm{e}}(t)=-\mathrm{i}\varepsilon -\mathrm{i}{\delta }_{r}{\alpha }_{\mathrm{g}\mathrm{e}}(t)-\frac{\kappa }{2}{\alpha }_{\mathrm{g}\mathrm{e}}(t),\quad \hfill \end{cases}\end{equation} \tag{ D1 }$

where ɛ is the amplitude of the driving tone, χ is the dispersive shift and δ_r = ω_r − ω_d is the detuning of the measurement drive to the bare cavity frequency.

The steady state $({\dot{\alpha }}_{\zeta \eta }=0)$ solutions to the above equations are

$\begin{equation*}\begin{cases}{\alpha }_{\mathrm{e}\mathrm{e}/\mathrm{g}\mathrm{g}}=\frac{-2\varepsilon }{2({\delta }_{r}\pm 2\chi )-\mathrm{i}\kappa },\quad \hfill \\ {\alpha }_{\mathrm{e}\mathrm{g}}={\alpha }_{\mathrm{g}\mathrm{e}}=\frac{-2\varepsilon }{2{\delta }_{r}-\mathrm{i}\kappa }\quad \hfill \end{cases}\end{equation*}$

with + for ee and − for gg.

In our parity measurement, we probe at the shared odd excitation resonance, which is also the same as the bare cavity frequency, i.e., δ_r = 0. The cavity resonance when the qubits are in |11⟩ is shifted from the bare cavity resonance by 2χ/2π = −4 MHz, while the resonance when the qubits are in |00⟩ is shifted from the bare frequency by −2χ/2π = 4 MHz (see figure 9). This results in an asymmetry between the paths in phase-space leading up to the steady states when the qubit pair changes parity.

**Figure 9.** The reflected phase response of different two-qubit states. χ₀/2π ≈ χ₁/2π ≈ −2 MHz, where χ₀ and χ₁ correspond to the two resonators for the two parity measurements. We set the probing frequency to be at the center of the odd-parity resonance. Reproduced with permission from [22].
Download figure:
Standard image High-resolution image

**Figure 10.** The pointer state paths leading up to the steady state in the phase space, with κ/2π = 800 kHz, χ/2π = −2 MHz, δ_r = 0 and ɛ set to 1. When the qubit pair goes from an even parity to an odd parity, e.g., |00⟩ → |10⟩, the blue line is the path of α_eg(t) while the blue cross shows the steady state of α_gg, obtained from equation (D2). When the qubit pair goes from an odd parity to an even parity, e.g., |10⟩ → |00⟩, the orange spiral curve is the path of α_gg while the orange cross shows the steady state of α_eg, obtained from equation (D3).
Download figure:
Standard image High-resolution image

When the qubits go from an even-parity state to an odd-parity state, e.g., |00⟩ → |10⟩, solving ${\dot{\alpha }}_{\mathrm{e}\mathrm{g}}(t)$ in equation (D1) with the initial coherent state at α_gg yields the path α_eg(t) specified by

$\begin{equation}\begin{cases}{\alpha }_{\mathrm{g}\mathrm{g}}(t)={\alpha }_{\mathrm{g}\mathrm{g}}\quad \hfill \\ {\alpha }_{\mathrm{e}\mathrm{g}}(t)=\left({\alpha }_{\mathrm{g}\mathrm{g}}+\frac{2\mathrm{i}\varepsilon }{\kappa +2\mathrm{i}{\delta }_{r}}\right){\text{e}}^{-\text{i}{\delta }_{r}t-\frac{\kappa }{2}t}-\frac{2\mathrm{i}\varepsilon }{\kappa +2\mathrm{i}{\delta }_{r}}.\quad \hfill \end{cases}\end{equation} \tag{ D2 }$

When the qubits go from an odd-parity state to an even-parity state, e.g., |10⟩ → |00⟩, solving ${\dot{\alpha }}_{\mathrm{g}\mathrm{g}}(t)$ in equation (D1) with the initial coherent state at α_gg yields the path α_gg(t) specified by

$\begin{equation}\begin{cases}{\alpha }_{\mathrm{g}\mathrm{g}}(t)=\left({\alpha }_{\mathrm{e}\mathrm{g}}+\frac{2\mathrm{i}\varepsilon }{\kappa +2\mathrm{i}({\delta }_{r}-2\chi )}\right){\text{e}}^{-\text{i}({\delta }_{r}-2\chi )t-\frac{\kappa }{2}t}-\frac{2\mathrm{i}\varepsilon }{\kappa +2\mathrm{i}({\delta }_{r}-2\chi )}\quad \hfill \\ {\alpha }_{\mathrm{e}\mathrm{g}}(t)={\alpha }_{\mathrm{e}\mathrm{g}}.\quad \hfill \end{cases}\end{equation} \tag{ D3 }$

These paths are shown in figure 10 . Strictly speaking, the two sets of solutions apply when there are no dynamics apart from the dispersive measurements.

**Figure 11.** The measurement rate Γ(t) on a pair of qubits with a bit-flip transition at t = 0, with κ/2π = 800 kHz, χ/2π = −2 MHz, δ_r = 0 and ɛ set to 1. The upper figure corresponds to the qubit pair transitioning from an even parity to an odd parity, obtained from equation (D2). The lower figure corresponds to the qubit pair transitioning from an odd parity to an even parity, obtained from equation (D3).
Download figure:
Standard image High-resolution image

The measurement strength is defined as [45, 48]

$\begin{equation*}{\Gamma}(t)=\frac{1}{2}\kappa {\vert {\alpha }_{\mathrm{g}\mathrm{g}}(t)-{\alpha }_{\mathrm{e}\mathrm{g}}(t)\vert }^{2},\end{equation*}$

which scales the separation of the two parity signal means under constant noise variance (see equation (1)). In the odd-to-even parity transition, the path in phase-space leading up to the steady states forms a tighter spiral as the ratio |χ/κ| gets larger. A tighter spiral translates to a more oscillatory Γ(t), thus leading to a more oscillatory signal mean [10].

Shown in figure 11, the ring-up transient without clear oscillations is manifested in the measurement strength corresponding to the even-to-odd parity transition in equation (D2), whereas the ring-down transient with oscillations is manifested in the measurement strength corresponding to the odd-to-even parity transition in equation (D3). They show good agreement with experimental observations, such as those in figure 1.

**Figure 12.** The final fidelity with respect to the initial state |000⟩ in schemes A and D with the double threshold exponential filter (DT) in [21], and the double threshold boxcar filter (DT boxcar) in [20], as a function of single qubit bit-flip rate γ at an operation time T = 20 μs with a measurement strength Γ_m = 4.7 μs⁻¹. Each data point is averaged over 30 000 quantum trajectories.
Download figure:
Standard image High-resolution image

Appendix E.: Population of states subject to amplitude damping or bit-flips

We recall that the population of the excited states P_exc is the ensemble population of the states that are at most one bit-flip away from the fully excited state |111⟩, i.e., P_exc = P(|111⟩) + P(|110⟩) + P(|101⟩) + P(|011⟩) = P₇ + P₆ + P₅ + P₃.

Under T₁ decay at zero temperature, the transition matrix evolving the states for time T is J'(T) = exp(Q'T), where Q' is defined as,

$\begin{equation}{Q}^{\prime }=\gamma \left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 1\hfill & \hfill -1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill -1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -2\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill -1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill -2\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill -2\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill -3\hfill \end{matrix}\right].\end{equation} \tag{ E1 }$

The state probabilities under the Markov chain are given by P(T) = J'(T)P(0), which yields

$\begin{equation*}{P}_{\text{exc}}(T)=\left(3\,{\mathrm{e}}^{\gamma T}-2\right){\mathrm{e}}^{-3\gamma T}.\end{equation*}$

Under only bit-flip errors X_q, the transition matrix evolving the states for time T is J(T) = exp(QT), where Q is defined in equation (10). The resultant population of excited states is

$\begin{equation*}{P}_{\text{exc}}(T)={\mathrm{e}}^{-3\gamma T}\,{\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{h}}^{2}(\gamma T)\left[3\,\mathrm{sinh}(\gamma T)+\mathrm{cosh}(\gamma T)\right].\end{equation*}$

Appendix F.: Performance comparison between the double threshold method and the double threshold boxcar filter

The double threshold boxcar filter in [20] employs a boxcar averaging of the measurement signals and two thresholds, one fixed at zero and the other at a variable position above zero. We compare the performance of this against the double threshold model (with exponential filter and two variable thresholds) from [21] that was used in this work, by running the state tracking task as described in section 5.1 on schemes A and D, as shown in figure 12. The double threshold method outperforms double threshold boxcar in both schemes at relatively low error rates.

Machine learning for continuous quantum error correction on superconducting qubits

Article metrics

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

2. Background

3. Problem setup

3.1. Characteristics of CQEC measurement signals

3.2. Impact of auto-correlations

4. Models

4.1. Double thresholds

4.2. Discrete Bayesian classifier

4.3. Recurrent neural network (RNN)

5. Simulated experiments

5.1. Quantum memory state tracking

5.2. Extending T₁ time of the logical qubit

5.3. Protecting against bit-flip errors

5.4. Quantum annealing with time-dependent Hamiltonians

6. Discussion

Acknowledgment

Data availability statement

Appendix A.: Homodyne measurements

Appendix B.: Quantum annealing simulations

Appendix C.: RNN hidden state size vs performance

Appendix D.: Resonator transients

Appendix E.: Population of states subject to amplitude damping or bit-flips

Appendix F.: Performance comparison between the double threshold method and the double threshold boxcar filter

Footnotes

Machine learning for continuous quantum error correction on superconducting qubits

Article metrics

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

2. Background

3. Problem setup

3.1. Characteristics of CQEC measurement signals

3.2. Impact of auto-correlations

4. Models

4.1. Double thresholds

4.2. Discrete Bayesian classifier

4.3. Recurrent neural network (RNN)

5. Simulated experiments

5.1. Quantum memory state tracking

5.2. Extending T1 time of the logical qubit

5.3. Protecting against bit-flip errors

5.4. Quantum annealing with time-dependent Hamiltonians

6. Discussion

Acknowledgment

Data availability statement

Appendix A.: Homodyne measurements

Appendix B.: Quantum annealing simulations

Appendix C.: RNN hidden state size vs performance

Appendix D.: Resonator transients

Appendix E.: Population of states subject to amplitude damping or bit-flips

Appendix F.: Performance comparison between the double threshold method and the double threshold boxcar filter

Footnotes

5.2. Extending T₁ time of the logical qubit