Bulk fault-tolerant quantum information processing with boundary addressability

We present a fault-tolerant (FT) semi-global control strategy for universal quantum computers. We show that an N-dimensional array of qubits where only (N−1)-dimensional addressing resolution is available is compatible with FT universal quantum computation. What is more, we show that measurements and individual control of qubits are required only at the boundaries of the FT computer. Our model alleviates the heavy physical conditions on current qubit candidates imposed by addressability requirements and represents an option for improving their scalability.


Introduction
In recent years, much effort has been devoted to the realization of what was first envisioned by Shor [1]: a quantum computer. To achieve this, one basically requires the following: a set of qubits and the ability to prepare, manipulate (through gates) and measure quantum information stored in those qubits [2]. In the face of errors, one must also execute fault-tolerant (FT) quantum error correction (QEC) to enable large-scale quantum information processing [3]. For fault-tolerant universal quantum computing (FTUQC) [4], one typically needs a high degree of parallelism or simultaneity. In particular, traditional schemes require (i) complete addressability, (ii) sufficiently low error rates for operations and (iii) simultaneity of such operations. Such tough requirements imply that the experimental execution of FT schemes will be technically very challenging. For example, ion traps have fairly low error rates (∼10 −3 ) and good addressability, but scaling ion traps to a large number of ions where many gates can be executed simultaneously is highly challenging [5]. On the other hand, neutral atoms trapped in optical lattices possess large numbers of qubits and also simultaneous control but individual addressing is very challenging [6]. In general, there is a technological trade-off between the capability to perform individual addressing and large-scale simultaneous addressing. This makes us ask the question: Which of the requirements (i)-(iii) can be relaxed and still one could achieve FTUQC with a reasonable threshold? Relaxing the requirements may pave the way for implementations to scale up the number of qubits and achieve FT quantum computation.
In a previous paper [7], we found ways to overcome one challenge inherent in criteria (ii) and (iii), specifically faulty measurements that are difficult to apply simultaneously, by removing measurement from QEC. It had been known for some time that this was possible [4], but we were able to show that the associated threshold error rate for FT quantum computation was competitive with strategies using measurement ( [13] and references therein). One can reduce, in principle, the addressing requirements using global control techniques [9,10].
In global control, gates do not have to be individually addressed and one can achieve universal quantum computing using global operations, such as always-on interactions plus homogeneous pulses. The preparation of specific quantum states can be replaced with a resetting process plus a gate; moreover, the resetting process can be executed in parallel, greatly reducing the addressing requirements. Only measurements seem to require addressing. This apparent requirement poses heavy experimental constraints when one demands fault tolerance: error correction (EC) in every logical qubit simultaneously, at every computational time step, which means that measurement-aided syndrome extraction would require one to simultaneously and distinguishably measure the state of many qubits. Distinguishable measurements with good accuracy may be possible in small qubit registers, but they will be very challenging to scale up [8]. Consequently, to avoid these and other issues, in [7] we introduced circuits to remove measurements from QEC protocols and greatly reduce the role of measurements in FTUQC. It should be noted that we used resetting, or fresh ancillas, in order to flush entropy from the system; resetting is, however, experimentally friendlier than measurements. Our results showed two things: (I) gates must be quite accurate, while preparation and measurements can have error rates as high as 33%; and (II) because measurements are removed, simultaneous and parallel FT QEC can be executed over many logical qubits.
In this paper, we explore in detail the consequences of (II) and propose a novel concept with the potential for not only greatly alleviating the measurement-related issues mentioned above, but also reducing some of the addressability constraints that typically burden FT computer designs. We show for the first time that FT bulk universal quantum computing is possible where only the boundaries are addressable. We will argue that such a 'holographic'-like design not only represents a way of reducing the number of individually addressable qubits, but also inherits the results we obtained in [7] regarding the tolerable high error rates for fault tolerance.
This paper is organized as follows. In section 2, we discuss the addressing required in typical FTUQC designs and describe the schematic idea of a semi-global, with boundary addressing only, control scheme, which reduces the number of controls and the amount of addressing required while maintaining fault tolerance. In section 3, we formalize the scheme and explicitly describe the necessary tools. In particular, we describe FT routines that make minimal use of measurements; these measurements can be carried out exclusively on the boundary (or offline). We then show how our model can be mapped to a lower dimensionality array but now requiring additional next-to-nearest neighbor gates. We discuss in section 4 the error threshold for this design and, finally, in section 5 we proceed to make an analysis of the resources required and show that the number of controls in our strategy has only a weak dependence on the number of computational qubits, as opposed to traditional fully addressable architectures.

A semi-globally controlled quantum computer
We initially consider a three-dimensional (3D) array of qubits with nearest-neighbor interactions, although we urge the reader to keep in mind that our real goal is to show that an N -dimensional, N = 2, 3, array can execute FT quantum when only (N − 1)-dimensional individual addressing is available. We will first develop the 3D model and then, in section 3.1, argue that the model can work in a 2D array with 1D individual addressing. A 3D array of qubits makes efficient use of space; however, in a 3D spatial array, it will typically be hard to measure/manipulate individual qubits in the bulk of the array, i.e. 3D addressing resolution. Instead, we assume a 2D addressing resolution, i.e. the ability to address lines of Figure 1. Schematic of the addressability requirements and qubit distribution of our semi-global architecture. Vertical global pulses are capable of executing the single-qubit (green) and two-qubit gates (red) described in the text. Computation is achieved through the vertical nearest-neighbor-independentT pulses, which can be decomposed into two subroutines (black and brownT pulses), in virtue of the ABAB addressability, so as to ensure fault tolerance. The end-planes (darker blue) are fully addressable independently of the bulk of the computer, and must have extra space to accommodate an |H L state encoded at the highest level of concatenation in order to execute the non-Clifford encoded gates. All planes contain enough qubits to hold an encoded qubit, the ancillas required for its unitary quantum error correction (UQEC). We also require that every plane has physical qubits that can be reset (simultaneously in all planes) to use a resource for algorithmic cooling (simultaneous in all planes); note that the resetting operation assumes no single plane addressability. qubits in the array. We label an indexed array of 3D locations by the coordinates (x, y, z), where each coordinate s ∈ [1, N s ], for s = {x, y, z}, and we label the addressable lines in the 3D array by (x, y). The action of a single-qubit gate addressing the line (x, y) is given by U (x,y) = z U (x,y,z) , whereas two-qubit gates between addresses (x, y) and (x , y ) are given by V ((x,y),(x ,y )) = z V (x,y,z),(x ,y ,z) , with the obvious generalization for multi-qubit gates. Approaching measurements this way is not practical as it does not allow one to discriminate individual qubit measurement results along z. We shall assume that all measurements are carried out at the boundaries. In fact, we allow the possibility of executing any operation on the z-boundaries, i.e. O (x,y,1) and O (x,y,N z ) for any (x, y). The addressing limits described above impose a constraint on the type of gates we can execute in the z-direction: for example, we cannot directly execute a gate of the form V (x,y,z),(x,y,z ) (figure 1). We note that we allow longrange interactions within every z-plane, but that one can restrict to nearest-neighbor gates with a slight reduction of the FT threshold due to the introduction of intermediate SWAP gates. We require the ability to execute nearest-neighbor CZ gates in the z-direction along (x, y) columns. We will show that this limited addressability, where we can only address columns of qubits in the 3D array, yields universal quantum computing and, more importantly, fault tolerance.

Global control
Let us consider the global control model introduced in [9] (which is closely related to [10]). In [9], the authors consider a 1D array of qubits, e.g. consider the single column x = 1 = y in our 3D array, and show that with the global operators and non-Clifford gates executed at the edges of the 1D array (z = 1, N z ), universal quantum computation is possible within this 1D array. Their basic idea uses the global gate T = C Z ·H to propagate information in a controlled way within the 1D array. T N z +1 is equivalent to executing a spatial reflection of the information stored in the 1D sub-array along the z-direction, T N z +1 : ρ (1,1,z) → ρ (1,1,N +1−z) , where ρ is the density matrix of the array. By executing Clifford and non-Clifford gates at the boundaries of the 1D array and between the T -pulses, it was shown that universal quantum computing is possible. See [9] for details. The problem with global control is its apparent incompatibility with fault tolerance. The use of global pulses, usually in the form of nearest-neighbor gates, leads to two serious obstacles towards implementing FT QEC: (i) traditional global control techniques may give rise to correlated errors in a codeword, which although not necessarily lethal for fault tolerance, are known to reduce the error correcting capabilities of a code [11] and, perhaps the most relevant, (ii) the uncontrolled propagation of errors to multiple locations, e.g. due to the interaction of a faulty qubit with others.
Although some evidence for the existence of an FT threshold using global addressing exists [14], no such threshold has been calculated to date. In any case, as global control uses many global operations to implement a single logical gate, globally simulating traditional EC circuits can only lead to a worse threshold.
Rather than propose a fully globally addressed architecture, we here propose to use a hybrid strategy where, while the addressing requirements are reduced, fault tolerance is still possible. The central concept of our hybrid design is to trap any possible correlated error so that it would not affect more than one qubit in every logical qubit (or more than one depending on the QEC code we are using; for simplicity, we will assume in this paper distance 3 codes). More importantly, we restrict the direction in which errors can propagate, ensuring that correlated errors propagate within separate codewords. To arrange for this, we consider a 3D cubical array of physical qubits, where each x y-plane contains a logical qubit encoded in a CSS code, e.g. Bacon Shor (BS) code [17], Steane code [15], etc. The encoded T gate is then a set of vertical, nearest-neighbor gates in the z-direction that, given the choice of code, are transversal, i.e. bitwise, whereT (x,y) = z C Z (x,y,z),(x,y,z+1) H (x,y,z) . We note that a single faulty physicalT (x,y) , i.e. any combination of errors in the gates composingT (x,y) , can generate a correlated error, but such error will only affect one qubit in every plane, i.e. in every logical/encoded qubit. Furthermore, 6 to avoid simultaneous C Z gates targeting or controlling more than one logical plane, we execute T in two steps to get a FT version, T , where (x,y,2n+1) . This would require an ABAB plane addressability where we are able to execute AB and BA nearest (plane) neighbor operations. Moreover, if we use the BS code, the CZ is not completely transversal, but requires an extra π/2 physical rotation of one of alternate (x-y) planes in the 3D array. To avoid this, particularly for the BS code, we benefit from the ABAB-plane addressability: (i) even planes will be encoded in the BS code, while odd planes are encoded in a π/2 rotated BS code and (ii) even and odd planes will have independent ECs according to the standard or rotated BS encoding. Other CSS codes, such as the Steane code, would not have this issue as the CZ gates are transversal in this code, but still require the ABAB addressability for the execution of the FT T .

Quantum error correction without measurements (unitary quantum error correction)
We now consider how to execute QEC routines given the limited in-plane addressability. Our restriction to columnar semi-global operations forbids the execution of parallel individual measurements along the z-direction. This would indicate that parallel, measurement-aided, EC and thus fault tolerance is not possible. The straightforward solution is to remove measurements from quantum error correcting routines and use suitably designed circuits instead. In [7], we introduced such unitary error correcting routines for the BS code (figure 2(c)) based on the so-called M-gate: a majority voting gadget (figure 2(a)). Although the specific design we use here for the M-gate is tailored for the BS code, we stress the measurement-free scheme below applies independently of this choice of QEC code (see supplementary material in [7]). Armed with the appropriate unitary quantum error correction (UQEC) gadgets and certain transversality properties (which we outline below), the semi-global scheme can be applied to a variety of CSS [15] codes.
The EC scheme used here is based on two routines: an EC routine for the QR code, M, and a mapping between CSS and QR codes, using a gadget we dub the N -gate. The BS code is composed of X -and Z -base quantum repetition (QR) codes and is defined by the following stabilizer set on a 2D array of 3 × 3 = 9 qubits, For this code, the logical Pauli operators are given by ,i modulo stabilizer operations and gauge operations, i.e. X L (Z L ) acts on a column (row) of the array. This code is a subsystem code and is invariant under pairs of X (Z ) operators along any given row (column) because they act only on gauge degrees of freedom, i.e. gauge operations. Given the subsystem structure of the code, one is able to correct acting on only one row (for X -errors) and on only one column (for Z -errors). An encoded M-gate provides a way of executing BS EC. Here is where the N -gate comes into play: as an interface between the BS and QR codes. Let us explain how the UQEC protocol operates. For illustration, let us assume that we are The M-gate. An X-encoded majority voting gadget of (k + 1)-level of concatenation. Here all CNOTs are bitwise, i.e. each CNOT depicted corresponds to three CNOT(k), and subscript R corresponds to a cyclic k-encoded rotation of the targets of the corresponding gate. In the QR code, the TOFFOLI gate depicted is bitwise. The M-gate can also be designed for a Z -encoded quantum majority voting, with | + (k) ancillas and the obvious Hadamard conjugation of gates. When the need to distinguish them arises, we shall denote X and Z encoded majority votings by M (X ) and M (Z ) , respectively. (b) A subroutine acting on ancillas for processing error syndrome information extracted from the data. The circuit shows one row, , of the fully contracted exRec V N (k) representing a collection of k-level protected gates acting on row i of ancillas which take part in an EC(k + 1) step. Note that in this circuit the output top lines are discarded, so no EC gadget must protect them. With this, the exRec corresponding to N at degree . In our circuits, G(k) denotes the implementation of gate G, in terms of (k − 1)-level gates, without the prepending or appending EC(k). (c) Full EC gadget for the BS code. The orange and pink boxes represent the syndrome extraction stage. Here, a TOFFOLI with controls is a Z -TOFFOLI; is a set of transversal CNOTs, CX (±) (t,i, j) . The control of the gates in boxes is always the top input of the gate. The W gate is a wait (identity) gate, and the last gate on the upper and the lower half is a transversal bTOFFOLI. 8 executing the X -correction stage, i.e. the lower part of figure 2(c). We can extract the syndrome information using BS encoded gates at any level of concatenation; then we use the N -gate to take a BS encoded syndrome to a QR encoding. This is possible because the syndrome used in a particular EC stage only needs protection against one type of error, e.g. during the X -stage, the syndrome must be protected against X errors and not Z errors. After this step, we have three syndrome strings, one for each column of the encoded 3 × 3 array, (s1, s2, s3). If we tried to correct every column, we would destroy the gauge-freedom available for the BS code, so in order to maintain gauge freedom we use these strings to vote into a fourth s 4 = s 1 ⊕ s 2 ⊕ s 3 , which will control the final correction. This step is achieved through the V N routine described in figure 2(b). Note that the s 4 string is encoded in the QR code, so it may seem that we would not be able to execute the correction on a BS encoded state. However, the interplay between CSS and QR codes is very interesting: a CNOT gate and, more importantly, a TOFFOLI gate can be executed using QR encoded controls and can target a subset of the CSS encoded state, e.g. targeting one column of the BS encoded state. To actually execute the s 4 correction, we just copy it with a cyclic rotation in order to execute the correction through a bitwise TOFFOLI gate. To see how the voting respects the gauge freedom, consider the following scenarios: a single error, in e.g. column 1, leads to s 4 = s 1 , which would correctly execute the correction by virtue of the gauge freedom; on the other hand, a gauge-like operation, two X -errors in the same row, leads to s 4 = s ⊕ s = 0, which correctly implies an identity correction operation. An analogous analysis holds for the Z -EC. This completes our schematic description of the BS UQEC gadget.
To complete a UQEC scheme capable of achieving fault tolerance, we must provide some fresh ancillas in every (x-y)-plane. Fresh ancillas can be generated via a semi-global reset, which in contrast with measurement does not need to output a result and thus imposes no addressability constraints. Note that in virtue of the (x, y) column addressability, one can reset the same group of ancillas in every plane. A noisy version of this operation that fails with probabilityp ( p) can be modeled by where ρ is an arbitrary density (of a qubit in each x-y-plane) matrix ideally mapped to |0 0| state or to an arbitrary density matrix η with probabilityp ( p) . If we require an even lower error rate, then we can further use an algorithmic cooling protocol [7] in parallel in order to distill colder |0 states and effectively reduce it,p ( p) → p ( p) <p ( p) . With this fresh ancilla, we can also unitarily prepare the k-level BS encoded states, |0 (k) L and |+ (k) L , the k-level QR encoded states, | 0 (k) and | + (k) , needed for EC at every level of concatenation, using the following routines: • Preparation of | 0 (k) and | + (k) . Using three copies of |0 , we execute an M (X ) gate to ensure one has a | 0 . In reality, the M gate is taking the role of the EC gadget. This routine can be concatenated any number of times to get | 0 (k) . To get a Z -encoded QR code, we instead start with three copies of |+ and execute an M (Z ) gate. So this allows us to continuously execute massively parallel unitary FT QEC on all logical planes which provides an FT memory.

Error considerations
So far, we have described a way of executing simultaneous UQEC in every plane of the array. However, to truly achieve fault tolerance, we must have a consistent error model that takes into account that the vertical pulses may introduce correlated errors. We are considering every vertical pulse as a single error location: each possibly faulty vertical addressing pulse admits any error, correlated or uncorrelated, in any of the qubits it addresses. The semi-global character of our design demands also that errors induced by an s-qubit semi-global operation addressing columns {(x 1 , y 1 ), . . . , (x s , y s )} are independent of errors induced by an s -qubit semi-global operation addressing columns {(x 1 , y 1 ), . . . , (x s , y s )}. Thus, in practice, we have an adversarial (local) stochastic error model in every plane (in 2D).
When we extend our dimensionality, every s-qubit gate addresses s strings of N z qubits. Let us now assume that our semi-global columnar controls are not applied homogeneously along the individual columns and model an inhomogeneous version of a gate U , acting on some column, asŨ = U z e iθ z H z , where e iθ z H z accounts for a potential inhomogeneity of the pulse generated by H z , and H z indicates that each location z of the column may undergo a different undesired rotation. Clearly a gate can fail in other ways due to the coupling of the qubits with their environment, but in general we can parameterize such an error by some parameter θ z . So we can model an error in a semi-global s-qubit gate as the error implied by the stochastic error model on an N z × s qubit columnar gate. For example, a single qubit semi-global gate can be modeled as an N z -qubit physical gate where we assume that any type of error is possible: single-qubit errors, unwanted correlations arising among the N z qubits, etc. Now the condition for fault tolerance, under the stochastic error model, is translated into a constraint for every z, i.e. that θ z is below some threshold ∀z, i.e. in every plane. Note that these types of errors are potentially correlated. However, we note that these correlations are spatially bounded, i.e. in the case of an s-qubit semi-global gate, they are always confined to s-qubits in every logical plane, which makes our model effectively a 2D local stochastic adversarial error model. We next extend these error considerations to the vertical C Z (o−e) and C Z (e−o) operations we use to execute universal quantum computation, and we assume that an error in T (x,y) is independent of an error in T (x ,y ) .
As should be expected, our concept to encode separate logical qubits in separate x-y-planes to protect them from correlated errors will eventually fail if the error rates are high enough. When enough errors accumulate such that one qubit at the highest level of concatenation is compromised, all logical qubits at this level of concatenation in the register are compromised since they will eventually couple to it via the spatial reflection protocol, and the computation fails. This may seem a deal breaker, but this is no different from what happens in individually addressed fault tolerance. To show this, let us consider a quantum computer running a specific circuit. If at some point enough errors accumulate such that one of the logical qubits at the highest concatenation level is compromised, then whatever circuit you operate with this logical qubit will be faulty because of error propagation. Thus, losing one or all logical qubits at the highest level of concatenation to accumulated errors is equivalent.
Additionally, since we plan to use the semi-global control scheme to perform a computation, every logical gate (at the highest concatenation level) is a sequence composed of at least 4(N z + 1) T pulses plus non-Clifford gates executed at the ends (see the next section for details). Thus, the computational size of any given circuit we want to simulate is increased by a factor ∼4(N z ), i.e. its complexity is increased by one order in N z , as compared to the circuit with full addressability. This, as we will see below, has consequences, not in the threshold value, but on the degree of concatenation required to achieve some fixed accuracy in the outcome of a given quantum circuit to be simulated.

Bulk fault-tolerant universal quantum computing
The previous section described how to execute FT QEC using semi-global control and a compatible error model. Using the scheme in [9], we can achieve universal QC, if we can execute non-Clifford gates at the z end-planes of our 3D array (z = 1 and z = N z ). So, under the error considerations and using the UQEC tools described above, if we can execute encoded FT Clifford operations at the boundaries, we will achieve bulk semi-global FTUQC via T pulse sequences. Note that after each of the stages in the T steps (see (3)), we are able to execute UQEC. We now show how to achieve the execution of the non-Clifford gates at the end z-planes. Recall that we allowed the execution of any operation at the boundaries of the array, including measurement; thus we can execute the following routines that will yield the sought FTUQC: (i) Z 1/4 (non-Clifford) gate aided by encoded measurements. To execute a Z 1/4 gate, we use the circuit in figure 3(b). This circuit assumes the ability to prepare a magic state ancilla, |H L = (|0 L + e iπ/4 |1 L )/ √ 2, at the highest level of concatenation with an error rate of the same order of the gates as such a level of concatenation. To do so, we use a twostep routine. First we encode a physical |H state and use the encoder circuit (described below in (iii)) to get |H L . This encoded ancilla will typically have an error rate higher than the physical error rate and thus is not yet useful. However, it can be shown [7] that if the effective error rate of such preparation is below p H -ancs = 14.6%, then they can be used as a resource for magic state distillation (MSD) [16]. The MSD protocol, composed exclusively of Clifford operations, has two requirements: (A) Clifford operations are perfect, an approximation justified because we are executing encoded gates at the highest level of concatenation L and thus Clifford gates can be made arbitrarily noiseless provided we choose L large enough, and (B) a source of noisy copies of |H L ancillas with an error rate below sin 2 (π/8) ∼ 14.6%. (ii) Z 1/2 (Clifford) gate aided by encoded measurements. Clifford operations can be generated by the gate set {CNOT, H, Z 1/2 }. CNOT and H are transversal, and the Z 1/2 gate can be implemented using the circuit in figure 3(a), provided that one can prepare a logical ancilla in | ± i L = (|0 L ± i|1 L )/ √ 2. Since the Z 1/2 gate is not part of the EC routines, it is only needed at the highest level of concatenation. There are two ways of implementing it: (A) since it is the only complex gate, it can be shown that by always using the same logical ancilla prepared in |0 L = 1/ √ 2(| + i L + | − i L ) to activate the circuit in figure 3(a), the entire quantum computation splits into two non-interfering paths (evolution by U comp and U * comp ) and the measurements of real, Hermitian operators at the end have the same expectation values as for evolution by U comp alone [12]; or, alternatively, (b) one can use the distillation circuit in [13] at the highest level provided that one can prepare noisy | + i L with an error rate below p (i-anc) = 1/2. (iii) Encoding of arbitrary ancillas. Both (i) and (ii) use the preparation of particular encoded states, but they do not require them to have an extremely low error rate. So we will now show a routine to encode any desired state to any level of concatenation; we showed in [7] and summarize in section (4) that the error rate of the resulting encoded state p anc is below the required values p (H -anc) and p (i-anc) and thus can be used as a resource in the routines (i) and (ii) described above. Thus, to encode an arbitrary state, we use the following algorithm: (A) we start with the level-0 state |φ we want to encode and 8 |0 states and then (B) we use CNOT gates, including waiting times such that never in one step does one qubit interact with more than one qubit, to create the state | φ 3×3 = a| 0 3×3 + b| 1 3×3 . Finally, (C) we execute an M (Z ) gate in every row, to create the state |φ L = a|0 L + b|1 L . We can recursively use the same algorithm to create the state at any level of concatenation k.
Thus, combining the above elements (i)-(iii), we can execute any Clifford and non-Clifford encoded operation in the boundaries. This, in addition to the encoded T , yields a method for executing FT semi-global universal quantum computation.
(iv) The last element of our scheme is the readout stage: to extract the result of a computation. The challenge is to perform the readout without violating the addressability constraints.
To do so, we use the above described semi-global FTUQC to execute SWAP gates at the highest level of concatenation to move any (x-y) logical plane to one of the end-planes (z = 1, N z ) where measurements are allowed. We can repeat this process until all of the desired logical qubits are read out.

Restriction to two-dimensional (2D) arrays
The tools presented above were based on a 3D array of qubits. Despite being natural for many physical systems, it is not even possible for many others. For these latter systems, a 2D architecture would be more desirable and we now discuss such a structure. We can start by reducing the dimensionality of every plane to a 1D linear array such that the 3D array is reduced to a 2D planar array. Although this is, in principle, possible, the question is whether fault tolerance is respected. This depends on the interactions at our disposal in the dimensionally reduced architecture. After the reduction of spatial dimensionality, we now require that we can execute FT EC in every line: this is not possible in general if we restrict ourselves to nearest-neighbor interactions only [18], although see [19] for a special case, but is always possible if we also allow next-to-nearest-neighbor interactions. The inclusion of a nextto-nearest-neighbor interaction can be achieved either via a quasi-1D array [20] or demanding an ABAB addressability structure in every 1D array where we have AA and AB nearest-neighbor interactions. We essentially need to show that a SWAP gate between two qubits containing data/information can be executed in a FT way, i.e. such that one gate error does not generate more than one error in the data qubits. To achieve an FT swap gate, we introduce placeholder qubits, that is, qubits that we require specifically to physically move information around, but which hold no computationally valuable information. To differentiate them from placeholder qubits, we shall label the qubits involved in the computation, i.e. data plus ancilla qubits, by information-holding qubits, i.e. info-qubits. Consider a 1D line of info-qubits, for example encoding a logical qubit, denoted by ρ i , with interspersed placeholder ancilla qubits, denoted by η j , in between every nearest-neighbor pair of info-qubits (e.g. any horizontal line of figure 4). Using nearest-neighbor and next-to-nearestneighbor interactions, we can now SWAP two info-qubits containing relevant information (ρ andρ, respectively), in such a way that a single failure in a SWAP gate does not generate two errors in the info-qubits, through the following routine, where the subscripts in ρ and Figure 5. Decomposition of a physical TOFFOLI gate into two-qubit interactions when one wants to constrain the problem to at most two-qubit interactions only. Note that it requires only three time steps as the first two C X 1/2 s can be executed simultaneously. The last CNOT gate is not necessary as we will typically discard the controls of such TOFFOLI and thus they do not count towards our threshold estimation. These gates are not encoded gates but always physical gates. Step Gate η denote the physical locations in the sample four-qubit chain. This routine executes the FT SWAP gate between site 1 and site 3, since qubits containing the relevant information (ρ and ρ) never interact directly and a SWAP gate does not propagate errors. Since we can reproduce this structure in a longer chain and achieve SWAP gates between any two qubits in a chain, it is possible to execute FT semi-global quantum computation on a 2D array with a reduced number of controls. We must have in mind that, although the threshold value will be reduced because of the need to swap qubits around to execute the desired gates, a threshold value does exist [18]. It will remain true that preparation and measurement can be very noisy, as the argument behind such a result is that gate error rates are sufficiently below the threshold value. Moreover, the advantage in terms of the number of controls is also maintained in this reduced dimensional design.

Bulk semi-global fault-tolerant threshold
Let us now discuss the required error rates for achieving FT semi-global universal quantum computation. Because we are using the tools developed in [7], we have essentially the same threshold analysis.
There we assumed that at the physical level we had at our disposal three qubit gates, in the form of TOFFOLI gates. We showed that armed with three-body interactions (in every plane), one can achieve FTUQC if gates and preparation have error rates below p (p,g)thresh = 3.76 × 10 −5 . Measurements are only required at the boundaries and only at the highest level of concatenation, and thus fault tolerance is possible when measurement error rates are below p (m)thresh = 1/3. If we implement a gate library at the physical level that does not include the TOFFOLI, then we decompose the TOFFOLI into one and two qubit gates, as in figure 5. In this case, the threshold value for gates and preparation becomes p (g,p)threshold = 2.68 × 10 −5 again with measurements as noisy as p (m) = 1/3.
Furthermore, we showed in [7] that provided that the gate error rate ( p (g) ) was sufficiently below p (g)threshold , one could relax the demands on measurement and preparation error rates, ( p (m) ) and ( p (p) ), respectively. Following that argument, and assuming that our gate library 14 contains the TOFFOLI, we find that if p (g) = 1.3 × 10 −6 , p (p) = 1% and p (m) = 33% yields, with e.g. k = 6 degrees of concatenation, an effective error rate at concatenation level k = 6, p (6) ∼ 10 −13 , and an effective error rate for the output of an encoded state, p (6) anc = 8.3 × 10 −3 . Note that p (6) anc , denoting the output error rate of the |H L encoder circuit, is safely below the 1.46 × 10 −1 needed for executing MSD and achieving FTUQC.
This implies that the massively parallel mapping required for refreshing the ancillas in our design can have error ratesp (p) as high as 1%, or even 33% at the expense of demanding even lower values from p (g) (∼10 −6 ). To accommodate more physically realistic interactions, assume only nearest-neighbor interactions in the x y planes; then one can expect a decrease in the threshold. However, this is a characteristic shared by any addressable or non-addressable FT scheme, and previous work has shown that restricting a long-range addressable FT scheme to be nearest-neighbor can decrease the threshold by less than an order of magnitude [21]. Further, we observe that the dynamical decoupling (DD) protection of gates still applies, and such DD protection can be made compatible with our reduced addressability [22]. It follows that the extra demand placed on gate error rates ( p (g) ∼ 10 −5 -10 −6 ) can, in principle, be greatly alleviated by open quantum system control techniques [23].

General strategy
The particular strategy for achieving semi-globally addressed FT QC we presented above may not be unique, so here we want to summarize what are some of the essential requirements of our design. Our scheme relies on two properties: (i) every plane can execute EC in an FT manner and (ii) the z-direction is in charge of the computational aspect of our array via some interaction capable of coupling different planes. We have chosen the T pulse, but in principle other scenarios inspired by other global control strategies could be implemented.
Let us say we have a global control scheme in 1D with some (not necessarily nearestneighbor) interactions T . We consider a 3D array as a 2D collection of 1D arrays, such that at every 1D array we can execute T independently. Every horizontal plane of our 3D array will constitute a logical qubit, encoded in some QEC code with a set of stabilizers {S (z) i }, and the necessary ancillas for its measurement-free EC. This means that our computer will be initially stabilized by ⊗ This fixes a relationship between T and {S (z) i }. We must guarantee that (i) at all times all logical qubits in the computer must be stabilized by some (possibly changing) code {S (z) i }, i.e. T never leaves the information unprotected, and (ii) T can propagate an error in some plane to only one qubit in any number of planes (nearest-neighbor interactions fix the propagation of errors to nearest-neighbor planes). The above requirements will be enough to achieve parallel FT UQEC in all qubits and transport of information. Extra requirements for FT universal computation depend on the type of global control scheme one considers.

Scaling of the number of controls
To see that the semi-global architecture saves on resources, we now investigate the spatial addressing efficiency of our design. We first count the number of controls required for executing a FT semi-global quantum computer architecture and compare it with a fully addressable quantum computer architecture simulating the same quantum circuit to the same overall accuracy. As a first step, we assume the same measurement-free EC routines but with full addressing, since we really want to know whether we gain something using the 3D layout instead of just using a 2D one, and compare the number of controls required (N [uAdd] and N [sg] , respectively). We then proceed to compare the 3D layout to a fully addressable, measurement capable architecture using the same QEC code (which will be labeled by [mAdd]).
Defining the parameters: N C logical/computational qubits encoded using a QEC code that encodes each logical qubit into N EC physical qubits with k degrees of concatenation, and EC gadget using N A encoded ancilla qubits and N B classically encoded ancilla qubits. A naive count shows that the number of controls in the fully addressable measurement-free (labeled [uAdd]) and semi-global measurement-free (labeled [sg]) architectures are given by 2 Using the EC protocols described before, N EC = 9, N A = 18 and N B = 6. From (7), it would seem that N [sg] does not depend on the number of computational qubits, and that N [uAdd] /N [sg] = N C ; however, that is not the case as k [sg] is a function of N C , albeit with a weaker dependence.
To see this, we consider the result of the threshold theorem [4]: given a circuit that we wish to simulate to an accuracy ε, whose size is a polynomial in the number of computational qubits, where p (k) = A( p (k−1) ) 2 is the error probability of an operation at level k of concatenation in terms of (k − 1)-level error rates and A counts the number of pairs of possible (k − 1)-level errors (in the largest exRec defined in [13] as a gate with appended and preprended EC(k) routines), Assuming that the polynomial scaling of the circuit size with the number of computational qubits is with power t, i.e. f [uAdd] In general, because of the discreteness of k, we will have an effective discrete difference between degrees of concatenation k 0 instead of a continuous value for δk. This discrete difference oscillates as the number of computational qubits increases. In any case, we will have that where k = k [sg] − k [uAdd] . So, in general, the semi-global architecture will have a gain ∝ N C in the number of controls, i.e. N [uAdd] /N [sg] > 1. More generally, we determine that there is a significant advantage, i.e. k 1 ⇒ N [uAdd] /N [sg] = N C , whenever log 4 log(β N t−1 C p thresh /ε). As an example, let us consider the Shor factorization algorithm. To factor an N = 768 bit integer would require N C = 2N + 4 = 1540 qubits, and a circuit size of 8000N 4 = 2.8 × 10 15 logical gates, if we want a 97% overall success rate, using p (0) = 10 −6 < p thresh we find that k = 1 and thus we gain a factor of N [uAdd]  We can go a step further and compare a fully addressable model with EC gadgets that admit measurements with our semi-global unitary QEC model. We will consider the gadgets and threshold values for our same QEC code obtained in [13] with a corresponding set of parameters N A = 18, p threshold = 1.2 × 10 −4 . Because such EC gadgets would typically use a different number of ancillas N A < N A and have different threshold values p threshold , we get with The difference with the previous case is that now k is more likely to be 1 and not 0, as follows from the observation that typically k < k [uAdd] . Assuming that we have a physical error rate good enough for both threshold values, p thresh and p thresh , e.g. p (0) = 1 × 10 −6 , we get for Shor's protocol on the strings used in the previous example, k = 1 and N [mAdd]  which implies a slightly smaller gain than in the previous case, but still significant ∼N C ( 1 9 k +3 k −k +1 ×2/3 ). Even compared to gadgets where measurements are allowed, the semi-global architecture yields an O(N C ) gain in terms of the number of controls.

Experimental considerations
In terms of an experimental realization, given a 3D array of qubits, the addressability requirements can be translated into: (i) Massively parallel nearest-neighbor CZ gates in the z-direction: (x,y)T (x,y) .
One candidate technology for quantum computation well suited for semi-global control is neutral atoms trapped in a 3D optical lattice. In this architecture, one has the advantage of massive parallelism but since the lattice spacing is typically of the order of an optical wavelength, it is difficult to accommodate imaging lenses that could resolve individual qubit measurement outcomes in the bulk. To get some idea of the size of a computation that could be realized in such a system, in [24] the authors find that 10 6 133 Cs atoms could be trapped in a 100 × 100 × 100 blue detuned lattice with an achievable single-qubit gate error rate of 10 −5 using Raman-based gates. Here the lattice spacing would be 10 µm and it has already been demonstrated [25] that single 1D tubes of 87 Rb atoms trapped in a 2D lattice can be addressed with better than 1 µm resolution. Furthermore, using an architecture such as the 3D retroreflected lattice used in [27], it is possible to have ABAB-type addressability along 1D or 2D, enabling addressability of every other plane. If we make the reasonable assumption of a single-qubit reset error rate of 10 −5 and the (extremely optimistic) assumption of the same error rate for two qubit gates, then restricting to nearest-neighbor interactions, a computation with ∼10 5 sequential logical gates could be achieved on 100 logical qubits using semi-global control with an overall circuit simulation error of 1/3. This would require three levels of concatenation, which could be accommodated in each 100 × 100 plane with room left over for resettable ancillas to shuttle quantum information. Massively parallel two-qubit CPHASE gates have been realized in optical lattices but with rather low fidelity [26]. The best reported two-qubit gate using a parallel exchange blockade mechanism between neighboring trapped atoms in an optical lattice realized an error for the √ SWAP gate of 0.31 [27], although theory predicts that error rates as low as 1% could be achieved [28]. There are several other proposals for high-fidelity entangling gates in optical lattices, e.g. using fast Rydberg gates [29], but to achieve an error below our threshold would likely require another approach, such as using DD pulses to boost the effective two-qubit gate fidelity [23].
We note that the energy required by a control pulse increases with the number of x-y planes it must control, and this may pose a limit to the size of the computation the architecture may implement. However, if the physical model permits, one could place several logical qubits per plane, i.e. one plane = several tiles of logical qubits, such that a pulse with limited addressing capacity can still be used to build an, in principle, arbitrarily large computer with the tools described in this paper. Universality follows from the fact that we can achieve SWAP gates between tiles within different planes and thus execute any two logical qubit gates.

Conclusions
We have shown that in an N -dimensional (N = 2, 3) qubit array, fault tolerance is achievable when only (N − 1)-dimensional addressability and fixed short-range interactions are available. The scheme has implications on the design of scalable quantum computers and shows an advantage in the number of controls required for manipulating the array of qubits. More specifically, the number of controls required depends only weakly on the number of computational qubits, as opposed to fully addressable designs where they grow linearly, or equivalently we have a gain factor O(N C ) in the number of controls. The design is suitable for 3D optical lattices and for 2D arrays with only nearest and next-to-nearest-neighbor couplings.