Calculable lower bounds on the efficiency of universal sets of quantum gates

Currently available quantum computers, so called Noisy Intermediate-Scale Quantum devices, are characterized by relatively low number of qubits and moderate gate fidelities. In such scenario, the implementation of quantum error correction is impossible and the performance of those devices is quite modest. In particular, the depth of circuits implementable with reasonably high fidelity is limited, and the minimization of circuit depth is required. Such depths depend on the efficiency of the universal set of gates used in computation, and can be bounded using the Solovay–Kitaev theorem. However, it is known that much better, asymptotically tight bounds of the form , can be obtained for specific . Those bounds are controlled by so called spectral gap, denoted . Yet, the computation of is not possible for general and in practice one considers spectral gap at a certain scale r(ϵ) , denoted . This turns out to be sufficient to bound the efficiency of provided that one is interested in a physically feasible case, in which an error ϵ is bounded from below. In this paper we derive lower bounds on and, as a consequence, on the efficiency of universal sets of d-dimensional quantum gates satisfying an additional condition. The condition is naturally met for generic quantum gates, such as e.g. Haar random gates. Our bounds are explicit in the sense that all parameters can be determined by numerical calculations on existing computers, at least for small d. This is in contrast with known lower bounds on which involve parameters with ambiguous values.


Introduction and main results
Universal, scalable and fault-tolerant quantum computers are the holy grail of quantum computing.Such devices require quantum error correction that, due to quantum threshold theorem, can be implemented if the levels of gate errors are small enough [1,2,3].However, recent quantum hardware, so called Noisy Intermediate-Scale Quantum (NISQ) devices, does not offer gate fidelities required for quantum error correction and their performance is heavily affected by gate imperfections [4,5,6].Because of error accumulation effects, the depth of circuits feasible for NISQ devices is very modest.Hence it is imperative to find ways to minimize such depths.One of the ways to address this issue is to focus on the efficiency of universal sets [7,8] of gates S used for the computations.Spectral gap is an useful measure of efficiency of universal sets of quantum gates S ⊂ SU(d).The value of gap for chosen S, denoted gap(S), lies between 0 (no gap) and some optimal value gap opt < 1, depending only on the number of gates |S| [9].The higher the value of gap(S), the better is the upper bound on the minimal length (circuit depth) of a sequence of gates from S required to -approximate any unitary operation from SU(d).Recall that the Solovay-Kitaev theorem [10] provides such a bound for depth , = Ã(S) • log 3+δ (1/ )), δ > 0. (1) However, the existence of gap, i.e. gap(S) > 0, implies that is enough, with the constants A and B proportional to log −1 (1/(1 − gap(S))) [11].In fact, = O(log(1/ ))) is optimal, which can be seen from a simple volumetric argument.One should note that some properties of S with optimal spectral gap are known.For instance, if the gates from the universal set S have algebraic entries then the gap exists [12,13].Moreover, it has been conjectured that any universal S has the gap and there are explicit constructions of examples of S with the optimal spectral gap for SU (2) with |S| = p − 1 for p ≡ 1 mod 4 [14,15].Finally, some commonly used one-qubit universal sets turned out to have the optimal spectral gap [16,17,18,19].However, the construction of many-qubit gates with the optimal spectral gap remains an open problem.
The calculation of gap(S) is challenging and in practice one often considers the gap up to the certain scale r, denoted gap r (S), such that gap(S) is the infimum of gap r (S) over all scales r 1 .Since it is impossible to implement gates without any error, in practice can be bounded from below.In such a case, in order to bound it is sufficient to have the knowledge of gap r (S) at some scale r( ) instead of gap(S).This is due to the existence of the Solovay-Kitaev-like theorems involving gap r (S).Specifically, it is known that for any universal S one can bound ∝ gap −1 r (S) • log(1/ ) at some scale r( ) (see the first part of Lemma 5 in [20] and the improved version with r( ) O(1/ • log(1/ )) -Proposition 2 in [21]).Thus, bounding gap r (S) is imperative.From the seminal paper [20] it is known, in more general setting of semisimple compact connected Lie groups, that there exist group constants c, A and r 0 such that gap r (S) for any r ≥ r 0 .Thus, the knowledge of gap at the certain scale r 0 enables to bound the rate at which gap r (S) vanishes with growing r ≥ r 0 .However it is unclear what is the magnitude of the minimal scale r 0 from which the bound (3) holds, even for SU(2).Our preliminary analysis of this bound suggests that the value of r 0 for SU(2) resulting from the proof is enormousorders of magnitude larger than the scale for which the numerical calculation of gap r 0 (S) is remotely possible.
In this paper we exploit Bourgain's argument for bounding gap r (S) by the diameter of S, which was communicated in the proof of the second part of Lemma 5 from [20].By introducing an additional assumption on S we obtain calculable bounds on gap r (S) for universal sets of quantum gates.Our additional assumption on S is satisfied e.g. for generic quantum gates, such as Haar random gates (with probability 1).The main result of the paper is the following.
where c = log(5)/log(3/2) ≈ 4, α and β are known constants and g t 0 (S) can be determined by the numerical calculations of gaps at a known scale t 0 of certain universal sets that can be derived from S.
The quantity g t 0 (S) is defined in equation ( 137), see also (138) and (96).Crucially, we provide explicit formulas (136) and ( 139), (78) for α = α(d, 0 ), β = β(d) and t 0 = t 0 (d, 0 ), where 0 is the parameter in the construction leading to different bounds.The value of t 0 is small enough to enable numerical calculations of g t 0 (S), at least for d = 2, 3 and 4. Hence, our bounds can be made explicit by numerical experiments for fixed S. We provide examples of specific values of t 0 , α and β, for d = 2, 3 and 4 in Tables 1 and 2. The minimal possible values of t 0 are indicated by bold font and given by where τ ( , d) is defined in (78).We present the values of t 0 up to the ones giving α around 1. The value of α grows quickly with t 0 as can be seen in Fig. 1.Values of β and c do not depend on t 0 .On the other hand, the value of g t 0 (S) can decrease with increasing t 0 .In order to check the behaviour of our bound ( 4) and demonstrate that it can be calculated on existing hardware, we performed a numerical simulation on a supercomputer.For the sake of this simulation, we chose 1000 Haar random sets S for d = 2, each consisting of three gates and their inverses.The computations took approximately two weeks and utilized 1008 CPU cores.We calculated the values of the lower bound for t 0 ranging from 550 to 900 (with increment 10) and plotted the bounds for t from t 0 to 1000.We also calculated the ratio of our bound and the true value of the gap at given t.We present those results averaged over all sets S in Fig. 2. The value of the lower bound looks qualitatively the same as the ratio in Fig. 2 rescaled by a constant.This is because the true value of the gap is practically constant for any chosen S in the inspected range of t.From Fig. 2 it is clear that our bound is far from being tight, at least in a tested range.However, obtained results are not far from our expectations taking into account the generality of our bounds.Moreover, evidently, the lower bounds improve quickly with t 0 , due to the rising value of constant α that dominates possible deterioration of g t 0 (S).In fact, the value of g t 0 (S) is also constant for any S in the inspected range.Needless to say, such improvement cannot continue indefinitely, since the ratio must be at most 1.Unfortunately, we didn't have enough resources to push our simulations further.
The structure of this paper is as follows.In Section 2 we introduce the mathematics used in the paper, such as the averaging operators and their relevant spectral gaps.In Section 3 we provide an alternative proof of the efficiency bound (2) from [11] with A and B proportional to gap −1 (S).In Section 4 we present the proof of our main result, Theorem 1.

Averaging operators and their spectral gaps
By G d we denote the projective unitary group PU(d), which is the quotient of the unitary group U(d) by its center Consider the space L 2 (G d ) of square integrable complex functions on G d with respect to µ, equipped with the standard scalar product •, • (linear on the second slot).Since G d is compact we consider only unitary representations.A group so the regular representation acts on functions by shifts.Regular representation is not irreducible.In fact, due to Peter-Weyl theorem, it decomposes into an orthogonal direct sum of all the irreducible unitary representations (irreps) with multiplicities equal to their dimensions where Λ is the set of highest weights of G d (enumerating all irreps up to isomorphism), V λ is the representation space of irrep π λ with highest weight λ and dimension d λ and hat denotes the closure of an infinite direct sum.Moreover for each λ ∈ Λ is an orthonormal basis of V λ where matrix elements (π λ ) ij are functions in L 2 (G d ) given by for some fixed orthonormal basis of Clearly, sum of all such basis form an orthonormal basis of L 2 (G d ) Hence any function f ∈ V λ , as a linear combination of matrix elements, is given by for some complex d λ × d λ matrix A. The regular representation restricted to functions in V λ is isomorphic to representation π λ .If π is any (possibly reducible) representation of G d , then π is isomorphic to the direct sum of irreps which can be identified with function spaces from Peter-Weyl decomposition , for some k ≥ 1 and multiplicities m i ≥ 1.If m i ≤ d λ i for all 1 ≤ i ≤ k, then representation π will appear as a subrepresentation of L 2 (G).The corresponding space of functions consists of all functions obtained via for all matrices A.
We now comment on how one may naturally choose a scale up to which one would like to consider irreps of G d .
The Lie algebra of G d is isomorphic to su(d) since where Z(SU(d)) Z d , the center of SU(d), is discrete.The adjoint representation Ad of U(d) descents into the quotient group G d forming the adjoint representation Ad of a group G d acting on its representation space su(d) via where where by I we denote the one-dimensional trivial representation.
and applying this reasoning inductively we see that all irreps of (U ⊗ U ) ⊗s appear in (U ⊗ U ) ⊗t for s ≤ t.Thus we see that with t increasing the rep (U ⊗ U ) ⊗t contains more and more irreps of G d and each irrep of G d is contained in this rep for t large enough.In the language of Peter-Weyl theorem the corresponding functions in L 2 (G) are so they are balanced polynomials in U and U of degree t.Thus increasing t corresponds to considering polynomials with higher degrees.This motivates us to consider the following function spaces in where Λ t is the set of unique (i.e.without repetitions and up to isomorphism) highest weights of irreps of G d appearing in (U ⊗ U ) ⊗t .In the case t = 0, we set Additionally we define the following related symbols.The set Λt which equals Λ t without the weight of the trivial representation and the set of all unique highest weights Fortunately the weights Λ t have a nice description in terms of the sequences of integers.
Lemma 2. The set Λ t consists precisely of weights indexed by nonincreasing length d integer sequences λ such that |λ| = 0 and |λ + | ≤ t, where |λ| denotes the sum of entries and λ + is the subsequence of positive entries.
Each sequence λ = (λ 1 , . . ., λ d ) ∈ Λ t corresponds to a weight (the linear functional on the Cartan subalgebra h ⊂ su(d)) where L i are the standard basis elements2 .Since L 1 + . . .+ L d = 0 in h * , adding a constant sequence, (c, . . ., c) for some c ∈ Z, to λ does not change the weight.
Example 1.Consider the system of two qutrits C3 ⊗ C 3 and t = 2.Then, from Lemma 2, we have which is equivalent to and for example λ = (2, 1, 0) corresponds to the highest weight 2L 1 + L 2 i.e. to the adjoint representation.Similarly we can represent Λt as We introduce the following norm on the space of weights of G d It is clear that for each λ ∈ Λ t , From now on we represent each irrep λ by the sequence with smallest ||λ|| 1 .In particular, the trivial representation is given by λ = (0, 0, . . ., 0).By choosing the orthonormal basis of function spaces (11) we have the isomorphisms where We define and analogously we define H ∞ 3 .By H we denote the vector space isomorphic to For any representation of G d and any finite Borel measure ν on G d we define the operator acting on the representation space of π.We use can use (31) to define various averaging operators.By S we denote a finite set of generators of G d and ν S is the counting measure of S on G d .The t-averaging operator wrt to S, T ν S ,t : and can be represented as a block-diagonal matrix.Analogously we define the ∞-averaging operator wrt to S, T ν S ,∞ : Finally, the (global) averaging operator wrt to S, T ν S : H → H is In the language of functions, introduced averaging operators correspond to restrictions of Reg(ν S ) to corresponding function subspaces.We denote such isomorphic averaging operators using the same symbols.For example, the global averaging operator is The justification for the name averaging operator is clear from (36).Indeed, T ν S replaces the function f with the averaged function, whose value at h is the average of the values of f over all translates of h by the elements of S.
Similarly, the t-averaging operator is The subspace H λ 0 corresponds to the subspace of constant functions L 2 0 (G d ) = V λ 0 , with orthogonal compliment being the space of functions with Haar-average zero.Let T µ denote the projector onto H λ 0 .At the level of function spaces, T µ is the projector onto L 2 0 (G d ) which assigns to each function f the constant function with value being the Haar average4 of f , In order to assess how quick the words in S fill the group G d , we compare the averaging operator T ν S with T µ by checking the operator of their difference.Since T ν S H λ 0 = T µ H λ 0 , the norm ||T ν S − T µ || op equals the norm of the operator Similarly, we define Tν S ,t and Tν S ,∞ .Clearly, so we have This motivates us to define the spectral gap of S as The spectral gap is an useful numerical value describing the set S via the properties of the corresponding averaging operator.
Similarly, we define spectral gap of S at scale t as In general we can define analogous gaps for any finite Borel measure ν on G d .For example, where Tν,t is defined as in (32) with ν S substituted by ν.It is clear that and the gaps (42)-(44) belong to [0, 1].
We argue that we can assume that S is symmetric without the loss of generality.For a measure ν on G d we define its conjugate ν via the property for all continuous functions f on G d .We say a measure ν is symmetric if ν = ν.For two measures ν 1 and ν 2 on G d , their convolution ν 1 * ν 2 is a measure on G d defined via Going back to the definition (31) we have It is easy to see that π(ν) = π(ν) * .In particular if ν is symmetric then π(ν) is self-adjoint and hence σ(π(ν)) is real.Note also that ν * ν is automatically symmetric.We can write which means that Finally, because Since S is symmetric, T ν S ,t is Hermitian and so its spectrum σ(T ν S ,t ) is contained in [−1, 1].The same is true for T ν S ,∞ and T ν S .Note that since the subspace H λ 0 is excluded, the question if gap(S) > 0 is non-trivial.The gap exists, i.e. gap(S) > 0, if and only if 1 belongs to the spectrum σ(T ν S ) i.e. it is the accumulation point of σ(T ν S ).In such a case we say that T ν S has a spectral gap.
Let's denote by S a set of words in G d of length built from elements of S S := {g 1 g 2 . . .g | g 1 , g 2 , . . .g ∈ S}. (52) The corresponding averaging operator is T ν S .Indeed, and since ν * ( ) S is the law for S , T ν S is the averaging operator with respect to S .At the level of functions we have Importantly, gap(S) can be interpreted as the exponential rate of convergence of the (global) averaging operator T ν S to T µ in the operator norm with increasing.Indeed, due to left-invariance of Haar measure so we have and using the notion of a spectral gap (42) we have Thus, if the gap exists then T ν S converges to T µ as → ∞ exponentially fast in the operator norm.Moreover, the rate of convergence improves exponentially with gap(S) increasing.This motivates us to study gap(S).
We have hence Using (64) we get e − gap(S) ≥ Vol(Ω), (69) we get a contradiction, which means that S is an -net.On the other hand where C V is some group constant.Thus, with We have dim G d = d 2 − 1 and in the case of G d can put C V = (9.5) The values of constant C V bounding the volume of a ball in various groups can be obtained by techniques from [22].Note that Theorem 3 cannot be stated in analogous form for the t-averaging operators T ν S ,t , since the normalized indicator function (65) does not belong to L 2 t (G d ) for any t so we cannot write (68) for T ν S ,t instead of T ν S .However, by considering appropriate approximations of Dirac delta by polynomials from L 2 t (G d ), we can show that is sufficiently small and hence obtain analogous results.In particular, it is known that where r = D/ 2(d 2 −1)+2 and C, D are some constants [20].This result has been improved in case of U(d) in [21], where for some absolute constant C b and t ≥ 5d 5/2 / • τ ( , d), where τ ( , d) is 4 Calculable lower bound on spectral gap In this section, we derive lower bounds on the spectral gap at scale t for S ⊂ G d , such that any two pairs in S (of gate with its inverse) form an universal set themselves.This condition can be verified numerically by known universality criteria, see e.g.[23].
Our bound for any t can be calculated from the knowledge of certain gaps up to some fixed t 0 = t 0 (d) and is of the form where α, β, c > 0 are some specific calculable constants and g t 0 (S) can be determined numerically by calculating gaps of certain sets derived from S up to some calculable scale t 0 .
We study the action of the t-averaging operator wrt to S, acting on the Hilbert space By S(H λ ) we denote the unit sphere in H λ , We choose the orthonormal basis of H t , induced by the basis (11).Clearly, ||T ν S,t || op ≤ 1 and our goal is to improve this bound.The irreps Λ t of G d can be divided into three disjoint sets, based on the type of the representation of U(d) they come from: where H, R and C stands for quaternionic, real and complex representations.In fact, Λ t,H = ∅ since quaternionic representations of U(d) do not contribute to projective representations. Since we fix any λ ∈ Λt and consider ||π λ (ν S )|| op .
Additionally we assume

Strategy of the proof
Our strategy is to show that for any λ ∈ Λt , any w ∈ S(H λ ) and any generator U m ∈ S, except for at most one, say U k , for some coefficients b 2 i = b 2 i (λ) > 0 which can be bounded by gaps of certain subsets of the set which implies This means that we can obtain a non-trivial lower bound on gap t (S) for any t ≥ t 0 .Crucially, the value of t 0 can be easily determined and is not large, so the numerical calculations of the bound are feasible.The main reasoning Since π λ (g i ) is unitary we have for any w ∈ S(H λ ).Let ı λ denote the Frobenius-Schur indicator of π λ , Note that since the LHS is a self-intertwiner.
Observe that since ||λ|| 1 > 0, for any i, j and p, q we have Hence, for any w ∈ S(H λ ) so for any λ ∈ Λ t , there exists h = h(w) ∈ G d such that Re π λ (h 2 )w, w < ı λ d λ and Note that if λ is quaternionic the bound is even better.We want to connect h 2 with a square of some generator g 2 i , so that the large value of the norm (94) will propagate to the large value of ||(π λ (g 2 i ) − π λ (e))w||.Let and by S 2 j 1 ...jm we denote the set S 2 without elements U 2 j 1 , . . ., U 2 jm (and their inverses), By the assumption, each set S 2 j 1 ...jm is universal for 1 ≤ m ≤ k − 2. Consider any S 2 j 1 ...jm (we allow S 2 as the special case with m = 0).We find an j 1 ...jm -approximation of h 2 in terms of squares generators, namely we write h = g 2 1 g 2 2 . . .g 2 j 1 ...jm , where each g 2 i ∈ S 2 j 1 ...jm , so that D(h 2 , h) < and we specify 1 > j 1 ...jm > 0 later.We have From the unitary invariance of operator norm where h = h−1 h 2 and D(e, h) < j 1 ...jm .Let us fix a maximal torus T ⊂ G d with Lie algebra t ⊂ su(d).We can write h = gtg −1 , where t ∈ T and g ∈ G. Clearly, Let {e iγ 1 , . . ., e iγ d λ } be the spectrum of π λ (t) and {w 1 , . . ., w d λ } be an orthonormal basis of H λ in which π λ (t)w j = e iγ j w j (101) for 1 ≤ j ≤ d λ .By the definition of a real weight we have for some weight µ j of irrep π λ and H = log(t) = diag(iθ 1 , . . ., iθ d ) ∈ t. 5 We assume θ i ∈ (−π, π] for each i.Since D(e, t) < j 1 ...jm we have for each i, so Finally, where C = π/2.Thus, We use triangle inequality to propagate the result into some generator.
. .g 2 )))w|| so there exists i such that where and from the unitary invariance of operator norm Since g 2 i is an element from S 2 j 1 ...jm and there exists i q , where 1 ≤ q ≤ m, such that where is the bound for the worst choice of i 1 , . . ., i m , which we denote S 2 m6 .The set S 2 m has the corresponding m and m via (110).
We proceed as follows.First, we consider above procedure for S 2 and obtain for some U 2 i 1 ∈ S 2 .Next, we repeat the argument for S 2 i 1 and get for some U 2 i 2 ∈ S 2 i 1 .We proceed in this manner until m = k − 2, which gives for some This way we obtain bounds for each pair generators except for one pair {U 2 i k , U −2 i k }, where 1 ≤ i k ≤ k is the remaining index.Thus, using (89), for all i m with m ∈ {1, . . ., k − 1} we have For m−1 1, the good approximation is Hence, Using similar argument by considering only S 2 i 1 ...i k−2 we have the following, weaker bound Indeed, comparing (121) and (122) we have the inequality Moreover, the ratio between LHS and RHS of (123) is i.e. it is ratio between the average of a nonincreasing sequence b 2 0 , b which can be weakened to the following simplified bound We have a trade-off between the contribution of m to the numerator of multiplicative term (the smaller the m the better) and to the diameter (the larger the m the better).

Figure 1 :
Figure 1: The value of α as a function of t 0 for d = 2.

Figure 2 :
Figure 2: Ratio of lower bound (4) to true value of gap as a function of t ∈ [t 0 , 1000].The ratio was averaged over 1000 Haar random sets S with d = 2 and 3 gates on each set together with inverses.Each line corresponds to a lower bound calculated for different value of t 0 ∈ [550, 900] with increment 10.
Because we do not know how diam m (G, S 2 m ) depends on m , in order to proceed we can use Solovay-Kitaev theorem for S 2 m to bounddiam m (G, S 2 m ) ≤ A m • log cwhere c = log(5)/log(3/2) ≈ 4, c s is some constant (c s = d + 2 + O( )), 0,m is the of initial approximation in Solovay-Kitaev algorithm and 0,m is the word length of this approximation.Thus,

Table 1 :
Examples of values of t 0 , α and β for d = 2 (left) and d = 3 (right).The parameter 0 is an element of the construction determining t 0 (along with d).Bold font indicates the choice of the smallest possible t 0 .

Table 2 :
Examples of values of t 0 , α and β for d = 4.The parameter 0 is an element of the construction determining t 0 (along with d).Bold font indicates the choice of the smallest possible t 0 .
Importantly Ad is faithful hence every representation of G d is realized inside Ad ⊗n for n large enough.The defining representation U of U(d) does not descend into a well-defined representation U of G d but U ⊗ U does, where U is the adjoint of U .In fact, so it acts just like T ν S but on a restricted domain of functions.Since T ν S ,t is a sum of |S| left shift operators, normalized by 1/|S|, and due to leftinvariance of Haar measure, each such operator is unitary on L 2 t (G d ), we see that ||T ν S ,t || op ≤ 1, where by || • || op we denote the operator norm.On the other hand T ν S ,t acts trivially on H λ 0 , where λ 0 = (0, 0, . . ., 0) so ||T ν S ,t || op ≥ 1 and hence ||T ν S ,t || op = 1.