PaperThe following article is Open access

Is quantum computing green? An estimate for an energy-efficiency quantum advantage

and

Published 19 January 2023 © 2023 The Author(s). Published by IOP Publishing Ltd
, , Citation Daniel Jaschke and Simone Montangero 2023 Quantum Sci. Technol. 8 025001DOI 10.1088/2058-9565/acae3e

Could you publish open access in this journal at no cost?

Find out if your institution is participating in the transformative agreement with OhioLink to cover unlimited APC’s.Find out how to take advantage

2058-9565/8/2/025001

Abstract

The quantum advantage threshold determines when a quantum processing unit (QPU) is more efficient with respect to classical computing hardware in terms of algorithmic complexity. The 'green' quantum advantage threshold—based on a comparison of energetic efficiency between the two—is going to play a fundamental role in the comparison between quantum and classical hardware. Indeed, its characterization would enable better decisions on energy-saving strategies, e.g. for distributing the workload in hybrid quantum–classical algorithms. Here, we show that the green quantum advantage threshold crucially depends on (a) the quality of the experimental quantum gates and (b) the entanglement generated in the QPU. Indeed, for noisy intermediate-scale quantum hardware and algorithms requiring a moderate amount of entanglement, a classical tensor network emulation can be more energy-efficient at equal final state fidelity than quantum computation. We compute the green quantum advantage threshold for a few paradigmatic examples in terms of algorithms and hardware platforms, and identify algorithms with a power-law decay of singular values of bipartitions—with power-law exponent —as the green quantum advantage threshold in the near future.

Export citation and abstractBibTeXRIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

The exascale computing era is making the energy consumption of high-performance computing (HPC) a crucial aspect of modern facilities: energy-saving optimizations in the computational cluster workflow play a fundamental role [13]. The rise of graphics processing units for HPC applications demonstrated the benefit of specialized hardware accelerating certain tasks, e.g. for machine learning applications [4, 5]. The integration of quantum computers in HPC environments can potentially provide another accelerating hardware serving as quantum processing units (QPUs) [69]. QPUs might be beneficial either for speeding up the overall computation—exploiting the quantum advantage—or drastically reducing the energy consumption of a task, i.e. displaying a green quantum advantage. Both advantages influence the decisions of how to split the workload in a hybrid quantum–classical algorithm or infrastructure and can occur at the same time. The estimation of resource constraints for quantum computing from compilation to running the hardware has already attracted interest, even beyond the noisy intermediate-scale quantum (NISQ) regime for fault-tolerant applications [1012].

The quantum advantage is expected to induce also a green quantum advantage because the beneficial scaling in time of a bounded-error quantum polynomial time (BPQ) problem on a quantum machine versus a nondeterministic polynomial time (NP) problem on a classical computer will dominate with large enough problem sizes [13]. But there remain two questions to be determined: do algorithms exist that have no quantum advantage but nonetheless show a green quantum advantage and to which extent the beneficial scaling of BPQ versus NP is observable with current system sizes. After recent breakthroughs in the QPU development which shifted the border for a quantum advantage in favor of QPUs [14, 15], classical algorithms and tensor network (TN) simulations have followed up on the results and improved the classical simulation problem with novel approaches [16, 17]. The benefit of TN methods is the ability to allow an error comparable in magnitude to the infidelity introduced by QPUs [1820]. As a consequence, classical emulators of quantum circuits based on TN methods serve as a bridge to quantum algorithms [2023]. In agreement with recent results discussing the aspect of entanglement in quantum algorithms [24], the amount of generated entanglement—controllable via the distribution of the entanglement entropy—is another aspect to be considered in addition to the finite fidelity: the two examples studied hereafter underline that the green quantum advantage depends on both aspects.

In the following, we compare the energy consumption of different quantum computing architectures with a classical emulator. The classical numerical simulation methods considered are an evolution of the exact state [25] and TN methods [23, 2630]. TN methods are widely applied to a variety of quantum many-body problems and benefit from their entanglement-based wave-function compression scheme. For pure quantum states, TN methods offer direct access to the bipartite entanglement via singular value decomposition (SVD). These features allow for an efficient simulation for states with a limited amount of entanglement, e.g. for states following the area law of entanglement [31]. Matrix product state (MPS) simulations for quantum circuits have convincingly demonstrated the close connection between the entanglement generated by quantum circuits and the ability to emulate the computation with an MPS in the Google quantum advantage demonstration [7, 14, 18]. Beyond MPS methods obtaining the final state, TN methods relying on an optimized contraction scheme have been introduced and successfully applied to the question of quantum advantage [19].

Our approach allows a direct prediction of the green quantum advantage threshold, i.e. the equality of fidelity and energy point (EFEP) between an MPS and the quantum hardware, via a criterion that can be calculated a priori as a function of the number of qubits, the generated entanglement, and the QPU's gate properties to name the most important parameters. In more detail, the final EFEPs depend on various factors: the classical cost of a single two-qubit gate application on an HPC system, the gate fidelities and energy consumption of the quantum hardware, the circuit depth, and the number of qubits, as well as the assumption of how much entanglement is generated within the algorithm. Improvements in the efficiency per elementary classical operations have been pursued in the last decades [32]; and likewise, methods have been developed that reduce the energy for a pulse sequence driving an elementary quantum gate have been already developed [33]. Although these improvements are highly desirable, they do not change the scaling found hereafter which depends only on the gate fidelities and the entanglement generated during the quantum computation.

Large-scale simulations are valuable to prove or disprove the quantum advantage for particular settings [7, 30], but every single simulation requires substantial HPC resources, and exploring a vast parameter landscape of high dimensionality is unfeasible. Moreover, small changes in the system parameters, such as gate fidelities, quickly add up to a drastic change in the final outcome, thus a fine-grained grid in the parameter landscape is desirable. Thus, here we develop an efficient way of mimicking the MPS simulation, i.e. an mps-oracle, which allows exploring parameter regimes otherwise beyond computational resources or only accessible with substantial computational costs and at a low resolution in the parameter space. We demonstrate the agreement of the exact MPS simulation with the mps-oracle in the relevant regimes for system sizes where the simulations are possible, i.e. the few-qubit case. In conclusion, the developed methodology can be efficiently applied to specific large-scale use-cases: we target NISQ platforms and estimate the possible benefits for common quantum circuits, e.g. the quantum Fourier transformation (QFT) [34].

We estimate the classical computation's energy consumption in section 1. Then, we move on to the energy consumption of a Rydberg simulator, an ion platform, and superconducting qubits (SCs) in section 2, before comparing the two scenarios in different use-cases in section 3. We present our conclusion and outlook in section 4.

1. Classical emulation of quantum computations

The simulation of quantum computing systems via classical hardware can be grouped into different classes: exact state simulations and approximate methods, e.g. via TN methods exploiting the truncation of entanglement via the Schmidt decomposition [21, 23, 35, 36]. These simulations are performed at the digital level, i.e. simulating directly quantum circuits as a sequence of quantum gates, or at the analog level of the computation, that is, solving the Schrödinger equation via careful scheduling of the Hamiltonian terms and control fields driving the qubit dynamics in the hardware. An additional complexity level can be added considering the interaction with the environment, i.e. the full open quantum systems dynamics [37]. In this case, quantum gates turn into quantum channels and the Schrödinger equation into a master equation for open quantum systems, e.g. the Lindblad equation [38]. Here, we do not pursue the path of open quantum systems because the computational effort changes considerably and we point out further considerations in this scenario via either the purification of the system state or alternative strategies: the simulation of the Lindblad equation requires the same amount of resources as the simulation of the pure state with twice as many qubits. Alternatively, the simulations for the Lindblad equation are often performed via the quantum trajectories approach [3941]: an unraveling of the classical probabilities performed via an average over many simulations of pure state dynamics. In conclusion, hereafter we analyze the scaling of the energy cost of exact state and MPS-based digital simulations of pure states.

1.1. Computational cost for exact state evolution and MPSs

We first estimate the energy cost of the exact state evolution of quantum circuits setting the reference for all other cases: as shown below, the number of floating-point (FLOP) operations is easily estimated and we extract the energy consumption of the most energy-efficient clusters from the Green500 list [42]. The cluster efficiency has significantly improved over the past years: top-tier clusters consumed around in 2012 [1], improving from for standard clusters and for energy-efficient designs in 2003 [43]. In the following, we set today's HPC energy-efficient consumption to [42].

The wave function of L qubits in a pure quantum state is represented via 2L FLOP numbers. The most straightforward application of a single-qubit gate, i.e. exploiting tensor notation, is equivalent to a matrix-matrix multiplication of dimension and . These dimensions lead to , which already includes the required multiplication and summation. Similarly, a two-qubit gate scales at , and a generic gate on η qubits scale as . A FLOP refers here to complex-valued arithmetic. Thus, the application of a single one-qubit or two-qubit gate costs approximately or () for a system of 45 qubits, respectively. In general, we obtain for the η-qubit gate

This estimate covers the evolution via quantum gates but disregards the energy needed to compute measurements and non-necessary permutations [44]. Notice that the costs for communication, e.g. if using the message passing interface, in first approximation are included in the energy consumption of the cluster per FLOP as the performance tests for HPC clusters are using parallel applications, e.g. a parallel version of the LINPACK benchmark [45].

We now repeat the previous calculation for an approximate simulation performed by evolving an MPS. The approximation introduced by the MPS approach consists of truncating the lowest singular values of any Schmidt decomposition which separates the system into a bipartition of two chains of qubits of length L − M and M, respectively; the number of remaining singular values is the bond dimension χ. The MPS simulation becomes exact once χ reaches , where d is the Hilbert space dimension of a single site. For each two-qubit gate, the MPS algorithm (i) permutes sites until they are nearest neighbors, (ii) contracts neighboring sites, (iii) contracts the gate, and (iv) splits the tensor related to the two involved qubits via a SVD. Moreover, step (i) contains the sub-steps (a) contract two sites, then (b) split sites with swapped site indices via an SVD. There are L − 1 nearest-neighbors pairs, L − 2 next-nearest neighbors pairs, until we reach one pair of sites being neighbors. Assuming a random distance between the two qubits involved in the gates during a generic algorithm, the average distance is

In conclusion, step (i) is composed on average by FLOP with

where the first term accounts for the contraction and the second for the SVD [46]; we recall that the system size is L, the dimension of the local Hilbert space d, and the bond dimension χ. Moreover, permutations of the memory are neglected as they scale linearly for classical computers and, for simplicity, we assume that the bond dimensions χ are constant throughout the qubit chain. Similar estimates result in the following computational effort for the steps (ii) to (iv):

In conclusion, we obtain a total number of FLOP

The resulting average energy consumption per gate, under the assumption of an energy consumption of , is reported in figure 1(a) as a function of the number of qubits for a set of two different bond dimensions and compared with the exact state simulation. We observe the bounded energy consumption of the MPS simulation in comparison to the exact state simulation: the FLOP count for the MPS increases in a controlled fashion as a function of the system size L for a fixed bond dimension χ and the computational resources are well below the exact state simulation in the limit of large system sizes. When keeping only the terms with the highest-order scaling, the MPS approximately consumes more energy than the exact simulation if

The inequality yields a threshold for the bond dimension of χ ≈ 1500 for 40 qubits and confirms that lower-order terms play only an insignificant role. This estimate is very coarse-grained and an upper bound for the MPS effort [47], but nonetheless, the trends are valid including the fact that an MPS becomes more expensive than the full state. We improve the approach of estimating the effort through an mps-oracle in the next subsection.

Figure 1. Refer to the following caption and surrounding text.

Figure 1. Tools for extracting the equal fidelity and energy points (EFEP). (a) The starting point is the cost of applying a single two-qubit gate for the classical matrix product state emulator (MPS) for different bond dimensions χ, see the label 'MPS(χ)', and the exact state simulation, see label 'Exact'. (b) Instead of actually executing the classical MPS simulation, we refine this approach by sampling the values λi of the Schmidt decomposition according to a postulated distribution, e.g. , after every two-qubit gate. This method allows us to tune the entanglement via different distributions, e.g. with a parameter α. Then, we calculate an error based on the bond dimension χ. The generic circuit with one layer, i.e. m = 1, acts on each pair of nearest neighbors once.

Standard image High-resolution image

1.2. Refined computational cost for MPSs

In the following, we refine the FLOP and fidelity estimate via the introduction of the mps-oracle—without the need of running the full simulation—exploiting one key property of an MPS simulation: the bond dimension grows dynamically over time and the bond dimension determines the numerical cost of the computation. Thus, to obtain an energy consumption estimate, we perform an mps-oracle simulation that avoids evolving the state but—given a specific algorithm—tracks only the bond dimension of each link and the corresponding fidelity. The parameters of the mps-oracle are the number of sites L, the dimension of the local Hilbert space d, the maximally allowed bond dimension , and a distribution of singular values in the Schmidt decomposition λi . The sequence of gates is set by the algorithm of interest and the initial bond dimensions along the MPS are initially determined.

While iterating through the sequence of gates, we consider each gate acting on two nearest neighbors and keep track of the decision if the MPS has to make a truncation based on the bond dimension of the previous step. Therefore, we compare the parameter with the number of singular values yielded by the SVD, i.e. . If this minimum is larger than the parameter , we assume that a truncation takes place according to singular values distributed with leading to a fidelity for the kth gate in the algorithm. Figures 1(b) and 2 highlight these steps, the latter in a minimal example of four qubits and three gates to be applied. As shown in figure 2, no error is introduced when is not reached, i.e. . Combining the fidelities of each step k, we obtain

determining the fidelity of the complete algorithm via the fidelity of its final state. Later, we use the same equation to calculate the fidelity of an algorithm on each quantum hardware and we plug in the fidelity of a single gate as . This cumulative fidelity is a conservative estimate, which represents a worst-case scenario for both MPS and QPU. We use the infidelity to characterize the error.

Figure 2. Refer to the following caption and surrounding text.

Figure 2. Four-qubit example for the mps-oracle. While applying three two-qubit gates to an initial product state, one observes a growing bond dimension of the MPS. While the first two gates in steps 1 and 2 are applied without exceeding the maximal bond dimension, i.e. in this example, the maximal bond dimension is exceeded after the third step and gate, as indicated by the warning sign. The mps-oracle samples from one of the available distributions of singular values λi as indicated by the sketch in the box after the application of the third gate; this information is sufficient to estimate the truncation and reduce the bond dimension again to .

Standard image High-resolution image

Hereafter, we make the following assumptions about the algorithm and on the initial state and explain a few restrictions which come with the mps-oracle:

  • (a)  
    We work with an uncompiled quantum circuit with regard to the MPS connectivity; swaps are automatically executed by the mps-oracle until the two sites of the two-qubit gates are nearest neighbors. Ordering of the two sites matters and is always installed via SWAPs in the mps-oracle, i.e. we neither swap the sites in the gate represented as a tensor nor do we keep track if gates are symmetric and swaps are superfluous.
  • (b)  
    The new bond dimension is obtained via the parameter and the neighboring bond dimensions and . Special properties of states cannot be tracked as there is no knowledge of the state with our approach; for example, the application of a SWAP gate to the GHZ state increases the entanglement predicted by the mps-oracle, despite with equal entanglement before and after the application of the SWAP gate. Another example would be the application of a gate and its inverse. Although the overall entanglement in the system remains constant, the mps-oracle increases the entanglement estimate.
  • (c)  
    The fidelity is computed per SVD as for singular values λi in descending order, i.e. . We propagate the fidelities through the algorithm via equation (7) while assuming one of the following unnormalized distributions for the singular values
    The distributions follow an exponential decay in equation (8), a power-law behavior with the tunable parameter α as shown in equation (9), and the Marchenko–Pastur (MP) distribution for the eigenvalues of random matrices [48], see equation (10). Figure 1(b) represents this step of choosing a distribution when splitting two neighboring sites of an MPS into two: the index i runs from 1 to and is cut to once the MPS exceeds at any position.
  • (d)  
    As typical in most quantum algorithms, we assume in all the following examples that we start in a product state, i.e. χ = 1. Thus, the MPS emulator has a low computational cost, especially at the beginning of the algorithm where the truncation error is small due to starting in a product state.

This mps-oracle can be used to estimate the energy consumption as a function of the target fidelity , while the bond dimension required for the EFEP is evaluated during this process. When used to evaluate the EFEP, the target fidelity is the value achieved by the currently available NISQ quantum hardware.

2. Landscape analysis for quantum hardware

In the following paragraphs, we first explicitly describe our assumptions, e.g. on the measurement statistics, for the Rydberg platform. Then, we keep the same methods and assumptions used for the Rydberg hardware and analyze the trapped ions and superconducting platforms.

We neglect the different requirements in terms of compilation for each platform, where the compilation converts the algorithm into a gate set and connectivity suitable for each QPU. For example, a limited connectivity introduces additional SWAP gates for two-qubit gates unavailable in the connectivity map. This question becomes even more relevant once the hardware allows one to modify the connectivity at run-time with shift operations. We instead take into account the number of gates in a general formulation of the algorithm and avoid this way the problem of finding the best available compiler for each platform.

2.1. Rydberg quantum simulator

The estimate of the energy consumption on a Rydberg quantum simulator for a complete algorithm depends on the energy consumption of each gate , the number of repetitions N required to obtain the measurement statistics, and the number of gates n for the algorithm of interest:

Herein, the number of gates is determined at a later stage after selecting the algorithm. A first approximation of the number of repetitions is N = 1000, which eventually depends on the fidelity and observable. The energy consumed on a per-gate level requires knowledge of the power consumption and the duration of each gate as explained next.

The consumption of each gate is estimated as follows [49, 50]: consumption by the experiment except for the computers, which are taken into account with for approximately ten machines, i.e. in total . Although the pulse duration is around , we consider the idle time in between the gates; thus, the frequency at which we apply gates is a better estimate, i.e. . In conclusion, a conservative estimate is

Thus, we obtain for the complete algorithm a fidelity according to equation (7), where n is the number of gates. The fidelity of an optimally controlled pulse for a two-qubit gate, i.e. a theoretical value obtained through numerical simulations, is [51, 52]. A difference between one-qubit and two-qubit gates is not included; furthermore, we assume the connectivity does not lower the fidelity.

We point out how variables scale to identify possibilities to fine-tune our estimate for the Rydberg hardware. The laser energy scales linearly with the number of qubits, i.e. more powerful lasers are required to scale up the system [53]. The selected Rabi frequency changes the gate time, where the pulse becomes half as long if we double the Rabi frequency; increasing the Rabi frequency by a factor of two induces a four-fold energy consumption of the laser. Thus, from a pure viewpoint of the driving laser, the energy consumption doubles when reducing the gate time from τ to . Although NISQ benefits from lower gate times benefit, the Rabi frequency cannot be arbitrarily increased without introducing other errors. These considerations can contribute to a very fine-tuned estimate adapted for each specific hardware in the future.

Before continuing with the following platforms, we explain that we neglect certain contributions according to table 1 which are not reflected in equation (11): the first step before running an algorithm on the QPU is a compilation step. Compiling requires classical computation and depending on the algorithm and optimization steps of the compiler, the compilation consumes a non-negligible computation time during which one consumes an energy . However, this step is an effort, e.g. a QFT will be precompiled for any platform. Nonetheless, this one-time effort can be considerable [12]. This aspect changes if the algorithm needs to be recompiled for every run due to changing parameters, e.g. for variational quantum eigensolvers. In the case of the Rydberg system, there is also an initialization step loading the atoms into the optical tweezers; this loading step is necessary for each run of the algorithm and consumes an energy . Another contribution neglected is the energy consumed in the startup phase of the experiment, e.g. for cooling down the apparatus or installing a vacuum. This step consumed an energy and has to be repeated after a time , e.g. after maintenance. The fraction of this energy associated with each repetition of an algorithm is then , where contains all steps from loading the atoms into the optical tweezers to the measurements. Comparing the expected of multiple weeks to the duration of one algorithm, the contribution is insignificant. The final step is the measurement: this contribution is not considered either as the measurement follows the same logic as the execution of gates; their power consumption times the duration leads to a contribution which can be added to more detailed models. In addition, a more detailed model can tailor the number of repetitions N to obtain measurement statistics individually for each algorithm. Thus, we suggest for future improved estimates, but beyond the scope of this work, the following form of the energy consumption for a Rydberg-QPU

Table 1. Energy consumption at different steps. We consider the steps compile, startup, loading a trap, running the algorithm, and measurements for sampling N times from the algorithm. We distinguish between negligible contributions of steps reused over a long time (➘), neglected contributions (✗), not applicable contributions (−), and the considered contributions of the actual circuit (). Startup contains for example the process of creating a vacuum or cooling the superconducting hardware. For standard algorithms, a compiled instance of the circuit can be used multiple times; an equal argument applies to the contribution for startup. Loading the trap scales with the number of qubits, but not the number of gates.

StepRepetitionsRydbergIonsSuperconducting
Compile
Startup
Loading trapN
Run algorithmN
MeasurementsN

2.2. Trapped ion simulator

There are different trapped ion simulators available. For example, the setup from Alpine Quantum Technologies has the following specifications [54]: the gate fidelity of a single Mølmer–Sørensen gate is 0.9983 and has a pulse time of s; the single-qubit Clifford gates reach a fidelity of 0.9983; the single-qubit rotations require a pulse time of s; the energy consumption is bounded by kW, which corresponds to the two computing racks. We combine these numbers and insert them into equation (11) leading to the energy cost of this trapped ion platform:

We allow different parameters for single-qubit and two-qubit gates, where n1 and n2 are the number of single-qubit and two-qubit gates, respectively. The total number of gates of the algorithm is . The gate fidelities are comparable to the IonQ platform; IonQ's gate fidelities are on average around and in the one-qubit and two-qubit gate, respectively [55].

2.3. Superconducting quantum computers

The low energy consumption of Google's quantum computer has been already discussed in [7, 8], which state 15 kW and 26 kW power consumption for the complete experiment. [7] also mentions that the energy cost does not scale significantly with the number of qubits.

The pulse duration of the two-qubit gate is of the order of 12 ns [14]. One expects gate fidelities of the order of and for single-qubit and two-qubit gates, respectively [56]. We plug these numbers into equation (11) for the following analysis. We expect similar energy consumption and performance for any other competitive superconducting platform.

3. Green quantum advantage

We first analyze a generic algorithm, then focus on two different specific algorithms, the randomizing circuits used in Google's Sycamore architecture [14] and the QFT: we first analyze the entanglement generated on a small system and then study its scaling on larger system sizes via the mps-oracle.

3.1. Generic algorithm

We continue with the identification of the EFEPs. Hereafter, we tune the MPS-parameters via the mps-oracle such that the fidelity and energy consumption of the classical emulator matches that of the quantum hardware, i.e. as defined in the EFEP. Furthermore, we define one layer of the generic algorithm acting on L qubits as two-qubit gates acting on all nearest-neighbor in a 1-dimensional chain. If the gate is acting on the qubits k and kʹ, one layer of the algorithm for L = 5 is , as sketched in figure 1(b) for a bulk of a system. To tune the circuit depth and number of gates, we apply m of these layers; thus, the total number of gates is . With the odd–even and even–odd pairs, we have a circuit depth of 2 m. We tune the entanglement via α and the power-law distribution in equation (9), where a lower (higher) value for α indicates more (less) entanglement. This choice of a power-law decay is clearly the less trivial and more relevant scenario, as the other two distributions, i.e. exponential and MP behaviors, are clearly highly favorable for the classical and quantum computation, respectively.

Figure 3(a) indicates the two regimes for a fixed gate fidelity of the Rydberg quantum hardware and different system sizes. The parameters of the MPS emulator suggested by the mps-oracle, i.e. the bond dimension χ and the decay coefficient α for the singular values of equation (9), lead to the EFEP. We easily verify two expected trends: firstly, decreasing the system size L extends the regime of the MPS emulator. Secondly, increasing the number of layers m favors quantum hardware, because the MPS has to operate for more steps at a saturated bond dimension. The intriguing outcome of figure 3(a) is the boundary around α = 0.75 which only changes insignificantly for large L. For clearness and to include the necessary approximations and limits of our estimates with respect to the experimental parameters and performances, we highlight also the boundary where each approach consumes a factor of 1000 less energy than its counterpart: as clearly shown, this factor only slightly shifts the boundary without introducing a change in the scaling. For large system sizes, a significant fraction is still not accessible by the classical emulator, i.e. the area with α < 0.6. From this analysis, we propose α = 1 as a safe threshold to run algorithms on an MPS-emulator, e.g. within hybrid quantum–classical algorithms. A fit of the EFEP boundary further supports this hypothesis: we approximate the EFEP curve with . The fit is shown in figure 3(a). An extrapolation yields a range of for system sizes reaching from 80 to 2000 qubits.

Figure 3. Refer to the following caption and surrounding text.

Figure 3. Generic algorithm regimes and distributions. (a) We calculate the EFEPs for different power-law distributions as a function of the system size L. From the viewpoint of the minimal energy consumption, the blue and green regions favor quantum processing units (QPUs) and MPS, respectively. The regimes do not shift significantly with the circuit depth as shown for m layers. i.e. m = 12 (purple circles) and m = 16 (red squares). The curves of the same color as the data points indicate a factor of 1000 difference in the energy consumption of quantum hardware and classical emulators. During the procedure, we vary the bond dimension χ and the entanglement via α to find the curve of EFEPs. Here, each of the m layers acts first on the odd sites and their nearest neighbor and then on the even sites and their nearest neighbor. (b) We calculate the EFEPs for different power-law distributions α as a function of the gate infidelity. The QPU and MPS regimes are in the same color as for (a) and we show the layers m = 12 (purple circles) and m = 24 (red ). (c) We add alternative distributions for the singular values in addition to the power law for L = 64 and conclude that the MP distribution (MP) is extremely expensive as expected, while exponential decays are favorable for classical emulators. Moreover, we compare the exact state (Exact), an MPS at full bond dimension, and the experimental platforms using trapped ions (Ion, experimental fidelities [E]), superconducting qubits (SC), or Rydberg atoms (Rydb, theoretical fidelities [T]).

Standard image High-resolution image

The entropy of a quantum system can be accessible to experiments even when the distribution of singular values is out of reach for measurements [57]. Area laws for entanglement offer a common tool to describe and characterize many-body quantum systems; the area law predicts a growing entropy for a subsystem's density matrix ρ as a function of the system size. The EFEP does not follow an area law of entanglement: table 2 contains data for the EFEP of the generic algorithm for different system sizes, where the power-law decay α and the bond dimension χ are chosen to match the EFEP; the last column contains the entropy and is decreasing with system size. MPSs emulators for larger system sizes compensate for the advantage of the QPU with faster decaying power laws, i.e. α increases. As the bond dimension decreases at the same time, the entropy reduces for larger systems at the EFEP. We point out that the data for L = 24 in table 2 has no decay and therefore generates more entropy even at a lower bond dimension. In summary, table 2 serves as a rule-of-thumb of the EFEP if only the entropy is available.

Table 2. Entropy as a function of the mps-oracle parameters. We run the mps-oracle for different system sizes L to obtain the decay α of the power-law distribution and the bond dimension χ at the EFEP for a generic algorithm. The resulting von-Neumann entropy S of the subsystem for a bipartition of equal sizes serves as a guideline for the MPS vs. QPU decision for experiments where the entropy is available, but not the distribution of singular values. Analog to other results, the entropy shrinks for increasing system sizes and shifts the EFEP in favor of QPUs for larger L.

System size L Decay α Bond dimension χ Entropy S
24 39358.28
320.55240216.16
480.68171373.88
640.70159253.60
800.72151773.48

For the next generations of quantum hardware, we are interested in how much an improvement of the fidelity shifts the border in favor of quantum hardware, see figure 3(b). Starting at a single gate fidelity of approximately 10−3 and the boundary at α ≈ 0.7, the boundary shifts towards α ≈ 1.3 for a single gate fidelity around 10−7 for a system size of L = 64 at a fixed number of layers, i.e. m = 24. This shift is significant, e.g. we have the two following sets of parameters at the EFEP with and . We observe that the bond dimension stays almost equal as the energy consumption of the hardware does not change in this scenario. As the fidelity of the quantum hardware improves, the MPS has to capture the entropy S more precisely, which leads to a faster-decaying distribution of singular values and, thus, we remain with much less entropy. A numerical fit delivers an estimate of the infidelity of a single gate .

The generic algorithm provides a lot of degrees of freedom in terms of tuning the generated entanglement, the system size, and the gate fidelity on a QPU. In contrast, figure 3(c) fixes one specific system size of L = 64 to compare the three different quantum platforms to the exact state, the exact MPS at full bond dimension, and the different distributions via the mps-oracle. The data underlines our previous statement that the MP distribution is extremely costly on a classical emulator independent of the target fidelity. In the other extreme, exponential and fast decaying power-law distributions are favorable for an MPS scenario. The difference in the energy costs of each hardware platform originates in the time to apply a single gate, which differs by almost five orders of magnitudes between the platforms. In contrast, the power consumption is within a factor of three. The infidelity of the QPU platforms has to considered with regard to the conservative estimate of the fidelity of the final state according to equation (7); quantum error correcting codes that require a ten-times or a hundred-times overhead in the number of gates will change the fidelity significantly while an additional two orders of magnitude for the quantum error correcting operations will not change the energy on the y-scale significantly.

3.2. Randomizing circuits

We now examine the randomizing circuits used in Google's Sycamore architecture [14]. The quantum circuit acting on 53 contains a single-qubit gate on each qubit followed by 20 to 24 two-qubit gates, which are selected randomly out of four subgroups with compatible connectivity. The complete sequence contains repetitions of these gates. The same sequence is also used in the experiment in [58]. For the mps-oracle, the algorithm is not optimized beforehand via a transpiler for the connectivity of a one-dimensional chain; the MPS sites are ordered in a zig-zag scheme. The zig-zag scheme maps a two-dimensional grid to a one-dimensional system by walking along each row of the matrix always starting at the first column and moving towards the last column. With a fixed algorithm at hand, we first answer the question of which distribution of singular values we choose for the randomizing circuits. We compare the exact simulation with the distributions from equations (8) and (10) with twenty layers of the random gates in figure 4(a); we reduce the number of qubits to L = 23 to perform fast exact state simulations and compare the actual singular values at time τ, i.e. the end of the evolution. According to the data at the end of the gate sequence, we have to consider a MP distribution for the singular values. Additional data points at and demonstrate how the entanglement grows over time and remains below the MP distribution at the intermediate points. Figure 4(b) compares the fidelity calculated from the final states of an actual MPS simulation [36] between and with the fidelity predicted by the mps-oracle. We define the bond dimension . The data indicates two regimes, i.e. the first regime where a high truncation due to the MP distribution leads to a final fidelity close to 1.0 despite the fact of covering a considerable fraction of . The second regime of low infidelities is especially of interest for future hardware with increasing fidelities and the mps-oracle provides us with a reasonable prediction. A more sophisticated error model is beyond the scope of this work.

Figure 4. Refer to the following caption and surrounding text.

Figure 4. Random gates circuit. We analyze the sequence of random gates used for Google's Sycamore platform. The number of layers is . (a) The singular values for a reduced version of the Sycamore architecture with 23 qubits indicate which distribution of singular values we should expect for this algorithm: the Marchenko-Pastur (MP) distribution is the closest for the given options. The exponential decay (exp) and the power law (pow3) are far away from the true singular values. A power law (pow1) still underestimates the singular values. In addition to the final distribution of the exact state simulation at time τ (Exact(τ)), we show the distribution after a quarter and a half of the evolution. (b) For low bond dimensions, we compare the estimate of the mps-oracle (oracle) with regard to the fidelity with the actual MPS simulation. The estimate of the mps-oracle yields a reasonable estimate. The large truncation error even for high bond dimensions is related to the Marchenko-Pastur distribution for the singular values. (c) The energy consumption of random gates used for Google's Sycamore platform in as a function of the infidelity yields a clear separation in energy. The choice of the MP distribution for the singular values in a MPS has a significant effect on the FLOP count; here, we observe that the MPS emulator becomes unfavorable due the high entanglement within this approach. The reference points are the Rydberg system (Rydb.) with the theoretical ([T]) fidelity from the optimal control simulations, the upper bound of energy consumption for trapped ions (Ion) with experimental ([E]) gate fidelities, the superconducting qubits (SC), and the complete state vector, see the label 'Exact'.

Standard image High-resolution image

Figure 4(c) captures the effects of the MP distributions for the singular values with L = 53, which needs a bond dimension almost at saturation of . Recall that the estimate for the Rydberg hardware uses theoretical predictions for the gate fidelity. We emphasize that the ion data points and the results for the SCs refer to experimental fidelities and therefore cannot be compared to the Rydberg platform on equal footing. On the Rydberg hardware, experimental gate fidelities of have been reached [59]. In summary, the data underlines why this experiment has been considered to prove quantum advantage due to the amount of entanglement present in the system.

3.3. QFT

Now, we turn to the second example and investigate the QFT algorithm. We consider repetitions of the algorithm which allows us to tune the generated entanglement while starting in a product state. A single QFT executed on a product state does not generate enough entanglement for a meaningful comparison; we can tune the number of QFTs more easily than the entanglement in the initial state. In the following, we set the number of layers to , where each layer contains the complete QFT algorithm.

We evaluate in figure 7 in appendix B which distribution from equations (8)–(10) represents best the QFT algorithm and choose the power-law distribution. Unlike the randomizing circuit and its MP distribution, the power-law distribution does not match the singular values from the QFT as well. The power-law distribution with a coefficient of α = 3 leads to a regime where the MPS is more competitive than for the MP distribution, i.e. we expect bond dimensions χ below .

We compare across different system sizes from L = 16 to L = 64 in the example of the QFT; the comparisons between the mps-oracle and quantum hardware are shown in figure 5. We plot the energy consumption of the TN emulator as a function of the fidelity. Of interest is the order of magnitude in terms of the energy consumption, e.g. the vertical difference between the Rydberg platform data points and the data points for the mps-oracle using the power-law decay. On the horizontal axis, the improvement in fidelity for the complete sequence of gates is the horizontal difference to the closest data point of the mps-oracle prediction. For comparison, we show the Rydberg reference points when using a theoretical ([T]) gate fidelity of , the trapped ion upper bound with experimental ([E]) fidelity, the SCs with experimental fidelities, and the exact state vector. The exact state vector has by default an infidelity of zero and is therefore visualized as a constant line. The second point of reference and constant line with fidelity is the MPS at a bond dimension where the MPS becomes exact.

Figure 5. Refer to the following caption and surrounding text.

Figure 5. Quantum Fourier transformation. We apply multiple iterations of the quantum Fourier transformation and compare the predicted energy consumption by classical emulators and the Rydberg platform (Rydb., theoretical gate fidelity), trapped ions (Ion, experimental gate fidelity [E]), and superconducting qubits (SCs). We consider three different system sizes: (a) For L = 16 sites, the emulator via MPS or the exact states is favorable in comparison to the quantum hardware, where the singular values decay as a power law for the QFT. (b) For L = 32, the quantum hardware becomes more energy-efficient than the complete state. (c) For L = 64, the range of energies is now beyond twenty orders of magnitude. The relative difference between the mps-oracle with a power-law distribution and the quantum hardware decreases slightly.

Standard image High-resolution image

We recognize different regimes in the different system sizes. As expected, small systems favor classical emulators and figure 5(a) for L = 16 represents the first regime, where the quantum hardware has a higher predicted energy consumption in comparison to any emulated simulation. In fact, even the worst-case scenario of an MPS at full bond dimension is slightly better than the quantum platforms. The method of choice is either the exact state or the MPS depending on the target fidelity of the full algorithm. For L = 32, the situation changes and the quantum hardware is from an energy standpoint distributed around the exact state method depending on the platform. The scenario with L = 64 changes the situation: both emulator and quantum hardware are orders of magnitudes cheaper than the exact state method. The ration of slightly decreases, which is not visible in figure 5 covering several orders of magnitude in energy.

Moreover, we predict with figure 5(c) that the Rydberg platform amongst others has to improve by several orders of magnitude with regard to the energy consumption and one order of magnitude in the gate fidelity to outperform the TN simulation for the QFT algorithm. The singular values in the simulation follow a power law with a cubic decay, which is favorable to TN simulations.

As a side remark for the QFT, we point out that the algorithm is rarely run on its own, e.g. the QFT is used within Shor's algorithm. If executed after a highly entangling part of the circuit, the computational scaling of the QFT also increases.

4. Conclusion

We have compared the energy consumption of quantum hardware with a classical emulator of the quantum algorithm. We address the problem from the viewpoint of reaching the EFEP of the two platforms. Thus, the fidelity of a single gate is crucial for both, the quantum computer and the MPS emulator. Low gate fidelities accumulate to a large error; the distribution of singular values is the key parameter on any MPS emulator. The mps-oracle enables us to predict the computational effort of quantum circuits emulators running on HPC or as a decision-making rule for possible applications in hybrid quantum–classical algorithms and architectures.

The randomizing circuit generates an amount of entanglement which makes the quantum hardware favorable; the main reason for this characteristic is the singular value distribution of the Schmidt values according to the MP distribution. In contrast, if the singular values follow a power-law distribution, TN simulations via MPSs are competitive for values of α greater than 0.75 to 1.0 depending on the fidelity of a single gate. The latter case of a power-law distribution appears for example in an isolated QFT. We provide fits for the EFEP curve, which underline the aforementioned range, where, in our example, the value of α = 1.0 is reached at around 2000 qubits according to the extrapolation. The fit of the EFEP curve as a function of the single-gate fidelity predicts an α ≈ 1.5 once the fidelity for a single-gate reaches a 10−7 level.

The presented methodology evaluates the relative energy consumption of TN emulators and quantum hardware. The following open questions lead to a more detailed picture while adding more specific details: we stress that quantum computers can also improve their energy consumption by performing the gate sequence with the same fidelity and power consumption in less time, e.g. by performing gates in parallel or combining gates leading to a lower pulse time [60]; one can go beyond the fidelity point of view and compare the measurement statistics of classical and quantum simulations. The application to adiabatic quantum computing represents an additional direction, but the error calculations on the classical simulation side and the quench on the quantum hardware have to be taken into account for the mps-oracle. Another question is whether open systems should be included in the considerations, which increase the computational cost of the classical simulation; in the scenario of a classical emulator, the absence of decoherence might be favorable. Finally, one can move to a more and more detailed picture where, in the end, our work and its perspective on the energy consumption might directly influence the future development of hybrid quantum–classical applications.

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Acknowledgment

We gratefully appreciate the contributions from Florian Meinert and Tilman Pfau on the energy consumption of the Rydberg platform. We thank Marco Ballarin, Phila Rembold, and Pietro Silvi for useful discussions and feedback.

We acknowledge funding from the German Federal Ministry of Education and Research (BMBF) under the funding program quantum technologies − from basic research to market − with the grant QRydDemo, the Italian PRIN 2017, the EU-QuantERA projects QuantHEP and T-NISQ, the EU project TEXTAROSSA, the WCRI-Quantum Computing and Simulation Center of Padova University, and the Italian National Centre on HPC, Big Data and Quantum Computing.

Appendix A: Marchenko-Pastur distribution for the Schmidt decomposition

We use the Marchenko-Pastur distribution [48] to generate singular values according to random quantum states because the distribution is the most precise way to generate samples of singular values in comparison to the other approaches in the case of the Sycamore random gate sequence. As the Marchenko-Pastur distribution describes the eigenvalue distribution of a matrix in which the entries are distributed randomly according to a Gaussian distribution with zero mean, we have to make additional adaptions. We start with real-valued wave functions and generate a vector r of random numbers with . We choose , which leads to a normalized wave function with in the limit of . To ensure normalization, we actually define . We follow the standard procedure to obtain the Schmidt decomposition of the quantum state for the two subsystems of dimension d1 and d2 and define the corresponding matrix for the SVD. The SVD yields singular values λk ; the values are distributed according to the Marchenko-Pastur distribution. The original statement is in terms of the eigenvalues of the matrix ; it holds using the unitary characteristics of the SVD.

The real-valued wave function restricts the generality of quantum mechanics. Therefore, we move towards complex-valued wave functions. We generate two random vectors and combine them to a complex-valued random vector as . The wave function is defined analog to the real-valued case with . We empirically find the standard deviation σ to be used in the Marchenko-Pastur distribution for a Schmidt decomposition in the center of the system with

The validation for different system sizes is shown in figure 6, where we consider the Schmidt decomposition at the center of the system for system sizes averaged over 20 samples. We show the cumulative probabilities as a function of the singular values and find a good agreement; despite the agreement, small singular values are underestimated in probability for even system sizes. The odd–even effects are significant and therefore we dedicate the subplots (a) and (b) of figure 6 to the even and odd case, respectively.

Figure 6. Refer to the following caption and surrounding text.

Figure 6. Marchenko-Pastur distribution. We compare the average cumulative probabilities for randomly generated quantum states to the cumulative probabilities of the Marchenko-Pastur distribution for the empirically chosen parameter σ, which are shown as black dashed lines. Due to their different shapes, we show even system sizes in (a) and odd system sizes in (b). Note that the deviations in (a) are in a range of 10−5 to 10−2- for the x-axis, which are not present for the odd system sizes in (b).

Standard image High-resolution image

Based on these results, we extract two more requirements. On the one hand, we need to sample values according to the distribution . We draw random numbers from and return the corresponding singular value for the cumulative probability value. On the other hand, we need to estimate the truncated singular values; we keep only singular values in the case of an already saturated bond dimension. We obtain the discretized probabilities according to the Marchenko-Pastur distribution defined as

The values of λj are sorted ascending and result in figure 6. Finally, the truncation error is given by one sample of singular values for the given dimension.

Appendix B: Singular value distribution for QFTs

We evaluate the distribution of the singular values for the QFT as we did before for the randomizing circuit. Figure 7 relies on a system size of L = 23 as the previous results for the distribution. Although we start in a W-state , a single QFT does not generate much entanglement. Thus, we do a series of QFTs, i.e. eight in this case. The distribution of the singular values resembles most a power law with cubic decay, which makes the QFT use case more favorable for classical emulators.

Figure 7. Refer to the following caption and surrounding text.

Figure 7. Singular value distribution for QFTs. The series of eight QFTs are applied to a W-state. The amount of generated entanglement is much less than in the Marchenko-Pastur distribution. The cubic power-law decay comes closest to the distribution out of our set of distributions.

Standard image High-resolution image
undefined