Stochastic emulation of quantum algorithms

Daniel Braun; Ronny Müller

doi:10.1088/1367-2630/ac4b0f

1. Introduction

Schrödinger is often quoted with the statement 'I would not call [entanglement] one but rather the characteristic trait of quantum mechanics, the one that enforces its entire departure from classical lines of thought.' [1]. Nevertheless, there are many classical examples of 'entanglement'. Consider for example an elastic membrane of rectangular shape with lengths a, b in directions x, y. It has eigenmodes of vibration that can be found by solving the wave equation with appropriate boundary conditions for the amplitude field ψ( x , t) with x = (x, y), x ∈ [0, a], y ∈ [0, b]. A general solution of the wave equation can be written as superposition, ψ( x , t) = ∑_n,m c_nm(t)sin(k_x,n x)sin(k_y,m y), where ${c}_{nm}(t)\in \mathbb{R}$ are arbitrary real coefficients, and the k-vector components are discrete, k_x,n = πn/a, k_y,m = πm/b. Clearly, if there is only one component c_nm(t), the state is a product-state, but most states are entangled in the same sense as in quantum mechanics, i.e. they cannot be written as a product of two wavefunctions. The wavefunction lives in a Hilbert space ${\mathbb{R}}^{\infty }\times {\mathbb{R}}^{\infty }$ , where the two directions x, y play the role of two different subsystems. The membrane is capable of full interference in this Hilbert space, owing to the positive or negative values that the wavefunction can take. However, we have only two subsystems, which might be extended to three by going to a vibrating cube, but clearly this is the maximum number of different subsystems that we can manipulate independently. The fact that in principle we can code any state of a quantum computer with an arbitrary number of qubits in the Hilbert-space of the membrane or the cube is of no use, as the known quantum algorithms rely on the fact that they can be decomposed into quantum gates that act at the same time on one or two qubits only.

An arbitrary number N of subsystems that can be manipulated independently is available when we consider instead of a wavefunction a joint probability distribution P(x₁, ..., x_N) of N random variables x_i. Product states can then be defined as $P({x}_{1},\dots ,{x}_{N})={\prod }_{i=1}^{N}\enspace {P}_{i}({x}_{i})$ —which of course just means that there are no correlations between the random variables. Any probability distribution that cannot be written in this form is formally equivalent to a pure entangled state in quantum mechanics, meaning just this: it cannot be written as a product of local probability distributions, just as in quantum mechanics the wave function of an entangled state cannot be written as a product of local wavefunctions. There is also a (restricted) superposition principle: if p_a(x₁, ..., x_N) and p_b(x₁, ..., x_N) are two valid probability distribution, so is their convex combination, p(x₁, ..., x_N) = qp_a(x₁, ..., x_N) + (1 − q)p_b(x₁, ..., x_N) with 0 ⩽ q ⩽ 1. A local probability distribution might be expanded in an orthonormal set of basis functions β_i (e.g. for x ∈ [a, b], define β_i(x) = 0 unless x ∈ [i, (i + 1)] where β_i(x) = 1/). A scalar product between two probability states denoted as |p₁⟩⟩ and |p₂⟩⟩ can be defined as $\langle \langle {p}_{1}\vert {p}_{2}\rangle \rangle \equiv {\int }_{a}^{b}\;{p}_{1}(x){p}_{2}(x)\mathrm{d}x$ , and a one-norm as ${\Vert}\enspace \vert {p}_{1}\rangle \rangle {\Vert}\equiv {\int }_{a}^{b}\;{p}_{1}(x)\mathrm{d}x$ , where is some chosen width of a basis function, leading to a finite-dimensional Banach space of dimension (b − a)/). A probability state such as (|00⟩⟩ + |11⟩⟩)/2 with probability distribution p(x₁, x₂) = (β₀(x₁)β₀(x₂) + β₁(x₁)β₁(x₂))/2 then corresponds to perfect classical correlations. Clearly, here we are not limited by the number of subsystems—but we are, of course, limited by the type of superpositions that we can create: only convex combinations are possible, a state as e.g. (|00⟩⟩ − |11⟩⟩)/2 is in general forbidden due to the need of positivity of the probability distribution. This implies that 'probability states' are not capable of destructive interference, which would require compensating some positive amplitudes by negative ones (or, as in quantum mechanics, complex ones by other complex ones). Such direct manipulation of probability distributions with classical stochastic maps is therefore not sufficient for running quantum algorithms, even though the modular structure of Hilbert space (or Banach) space is there, as well as a restricted superposition principle both locally as well as in the full Banach space of all subsystems.

It is known that for pure state quantum computing unbound entanglement must arise, otherwise the quantum algorithm can be simulated efficiently classically [2]. But creating unbound entanglement is not sufficient for having a computational advantage over a classical computer, as is well-known on a much deeper level through the Gottesmann–Knill theorem [3]. From the present example, we clearly see that interference is a necessary ingredient for a quantum computer that cannot be efficiently simulated classically (see also [4, 5]).

From the above considerations we can distill the following necessary requirements for a system that is supposed to function the way a quantum computer does:

(a)
Composition of a system in terms of individual subsystems that can be manipulated independently (in sync with the first criterion of DiVincenzo [6] for building a quantum computer, i.e. the need of a scalable number of well-characterized qubits).
(b)
Interference on the level of a single subsystem.
(c)
A linear vector space with a tensor product structure, allowing one to manipulate states by local operations (or by acting on at most two subsystems).
(d)
Interference on the level of the full vector space.
(e)
Physical reality.

The last point becomes obvious when we want to actually build a computer. Mathematically, one can easily imagine all sorts of spaces that satisfy the first four criteria. But they have to describe physical reality if we want to do something with them in real life. The question is then whether besides the quantum mechanical wavefunction other objects exist in nature that allow one to satisfy all of the above criteria, and hence, ultimately, run quantum algorithms using those objects. Surprisingly, the answer is yes.

2. A new type of 'wavefunction' and the grabit

The last example already goes a long way in the direction of what we need. In particular, with probabilistic bits, and the full probability distribution propagated by a stochastic map, we get the same parallelism of propagation in an exponentially large (Banach)-space that is often quoted as the origin of the 'quantum supremacy'. Recently, probabilistic bits ('pbits') and a physical realization via spintronics were suggested in [7]. The only problem with probability distributions is the impossibility of interference. So the question is: can we somehow transform the probability distribution in another way from what we essentially do in quantum mechanics, namely taking its square root and providing it with a complex phase factor, in order to get an object that can at least take negative values, and still represent a certain physical reality?

2.1. Higher-order derivatives of probability distributions

Consider as a many-particle wavefunction of N particles on the real line the Nth derivative of the probability distribution, i.e. define

$\begin{equation}\psi ({x}_{1},\dots ,{x}_{N})\equiv {\partial }_{{x}_{1}}\dots {\partial }_{{x}_{N}}P({x}_{1},\dots ,{x}_{N}),\end{equation} \tag{ 1 }$

where the rhs will be abbreviated as ∂^N P as well. In particular, a single-particle wavefunction is just the (1D)-gradient ψ(x) = ∂_x P(x) of a single-particle probability distribution. With x ∈ [a, b], P(x) lives in the space ${\mathcal{L}}^{1}[a,b]$ of integrable real functions on [a, b], which implies that ψ(x) lives in the space of twice-integrable real functions on [a, b]. We can expand ψ(x) = ∑_n c_n β_n(x), where ${c}_{n}\in \mathbb{R}$ , and β_n(x) is a set of basis functions of twice-integrable real functions on [a, b]. A scalar product of two wavefunctions ψ(x) and ϕ(x) can be defined as $\langle \psi \vert \phi \rangle ={\int }_{a}^{b}\;\psi (x)\phi (x)\enspace \mathrm{d}x$ , under which we can orthonormalize the set of basis functions {β_n(x)}. A norm of the wave-function can be derived as usual from the scalar product, i.e. the Hilbert–Schmidt norm ${\Vert}\psi {{\Vert}}_{2}=\sqrt{\langle \psi \vert \psi \rangle }$ , under which we can normalize the wavefunctions for making a connection to the usual quantum mechanical wavefunctions. The one-norm ${\Vert}\psi {{\Vert}}_{1}={\int }_{a}^{b}\vert \psi (x)\vert \enspace \mathrm{d}x$ will also be used.

In the many-body case, if there are no correlations, then

$\begin{equation}\psi ({x}_{1},\dots ,{x}_{N})={\partial }_{{x}_{1}}\dots {\partial }_{{x}_{N}}{P}_{1}({x}_{1})\dots {P}_{N}({x}_{N})\end{equation} \tag{ 2 }$

$\begin{equation}=\prod\limits _{i=1}^{N}\enspace {\partial }_{{x}_{i}}{P}_{i}({x}_{i})=\prod\limits _{i=1}^{N}\enspace {\psi }_{i}({x}_{i}),\end{equation} \tag{ 3 }$

where the ψ_i(x_i) are local 'wavefunctions'. I.e. in this case we obtain a product state for the wavefunction rather than just for the probability distribution. We can arbitrarily superpose such product states of basis functions, leading to a general state

$\begin{equation}\psi ({x}_{1},\dots ,{x}_{N})=\sum\limits _{\boldsymbol{n}}\enspace {c}_{\boldsymbol{n}}{\beta }_{{n}_{1}}({x}_{1})\cdot \dots \cdot {\beta }_{{n}_{N}}({x}_{N}),\end{equation} \tag{ 4 }$

where n ≡ (n₁, n₂, ..., n_N) and ${c}_{\boldsymbol{n}}\in \mathbb{R}$ . One easily checks that with this the criteria (a)–(d) from the introduction are satisfied. We will come back to criterion (e) in more detail later, but it is clear that the 'wavefunction' defined in (1) has a clear physical meaning. First-order spatial derivatives of concentrations enter for example in Fick's law of diffusion, and osmotic pressure is driven by concentration differences. Reaction–diffusion systems were analyzed by Turing as a possible mechanism of pattern formation in biology [8], and there is even a branch of physics trying to establish universal computing based on such systems [9]. For large particle numbers, concentrations can be seen as proxies of corresponding probabilities. Higher-order derivatives known to play a role in physics are third-order time-derivatives that appear in electrodynamics when trying to build it entirely on the motion of charged particles [10]. Also, in quantum mechanics we can get derivatives of arbitrarily high order from the time-evolution operator $\mathrm{exp}(-\mathrm{i}\hat{H}t/\hslash )$ if the kinetic energy in the Hamiltonian $\hat{H}$ is expressed as a second derivative in position-representation, but it appears that derivatives of probability distributions of a given large order have not been considered as physical objects in their own right yet. Here we show that by doing so one can emulate any pure-state quantum algorithm as stochastic classical algorithm.

2.2. Discretization

For numerical implementations it is necessary to discretize coordinates x, x = iΔ, $i\in \mathbb{N}$ , and approximate derivatives by quotients of finite differences. Hence, for a single particle, we have ψ(x) = ∂_x P(x) ≃ (P((i + 1)Δ) − P(iΔ))/Δ ≡ P(i + 1) − P(i), where in the last step and from now one we set Δ = 1. I.e. one can code one value of ψ using two values of P. For realizing a single continuous bit, one needs four values of P [28]. Counting the values i from i = 0, we can define a two-state system with continuous real coefficients ψ₀ and ψ₁ as

$\begin{equation}\left(\genfrac{}{}{0pt}{}{{\psi }_{0}}{{\psi }_{1}}\right)=-\left(\genfrac{}{}{0pt}{}{P({x}_{1})-P({x}_{0})}{P({x}_{3})-P({x}_{2})}\right)\equiv \left(\genfrac{}{}{0pt}{}{{P}_{0}-{P}_{1}}{{P}_{2}-{P}_{3}}\right).\end{equation} \tag{ 5 }$

For convenience we have introduced an (irrelevant) global minus-sign. Apart from the fact that the ψ_i are real, and possibly a different normalization, the state defined by (5) emulates a qubit in a pure state with amplitudes ψ_i for computational basis states |i⟩. Owing to the origin as a gradient of the probability distribution, a system with a two-dimensional state space as defined in (5) will be called a 'gradient-bit', or 'grabit' for short. Equation (5) can be written compactly as

$\begin{equation}{\psi }_{i}=\sum\limits _{\sigma =0,1}{(-1)}^{\sigma }{P}_{2i+\sigma },\quad i\in \left\{0,1\right\}.\end{equation} \tag{ 6 }$

We will call i the 'byte logical value' ('blv' or 'logical value' for short), as it labels the grabit states |0⟩ or |1⟩, σ the 'gradient value', and I = 2i + σ the 'byte4 value' ('b4v' for short). They will be denoted with small roman, small Greek, and capital roman letters, respectively. We can think of i and σ really as two independent coordinates that label four different bins, see figure 1. A probabilistic realization of the grabit is realized via a particle that is found with probability P_I in one of the four bins. If just one of the 4 bins is realized with probability 1, we have states |0⟩, −|0⟩, |1⟩, and −|1⟩, respectively, for I = 0, 1, 2, 3, or equivalently (i, σ) = (0, 0), (0, 1), (1, 0), (1, 1). So two classical stochastic bits code one (real) grabit.

**Figure 1.** Left: a single grabit realized as a discretized gradient of a probability distribution leads to four bins in which a ball is found with probabilities P_I, I = 0, 1, 2, 3. The four bins are arranged on a square such that I = 2i + σ where the index i = 0, 1 labels the logical value and σ = 0, 1 the gradient value. The (unnormalized) grabit state is given by the amplitudes ψ₀ = P₀ − P₁ and ψ₁ = P₂ − P₃. Middle: example of a grabit state |ψ⟩ = 0.3|0⟩ − 0.2|1⟩, realized via P = (0.5, 0.2, 0.05, 0.25). Right: state space of a single grabit. Allowed pure states are on the surface of the tetrahedron, whose vertices give the signed computational basis states. Convex combinations of the b4v probability states lead to grabit states capable of interference.
Download figure:
Standard image High-resolution image

2.3. Single grabit interference

Let us verify at the hand of the simplest example that interference on the single-grabit level works correctly, by checking that a state |0⟩ and a state −|0⟩ can indeed be superposed to give 0|0⟩:

$\begin{equation}\vert 0\rangle +(-\vert 0\rangle )=0\vert 0\rangle .\end{equation} \tag{ 7 }$

According to (6), the two states involved are represented in terms of probabilities as P₀ = 1 and P₁ = 1, respectively, or, in terms of the full 4D probability vectors, as P _A = (1,0,0,0)^t, P _B = (0,1,0,0)^t. The linear superposition (7) translates to an equal weight convex combination of the two probability distributions, P = ( P _A + P _B)/2 = (1, 1, 0, 0)/2. Inserting this into (6) leads immediately to ψ₀ = ψ₁ = 0. So interference on the single-grabit level works indeed.

All other examples of single-grabit interference work the same way: convex combinations of the probability values P_I (I = 0, 1, 2, 3) translate into real superpositions of the ψ_i with coefficients of either sign and vice versa. The values of the coefficients in the superposition are only limited by the fact that 0 ⩽ P_I ⩽ 1 and ${\sum }_{I=0}^{3}\enspace {P}_{I}=1$ . Hence, the state space of the single grabit has the geometry of a tetrahedron, see figure 1. Its vertices correspond to the states |0⟩, −|0⟩, |1⟩, −|1⟩. Points on the edges interpolating between vertices with different logical values give linear superpositions of |0⟩ and |1⟩ with signs of the coefficients given by the signs of the vertices, whereas points on edges interpolating between vertices with the same logical values still represent states |0⟩ or |1⟩, respectively, with amplitudes varying from −1 to +1. Only points on the surface of the tetrahedron are allowed for pure states. Since only real linear combinations are allowed, the surface of the tetrahedron can be seen as the analogue of a great circle of the Bloch-sphere in quantum mechanics in the xz-plane.

Since in quantum mechanics states are described by points in a projective Hilbert space (i.e. state vectors are only defined up to a complex prefactor), the relevant quantum information in a grabit is only the ratio of the amplitudes of states |0⟩ and |1⟩. This motivates a parameterization of the probability vector P of a single grabit in the form

$\begin{equation}\boldsymbol{P}=\frac{1}{2}\left(\begin{matrix}\hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 1\hfill \\ \hfill 1\hfill \end{matrix}\right)+\frac{r}{2}\left(\begin{matrix}\hfill 1\hfill \\ \hfill 1\hfill \\ \hfill -1\hfill \\ \hfill -1\hfill \end{matrix}\right)+\frac{b}{2}\left(\begin{matrix}\hfill q\hfill \\ \hfill -q\hfill \\ \hfill 1\hfill \\ \hfill -1\hfill \end{matrix}\right),\end{equation} \tag{ 8 }$

where ψ₁ = P₂ − P₃ ≡ b ≠ 0 is assumed. The grabit state is then coded in q = ψ₀/ψ₁, whereas r can be seen as a gauge-degree of freedom that does not change the state of the grabit. In the case of ψ₁ = b = 0, we can take ψ₀ = a as independent variable instead of q, giving

$\begin{equation}\boldsymbol{P}=\frac{1}{2}\left(\begin{matrix}\hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 1\hfill \\ \hfill 1\hfill \end{matrix}\right)+\frac{r}{2}\left(\begin{matrix}\hfill 1\hfill \\ \hfill 1\hfill \\ \hfill -1\hfill \\ \hfill -1\hfill \end{matrix}\right)+\frac{a}{2}\left(\begin{matrix}\hfill 1\hfill \\ \hfill -1\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \end{matrix}\right).\end{equation} \tag{ 9 }$

While r, b, q (or r, a for b = 0) can be seen as new independent variables that replace P₀, P₁, P₂, their ranges are not independent. Indeed, from 0 ⩽ P_I ⩽ 1, for I ∈ {0, 1, 2, 3}, it follows for q > 0 that

$\begin{equation}\vert b\vert \leqslant \left\{\begin{matrix}\hfill r/q\hfill & \hfill \quad 0\leqslant r\leqslant \frac{q}{1+q}\hfill \\ \hfill 1-r\hfill & \hfill \quad \frac{q}{1+q}\leqslant r\leqslant 1.\hfill \end{matrix}\right.\end{equation} \tag{ 10 }$

This implies that argmax_r|b| = q/(1 + q), and max_r|b| = 1/(1 + q). The value of a is maximized for the same value of r and gives max_r|a| = q/(1 + q). The probability distribution with maximum contrast, meaning the largest maximum amplitudes |ψ₀| and |ψ₁| for given value of q > 0, obtained from ψ₀ > 0 and ψ₁ > 0, is given by

$\begin{equation}{\boldsymbol{P}}_{\mathrm{max}}\enspace (q)=\left(\begin{matrix}\hfill \frac{q}{1+q}\hfill \\ \hfill 0\hfill \\ \hfill \frac{1}{1+q}\hfill \\ \hfill 0\hfill \end{matrix}\right).\end{equation} \tag{ 11 }$

The correct ratio between ψ₀ and ψ₁ is obvious from this expression, as is the fact that there is no 'socket' of center-of-mass probabilities, in the sense that P₀ − P₁ = P₀ + P₁ and P₂ − P₃ = P₂ + P₃, i.e. all the probability in bins 0 or 1 goes into the difference that determines ψ₀, and all probability in bins 2 or 3 goes into the difference that determines ψ₁. This corresponds to a gauge choice, namely the maximum possible value of r. We see that in this case the physical probability ${\tilde{p}}_{0}={P}_{0}+{P}_{1}$ of finding a logical value 0 (or physical probability ${\tilde{p}}_{1}={P}_{2}+{P}_{3}$ for finding a logical value 1) when marginalizing over the gradient value is given directly by ${\tilde{p}}_{0}=\vert {\psi }_{0}\vert$ (respectively ${\tilde{p}}_{1}=\vert {\psi }_{1}\vert$ ). This marks a deviation from standard quantum mechanics and the Born rule that probabilities are given by the squared absolute value of the amplitude, unless a single probability equals 1. Useful quantum algorithms are mostly constructed such that there are one or relatively few peaks in the final probability distribution on logical values that allow one to obtain the result of the calculation by efficient classical post-processing. In that case, it is irrelevant whether the first or second power of the absolute amplitude determines the probability of the outcome. Interestingly, Born initially advocated the use of the power one [11]. We will therefore refer to the rule p_i = |ψ_i| as the 'Born-1' rule.

Most importantly, however, we see that in this situation of maximum contrast we have ${\tilde{p}}_{i}={p}_{i}\equiv \vert {\psi }_{i}\vert$ , i.e. the actual physical probability to find the particle in a bin with logical value i is given by the Born-1 rule. This means that if we manage to propagate ψ according to a quantum algorithm that ends with a peak in |ψ| AND we can enforce the gauge choice of maximum contrast, the particle will be found with correspondingly high probability in the bin that corresponds to the maximum of |ψ|²—just as if the algorithm was run on a quantum computer.

For q < 0 obtained through ψ₀ < 0, ψ₁ > 0, equation (11) reads correspondingly

$\begin{equation}{\boldsymbol{P}}_{\mathrm{max}}\enspace (q)=\left(\begin{matrix}\hfill 0\hfill \\ \hfill \frac{q}{1+\vert q\vert }\hfill \\ \hfill \frac{1}{1+\vert q\vert }\hfill \\ \hfill 0\hfill \end{matrix}\right),\end{equation} \tag{ 12 }$

and similarly for the other combinations of signs of ψ₀ and ψ₁. The rule is that always either P₀ or P₁ equals |q|/(1 + |q|) (and the other one equals 0) if the sign of ψ₀ is positive or negative, respectively; and similarly for P₂ and P₃ which always equals 1/(1 + |q|) or 0, depending on the sign of ψ₁.

2.4. Multi-grabit states

The discretization procedure introduced for a single grabit also works for any number of grabits. A general N-grabit state is given by

$\begin{equation}{\psi }_{\boldsymbol{i}}=\sum\limits _{\boldsymbol{\sigma }\in {\left\{0,1\right\}}^{\otimes N}}{(-1)}^{\sum\limits _{i=1}^{N}{\sigma }_{i}}\enspace {P}_{2\boldsymbol{i}+\boldsymbol{\sigma }},\end{equation} \tag{ 13 }$

where i = (i₁, ..., i_N) and σ = (σ₁, ..., σ_N) label the logical values and gradient values 0, ..., 2^N − 1, respectively. The map (13), P ↦ ψ, will be denoted symbolically as ψ = ∂^N P, recalling the origin as a (discretized) Nth order derivative of a probability distribution.

As in quantum mechanics, the sign of the state can now arise from any grabit, e.g. a two-grabit state |00⟩ can be represented by P₁₁ = 1, or equivalently by P₀₀ = 1. The latter case is really a product state (−|0⟩) ⊗ (−|0⟩). Two-particle interference works just as nicely as single-particle interference, e.g. (|00⟩ + (−|00⟩))/2 can be represented by the convex combination of two probability vectors with components ${({\boldsymbol{P}}_{1})}_{IJ}={\delta }_{IJ,00}$ , ${({\boldsymbol{P}}_{2})}_{IJ}={\delta }_{IJ,01}$ , which correspond to states |00⟩ and |0⟩(−|0⟩). The convex combination P = ( P ₁ + P ₂)/2 has then components P_IJ = δ_I,0((δ_J,0 + δ_J,1)/2), which translates to ${\psi }_{ij}={({\psi }_{1})}_{i}{({\psi }_{2})}_{j}$ with ψ₂ = (0,0)^t, hence |ψ⟩ = 0, indeed.

We can also easily create entangled states. It is useful to visualize the addition of probabilities and resulting grabit states. Let us denote a probability distribution with just one byte4 value of I realized for a grabit number 1, e.g. I = 0, as , where the squares are labeled according to figure 1. So this state is +|0⟩ of the first grabit. Similarly, if there are two grabits with a single combination of bins realized, we denote it as e.g. , which represents state (−|0⟩)(−|0⟩) = |00⟩, or which represents the state −|01⟩. The Bell states (|00⟩ ± |11⟩)/2 are then realized e.g. by

To summarize, a 4^N-component probability vector is used for coding the 2^N components of a real-valued ψ (see section 3.2.2 for complex states). This means that two classical bits are required to represent one real-valued qubit. As in standard quantum mechanics, where the quantum state is never explicitly stored but miraculously kept track of by nature, the probability vector is never stored explicitly either. It is implicitly stored in the correlations of the particles realizing different grabits, or, more precisely, in the derivatives of order N_bit for N_bit grabits of those correlations. The derivatives are approximated as finite differences of those correlations. In order to emulate a quantum algorithm, we have to sample from these probability distributions and then act on the realizations in order to emulate quantum gates for implementing the corresponding stochastic maps, as will be seen in the next section. Sampling from the probability distributions gives us N_ball realizations of a set of N_bit little billard balls. When N_ball is large enough, we can estimate the probability distribution P over the byte4 values from the observed relative frequencies R. We will continue to use upper case P (or P_I for a single component) for probability vectors over the byte4 values I. Column vectors of components P_I may also be represented in vector notation, P . Relative frequencies (i.e. histograms) over b4v's will be denoted by upper case R_I, relative frequencies over blv's by lower case r_i. Instead of i and I running from 0 to ${2}^{{N}_{\text{bit}}}-1$ or ${4}^{{N}_{\text{bit}}}-1$ , respectively, we also use i and I when we think of them as strings of single grabit values, $\boldsymbol{i}=({i}_{1},\dots ,{i}_{{N}_{\text{bit}}})$ and $\boldsymbol{I}=({I}_{1},\dots ,{I}_{{N}_{\text{bit}}})$ , and correspondingly for σ . All probabilities and relative frequencies will be normalized to 1, ${\sum }_{\boldsymbol{i}\in {\left\{0,1\right\}}^{{N}_{\text{bit}}}}\enspace {r}_{\boldsymbol{i}}=1$ and ${\sum }_{\boldsymbol{I}\in {\left\{0,1,2,3\right\}}^{{N}_{\text{bit}}}}\enspace {R}_{\boldsymbol{I}}=1$ . The symbol ψ will be used for a grabit state (see equation (13)). A hat is reserved for estimators. In particular, an estimator of a real-valued multi-grabit state ψ based on relative frequencies is

$\begin{equation}{\hat{\psi }}_{\boldsymbol{i}}=\sum\limits _{\boldsymbol{\sigma }\in {\left\{0,1\right\}}^{{N}_{\text{bit}}}}{(-1)}^{\sum\limits _{k=1}^{{N}_{\text{bit}}}{\sigma }_{k}}{R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}.\end{equation} \tag{ 16 }$

We also use histograms H_I normalized to ∑_I H_I = N_ball. The physical probability distribution is the one with the gradient degree of freedom 'traced-out' (or marginalized),

$\begin{equation}{\tilde{p}}_{\boldsymbol{i}}=\sum\limits _{\boldsymbol{\sigma }\in {\left\{0,1\right\}}^{{N}_{\text{bit}}}}\enspace {P}_{2\boldsymbol{i}+\boldsymbol{\sigma }},\end{equation} \tag{ 17 }$

and its estimator based on the relative frequencies reads

$\begin{equation}{\hat{\tilde{p}}}_{\boldsymbol{i}}=\sum\limits _{\boldsymbol{\sigma }\in {\left\{0,1\right\}}^{{N}_{\text{bit}}}}\enspace {R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}.\end{equation} \tag{ 18 }$

Ψ will be reserved for the true quantum mechanical state with complex amplitudes.

3. Grabit gates

As we have seen, superpositions with real coefficients of either sign of states of N grabits are generated through convex combinations of probability vectors over the 4^N byte4 values. Such convex combinations are created physically as usual via stochastic maps. Physically, we will act on the drawn realizations in such a way that the derivatives (respectively, differences under the approximations made) of the probability distributions are propagated in the same way as the multi-particle quantum mechanical wave function. In the special case of stochastic maps occurring with probability one, we have deterministic gates. Let us now determine the stochastic gates corresponding to the most important single-qubit and two-qubit gates.

3.1. Single grabit gates

3.1.1. The X gate

The X-gate flips the logical value i while keeping the phase, i.e. the gradient value σ. I.e. it maps ±|0⟩ ↔ ±|1⟩. This is achieved by flipping with probability 1 the byte4 values 0 ↔ 2 and 1 ↔ 3, which translates to a physical operation of flipping the corresponding particle positions. Hence, the stochastic map of a single grabit's vector of probabilities $\boldsymbol{P}={({P}_{0},{P}_{1},{P}_{2},{P}_{3})}^{\mathrm{T}}$ over the four byte4 values is given by

$\begin{equation}{S}_{x}=\left(\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right).\end{equation} \tag{ 19 }$

Note that, just as for qubits, it is enough to specify the stochastic map for computational basis states. Due to the linearity of the stochastic propagation, it then works for any grabit state.

3.1.2. The Z gate

The Z gate flips the phase of the |1⟩-state and leaves the phase of |0⟩ untouched, i.e. ±|0⟩ ↔ ±|0⟩ and ±|1⟩ ↔ ∓|1⟩. In terms of the particle's bins, this corresponds to flipping the byte4 values as 0 ↔ 0, 1 ↔ 1, 2 ↔ 3, 3 ↔ 2, all with probability 1. Hence, the stochastic map reads

$\begin{equation}{S}_{z}=\left(\begin{matrix}\hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill \end{matrix}\right),\end{equation} \tag{ 20 }$

which happens to be the same as the representation of the ordinary CNOT for two qubits.

3.1.3. The Hadamard gate

The Hadamard gate is maybe the most important gate of all. Almost all known quantum algorithms begin by applying the Hadamard gate to all qubits. It is the decisive gate for creating the massive interference that characterizes a quantum computer. A single Hadamard gate creates one i-bit of interference [4, 5, 12] by realizing the transformations ±|0⟩ ↔ ±(|0⟩ + |1⟩)/2 and ±|1⟩ ↔ ±(|0⟩ − |1⟩)/2. Here we have already used a normalization that is adapted to propagating probabilities, but as mentioned earlier, only ratios between amplitudes matter, such that the normalization by dividing by 2 instead of the usual $\sqrt{2}$ is irrelevant. The mapping of states implies that we have to flip byte4 values for the particle according to 0 → 2 or 0, 1 → 3 or 1, 2 → 3 or 0, 3 → 1 or 3 → 2 with all flips performed with probability 1/2. The stochastic map that describes this propagation reads

$\begin{equation}{S}_{H}=\frac{1}{2}\left(\begin{matrix}\hfill 1\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 0\hfill \end{matrix}\right).\end{equation} \tag{ 21 }$

We can see that S_H = (S_X + S_Z)/2, just as for unitary single-qubit gates $H=(X+Z)/\sqrt{2}$ .

A universal quantum computer can be constructed once we can implement the Hadamard gate, the π/8-gate (the unitary diag(1, e^iπ/4) in the computational basis), and a controlled-NOT ('CNOT') gate. A single-grabit gate representing a unitary with finite imaginary parts can be realized with a stochastic map acting on two grabits at the time, and also the CNOT is a two-qubit gate to which we turn now.

3.2. Two-grabit gates

3.2.1. The CNOT

The CNOT flips a target qubit if a control qubit is in state |1⟩. I.e. if we have two qubits, with the first one the control-qubit and the second one the target-qubit, then the CNOT performs a permutation of computational basis states according to ±|00⟩ ↔ ±|00⟩, ±|01⟩ ↔ ±|01⟩, ±|10⟩ ↔ ±|11⟩. This is easily translated to a stochastic map for byte4 values of grabits by keeping the local signs, just as in the case of the single-grabit gates. For the CNOT this translates to a 16 × 16 permutation matrix S_CNOT that will not be written down here, but as an example, (−|1⟩)(+|1⟩) ↔ (−|1⟩)(+|0⟩) translates to flipping joint b4v's of particle positions (3, 2) ↔ (3, 0) with probability 1.

3.2.2. Complex states and gates

Up to now we have considered interference only on the basis of positive and negative amplitudes. However, the state space of quantum mechanics is a (projective) Hilbert-space over the complex numbers. It is well-known that quantum mechanics can be formulated completely equivalently with state vectors (and hence wavefunctions) over the field of real numbers only (see however [13] for recent work considering imaginarity as a resource). To this end, one splits the wave function into its real and imaginary part, i.e. it becomes a two-component spinor propagated by a Hamiltonian that is also split into real and imaginary parts. In quantum algorithms this corresponds to spending a single additional qubit that codes whether we deal with the real- or imaginary part of the state [14]. We formalize the concept by the following definition:

Definition 3.1. The 'realification' Φ of a quantum state Ψ is a map ${\Phi}:{\mathbb{C}}^{{2}^{{N}_{\text{bit}}}}\to {\mathbb{R}}^{{2}^{{N}_{\text{bit}}+1}}$ , ${{\Psi}}_{i}{\mapsto}{{\Phi}}_{i}=(\mathfrak{R}{{\Psi}}_{i},\mathfrak{I}{{\Psi}}_{i})\enspace \forall \enspace i=0,\dots ,{2}^{{N}_{\text{bit}}}-1$ , which obviously constitutes a bijective map. Its inverse mapping Φ⁻¹ is called 'complexification'.

The additional grabit (or qubit) will be called 'ReIm grabit' (or 'ReIm qubit'). N_bit will denote the number of grabits including the ReIm grabit, i.e. N_bit = n_bit + 1, where n_bit is the number of qubits with complex-valued amplitudes.

Every pure quantum state Ψ of n_bit qubits can be represented as gradient state, as is made precise by the following lemma:

Lemma 3.2. For all ${\Psi}\in \mathcal{H}={\mathbb{C}}^{({2}^{{n}_{\text{bit}}})}$ , there exists a probability distribution $P\in {({\mathbb{R}}^{+})}^{({2}^{{N}_{\text{bit}}})}$ over b4v's such that $\psi ={\partial }^{{N}_{\text{bit}}}P=a{\Phi}$ , where $\mathbb{R}\ni a\ne 0$ and ${\Phi}\in {(\mathbb{R})}^{({2}^{{N}_{\text{bit}}})}$ with N_bit = n_bit + 1 is the realification of Ψ.

Proof. The lemma is proven by providing a trivial representation: set Q_{2
i} = Φ_i if Φ_i ⩾ 0, Q_{2
i
+(0...01)} = |Φ_i| if Φ_i < 0, and all other Q_{2
i
+
σ} = 0. Then ${\tilde{\psi }}_{\boldsymbol{i}}={\partial }^{{N}_{\text{bit}}}{\tilde{P}}_{\boldsymbol{i}}=\mathrm{sign}({{\Phi}}_{\boldsymbol{i}})\vert {{\Phi}}_{\boldsymbol{i}}\vert ={{\Phi}}_{\boldsymbol{i}}$ . Now normalize, P = Q/||Q||₁ = Q/∑_i|Φ_i|. Then P is a valid probability distribution and due to linearity

$\begin{equation}{\psi }_{\boldsymbol{i}}={({\partial }^{{N}_{\text{bit}}}P)}_{\boldsymbol{i}}=\frac{{({\partial }^{{N}_{\text{bit}}}Q)}_{\boldsymbol{i}}}{{\sum }_{\boldsymbol{j}}\vert {{\Phi}}_{\boldsymbol{j}}\vert }=\frac{{{\Phi}}_{\boldsymbol{i}}}{{\Vert}{\Phi}{{\Vert}}_{1}},\end{equation} \tag{ 22 }$

i.e. ψ = aΦ with a > 0. □

Of course, the representation used in the proof is not unique, as the Φ_i fix only a linear combination of P_{2
i
+
σ} with alternating signs for different σ , which generalizes the gauge choice observed for a single grabit. The prefactor a will, in general, depend on the gauge choice and on the state Φ.

Let U be a complex unitary matrix with matrix elements U_ij in the computational basis that propagates a complex quantum state Ψ with elements Ψ_j, ${{\Psi}}_{i}^{\prime }={U}_{ij}{{\Psi}}_{j}$ (with summation convention over pairs of indices). Then, coding real- and imaginary part of Ψ as in definition 3.1, $({{\Phi}}_{i0},{{\Phi}}_{i1})\equiv (\mathfrak{R}{{\Psi}}_{i},\mathfrak{I}{{\Psi}}_{i})$ , the real state $\tilde{{\Phi}}$ is propagated according to ${{\Phi}}_{i}^{\prime }={\tilde{U}}_{ij}{{\Phi}}_{j}$ where each complex matrix element U_ij is replaced by a real 2 × 2 matrix,

$\begin{equation}{U}_{ij}{\mapsto}{({\tilde{U}}_{i\nu ,j\mu })}_{\nu ,\mu \in \left\{0,1\right\}}=\left(\begin{matrix}\hfill \mathfrak{R}{U}_{ij}\hfill & \hfill -\mathfrak{I}{U}_{ij}\hfill \\ \hfill \mathfrak{I}{U}_{ij}\hfill & \hfill \mathfrak{R}{U}_{ij}\hfill \end{matrix}\right).\end{equation} \tag{ 23 }$

We always choose the ReIm qubit as least significant bit, and numbering of basis states is always in increasing order of the binary labels.

3.2.3. General phase gate

As simplest example consider the gate that adds a relative phase ϕ to the state |1⟩ of a qubit, while leaving the phase of |0⟩ untouched, i.e.

$\begin{equation}{U}_{\phi }=\left(\begin{matrix}\hfill 1\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {\text{e}}^{\text{i}\phi }\hfill \end{matrix}\right){\mapsto}{\tilde{U}}_{\phi }=\left(\begin{matrix}\hfill {1}_{2}\hfill & \hfill {0}_{2}\hfill \\ \hfill {0}_{2}\hfill & \hfill \begin{matrix}\hfill \mathrm{cos}\enspace \phi \hfill & \hfill -\mathrm{sin}\enspace \phi \hfill \\ \hfill \mathrm{sin}\enspace \phi \hfill & \hfill \mathrm{cos}\enspace \phi \hfill \end{matrix}\hfill \end{matrix}\right).\end{equation} \tag{ 24 }$

I.e. written as a real gate acting on real state vectors, the general phase gate can be seen as a controlled real phase gate acting as a rotation in the state space of the second (target-)qubit (=ReIm qubit) if the first (control-)qubit is in the state |1⟩.

For translation to a grabit gate, consider hence first the action of the rotation gate on the second qubit. Assume 0 < ϕ ⩽ π/2 first, i.e. cos ϕ ⩾ 0, sin ϕ > 0. As before, we find the corresponding stochastic map by studying its action on the computational basis states, e.g. |0⟩ ↦ cos ϕ|0⟩ + sin ϕ|1⟩, and using the linearity of convex combinations. Requesting maximum possible contrast we then have that the byte4 probability vector is mapped as (1,0,0,0)^t ↦ (q, 0, 1, 0)/(1 + q), where q = |cot(ϕ)|. Altogether we get the stochastic map

$\begin{equation}{S}_{{R}_{\phi }}^{(1)}=\frac{1}{1+q}\left(\begin{matrix}\hfill q\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill 0\hfill & \hfill q\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill q\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill q\hfill \end{matrix}\right)\text{,}\;0< \phi \leqslant \pi /2.\end{equation} \tag{ 25 }$

The stochastic process is implemented by acting on the drawn realizations: e.g. if the b4v is 0, we flip it with probability 1/(1 + q) to 2 and keep it with probability q/(1 + q), and accordingly for the other b4v's. The stochastic process is defined even for a single realization. In the other quadrants, the signs of cos ϕ or sin ϕ change. A sign change of cos ϕ is implemented by moving q in ${S}_{{R}_{\phi }}^{(1)}$ one row up or down while keeping the same blv. E.g. for π/2 < ϕ ⩽ π we have

$\begin{equation}{S}_{{R}_{\phi }}^{(2)}=\frac{1}{1+q}\left(\begin{matrix}\hfill 0\hfill & \hfill q\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill q\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill q\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill q\hfill & \hfill 0\hfill \end{matrix}\right)\text{,}\;\pi /2< \phi \leqslant \pi ,\end{equation} \tag{ 26 }$

and in the remaining quadrants

$\begin{equation}{S}_{{R}_{\phi }}^{(3)}=\frac{1}{1+q}\left(\begin{matrix}\hfill 0\hfill & \hfill q\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill q\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill q\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill q\hfill & \hfill 0\hfill \end{matrix}\right)\text{,}\;\qquad {S}_{{R}_{\phi }}^{(4)}=\frac{1}{1+q}\left(\begin{matrix}\hfill q\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill q\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill q\hfill & \hfill 0\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill q\hfill \end{matrix}\right)\end{equation} \tag{ 27 }$

for −π < ϕ ⩽ −π/2 and −π/2 < ϕ ⩽ 0, respectively.

The question of normalization becomes relevant for controlled gates. E.g. $\vert {\Phi}\rangle =\vert c,t\rangle ={(1,0,1,0)}^{t}/\sqrt{2}\stackrel{{\tilde{U}}_{\phi }}{{\mapsto}}{(1,0,\mathrm{cos}\enspace \phi ,\mathrm{sin}\enspace \phi )}^{t}/\sqrt{2}$ . I.e. the two-norm of the vector in the c = 1 sub-space {|10⟩, |11⟩} is conserved, whereas the stochastic map does not conserve the two-norm, and hence modifies the ratio of the two-norms in the c = 1 vs the c = 0 subspace. Because the amplitude of the c = 1 subspace cannot be increased beyond what the stochastic matrices ${S}_{{R}_{\phi }}^{(i)}$ deliver, in order to remedy this we reduce the amplitude of the c = 0 subspace instead using the gauge degree of freedom. For this we make an ansatz for an amplitude reduction gate acting as stochastic map on the control-grabit c, but activated only if c = 0. It can hence be restricted to the |0⟩, −|0⟩ subspace, and therefore represented as

$\begin{equation}{R}_{2}=\left(\begin{matrix}\hfill 1-{r}_{0}\hfill & \hfill {r}_{0}\hfill \\ \hfill {r}_{0}\hfill & \hfill 1-{r}_{0}\hfill \end{matrix}\right).\end{equation} \tag{ 28 }$

The reduction of the two-norm of states in the c = 0 subspace must be by the same amount as of those in the c = 1 subspace due to the controlled rotation, which is ${\Vert}{\psi }^{\prime }{{\Vert}}_{2}=\mathcal{N}{\Vert}\psi {{\Vert}}_{2}$ with $\mathcal{N}=\sqrt{1+{q}^{2}}/(1+\vert q\vert )$ , and gives ${r}_{0}=(1-\mathcal{N})/2$ . Altogether the general-phase gate U_ϕ on a single qubit becomes a real gate ${\tilde{U}}_{\phi }$ on two qubits that is emulated with a 16 × 16 stochastic map on two grabits of the form

$\begin{equation}{S}_{{R}_{\phi }}^{\text{full},i}={R}_{2}\otimes {1}_{4}\oplus {1}_{2}\otimes {S}_{R\phi }^{(i)},\end{equation} \tag{ 29 }$

It has the clear interpretation of reducing the amplitude of grabit components in the ±|0⟩ subspace of the first grabit when the conditional rotation of the second grabit is not activated, while leaving components of the ±1 subspace of the first grabit alone but activating the conditional rotation of the second grabit iff the first grabit is in state ±|1⟩. Since r₀ < 1/2, the sign of the components of the grabit-state is kept. The amplitude reduction gate (28) is implemented as a stochastic process where 0 ↔ 1 with probability r₀, whereas with probability 1 − r₀ the two b4v's are not flipped, and correspondingly for 2 ↔ 3.

If we have more than two grabits (i.e. more than one complex qubit) this construction is still sufficient, i.e. no additional amplitude reductions need to be introduced: the full Hilbert space decomposes into the two subspaces with the control grabit either in states ±|0⟩ or ±|1⟩. If we have controlled unitaries controlled by more than one control, the construction works accordingly, i.e. in the subspace where the unitary is not activated the two-norm of the components of the state vector shall be reduced to the same value resulting from the unitary applied in the complementary subspace. For this it is enough to only act with the amplitude reduction gate on the grabits involved in the control.

3.2.4. Controlled general phase gate

The controlled general phase gate on two complex qubits translates first to a doubly controlled rotation gate on the Re/Im qubit if c = t = 1, i.e. a three-qubit gate. The stochastic map ${S}_{c-{R}_{\phi }}$ on three grabits that emulates it has an ${S}_{{R}_{\phi }}$ block that acts on the Re/Im grabit in the four subspaces corresponding to ±|1⟩_c ± |1⟩_t of the control and target grabit, and amplitude reductions in the complementary subspaces. I.e. we apply the rotation gate to the ReIm-grabit iff the blv's c = t = 1, and apply an amplitude reduction gate to the states |c, t⟩ otherwise. It is enough to implement the amplitude reduction on a single grabit, e.g. for c = 0 implement it on c, which will reduce the amplitudes of all states |c, t⟩ = |±0, ±0⟩ and |±0, ±1⟩. For the state |10⟩ it can be implemented on |t⟩. Altogether we get for the controlled general phase gate the three-grabit stochastic map

$\begin{equation}{S}_{c-{R}_{\phi }}^{\text{full},i}={R}_{2}\otimes {1}_{16}\oplus {R}_{2}\otimes {1}_{4}\oplus {1}_{2}\otimes {S}_{{R}_{\phi }}^{(i)}\oplus {R}_{2}\otimes {1}_{4}\oplus {1}_{2}\otimes {S}_{{R}_{\phi }}^{(i)}.\end{equation} \tag{ 30 }$

Non-trivial stochastic matrices, i.e. matrices that cannot be reduced to a permutation of b4vs, will be called 'interference-generating stochastic matrices' in the following, and their set denoted by $\mathcal{I}$ . From the stochastic matrices considered so far, only the Hadamard gate, the general phase gate, and the controlled general phase gate (for phases φ ≠ kπ, $k\in \mathbb{Z}$ ) are in $\mathcal{I}$ .

3.3. Emulation of a quantum algorithm

As is well-known, a universal gate set $\mathcal{G}$ that allows one to approximate any unitary on ${\mathbb{C}}^{{2}^{{n}_{\text{bit}}}}$ with arbitrary precision is given by the Hadamard-gate, the T-gate (which is the gate (24) with ϕ = π/4), and the CNOT gate (see e.g. [15] p 189ff). From the stochastic emulation of the elementary gates in the previous subsection, we obtain the following theorem:

Theorem 3.3. Let U denote a pure-state quantum algorithm, i.e. a unitary transformation on the Hilbert space ${\mathbb{C}}^{{2}^{{n}_{\text{bit}}}}$ composed from a sequence of gates U⁽ⁱ⁾ from the universal gate set $\mathcal{G}$ , i.e. $U={U}^{({N}_{\text{g}})}{U}^{({N}_{\text{g}}-1)}\dots {U}^{(1)}$ . Let Ψ⁽⁰⁾ be the input state, and ${{\Psi}}^{({N}_{g})}$ the output state, ${{\Psi}}^{({N}_{g})}=U{{\Psi}}^{(0)}$ . Denote with Φ⁽⁰⁾ and ${{\Phi}}^{({N}_{g})}$ the realifications of Ψ⁽⁰⁾ and ${{\Psi}}^{({N}_{g})}$ , respectively, in ${\mathbb{R}}^{{2}^{{N}_{\text{bit}}}}$ with N_bit = n_bit + 1. Let P⁽⁰⁾ be a probability distribution over b4vs of N_bit grabits, such that ${\psi }^{(0)}={\partial }^{{N}_{\text{bit}}}{P}^{(0)}={a}^{(0)}{{\Phi}}^{(0)}$ with ${a}^{(i)}\in \mathbb{R}{\backslash}\left\{0\right\}$ . Then ${\psi }^{({N}_{g})}\equiv {\partial }^{{N}_{\text{bit}}}{P}^{({N}_{g})}$ with ${P}^{({N}_{g})}=S\enspace {P}^{(0)}$ and $S={S}^{({N}_{g})}{S}^{({N}_{g}-1)}\dots {S}^{(1)}$ , where S⁽ⁱ⁾ are the stochastic matrices corresponding to gates U⁽ⁱ⁾ (cf equations (21) and (30) and S_CNOT described in section 3.2.1), satisfies ${\psi }^{({N}_{g})}={a}^{({N}_{g})}{{\Phi}}^{({N}_{g})}$ with ${a}^{({N}_{g})}\in \mathbb{R}{\backslash}\left\{0\right\}$ .

Proof. For all these gates, we have found corresponding stochastic maps that we can now iterate in the sequence defined by the quantum algorithm. I.e. for an arbitrary ${U}^{(i)}\in \mathcal{G}$ with realification ${\tilde{U}}^{(i)}$ there is a corresponding S⁽ⁱ⁾ such that with ${\psi }^{(i-1)}={a}^{(i-1)}{{\Phi}}^{(i-1)}={\partial }^{{N}_{\text{bit}}}{P}^{(i-1)}$ (and a⁽ⁱ⁻¹⁾ ≠ 0) and P⁽ⁱ⁾ = S⁽ⁱ⁾ P⁽ⁱ⁻¹⁾, we have ${\psi }^{(i)}={\partial }^{{N}_{\text{bit}}}{P}^{(i)}={a}^{(i)}{\tilde{U}}^{(i)}{{\Phi}}^{(i-1)}=({a}^{(i)}/{a}^{(i-1)}){\tilde{U}}^{(i)}{\psi }^{(i-1)}$ , i = 1, ..., N_g. Iterating this, we get

$\begin{align*}\hfill {\psi }^{({N}_{\text{g}})}& ={\partial }^{{N}_{\text{bit}}}{P}^{({N}_{\text{g}})}={a}^{({N}_{\text{g}})}{{\Phi}}^{({N}_{\text{g}})}={a}^{({N}_{\text{g}})}{\tilde{U}}^{({N}_{\text{g}})}{{\Phi}}^{({N}_{\text{g}}-1)}\hfill \\ \hfill & =\frac{{a}^{({N}_{\text{g}})}}{{a}^{({N}_{\text{g}}-1)}}{\tilde{U}}^{({N}_{\text{g}})}{\psi }^{({N}_{\text{g}}-1)}\hfill \\ \hfill & =\frac{{a}^{({N}_{\text{g}})}}{{a}^{({N}_{\text{g}}-1)}}\frac{{a}^{({N}_{\text{g}}-1)}}{{a}^{({N}_{\text{g}}-2)}}{\tilde{U}}^{({N}_{\text{g}})}{\tilde{U}}^{({N}_{\text{g}}-1)}{\psi }^{({N}_{\text{g}}-2)}\hfill \\ \hfill & =\dots =\frac{{a}^{({N}_{\text{g}})}}{{a}^{(0)}}{\tilde{U}}^{({N}_{\text{g}})}{\tilde{U}}^{({N}_{\text{g}}-1)}\dots {\tilde{U}}^{(1)}{\psi }^{(0)},\hfill \end{align*}$

i.e. up to an overall prefactor, the grabit state ψ⁽⁰⁾ is propagated with exactly the same quantum algorithm $\tilde{U}={\tilde{U}}^{({N}_{\text{g}})}{\tilde{U}}^{({N}_{\text{g}}-1)}\dots {\tilde{U}}^{(1)}$ as the realification Φ⁽⁰⁾: ${{\Phi}}^{(Ng)}={\tilde{U}}^{({N}_{\text{g}})}{\tilde{U}}^{({N}_{\text{g}}-1)}\dots {\tilde{U}}^{(1)}{{\Phi}}^{(0)}$ . At the same time we have ${P}^{({N}_{\text{g}})}={S}^{({N}_{\text{g}})}\dots {S}^{(1)}{P}^{(0)}$ , i.e. the stochastic map can indeed be composed from the stochastic maps corresponding to the universal gates discussed before. □

3.4. Demonstration with two simple quantum algorithms

We first illustrate the basic functioning of the gradient-state approach at the example of two quantum algorithms, before coming back to more advanced ones in section 5.

3.4.1. Deutsch–Josza algorithm

The Deutsch–Josza algorithm (DJA) is maybe the simplest quantum algorithm that demonstrates a quantum advantage over classical algorithms. In addition, one can clearly point to the origin of the advantage, namely many-body interference. The algorithm allows one to decide whether a Boolean function $f:{\left\{0,1\right\}}^{\otimes {N}_{\text{bit}}}\to \left\{0,1\right\}$ is constant or balanced, assuming that it has only these two options. Here, balanced means that f = 1 for as many inputs x as for which f = 0. Functions can be evaluated reversibly by keeping the inputs, and making the map bijective, via f: x, y ↦ x, y ⊕ f(x). Classically, one has to evaluate f at least once and at the most ${2}^{{N}_{\text{bit}}-1}+1$ times in order to decide, whereas the algorithm by Deutsch–Josza succeeds with probability one with a single reversible evaluation of f, applied to a superposition of all computational basis states. The algorithm has 5 steps: 0. Preparation of input states $\vert {0\rangle }^{\otimes {N}_{\text{bit}}}\otimes \vert 1\rangle$ . 1. Application of Hadamard gates on all qubits. 2. Reversible calculation of f, with the first N_bit qubits untouched, the last one receiving y ⊕ f(x). 3. Application of Hadamard gates on the first N_bit qubits and 4. Their measurement in the computational basis.

We illustrate the algorithm for the simplest case of two grabits and f(x) = x, i.e. f is balanced. The probability distribution over the 16 possible byte4-values after the first two steps above which we denote with |I⟩⟩ for each grabit, and |IJ⟩⟩ for two grabits, are

$\begin{equation}\vert {P}^{(0)}\rangle \rangle =\vert 02\rangle \rangle \enspace ,\end{equation} \tag{ 31 }$

$\begin{equation}\vert {P}^{(1)}\rangle \rangle =(\vert 00\rangle \rangle +\vert 03\rangle \rangle +\vert 20\rangle \rangle +\vert 23\rangle \rangle )/4.\end{equation} \tag{ 32 }$

The function evaluation f(x) permutes byte4 values with the condition of (i) giving the correct logical values and (ii) preserving the sign of the state. The concrete example f(x) = x translates to the logical map x, y ↦ x, y ⊕ x. So e.g. logical 10 is mapped to 11, which means that byte4 values 20 are mapped to 22, and 23 to 21. Thus, after the function evaluation, we have

$\begin{equation}\vert {P}^{(2)}\rangle \rangle =\left(\vert 00\rangle \rangle +\vert 03\rangle \rangle +\vert 22\rangle \rangle +\vert 21\rangle \rangle \right)\hspace{-2pt}/4.\end{equation} \tag{ 33 }$

After the last Hadamard gate on the first grabit, we find

$\begin{equation}\vert {P}^{(3)}\rangle \rangle =\left(\vert 00\rangle \rangle +\vert 20\rangle \rangle +\vert 03\rangle \rangle +\vert 23\rangle \rangle +\vert 32\rangle \rangle +\vert 02\rangle \rangle +\vert 31\rangle \rangle +\vert 01\rangle \rangle \right)\hspace{-2pt}/8,\end{equation} \tag{ 34 }$

which translates to a two-grabit wavefunction

$\begin{align}\hfill \vert {\psi }^{(3)}\rangle & =\left(\right.\vert 00\rangle +\vert 10\rangle +\vert 0\rangle (-\vert 1\rangle )+\vert 1\rangle (-\vert 1\rangle )+(-\vert 1\rangle )\vert 1\rangle +\vert 0\rangle \vert 1\rangle +(-\vert 1\rangle )(-\vert 0\rangle )+\vert 0\rangle (-\vert 0\rangle )/8\hfill \\ \hfill & =(0\vert 00\rangle +0\vert 01\rangle +2\vert 10\rangle -2\vert 11\rangle )/8\hfill \\ \hfill & =(1/2)\vert 1\rangle \vert -\rangle ,\hfill \end{align} \tag{ 35 }$

with |−⟩ = (|0⟩ − |1⟩)/2. So indeed the state is propagated correctly, and we can extract ψ⁽³⁾ from the final probability distribution. For numerical implementation the ReIm grabit is not needed, as the state remains real in the computational basis at all times, i.e. we can set N_bit = n_bit here. A measurement that gives logical values distributed according to a Born rule (be it Born-1 rule or the usual Born-2 rule) would lead to a measurement outcome '1' with probability one for the first grabit. In this sense, the stochastic emulation gives the answer 'f(x) is balanced' with a single application of the gate that evaluates the function f. However, in the numerical implementation, f(x) has to be evaluated for each realization (i.e. N_ball many times!), and, furthermore, N_ball has to be large in order to have near-perfect cancellation of the amplitude on logical state |0⟩ of the first grabit. Secondly, the physical probability distribution at the end of the algorithm is a uniform distribution over all logical states,

$\begin{equation}\vert {\tilde{p}}_{3}\rangle \rangle =\left(\vert 00\rangle \rangle +\vert 01\rangle \rangle +\vert 10\rangle \rangle +\vert 11\rangle \rangle \right)\hspace{-2pt}/4.\end{equation} \tag{ 36 }$

These issues will be considered in section 4, but before we make them more quantitative with a numerical emulation of the Bernstein–Vazirani algorithm.

3.4.2. Bernstein–Vazirani algorithm

The Bernstein–Vazirani algorithm can be considered a generalization of the DJA. It allows one to determine a linear function f(x) = a ⋅ x mod 2 in a single shot, where a is a binary string on n_bit qubits. The quantum circuit is exactly as for the DJA. At the end of the algorithm, the first register of n_bit qubits contains a (indeed the shown example of the DJA corresponds to a = 1, and we saw that the first qubit is in state |1⟩ at the end of the algorithm). One might still apply a final Hadamard on the last grabit, in which case we expect a single peak in the final state, |ψ⟩₃ = |a⟩|1⟩. Figure 2 shows a stochastic emulation with n_bit = 3, N_ball = 10 000 for a = 01₂ (the subscript ₂ indicates binary representation). We hence expect a peak of |ψ⟩₃ at logical values 011₂ = 3, which is found indeed. The figure also show an analysis of the error-probability as function of N_ball for different values of n_bit, based on identifying the logical value i with the largest absolute amplitude |ψ_i| at the end of the algorithm as output a' of the algorithm. 500 different realizations of the algorithm were run with random initial a at given n_bit (the n_bit indicated in the legend refers to the total number of grabits, including the last one initialized in |1⟩, so the largest integer a for n_bit = 7 is 64), and the error probability estimated as relative frequency over all realizations of the algorithm and initial values of a where in the end a' ≠ a. We see that the probability that this decision rule leads to the wrong value of a decreases roughly exponentially with the number of realizations once N_ball is sufficiently large.

**Figure 2.** Stochastic emulation of Bernstein–Vazirani algorithm. Left: estimate ${\hat{\psi }}_{i}$ of final state |ψ⁽³⁾⟩ for n_bit = 3 (without using a ReIm grabit), N_ball = 10⁵, a = 01₂ = 1, including a final Hadamard gate on the third grabit initialized in 1, i.e. the expected output peak of the three-grabit |ψ⟩ is at 011₂ = 3. Right: error probability as estimated from 100 random values of a as function of N_ball for n_bit = 2, ..., 10 from bottom to top in a lin-log plot. Missing values correspond to vanishing error of the estimate (=relative frequency); lines are guides to the eye.
Download figure:
Standard image High-resolution image

**Figure 2.** Stochastic emulation of Bernstein–Vazirani algorithm. Left: estimate ${\hat{\psi }}_{i}$ of final state |ψ⁽³⁾⟩ for n_bit = 3 (without using a ReIm grabit), N_ball = 10⁵, a = 01₂ = 1, including a final Hadamard gate on the third grabit initialized in 1, i.e. the expected output peak of the three-grabit |ψ⟩ is at 011₂ = 3. Right: error probability as estimated from 100 random values of a as function of N_ball for n_bit = 2, ..., 10 from bottom to top in a lin-log plot. Missing values correspond to vanishing error of the estimate (=relative frequency); lines are guides to the eye.
Download figure:
Standard image High-resolution image

4. Refreshments

The examples of the Deutsch–Josza and the Bernstein–Vazirani quantum algorithm illustrate that while the propagation of the multi-grabit wave function works correctly, theorem 3.3 is not enough for obtaining a useful stochastic algorithm from a quantum algorithm. Indeed, extracting the final state as $\psi ={\partial }^{{N}_{\text{bit}}}\;P$ from the final probability distribution and finding its maximum is clearly not efficient when the number of qubits becomes large. Rather we want that the flow of physical probability follows the quantum mechanical wavefunction and leads to a peak on the correct result in the end, such that the correct result is found with high probability, just as with a quantum computer. Already at the example of the two-grabit Deutsch-algorithm we can see, however, that the physical probability ${\tilde{p}}_{1}$ for finding the first grabit in state 1 is not 1. In fact, the final physical probability distribution is evenly distributed over all computational basis states.

This issue can be seen most clearly already at the example of the simplest non-trivial quantum algorithm showing interference: U = H² = 1₂. Starting with |0⟩, the final state is |0⟩ as well. The byte4-value probability distribution of the grabit-emulation of this circuit is easily found to be P = (1/2,0,1/4,1/4)^t, from which results the final grabit-state |ψ⟩ = |0⟩/2, i.e. up to normalization the correct quantum state is perfectly reproduced. But the physical probability distribution is $\tilde{\boldsymbol{p}}=(1,1)/2$ , i.e. there is no contrast at all.

This can be generalized to n iterations of H² and reveals that there is another issue: with S_H from (21) we find that the initial byte4-probability distribution P ₀ = (1, 0, 0, 0) corresponding to initial state |0⟩ is propagated to

$\begin{equation}{\boldsymbol{P}}^{{H}^{2n}}=\frac{1}{4}\left(\begin{matrix}\hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 1\hfill \end{matrix}\right)+\frac{1}{{2}^{n+1}}\left(\begin{matrix}\hfill 1\hfill \\ \hfill -1\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \end{matrix}\right),\end{equation} \tag{ 37 }$

which translates to a grabit-state $\vert {\psi }^{{H}^{2n}}\rangle =\frac{1}{{2}^{n}}\vert 0\rangle$ . So while the correct quantum state is reproduced up to normalization, its amplitude decreases exponentially with the number of Hadamard-gates. If the estimation of the quantum state is based on a finite number of realizations, this means that the number of realizations that effectively contribute to $\vert {\psi }^{{H}^{2n}}\rangle$ decreases exponentially, and hence the statistical error, measured as the deviation in two-norm of the multi-grabit state estimated from the histogram of realized byte4 values (and normalized to two-norm equals one) from the correct quantum mechanical state, increases exponentially with the number of gates n_gate. This is confirmed numerically in figure 3(a). In appendix B we show that the reduction of the effectively contributing samples is in fact a measure of destructive interference and that for all quantum gates in the universal set considered here and all input states, a reduction by a factor 1/2 is the worst case.

**Figure 3.** Error ${\Vert}\hat{\psi }-{\Psi}{{\Vert}}_{2}$ of the final state estimate compared to the exact quantum state Ψ after a sequence of Hadamard gates Hⁿ on the first grabit as function of the number n_H of Hadmards for N_bit = 2 grabits, starting in the initial state |00⟩. $\hat{\psi }$ is renormalized to ${\Vert}\hat{\psi }{{\Vert}}_{2}=1$ for comparison with Ψ. The second (ReIm-)grabit remains in |0⟩. Left: without refreshment gates the error increases exponentially with the number of gates. The different curves are averaged over N_r = 50 realizations and correspond to different values of N_ball ranging from N_ball = 50 to N_ball = 5000 in 10 linear steps (from top to bottom at n_H = 1). Right: same as on the left but with an additional refreshment gate after each Hadamard gate for a sequence of up to 200 Hadamard gates on the first grabit in a log–log plot, i.e. $\mathrm{ln}{\Vert}\hat{\psi }-{\Psi}{{\Vert}}_{2}$ as function of ln n_H, where n_H now counts the total number of gates including the refreshment gates. Emulation with N_ball = 10 000 and N_r = 100 realizations. The fit gives a power law $\mathrm{ln}{\Vert}\hat{\psi }-{\Psi}{{\Vert}}_{2}\simeq -5.084\enspace 13+0.532\enspace 838\mathrm{ln}\enspace n$ , i.e. the error increase roughly as the square root of the number of gates.
Download figure:
Standard image High-resolution image

**Figure 3.** Error ${\Vert}\hat{\psi }-{\Psi}{{\Vert}}_{2}$ of the final state estimate compared to the exact quantum state Ψ after a sequence of Hadamard gates Hⁿ on the first grabit as function of the number n_H of Hadmards for N_bit = 2 grabits, starting in the initial state |00⟩. $\hat{\psi }$ is renormalized to ${\Vert}\hat{\psi }{{\Vert}}_{2}=1$ for comparison with Ψ. The second (ReIm-)grabit remains in |0⟩. Left: without refreshment gates the error increases exponentially with the number of gates. The different curves are averaged over N_r = 50 realizations and correspond to different values of N_ball ranging from N_ball = 50 to N_ball = 5000 in 10 linear steps (from top to bottom at n_H = 1). Right: same as on the left but with an additional refreshment gate after each Hadamard gate for a sequence of up to 200 Hadamard gates on the first grabit in a log–log plot, i.e. $\mathrm{ln}{\Vert}\hat{\psi }-{\Psi}{{\Vert}}_{2}$ as function of ln n_H, where n_H now counts the total number of gates including the refreshment gates. Emulation with N_ball = 10 000 and N_r = 100 realizations. The fit gives a power law $\mathrm{ln}{\Vert}\hat{\psi }-{\Psi}{{\Vert}}_{2}\simeq -5.084\enspace 13+0.532\enspace 838\mathrm{ln}\enspace n$ , i.e. the error increase roughly as the square root of the number of gates.
Download figure:
Standard image High-resolution image

These issues will be addressed now. A partial solution consists in using the gauge-degree of freedom in the parameterization of the grabit-states to enforce after each interference-generating gate the Born-1 rule, a process that will be called 'refreshing'.

4.1. General strategy

We are looking for a process that maps P to P ' (implying a map ψ ↦ ψ') such that the following two conditions (called 'refreshment conditions') are satisfied:

(a)

$\begin{equation}{\psi }^{\prime }=d\cdot \psi ,\end{equation} \tag{ 38 }$
i.e. up to a global factor d ≠ 0 the multi-grabit state remains unchanged. This is crucial for being able to continue propagating the correct multi-grabit state by the quantum algorithm;
(b)
And
$\begin{equation}{\tilde{p}}^{\prime }=c\vert \psi \vert =(c/\vert \mathrm{d}\vert )\vert {\psi }^{\prime }\vert ,\end{equation} \tag{ 39 }$
with c > 0, which ensures that a peaked ψ at the end of the quantum algorithm implies a peaked physical probability distribution ${\tilde{p}}^{\prime }$ .

As mentioned before, from a quantum mechanics perspective, one would prefer ${\tilde{p}}^{\prime }=\vert {\psi }^{\prime }{\vert }^{2}$ , but enforcing the Born-1 rule turns out to be simpler and is sufficient for having an emulation of the quantum algorithm that succeeds with high probability if the version with the Born-2 rule does.

In terms of the number of free parameters, we have ${4}^{{N}_{\text{bit}}}-1$ free byte4 probabilities P_I. Equations (38) and (39) are ${2}^{{N}_{\text{bit}}}$ equations each for the amplitudes, but also introduce two new real variables c, d. So all together there are ${4}^{{N}_{\text{bit}}}-{2}^{{N}_{\text{bit}}+1}+1$ free variables. Summing (39) over all i, we find $c=1/\bar{\psi }$ , with $\bar{\psi }\equiv {\sum }_{i}\vert {\psi }_{i}\vert$ . For N_bit = 1 there is only one free variable.

4.2. Refreshing a single grabit

In the case of a single grabit, equations (38) and (39) can be solved immediately. First consider the case ψ_i ⩾ 0 (i = 0, 1). Adding and subtracting equation (38) for i = 0, 1 then leads to

$\begin{equation}{P}_{2i}^{\prime }=\frac{1}{2}\left(\frac{1}{\bar{\psi }}+d\right){\psi }_{i},\end{equation} \tag{ 40 }$

$\begin{equation}{P}_{2i+1}^{\prime }=\frac{1}{2}\left(\frac{1}{\bar{\psi }}-d\right){\psi }_{i},\end{equation} \tag{ 41 }$

where d is the remaining free parameter. Positivity of the probabilities implies a range for d,

$\begin{equation}\mathrm{max}\left(-\frac{1}{\bar{\psi }},-\left(\frac{2}{{\psi }_{i}}-\frac{1}{\bar{\psi }}\right)\right)\leqslant d\leqslant \mathrm{min}\left(\frac{1}{\bar{\psi }},\left(\frac{2}{{\psi }_{i}}-\frac{1}{\bar{\psi }}\right)\right).\end{equation} \tag{ 42 }$

Hence, ${d}_{\mathrm{max}}=1/\bar{\psi }=1/({\psi }_{0}+{\psi }_{1})$ . In the case of a ψ_i < 0, it is easily checked that we can just replace ψ_i → |ψ_i| everywhere in the calculation of d_max, which leads to the general result ${d}_{\mathrm{max}}=1/\bar{\psi }$ , and

$\begin{equation}{\tilde{p}}_{i}^{\prime }=\frac{\vert {\psi }_{i}\vert }{\vert {\psi }_{0}\vert +\vert {\psi }_{1}\vert }=\frac{\vert {P}_{2i}-{P}_{2i+1}\vert }{\vert {P}_{0}-{P}_{1}\vert +\vert {P}_{2}-{P}_{3}\vert }.\end{equation} \tag{ 43 }$

Maximizing d under the constraint of positivity of the byte4 probabilities corresponds exactly to the gauge choice of maximum contrast. We see from (43) that for any ψ_i ≠ 0 this corresponds to having always either P_2i = 0 or P_2i+1 = 0. This has the important physical interpretation that refreshing means to make the state 'interference-free', i.e. the amplitude corresponds not to the difference of two byte4 probabilities anymore, but just to one of them, the other being zero. This has the desired consequence that the physical probability is equal to the probability obtained from the Born-1 rule.

For the numerical implementation, it suffices to refresh after each gate $\in \mathcal{I}$ , i.e. each gate that can potentially generate interference. We analyzed two different versions of a refreshment gate. Both of them need the estimation of the grabit state before refreshment and base the subsequent transformations on it, which renders refreshments non-linear in P . The non-linearity in P is also implied by the fact that refreshments map many different b4v probability distributions to the same interference-free b4v probability distribution. It is a step that reduces entropy.

A first implementation works in three steps:

(a)
Estimate the state from the histogram R of realized b4v's, i.e. evaluate ${\hat{\psi }}_{i}$ from (16). As the individual R_I converge for N_ball → ∞ to the P_I that determine their distribution, i.e. ${R}_{I}\enspace \stackrel{{N}_{\text{ball}}\to \infty }{\to }\enspace {P}_{I}$ , also $\hat{\psi }\enspace \stackrel{{N}_{\text{ball}}\to \infty }{\to }\enspace \psi$ .
(b)
For each logical value i, determine $\mathrm{sign}({\hat{\psi }}_{i})$ . A b4v I = 2i + 1 is called majority value (and I = 2i minority value) if R_2i+1 > R_2i, corresponding to $\mathrm{sign}({\hat{\psi }}_{i})< 0$ . For $\mathrm{sign}({\hat{\psi }}_{i})\geqslant 0$ , I = 2i is the majority value and I = 2i + 1 the minority value. Move all realizations corresponding to minority values to the corresponding majority values, a step called 'minority relocation' (MIRE). MIRE changes the probability distribution over logical values ${\tilde{p}}_{i}{\mapsto}{{\tilde{p}}^{M}}_{i}$ , and the state ψ to ψ^M but enforces ${\tilde{p}}_{i}^{M}=\vert {{\psi }^{M}}_{i}\vert$ .
(c)
Move realizations between the remaining byte4 values I = 2i + 1 (or I = 2i) until the original estimated ratios between probability amplitudes are restored as closely as possible, i.e. ${\hat{\psi }}_{i}^{M}{\mapsto}{\psi }_{i}^{\prime }\simeq d\enspace {\hat{\psi }}_{i}$ (a step called ROAR = 'restoration of original amplitude ratios').

Note that a perfect restoration of the original amplitude ratios may not always be possible, as the histograms lead to integer-ratios for the estimates of ψ_i. E.g. for N_ball = 11 and a histogram R = (4,0,4,3)^t/11, we estimate |ψ⟩ ∝ (4, 1). After MIRE, R ^M = (4,0,7,0)^t/11. With ROAR, the closest we can get to the estimated |ψ⟩ is R ^M = (9,0,2,0)^t/11, giving |ψ⟩ ∝ (4.5, 1). However, this type of error is of order 1/N_ball and hence becomes negligible for N_ball → ∞. The stochastic map that implements this first type of refreshments will be denoted by Rf₁.

In figure 3 we see at the example of ${H}^{{n}_{H}}$ on a single qubit that when the refreshment gate is implemented after each Hadamard gate, the error increases only $\propto \sqrt{{n}_{H}}$ compared to the exponential increase without the refreshment gates. As the Hadamard gates keep the state real at all times, we can compare with Ψ directly and do not need to consider the realification Φ. The decrease of the overall error compared to exact unitary propagation is $\propto 1/\sqrt{{N}_{\text{ball}}}$ , for both using the refreshment gates or not. Both behaviors can be understood analytically, see appendix A.

4.3. Multi-grabit refreshments

Since entangled many-grabit states are based on byte4-value correlations between different grabits it is clear that refreshing grabits independently is not sufficient. In fact, if correlations exist it is counterproductive, as the example of the quantum circuit in figure 4 shows. The byte-4 probability distributions |P_n⟩⟩ after step n (with the first step the first Hadamard gate) are |P₁⟩⟩ = (1/2)(|0⟩⟩ + |2⟩⟩)|0⟩⟩, |P₂⟩⟩ = (1/2)(|00⟩⟩ + |22⟩⟩), |P₃⟩⟩ = (1/4)(|00⟩⟩ + |20⟩⟩ + |02⟩⟩ + |32⟩⟩). To these correspond physical probability distributions $\vert {\tilde{p}}_{1}\rangle \rangle =(1/2)\vert 00\rangle \rangle +\vert 10\rangle \rangle$ , $\vert {\tilde{p}}_{2}\rangle \rangle =(1/2)\vert 00\rangle \rangle +\vert 11\rangle \rangle$ , $\vert {\tilde{p}}_{3}\rangle \rangle =(1/4)\vert 00\rangle \rangle +\vert 10\rangle \rangle +\vert 01\rangle \rangle +\vert 11\rangle \rangle$ , and grabit states |ψ₁⟩ = (1/2)(|00⟩ + |10⟩), |ψ₂⟩ = (1/2)(|00⟩ + |11⟩), |ψ₃⟩ = (1/4)(|00⟩ + |01⟩ + |10⟩ − |11⟩). We see that at each stage the Born-1 rule is satisfied, i.e. $\langle \langle i\vert {\tilde{p}}_{n}\rangle \rangle =\vert \langle i\vert {\psi }_{n}\rangle \vert$ . Compared to the simple sequence H², the CNOT between the Hadamard gates prevents interference, so the final state is interference-free and the Born-1 rule automatically fulfilled. Hence, there is no need for a refreshment gate. The reduced byte-4 probability distribution of the first grabit alone, however, looks as if the CNOT was not there, i.e. gives $\vert {P}_{3}^{(1)}\rangle \rangle =(1/2)\vert 0\rangle \rangle +(1/4)\vert 2\rangle \rangle +(1/4)\vert 3\rangle \rangle$ . If a refreshment of the first grabit is performed, it ends up in $\vert {P}_{4}^{(1)}\rangle \rangle =\vert 0\rangle \rangle$ . But since the correlations are kept during the refreshment process, this means that |20⟩⟩ ↦ |00⟩⟩ and |32⟩⟩ ↦ |02⟩⟩. So the physical probability distribution of the two grabits becomes $\vert {\tilde{p}}_{4}\rangle \rangle =(1/2)(\vert 00\rangle \rangle +\vert 01\rangle \rangle )$ , and the two-grabit wavefunction |ψ₄⟩ = (1/2)(|00⟩ + |01⟩). So while the Born-1 rule is satisfied, we end up with the wrong state and the wrong physical probability distribution.

**Figure 4.** Exemplary quantum circuit to demonstrate the need for multi-grabit refreshments.
Download figure:
Standard image High-resolution image

The above example clearly shows that when correlations are present, single-grabit refreshments are inappropriate. However, the algorithm presented for single-grabit refreshments can be easily generalized to multi-grabit refreshments, by determining majority and minority contributions to each byte logical value of all grabits combined, rather than for each grabit separately. Since all grabit states in the example given are interference-free, the refreshment algorithm will terminate immediately without doing anything in that case. In general, many different byte-4 values will contribute to the majority- and minority-components of a multi-grabit byte logical value. Indeed, equation (13) can be written as

$\begin{equation}{\hat{\psi }}_{\boldsymbol{i}}=\sum\limits _{\boldsymbol{\sigma },{\pi }_{\sigma }=0}\enspace {R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}-\sum\limits _{\boldsymbol{\sigma },{\pi }_{\sigma }=1}\enspace {R}_{2\boldsymbol{i}+\boldsymbol{\sigma }},\end{equation} \tag{ 44 }$

where ${\pi }_{\boldsymbol{\sigma }}={\sum }_{i=1}^{{N}_{\text{bit}}}\enspace {\sigma }_{i}\enspace \text{mod}\;2$ is the parity of a given gradient-value σ . The majority contributions to the estimate ${\hat{\psi }}_{\boldsymbol{i}}$ are the ones with a parity that determines the overall sign of ${\hat{\psi }}_{\boldsymbol{i}}$ . The fact that different majority and minority components can contribute to the same ${\hat{\psi }}_{\boldsymbol{i}}$ is owed to the possibility of shifting signs between different factors in a multi-grabit state. The generalization of Rf₁ in section 4.2 is the following:

(a)
Estimate the multi-grabit state from (16), yielding ${\hat{\psi }}_{\boldsymbol{i}}$ .
(b)
For each logical value i , determine $\mathrm{sign}({\hat{\psi }}_{\boldsymbol{i}})$ , which in turn determines what are the majority and minority contributions to ${\hat{\psi }}_{\boldsymbol{i}}$ . In MIRE move all realizations corresponding to minority values to gradient-value σ = 0 if the majority parity is even, or σ = (0, ..., 0, 1) if the majority parity is odd, i.e. one chooses a single canonical majority byte4 value for each I by always locating the sign in the same grabit (a step we call 'sign concentration'). This changes the probability distribution over logical values, ${\tilde{p}}_{i}{\mapsto}{{\tilde{p}}^{\text{M}}}_{i}$ and ${\psi }_{i}{\mapsto}{\psi }_{i}^{\text{M}}$ , modifying in general the amplitude ratios, but keeps the signs of the ${\hat{\psi }}_{\boldsymbol{i}}$ and enforces ${\hat{\tilde{p}}}_{i}^{\text{M}}=\vert {\hat{{\psi }_{i}}}^{\text{M}}\vert$ .
(c)
Move realizations between the remaining byte4 values until the original ratios between estimated probability amplitudes are restored as closely as possible, i.e. ${\hat{\psi }}_{i}^{\text{M}}{\mapsto}{\hat{\psi }}_{i}^{\prime }\simeq d\enspace {\hat{\psi }}_{i}$ (ROAR).

The whole algorithm can be implemented rather efficiently with an effort ∝N_ball, by working on the realized support of the histogram R only, and keeping a table of realizations after MIRE corresponding to majority and minority byte-4 values for each i , as well as a deviation table δ_i of the new estimate $\hat{\psi }$ from the stored one before MIRE (and both normalized to one-norm = 1). In ROAR one then flips successively realizations of i with δ_i > 0 to ones of j with δ_j < 0, updating both the lists of realizations and the deviation table (which only needs a counting up and down of the histogram values, no re-evaluation based on all realizations), till for the involved i or j the sign of the deviation-table entry changes, or no more realizations to be flipped are available for the given i or j , in which case one moves on to the next i or j . Alternatively, one can implement ROAR with a Monte-Carlo algorithm based on random maps I ↦ J with a target function ${\Vert}{\hat{\psi }}^{M}-\hat{\psi }{{\Vert}}_{2}$ to be minimized.

An alternative refreshment algorithm is this one (Rf₂, 'removal of socket'):

(a)
Remove all pairs of b4vs that differ only by their gradient value. This removes the 'socket' of realizations that do not contribute to the ${\hat{\psi }}_{i}$ and reduces N_ball to a new value ${{N}_{\text{ball}}}^{\mathrm{R}\mathrm{S}}$ .
(b)
Replicate all remaining b4vs an integer number of times, such that the new total number ${{N}_{\text{ball}}}^{\prime }$ of realizations is mapped to ${{N}_{\text{ball}}}^{\prime }=k{{N}_{\text{ball}}}^{\mathrm{R}\mathrm{S}}\geqslant {N}_{\text{ball}}$ with $k\in \mathbb{N}$ such that ${{N}_{\text{ball}}}^{\prime }$ is the smallest such integer $\geqslant {N}_{\text{ball}}$ .

While Rf₂ appears to be simpler than Rf₁, it leads to a diffusive process of ${{N}_{\text{ball}}}^{\prime }$ , which requires the allocation of sufficient memory such that values ${{N}_{\text{ball}}}^{\prime } > {N}_{\text{ball}}$ can be accommodated, or the reinitialization of the memory array containing all b4vs after the refreshment. However, ${{N}_{\text{ball}}}^{\prime }\leqslant 2{N}_{\text{ball}}$ at all times, due to the fact that the second step is executed only if ${{N}_{\text{ball}}}^{\mathrm{R}\mathrm{S}}$ drops below the initial N_ball. At first sight, the 2nd implementation also seems to avoid estimation of ψ, but an equivalent implementation is given by simply recreating a new set of $k{{N}_{\text{ball}}}^{\mathrm{R}\mathrm{S}}$ realizations based on $\hat{\psi }$ , by replicating each set of existing i k times, where due to ${\hat{\psi }}_{i}\in \mathbb{Q}$ we can choose the smallest k such that $k{\hat{\psi }}_{i}\in \mathbb{N}\enspace \forall \enspace i$ , and ${{N}_{\text{ball}}}^{\prime }\geqslant {N}_{\text{ball}}$ . A practical modification of Rf₂ (called Rf₃) consists in allocating a fixed memory space for 2N_ball realizations. After removing the socket and calculating the estimate $\hat{\psi }$ , we create ${{N}_{\text{ball}}}^{\prime }$ realizations by assigning $\lfloor 2{\hat{\psi }}_{j}{N}_{\text{ball}}\rfloor$ realizations to j and then distribute all remaining samples ${N}_{\text{ball,}\;\text{rest}}=2{N}_{\text{ball}}-{{N}_{\text{ball}}}^{\prime }$ onto those j that have the highest fractional value $2{\hat{\psi }}_{j}{N}_{\text{ball}}-\lfloor 2{\hat{\psi }}_{j}{N}_{\text{ball}}\rfloor$ remaining, thereby minimizing the deviation ${\Vert}\hat{\psi }-{\hat{\psi }}^{\prime }{{\Vert}}_{1}$ between the state before refreshment $\hat{\psi }$ and the state after refreshment ${\hat{\psi }}^{\prime }$ while using all the available memory space to represent the grabit state.

4.4. Efficiency of refreshments

The following theorem shows that the grabit approach can perfectly emulate a quantum algorithm via a stochastic classical algorithm in the sense that for N_ball → ∞ and refreshments implemented, the final physical probability distribution is the same as the quantum mechanical one, up to the different Born rule:

Theorem 4.1. Let U be a unitary quantum algorithm of finite depth N_g, Φ⁽⁰⁾, ${{\Phi}}^{({N}_{g})}$ the realifications of the states Ψ⁽⁰⁾, and ${{\Psi}}^{({N}_{g})}$ as in theorem 3.3, and $S={S}^{({N}_{g})}{S}^{({N}_{g}-1)}\dots {S}^{(1)}$ , the corresponding stochastic map on byte-4 values that emulates U. Let ${S}^{(R,{N}_{g})}={R}^{({N}_{g})}{S}^{({N}_{g})}{R}^{({N}_{g}-1)}{S}^{({N}_{g}-1)}\dots {R}^{(1)}{S}^{(1)}$ be a stochastic map on byte-4 values with each stochastic matrix ${S}^{(j)}\in \mathcal{I}$ followed by an N_bit-grabit refreshment gate, i.e. R^(j) = Rf₁ for all j with ${S}^{(j)}\in \mathcal{I}$ (alternatively: R^(j) = Rf₂ for all j with ${S}^{(j)}\in \mathcal{I}$ ), and R^(j) = Id else, where Id is the identity operation on all byte-4 values. Let ${P}_{I}^{(R,{N}_{g})}$ be the byte-4 probabilities produced by the stochastic map ${S}^{(R,{N}_{g})}$ acting on an initial P⁽⁰⁾ with Φ⁽⁰⁾ = a⁽⁰⁾∂_N P⁽⁰⁾, ${P}^{(R,{N}_{g})}={S}^{(R,{N}_{g})}{P}^{(0)}$ . Then the physical probabilities in the final state ${\tilde{p}}_{\boldsymbol{i}}^{(R,{N}_{g})}={\sum }_{\boldsymbol{\sigma }\in {\left\{0,1\right\}}^{{N}_{\text{bit}}}}\enspace {P}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{(R,{N}_{g})}$ , satisfy

$\begin{equation}{\tilde{p}}_{i}^{(R,{N}_{g})}=\underset{{N}_{\text{ball}}\to \infty }{\mathrm{lim}}{\hat{\tilde{p}}}_{i}^{(R,{N}_{g},{N}_{\text{ball}})}=\vert {{\Phi}}_{i}^{({N}_{g})}\vert /{\Vert}\enspace {{\Phi}}^{({N}_{g})}\enspace {{\Vert}}_{1},\end{equation} \tag{ 45 }$

where the estimator ${\hat{\tilde{p}}}_{i}^{(R,{N}_{g},{N}_{\text{ball}})}$ is given by (18) with ${R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}={R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{(R,{N}_{g},{N}_{\text{ball}})}$ , the relative frequencies of the byte-4 values for N_ball realizations.

In other words, for N_ball → ∞, the final physical probability distribution on byte-logical values will be given by the Born-1 rule based on the true quantum-mechanical (realified) state Φ, up to the different normalization.

Proof. Let

$\begin{equation}{\hat{\tilde{p}}}_{i}^{(R,{N}_{g},{N}_{\text{ball}})}\equiv \sum\limits _{\boldsymbol{\sigma }}\enspace {R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{(R,{N}_{g},{N}_{\text{ball}})}\end{equation} \tag{ 46 }$

be the estimate of the physical probabilities after N_g gates based on the relative frequencies for the b4vs with N_ball realizations and including the refresh operations. By definition of the refresh operations, we have

$\begin{equation}{\hat{\tilde{p}}}_{i}^{(R,{N}_{g},{{N}_{\text{ball}}}^{\prime })}=\vert {\hat{\psi }}_{i}^{(R,{N}_{g},{N}_{\text{ball}})}\vert ,\end{equation} \tag{ 47 }$

where in general ${{N}_{\text{ball}}}^{\prime }\geqslant {N}_{\text{ball}}$ . Now take the limit ${\mathrm{lim}}_{{N}_{\text{ball}}\to \infty }$ on both sides of equation (47). On the rhs we obtain

$\begin{equation}\underset{{N}_{\text{ball}}\to \infty }{\mathrm{lim}}\vert {\hat{\psi }}_{\boldsymbol{i}}^{(R,{N}_{g},{N}_{\text{ball}})}\vert =\vert \underset{{N}_{\text{ball}}\to \infty }{\mathrm{lim}}{\hat{\psi }}_{\boldsymbol{i}}^{(R,{N}_{g},{N}_{\text{ball}})}\vert \end{equation} \tag{ 48 }$

$\begin{equation}=\vert \underset{{N}_{\text{ball}}\to \infty }{\mathrm{lim}}\sum\limits _{\boldsymbol{\sigma }}{(-1)}^{\sum\limits _{j}\;{\sigma }_{j}}{R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{(R,{N}_{g},{N}_{\text{ball}})}\vert \end{equation} \tag{ 49 }$

$\begin{equation}=\vert \sum\limits _{\boldsymbol{\sigma }}{(-1)}^{\sum\limits _{j}\;{\sigma }_{j}}{P}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{(R,{N}_{g})}\vert \end{equation} \tag{ 50 }$

$\begin{equation}=\vert {\psi }_{\boldsymbol{i}}^{(R,{N}_{g})}\vert \end{equation} \tag{ 51 }$

$\begin{equation}={c}^{({N}_{g})}\vert {\psi }_{\boldsymbol{i}}^{({N}_{g})}\vert \end{equation} \tag{ 52 }$

$\begin{equation}={d}^{({N}_{g})}\vert {\phi }_{\boldsymbol{i}}^{({N}_{g})}\vert .\end{equation} \tag{ 53 }$

The crucial step is the one from (49) and (50), which follows from the consistency of the maximum likelihood estimator R_I of the multinomial distribution P_I, i.e. ${\mathrm{lim}}_{{N}_{\text{ball}}\to \infty }{R}_{I}={P}_{I}$ [16], and the finite sum over σ that commutes with ${\mathrm{lim}}_{{N}_{\text{ball}}\to \infty }$ . Hence, the limit in the rhs of (48) exists and is equal to the lhs of (48). Equations (50) and (51) uses the definition (13). The step from (51) and (52) is based on the fact that for N_ball → ∞, ${\hat{\psi }}_{i}\to {\psi }_{i}\enspace \forall \enspace i$ , such that the refreshment based on reconstructing the state from $\hat{\psi }$ leaves the amplitude ratios of state ψ unaltered. The last step is the statement of theorem 3.3. On the lhs in (47) we obtain

$\begin{equation}\underset{{N}_{\text{ball}}\to \infty }{\mathrm{lim}}\vert {\hat{\tilde{p}}}_{i}^{(R,{N}_{g},{{N}_{\text{ball}}}^{\prime })}\vert =\underset{{{N}_{\text{ball}}}^{\prime }\to \infty }{\mathrm{lim}}\vert {\hat{\tilde{p}}}_{i}^{(R,{N}_{g},{{N}_{\text{ball}}}^{\prime })}\vert \end{equation} \tag{ 54 }$

$\begin{equation}=\vert \underset{{{N}_{\text{ball}}}^{\prime }\to \infty }{\mathrm{lim}}{\hat{\tilde{p}}}_{i}^{(R,{N}_{g},{{N}_{\text{ball}}}^{\prime })}\vert \end{equation} \tag{ 55 }$

$\begin{equation}=\vert \underset{{{N}_{\text{ball}}}^{\prime }\to \infty }{\mathrm{lim}}\sum\limits _{\boldsymbol{\sigma }}\;{R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{(R,{N}_{g},{{N}_{\text{ball}}}^{\prime })}\vert \end{equation} \tag{ 56 }$

$\begin{equation}=\vert \underset{{{N}_{\text{ball}}}^{\prime }\to \infty }{\mathrm{lim}}\sum\limits _{\boldsymbol{\sigma }}\;{P}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{(R,{N}_{g},{{N}_{\text{ball}}}^{\prime })}\vert \end{equation} \tag{ 57 }$

$\begin{equation}={\tilde{p}}_{\boldsymbol{i}}^{(R,{N}_{g})}.\end{equation} \tag{ 58 }$

Comparison of equations (53), (54) and (58) leads to equation (45) when taking into account that from the normalization ${\Vert}{\hat{\tilde{p}}}^{(R,{N}_{g},{{N}_{\text{ball}}}^{\prime })}{{\Vert}}_{1}=1$ the constant ${d}^{({N}_{g})}$ is found as $1/{\Vert}\enspace {{\Phi}}^{({N}_{g})}\enspace {{\Vert}}_{1}$ . This completes the proof. □

While theorem 4.1 shows that the grabit approach combined with refreshments leads to a stochastic emulation of any finite-depth and finite-size quantum algorithm in the sense that at the end of the stochastic process the physical probabilities for any outcome are given by the absolute values of the amplitudes of the true quantum mechanical state (i.e. the Born-1 rule, implying that we find with high probability an outcome that also has a high probability for the corresponding quantum algorithm), the limit N_ball → ∞ is of course impractical. It is clear that for completely delocalized ψ, N_ball must be at least of the order ${2}^{{N}_{\text{bit}}}$ to, on average, even have pairs that annihilate in the refreshment algorithm, or, equivalently, to estimate ψ. The refreshments hence cannot, in the limit of large N_bit, mend the problem of the exponentially decreasing prefactor of ψ if the contributions to at least one blv are spread over an exponential number of b4vs, without exponentially increasing N_ball. Unfortunately, it appears that the majority of useful quantum algorithms use an exponentially large amount of interference [4, 5, 12]: they start with creating a superposition of all computational basis states and only close the such opened interferometer at the very end, e.g. with an inverse QFT, where amplitudes on all other blvs will contribute with different phases to any ψ_i.

We remark that N_g does not explicitly enter in the proof of theorem 4.1. In fact, one could do the refreshments at the very end of the algorithm, based simply on theorem 3.3, dealing with the prefactor that decreases exponentially with N_g by exponentially increasing N_ball. This has the advantage of allowing massive parallelization of the emulation until the very end, with one thread per realization, minimal memory of just 2N_bit bits, and basic Boolean operations on those (plus a random number generator that generates a random bit for each gate in $\mathcal{I}$ ). At the example of Hⁿ on a single qubit, we have seen, however, that in practice it is worthwhile to refresh after each gate $\in \mathcal{I}$ , even for substantial oversampling, in order to prevent even faster accumulation of errors in ψ. For parallelization this requires re-synchronization of all threads for each refreshment gate. Rather than calculating analytical bounds on N_ball required for a certain precision of $\tilde{p}$ , we study in the next section the required scaling of N_ball numerically for the paradigmatic quantum Fourier transform and the quantum approximate optimization algorithm (QAOA).

5. Quantum algorithms

We now show how the above emulation of a many-body quantum state and elementary quantum gates leads naturally to the stochastic emulation of quantum algorithms on a classical computer.

5.1. Quantum Fourier transform

The quantum Fourier transform on N_bit qubits realizes a unitary transformation with matrix elements U_ij given in the computational basis as

$\begin{equation}{U}_{jk}^{\text{QFT}}=\frac{1}{{2}^{{N}_{\text{bit}}/2}}{e}^{2\pi j\enspace k/{2}^{{N}_{\text{bit}}}}.\end{equation} \tag{ 59 }$

It is a fundamental subroutine for some of the most important quantum algorithms, such as order finding, Shor's factoring algorithm, and the hidden-subgroup problem. It can be efficiently realized with a series of Hadamard gates followed by a sequence of phase-gates R_K with

$\begin{equation}{R}_{k}=\left(\begin{matrix}\hfill 1\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {e}^{2\pi i/{2}^{k}}\hfill \end{matrix}\right)\end{equation} \tag{ 60 }$

on each qubit, see [15] section 5.1, followed by a sequence of swap gates that exchange qubit labels 1 and N_bit, 2 and N_bit − 1, ..., N_bit/2 and N_bit/2 + 1 in the case N_bit is even (or (N_bit − 1)/2 and (N_bit − 1)/2 + 1 in the case N_bit is odd).

The most basic use of the QFT is to perform a discrete Fourier transform of amplitudes of computational basis states, i.e. a state of the form $\vert \psi \rangle ={\sum }_{j=0}^{{2}^{{N}_{\text{bit}}}-1}\enspace {x}_{j}\vert j\rangle$ is mapped to $\vert \tilde{\psi }\rangle ={\sum }_{j,k=0}^{{2}^{{N}_{\text{bit}}}-1}\enspace {x}_{j}{U}_{jk}\vert k\rangle$ . As such, the QFT is not very useful yet, as the state collapses to a random computational state when read out, rather than the register containing a desired Fourier coefficient y_k = x_j U_jk, but it becomes useful if the state is periodic. In that case, the Fourier coefficients will be highly peaked on some values k, and the state will collapse to one of those with high probability when measured, which allows one to recover the periodicity of the state.

Since R_k is a complex phase gate, we need to spend a ReIm-grabit to emulate the QFT, in which case the R_k gate becomes a controlled-rotation gate of the form (24) with ϕ = 2π/2^k. Figure 5 shows that the QFT can indeed be emulated stochastically using grabits. Panel (a) shows the output state for a periodic input state of the form |ψ⟩ = (|0⟩ + |p⟩ + |2p⟩ + ⋯) with p = 3 for N_bit = 5 (including the ReIm grabit), N_ball = 10⁵, compared to the exact state obtained from propagating the input state with the unitary (59). One expects a peak at 'wavenumber' 2 × 16/3, (where the 2 is from mapping to a real wavefunction by using the ReIm grabit) and integer multiples, but since this is not an integer, the peak gets distributed over neighboring wavenumbers. Panel (b) is for p = 4 for N_bit = 6, N_ball = 10⁵. Here one clearly sees the expected peaks at wavenumber 2 × 32/4 and integer multiples. An important question is how the N_ball needed to find the correct period of a periodic function via the QFT scales with n_bit. This is analysed in figure 6, where the inverse QFT is used to calculate the period of a state $\vert \tilde{k}\rangle =\frac{1}{\sqrt{{2}^{{n}_{\text{bit}}}}}{\bigotimes}_{j=1}^{{n}_{\text{bit}}}(\vert 0\rangle +\mathrm{exp}(2\pi ik/{2}^{j})\vert 1\rangle )$ prepared in the Fourier basis, so that ${\text{QFT}}^{-1}\vert \tilde{k}\rangle =\vert k\rangle$ , where $k > {2}^{{n}_{\text{bit}}-1}$ was chosen [29]. If ${\text{argmax}}_{l}\enspace {\hat{\tilde{p}}}_{l}=k$ , after running the algorithm that run is considered a success. The figure shows that for a fixed success ratio (i.e. ratio of number of successful runs to the total number of runs), N_ball scales exponentially with n_bit. A fit to N_ball = a exp(b n_bit) with a, b as fit parameters generates the scaling law ${N}_{\text{ball}}=3.46\cdot \mathrm{exp}(0.7{n}_{\text{bit}})\simeq 3.46\cdot {2}^{{n}_{\text{bit}}}$ .

**Figure 5.** Comparison of the final state estimate $\hat{\psi }$ of QFT from a stochastic emulation of the algorithm (blue dots) with exact propagation Φ (orange lines; values only defined on integer support; computational states counted here from 1 instead from 0). $\hat{\psi }$ is renormalized to ${\Vert}\hat{\psi }{{\Vert}}_{2}=1$ for the comparison. Left: N_bit = 5, N_ball = 10⁵, initial periodic state with period 3 (over the first 4 grabits). Right: N_bit = 6 including the ReIm-grabit, N_ball = 10⁵, initial periodic state with period 4 (over the first 5 grabits).
Download figure:
Standard image High-resolution image

**Figure 5.** Comparison of the final state estimate $\hat{\psi }$ of QFT from a stochastic emulation of the algorithm (blue dots) with exact propagation Φ (orange lines; values only defined on integer support; computational states counted here from 1 instead from 0). $\hat{\psi }$ is renormalized to ${\Vert}\hat{\psi }{{\Vert}}_{2}=1$ for the comparison. Left: N_bit = 5, N_ball = 10⁵, initial periodic state with period 3 (over the first 4 grabits). Right: N_bit = 6 including the ReIm-grabit, N_ball = 10⁵, initial periodic state with period 4 (over the first 5 grabits).
Download figure:
Standard image High-resolution image

**Figure 6.** Minimum N_ball needed in order to successfully run an inverse QFT in at least 10% of trials as function of the size of the circuit in terms of n_bit. The number of gates in each circuit scales like $\mathcal{O}({{n}_{\text{bit}}}^{2})$ . The blue line with crosses shows a fit of the form N_ball = a exp(*b n*_bit) with refreshment Rf₃ after each gate while the red one with points shows a similar fit without refreshment. The fits result in an exponential scaling of N_ball ∝ exp(0.7n_bit) for the case with refreshment and N_ball ∝ exp(2.9n_bit) without refreshment, i.e. refreshments improve the scaling of the minimum N_ball significantly, but still require exponentially large N_ball. The inset shows the same data in a lin-log plot.
Download figure:
Standard image High-resolution image

**Figure 6.** Minimum N_ball needed in order to successfully run an inverse QFT in at least 10% of trials as function of the size of the circuit in terms of n_bit. The number of gates in each circuit scales like $\mathcal{O}({{n}_{\text{bit}}}^{2})$ . The blue line with crosses shows a fit of the form N_ball = a exp(*b n*_bit) with refreshment Rf₃ after each gate while the red one with points shows a similar fit without refreshment. The fits result in an exponential scaling of N_ball ∝ exp(0.7n_bit) for the case with refreshment and N_ball ∝ exp(2.9n_bit) without refreshment, i.e. refreshments improve the scaling of the minimum N_ball significantly, but still require exponentially large N_ball. The inset shows the same data in a lin-log plot.
Download figure:
Standard image High-resolution image

The culprit for this scaling is the estimation of the quantum state needed in the refreshment. So while refreshments help to fight the exponential decrease of the amplitude of the state with the number of gates and quickly become necessary when the number of interference-creating gates increases with the depth of the algorithm, they still require in general an exponentially large number of realizations. For algorithms that are p-blocked one might envisage refreshments only for the correlated blocks by keeping track of the blocks of correlated qubits as proposed in [2]. In that case the grabit emulation should be efficient, but not more so than the method in [2].

5.2. Quantum approximate optimization algorithm

The quantum approximate optimization algorithm [17] (QAOA) offers approximate solutions for combinatorial optimization problems. The cost function of the optimization problem can be translated to the expectation value of a cost Hamiltonian H_C. This Hamiltonian combined with a mixer Hamiltonian H_B creates the variational ansatz of the QAOA,

$\begin{equation*}\vert {\psi }_{p}(\boldsymbol{\gamma },\boldsymbol{\beta })\rangle ={\text{e}}^{-\text{i}{\beta }_{p}{H}_{\mathrm{B}}}{\text{e}}^{-\text{i}{\gamma }_{p}{H}_{\mathrm{C}}}\dots {\text{e}}^{-\text{i}{\beta }_{1}{H}_{\mathrm{B}}}{\text{e}}^{-\text{i}{\gamma }_{1}{H}_{\mathrm{C}}}\vert {+\rangle }^{\otimes {n}_{\text{bit}}},\end{equation*}$

where p is an input parameter that sets the depth of the ansatz. A classical optimization over ( γ , β ) aims at minimizing the expectation value ⟨ψ_p( γ , β )|H_C|ψ_p( γ , β )⟩, the resulting 'cost function' of the optimization problem. For p = 1 and some regular graphs of bounded degree, the cost function can be calculated analytically [17, 18]. In all other cases, one needs to estimate ⟨ψ_p( γ , β )|H_C|ψ_p( γ , β )⟩ by sufficiently large sampling, i.e. a large number of 'shots' of the quantum algorithm is required for many settings in order to perform the classical optimization step. Additional samplings are required in the end to estimate the ground state energy or to find the ground state with a given fixed success probability. It is then interesting to compare the total number of shots needed with N_ball in the grabit approach, as N_ball has a similar meaning as the number of samples drawn. This comparison is shown in figure 7 for QAOA designed to solve a portfolio optimization problem (see appendix C), where we plot the probability P_gs to find the system in the true ground state of H_C after the optimization as function of the number of shots used, and the same quantity for the grabit approach as function of N_ball. The number of shots used was calculated by simulating the quantum algorithm in form of a quantum circuit using the density matrix simulation software QASM [19], see figure 8 for an example circuit with p = 1. Each data point is the mean of 100 optimization runs. The spread in the QASM case is due to different chosen values for the number of shots per step and the number of steps in total. We further showcase the dependency of the estimation error of the final state compared to the exact quantum state when simulating an exemplary QAOA circuit for different values of N_ball as function of n_gate. The latter figure shows that the estimate of the grabit state indeed approaches the true state for high enough N_ball. However, we also see in the left figure that for N_ball equal to the number of shots in the quantum algorithm, the latter still outperforms the grabit approach slightly. For larger p the circuit depth increases and even larger values N_ball are needed for high accuracy, whereas in the quantum case the number of shots increases only due to additional optimization steps. It also becomes clear that both quantum circuit and grabit approach are by far outperformed by the simplest classical approach of going through all possible combinations, of which there are just 2⁵ = 32 in the present problem.

**Figure 7.** Comparison of the performance of the grabit approach and direct implementation of QAOA (as simulated with QASM) for a portfolio optimization problem on n_bit = 5 qubits with p = 1 (see appendix C). Left: the ground state probability P_gs after running the QAOA circuit as a function of the total number of shots for the QASM simulator, and of the value of N_ball for the grabit approach. The points show the results for the QASM simulator while the crosses represent the grabit approach. Each shot/sample resembles one evaluation of the cost function. The horizontal blue line shows the result of a statevector simulation. Right: ${\Vert}\hat{\psi }-\bar{{\Psi}}{{\Vert}}_{1}$ , with $\bar{{\Psi}}=(\vert {{\Psi}}_{0}{\vert }^{2},\dots ,\vert {{\Psi}}_{{2}^{{n}_{\text{bit}}}-1}{\vert }^{2})$ , as a function of the number of executed gates n_gate in a single QAOA circuit for p = 1, see figure 8, for different values of N_ball = 10, 100, 500, 1000, 2000, 3000, 10 000, 100 000 (top to bottom). ${N}_{\text{ball}}\gg {2}^{{n}_{\text{bit}}}=32$ is required to get a result with a low deviation from the true state.
Download figure:
Standard image High-resolution image

**Figure 7.** Comparison of the performance of the grabit approach and direct implementation of QAOA (as simulated with QASM) for a portfolio optimization problem on n_bit = 5 qubits with p = 1 (see appendix C). Left: the ground state probability P_gs after running the QAOA circuit as a function of the total number of shots for the QASM simulator, and of the value of N_ball for the grabit approach. The points show the results for the QASM simulator while the crosses represent the grabit approach. Each shot/sample resembles one evaluation of the cost function. The horizontal blue line shows the result of a statevector simulation. Right: ${\Vert}\hat{\psi }-\bar{{\Psi}}{{\Vert}}_{1}$ , with $\bar{{\Psi}}=(\vert {{\Psi}}_{0}{\vert }^{2},\dots ,\vert {{\Psi}}_{{2}^{{n}_{\text{bit}}}-1}{\vert }^{2})$ , as a function of the number of executed gates n_gate in a single QAOA circuit for p = 1, see figure 8, for different values of N_ball = 10, 100, 500, 1000, 2000, 3000, 10 000, 100 000 (top to bottom). ${N}_{\text{ball}}\gg {2}^{{n}_{\text{bit}}}=32$ is required to get a result with a low deviation from the true state.
Download figure:
Standard image High-resolution image

**Figure 8.** QAOA circuit for p = 1 used to analyze the development of error ${\Vert}\hat{\psi }-{\Phi}{{\Vert}}_{1}$ as a function of the number of gates in figure 7. The R_Z gates and CNOTs implement the cost Hamiltonian H_c while the R_x gates realize the mixer H_B = −∑_i X_i.
Download figure:
Standard image High-resolution image

**Figure 8.** QAOA circuit for p = 1 used to analyze the development of error ${\Vert}\hat{\psi }-{\Phi}{{\Vert}}_{1}$ as a function of the number of gates in figure 7. The R_Z gates and CNOTs implement the cost Hamiltonian H_c while the R_x gates realize the mixer H_B = −∑_i X_i.
Download figure:
Standard image High-resolution image

6. Conclusions and outlook

We have shown that any multi-qubit state Ψ of n_bit qubits can be represented as a discretized partial derivative ψ of order N_bit = n_bit + 1 of a classical probability distribution over N_bit 'byte-4 values', each represented by 2 classical stochastic bits. The state ψ is fully capable of interference in the exponentially large Hilbert space spanned by all ψ and can be propagated with a stochastic algorithm that fully reproduces the quantum state evolution of any pure-state quantum algorithm up to a prefactor linked to different normalization. Complex Ψ can be mapped to real ones by adding a single additional qubit (resp. grabit). We have identified the stochastic maps for all gates in a universal gate set, and hence found an automated translation of any quantum algorithm represented by a quantum circuit to a classical stochastic algorithm that becomes exact in the limit of N_ball → ∞ (see theorem 3.3). Based on this, we emulated several key quantum algorithms. The norm of the estimator $\hat{\psi }$ of the grabit state typically decays exponentially with the depth of the quantum algorithm. This can be remedied to some extent with a 'refreshment procedure' that exploits a gauge degree of freedom in the representation of ψ as a grabit state, but the number of realizations needed for given success probability of the studied grabit-algorithms still scales exponentially with N_bit. This is due to the fact that in the refreshment procedure the quantum state is estimated, and it can only be estimated based on the drawn realizations. This prevents automated turning efficient quantum algorithms into efficient stochastic emulations with this method and highlights the need for massive sampling. Rapidly fluctuating bits in spin-tronics, as were used in [7], or special-purpose classical high-performance samplers, introduced e.g. in the context of massively paralel probabilistic computing with sparse ising machines [20] offer themselves as natural hardware frameworks for an implementation, but it is clear that even extremely high sampling rates can ultimately not beat the need for exponentially growing sample numbers.

From a foundational perspective, the grabit approach demonstrates a fundamental difference between how probabilities are propagated in quantum mechanics and in classical mechanics: in the latter, we can only do so by acting on drawn realizations, whereas in quantum mechanics it appears that we can propagate states that fully determine the measurement statistics in the limit of very many measurements when manipulating a single quantum system—i.e. apparently without the material support of all the actual instances that would be needed classically to precisely implement the stochastic gates (see also [21] for an assessment of the 'immaterial reality'of a quantum mechanical state). Fundamentally it is unclear how nature stores a state then, and where the apparent true randomness comes from.

Thus, while the grabit approach has, so far, not led to new efficient classical algorithms, it might give new insight into the nature of quantum states, which almost 100 years after the discovery of quantum mechanics is still subject of debate and research (see e.g. [22]). In particular, our new approach has proven that at least one other physical object besides the familiar many-particle quantum states of quantum mechanics exists that allows for quantum interference in an exponentially large tensor-product Hilbert space, combined with the possibility of local operations, and a probability interpretation attached. Hopefully this will lead to the discovery of other such objects and fresh insights into possible 'roads to reality' in quantum mechanics and quantum mechanical states in particular.

While finishing the manuscript, a preprint with some ideas similar to those presented here appeared [23].

Author contributions

DB conceived the idea, developed the theory, implemented basic simulations of the quantum algorithms with the exception of QAOA, and wrote most of the manuscript. RM implemented the QFT and QAOA with QISKIT and as grabit algorithms and performed the scaling analysis of the required N_ball.

Acknowledgments

DB thanks Michael Hall for introducing him to the Interacting Many World (IMW) picture, and Uzy Smilansky for encouragements to think about classical entanglement more than 20 years ago. He thanks Roderich Tumulka for interesting discussions and pointing out additional references to the (IMW) picture of Quantum Mechanics. We thank Gerhard Hellstern and Thomas Wellens for sharing unpublished work. We acknowledge use of the IBM Q for this work. The views expressed here are those of the authors and do not reflect the official policy or position of IBM or the IBM Q team. We acknowledge support by Open Access Publishing Fund of University of Tübingen.

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Appendix A.: Analytical results for the error after H² Rf₁

In the following we study the scaling of the error as function of N_ball including refreshments after each H² sequence on a single qubit. Starting with initial state |0⟩ of a single qubit, H²ⁿ leaves the state real in the computational basis, and it is enough to analyse the propagation of a single grabit. P_I denote here the probabilities for b4vs after the first H², i.e. P = (1/2,0,1/4,1/4)^t. At that stage, the unnormalized grabit-state estimate is $\hat{\psi }={({R}_{0},{R}_{2}-{R}_{3})}^{t}$ . In the limit of large N, a refreshment gate preserves the amplitude ratios of $\hat{\psi }$ but scales up the prefactor by making the state interference-free. Hence, after H² Rf₁, the physical probabilities are estimated as

$\begin{equation}{\hat{\tilde{p}}}_{0}^{({H}^{2}R{f}_{1})}=\frac{{R}_{0}}{{R}_{0}+\vert {R}_{3}-{R}_{2}\vert }.\end{equation} \tag{ A1 }$

To be specific we focus on Rf₁, but any other refreshment routine that fulfills (38) and (39) is covered. We now show that the estimate (A1) is efficient, i.e. $\langle {\hat{\tilde{p}}}_{0}^{({H}^{2}R{f}_{1})}\rangle \enspace \stackrel{{N}_{\text{ball}}\to \infty }{\to }\enspace {\tilde{p}}_{0}^{({H}^{2}R{f}_{1})}$ , where the expectation value is over the statistical ensemble created by the stochastic propagation, and estimate the error as function of N_ball.

Since we expect |R₃ − R₂| ≪ R₀ for N_ball → ∞, the expectation value can be expanded as $1-\langle \frac{\vert {R}_{2}-{R}_{3}\vert }{{R}_{0}}\rangle +\mathcal{O}({\frac{\vert {R}_{2}-{R}_{3}\vert }{{R}_{0}}}^{2})$ , and the second order will be neglected. The probability for realizing a certain histogram R is given by the multinomial distribution,

$\begin{equation}{p}_{\boldsymbol{R}}=N!\prod\limits _{I=0}^{3}\frac{{P}_{I}^{N{R}_{I}}}{(N{R}_{I})!},\end{equation} \tag{ A2 }$

where N ≡ N_ball for short, and R₃ = 1 − R₀ − R₁ − R₂. Now use the Stirling approximation for large N, $N!\simeq \mathrm{exp}(N\mathrm{ln}N-N)\sqrt{2\pi N}$ , consider R_I to vary continuously, and change summation to integrals, ${\sum }_{{n}_{i}=0}^{N}\to N{\int }_{0}^{1}\mathrm{d}{R}_{i}$ . Thus, for large N, the probability of realizing a given histogram R ≡ (R₀, R₁, R₂, 1 − R₀ − R₁ − R₂) reads

$\begin{equation}{p}_{\boldsymbol{R}}\simeq \frac{1}{2\pi N}\frac{1}{\sqrt{{R}_{0}{R}_{2}(1-{R}_{0}-{R}_{2})}}\enspace \mathrm{exp}\left[N\left\{\sum\limits _{I=0,2}{R}_{I}\enspace \mathrm{ln}\left(\frac{{P}_{I}}{{R}_{I}}\right)+(1-{R}_{0}-{R}_{2})\mathrm{ln}\left(\frac{1-{P}_{0}-{P}_{2}}{1-{R}_{0}-{R}_{2}}\right)\right\}\right].\end{equation} \tag{ A3 }$

The expectation value $\langle Q(\boldsymbol{R})\rangle \equiv \langle \frac{\vert {R}_{3}-{R}_{2}\vert }{{R}_{0}}\rangle$ then becomes

$\begin{equation}\langle Q(\boldsymbol{R})\rangle =\frac{N}{2\pi }{\int }_{0}^{1}\mathrm{d}{R}_{0}\left[{\int }_{0}^{(1-{R}_{0})/2}\mathrm{d}{R}_{2}-{\int }_{(1-{R}_{0})/2}^{1-{R}_{0}}\mathrm{d}{R}_{2}\right]\frac{1-{R}_{0}-2{R}_{2}}{{R}_{0}}\frac{{e}^{NR(\boldsymbol{R})}}{\sqrt{{R}_{0}{R}_{2}(1-{R}_{0}-{R}_{2})}},\end{equation} \tag{ A4 }$

where the 'action' NR( R ) is given by the argument of exp in (A3). We solve the second integral by saddle point approximation (SPA) in the limit of large N. This requires finding the maximum of R( x ), i.e. solving ${\partial }_{{R}_{2}}R(\boldsymbol{R})=0$ , with the solution ${R}_{2}^{s}=({P}_{2}/(1-{P}_{0}))(1-{R}_{0})$ . After the gates H², P₂/(1 − P₀) = 1/2. Therefore, the saddle point ${R}_{2}^{s}=(1-{R}_{0})/2$ is exactly on the integration boundary for R₂ in both integrals. In addition the pre-exponential factor (considered here as function of R₂ only for the purpose of the integral over R₂) ${\Phi}({R}_{2})\equiv (1-{R}_{0}-2{R}_{2})/\sqrt{{R}_{0}{R}_{2}(1-{R}_{0}-{R}_{2})}$ vanishes at ${R}_{2}={R}_{2}^{s}$ . This is an unusual situation for the SPA. The 1st problem is still treated in text books such as [24], see p 37. We generalize this method to treating also ${\Phi}({R}_{2}^{s})=0$ by expanding it. Looking at the 1D case with x ≡ R₂ as integration variable, consider the integral

$\begin{equation}I={\int }_{\alpha }^{\beta }\mathrm{d}x{\Phi}(x){e}^{Nh(x)}.\end{equation} \tag{ A5 }$

Define t by h(x) = h(β) − t². So at x = β, t = 0, and more generally, x = h⁻¹(h(β) − t²), and dx = −2t/h'(x) dt. By the mean value theorem, there exist values ξ, ξ₁ such that $\mathrm{d}x=\sqrt{-2{h}^{{\prime\prime}}(\xi )}/(-{h}^{{\prime\prime}}({\xi }_{1}))\mathrm{d}t$ [24]. For N → ∞, these move arbitrarily close to β, such that in this limit $\mathrm{d}x=-\mathrm{d}t/\sqrt{-{h}^{{\prime\prime}}(\beta )/2}$ , where a maximum of h(x) at x = β was assumed, i.e. h(x) increases monotonically sufficiently close to x = β for x < β. Together with h'(β) = 0, this allows one to write t² = h(β) − h(x) ≃ −h''(β)(β − x)²/2 and Φ(x) ≃ Φ'(β)(x − β). Inserting this into the integral, we find

$\begin{equation}I\simeq {\int }_{0}^{\tau }\mathrm{d}t\frac{{{\Phi}}^{\prime }(\beta )(x-\beta )}{\sqrt{\frac{\vert {h}^{{\prime\prime}}(\beta )\vert }{2}}}{e}^{N(h(\beta )-{t}^{2})},\end{equation} \tag{ A6 }$

where the value of τ > 0 is irrelevant, and $x-\beta =\sqrt{-2/{h}^{{\prime\prime}}(\beta )}t$ . With this we finally obtain from the lower boundary at t = 0

$\begin{equation}I\simeq \frac{{{\Phi}}^{\prime }(\beta ){e}^{Nh(\beta )}}{N{h}^{{\prime\prime}}(\beta )}.\end{equation} \tag{ A7 }$

Applied to (A4), and considering that both integrals over R₂ give the same value when the minus sign of the second is included, we obtain

$\begin{equation}\langle Q(\boldsymbol{R})\rangle =\frac{1}{\pi }{\int }_{0}^{1}\mathrm{d}{R}_{0}\frac{{e}^{N\tilde{R}({R}_{0})}}{{R}_{0}^{3/2}}\enspace ,\enspace \enspace \enspace \tilde{R}({R}_{0})=R({R}_{0},{R}_{2}^{s}({R}_{0})),\end{equation} \tag{ A8 }$

which leads to

$\begin{equation}\langle {\hat{\tilde{p}}}_{0}\rangle \simeq 1-\frac{2}{\sqrt{\pi N}}+\mathcal{O}\left(\langle {Q}^{2}(\boldsymbol{R})\rangle ,\frac{1}{N}\right).\end{equation} \tag{ A9 }$

The expectation of the estimator of the logical probability vector after the sequence H² Rf₁ hence deviates from the correct value (1, 0) as

$\begin{equation}{\Vert}\left(\begin{matrix}\hfill \langle {\hat{\tilde{p}}}_{0}\rangle \hfill \\ \hfill \langle {\hat{\tilde{p}}}_{1}\rangle \hfill \end{matrix}\right)-\left(\begin{matrix}\hfill 1\hfill \\ \hfill 0\hfill \end{matrix}\right){{\Vert}}_{2}\simeq 2\sqrt{\frac{2}{\pi }}\frac{1}{\sqrt{N}}\simeq \frac{1.596}{\sqrt{N}},\end{equation} \tag{ A10 }$

in reasonable agreement with the numerically estimated behavior $1.4/\sqrt{N}$ , with the error averaged over 10 realizations of the algorithm and N_ball ranging from 50 to 5000, when considering the relatively large statistical uncertainty from the 10 runs. When iterating the sequence as ${({H}^{2}R{f}_{1})}^{n}$ , the random deviation of $\hat{\tilde{\boldsymbol{p}}}$ from the exact value (1, 0) is expected to lead to a diffusive behavior of the two-norm (A10) as function of n, i.e. a slow increase $\simeq \sqrt{1.596n/N}$ . Since in figure 3 n_H includes refreshment gates after each H, which are, however, irrelevant after the 1st Hadamard and all following ones with odd number, thus diluting the increase of the two-norm by a factor $\sqrt{3/4}$ , one expects a behavior $\simeq \sqrt{1.596\times (3/4){n}_{H}/N}\simeq 0.01\sqrt{{n}_{H}}$ . Given the relatively large statistical uncertainty, this is again in reasonable agreement with the numerically found $\mathrm{exp}(-5.084\enspace 13+0.532\enspace 838\mathrm{ln})\simeq 0.006\enspace 4{n}_{H}^{0.538}$ .

Appendix B.: Measure and bounds of destructive interference

B.1. Cramér–Rao bound for the estimation of the grabit state

After concentration of signs, ψ_i and ${\tilde{p}}_{\boldsymbol{i}}$ can be written as

$\begin{equation}{\psi }_{\boldsymbol{i}}={P}_{2\boldsymbol{i}}-{P}_{2\boldsymbol{i}+1}\end{equation} \tag{ B1 }$

$\begin{equation}{\tilde{p}}_{\boldsymbol{i}}={P}_{2\boldsymbol{i}}+{P}_{2\boldsymbol{i}+1},\end{equation} \tag{ B2 }$

where 2 i + 1 means 2 i + (0, ..., 0, 1). Hence, the ψ_{2
i
+1} parameterize the b4v probabilities as

$\begin{equation}{P}_{2\boldsymbol{i}}=\frac{1}{2}({\tilde{p}}_{2\boldsymbol{i}+1}+{\psi }_{\boldsymbol{i}}),\enspace \enspace \enspace {P}_{2\boldsymbol{i}+1}=\frac{1}{2}({\tilde{p}}_{2\boldsymbol{i}+1}-{\psi }_{\boldsymbol{i}}).\end{equation} \tag{ B3 }$

The Fisher information with respect to the ψ_i is given by

$\begin{equation}{I}_{{\psi }_{\boldsymbol{i}}}=\sum\limits _{\boldsymbol{j},\sigma }\enspace {P}_{2\boldsymbol{j}+\sigma }{\left(\frac{\partial \mathrm{ln}{P}_{2\boldsymbol{j}+\sigma }}{\partial {\psi }_{\boldsymbol{i}}}\right)}^{2}\end{equation} \tag{ B4 }$

$\begin{equation}=\frac{1}{4}\sum\limits _{\boldsymbol{j},\sigma }^{\prime }\enspace {P}_{2\boldsymbol{j}+\sigma }^{-1},\end{equation} \tag{ B5 }$

where the sum ${\sum }_{\boldsymbol{j},\sigma }^{\prime }$ is over all j , σ with P_{2
j
+σ} ≠ 0. Hence the Cramér–Rao bound for the smallest possible standard deviation $\text{sdv}({\hat{\psi }}_{\boldsymbol{i}})$ of an arbitrary unbiased estimator $\hat{\psi }$ of ψ based on N_ball samples reads

$\begin{equation}\sum\limits _{\boldsymbol{i}}^{{\prime\prime}}\enspace \text{sdv}({\hat{\psi }}_{\boldsymbol{i}})\geqslant \sum\limits _{\boldsymbol{i}}^{{\prime\prime}}\frac{2}{\sqrt{{N}_{\text{ball}}\left(\frac{1}{{P}_{2\boldsymbol{i}}}+\frac{1}{{P}_{2\boldsymbol{i}+1}}\right)}},\end{equation} \tag{ B6 }$

where the ${\sum }_{\boldsymbol{j}}^{{\prime\prime}}$ is over all j with P_{2
j} P_{2
j
+1} ≠ 0. This demonstrates the importance of large N_ball for a good estimate of the grabit state. Without any refresh, the effective N_ball that contributes to an estimate of ψ_i decays due to destructive interference. Indeed, that decay can be considered a measure of destructive interference. Let H be a histogram normalized to ∑_I H_I = N_ball. After the action of any gate or even an entire algorithm, we have a new histogram H '. The estimate ${\hat{\psi }}^{\prime }$ from that histogram is identical to the one obtained from H ' by sign concentration and pair annihilation, called H '', even if these two operations are not performed. It leads to a new effective ${{N}_{\text{ball}}}^{\prime }={\sum }_{\boldsymbol{I}}\enspace {H}_{\boldsymbol{I}}^{{\prime\prime}}$ . Its mean value is given by

$\begin{equation}\langle {{N}_{\text{ball}}}^{\prime }\rangle =\sum\limits _{{\boldsymbol{H}}^{\prime }}\enspace {P}_{{\boldsymbol{H}}^{\prime }}\sum\limits _{\boldsymbol{i}}\vert \sum\limits _{\boldsymbol{\sigma },\text{even}}\enspace {H}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{\prime }-\sum\limits _{\boldsymbol{\sigma },\text{odd}}\enspace {H}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{\prime }\vert \end{equation} \tag{ B7 }$

$\begin{equation}={N}_{\text{ball}}\sum\limits _{{\boldsymbol{H}}^{\prime }}\enspace {P}_{{\boldsymbol{H}}^{\prime }}\sum\limits _{\boldsymbol{i}}\vert \sum\limits _{\boldsymbol{\sigma },\text{even}}\enspace {R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{\prime }-\sum\limits _{\boldsymbol{\sigma },\text{odd}}\enspace {R}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{\prime }\vert ,\end{equation} \tag{ B8 }$

where P_{H
'} is the probability for a histogram H ' given by a multinomial distribution of the b4v probability distribution ${P}_{I}^{\prime }$ after the gate or algorithm. In the limit of N_ball → ∞, ${R}_{I}^{\prime }\to {P}_{I}^{\prime }$ , and the distribution of histograms R_I becomes arbitrarily sharp, concentrated on ${P}_{I}^{\prime }$ . This leads to $\langle {{N}_{\text{ball}}}^{\prime }\rangle \to {N}_{\text{ball}}{\sum }_{\boldsymbol{i}}\vert {\psi }_{\boldsymbol{i}}^{\prime (u)}\vert ={N}_{\text{ball}}{\Vert}{\psi }^{\prime (u)}{{\Vert}}_{1}$ , where the unnormalized grabit state ${\psi }_{\boldsymbol{i}}^{\prime (u)}$ is defined as

$\begin{equation}{\psi }_{\boldsymbol{i}}^{\prime (u)}\equiv \sum\limits _{\boldsymbol{\sigma },\text{even}}\enspace {P}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{\prime }-\sum\limits _{\boldsymbol{\sigma },\text{odd}}\enspace {P}_{2\boldsymbol{i}+\boldsymbol{\sigma }}^{\prime }.\end{equation} \tag{ B9 }$

The reduction of ${N}_{\text{ball}}\to \langle {{N}_{\text{ball}}}^{\prime }\rangle$ depends on the initial state. E.g. a first Hadamard on a state |0⟩ does not introduce interference yet, and ${{N}_{\text{ball}}}^{\prime }={N}_{\text{ball}}$ in this case. Only when the interferometer closes, i.e. after H², probability amplitudes are brought to overlap and interfere away the amplitude of |1⟩. By considering the worst case over all input states ψ, the following definition of a measure of destructive interference is motivated:

$\begin{equation}\mathcal{D}\equiv \underset{\psi }{\mathrm{max}}(1-{\Vert}{\psi }^{\prime (u)}{{\Vert}}_{1}/{\Vert}\psi {{\Vert}}_{1}),\end{equation} \tag{ B10 }$

which is only a function of the grabit gate (or algorithm) applied. We calculate it in the next subsubsection for the Hadamard gate and the phase gate.

B.2. Destructive interference and the decay of the effective N_ball

After applying a Hadamard gate on a single grabit (taken as the first one in the following, w.l.g.), the one-norm ||ψ'||₁ of the grabit state reads

$\begin{equation}{\Vert}{\psi }^{\prime }{{\Vert}}_{1}={\Vert}H\enspace \psi {{\Vert}}_{1}=\sum\limits _{{i}_{1},\dots ,{i}_{n}}\vert {\psi }_{{i}_{1},\dots ,{i}_{n}}^{\prime }\vert \end{equation} \tag{ B11 }$

$\begin{equation}=\frac{1}{2}\sum\limits _{\bar{\boldsymbol{i}}}\left(\vert {\psi }_{0\bar{\boldsymbol{i}}}+{\psi }_{1\bar{\boldsymbol{i}}}\vert +\vert {\psi }_{0\bar{\boldsymbol{i}}}-{\psi }_{1\bar{\boldsymbol{i}}}\vert \right)\end{equation} \tag{ B12 }$

$\begin{equation}=\sum\limits _{\bar{\boldsymbol{i}}}\enspace \mathrm{max}\left\{\vert {\psi }_{0\bar{\boldsymbol{i}}}\vert ,\vert {\psi }_{1\bar{\boldsymbol{i}}}\vert \right\},\end{equation} \tag{ B13 }$

where $\bar{\boldsymbol{i}}\equiv ({i}_{2},\dots ,{i}_{n})$ , ${i}_{1}\bar{\boldsymbol{i}}\equiv ({i}_{1},{i}_{2},\dots ,{i}_{n})$ , and we used |a + b| + |a − b| = max{|a|, |b|} for all $a,b\in \mathbb{R}$ . Then

$\begin{equation}\underset{\psi }{\mathrm{min}}{\Vert}{\psi }^{\prime (u)}{{\Vert}}_{1}=\underset{\psi }{\mathrm{min}}\sum\limits _{\bar{\boldsymbol{i}}}\enspace \mathrm{max}\left\{\vert {\psi }_{0\bar{\boldsymbol{i}}}\vert ,\vert {\psi }_{1\bar{\boldsymbol{i}}}\vert \right\}\end{equation} \tag{ B14 }$

$\begin{equation}\geqslant \underset{\psi }{\mathrm{min}}\mathrm{max}\left\{\sum\limits _{\bar{\boldsymbol{i}}}\;\vert {\psi }_{0\bar{\boldsymbol{i}}}\vert ,\sum\limits _{\bar{\boldsymbol{i}}}\;\vert {\psi }_{1\bar{\boldsymbol{i}}}\vert \right\}\end{equation} \tag{ B15 }$

$\begin{equation}\geqslant \underset{\psi }{\mathrm{min}}\mathrm{max}\left\{\sum\limits _{\bar{\boldsymbol{i}}}\;\vert {\psi }_{0\bar{\boldsymbol{i}}}\vert ,{\Vert}\psi {{\Vert}}_{1}-\sum\limits _{\bar{\boldsymbol{i}}}\;\vert {\psi }_{0\bar{\boldsymbol{i}}}\vert \right\}\end{equation} \tag{ B16 }$

$\begin{equation}\geqslant \underset{0\leqslant x\leqslant {\Vert}\psi {{\Vert}}_{1}}{\mathrm{min}}\enspace \mathrm{max}\left\{x,{\Vert}\psi {{\Vert}}_{1}-x\right\}\end{equation} \tag{ B17 }$

$\begin{equation}=\frac{1}{2}{\Vert}\psi {{\Vert}}_{1}.\end{equation} \tag{ B18 }$

Hence, ${\mathcal{D}}_{H}=1/2$ for an arbitrary number of grabits (with H applied to one of them). Similarly, one shows for the phase gate that ${\mathcal{D}}_{{R}_{\varphi }}=1-\frac{1+{q}^{2}}{{(1+q)}^{2}}$ with q = |cot φ|. Note that above we assumed a fixed initial N_ball. After the first gate that introduces destructive interference, the effective ${{N}_{\text{ball}}}^{\prime }$ will be distributed. However, since for N_ball → ∞ the distribution of histograms becomes arbitrarily sharp, almost all histograms will have ${{N}_{\text{ball}}}^{\prime }=\langle {{N}_{\text{ball}}}^{\prime }\rangle$ . In that limit the effective N_ball will hence decay over an entire quantum algorithm such that the final ${{N}_{\text{ball}}}^{\prime }$ satisfies

$\begin{equation}\langle {{N}_{\text{ball}}}^{\prime }\rangle \enspace \gtrsim \enspace {N}_{\text{ball}}\prod\limits _{g=1}^{{N}_{g}}{\mathcal{D}}_{g},\end{equation} \tag{ B19 }$

where the product is over all gates of the algorithm. Hence, without any refresh the effective remaining ${{N}_{\text{ball}}}^{\prime }$ decays exponentially with the numer of gates that lead to destructive inteference. It should be kept in mind, however, that (B19) is the worst case scenario, pessimized over all intermediate states. From the experience with H²ⁿ one expects that this exponential decay (and corresponding deterioriation of an estimate of the final ψ) can be avoided with refreshs after each interference-generating gate g, ${S}_{g}\in \mathcal{I}$ , but only for ${N}_{\text{ball}}\gg {2}^{{N}_{\text{bit}}}$ .

Appendix C.: Portfolio optimization using QAOA

Mean-variance portfolio analysis as introduced by Markowitz [25] can be used to choose assets out of a stock market in a way that minimizes risk. The following description is based on [26, 27]. Consider a portfolio consisting of n assets. The return μⁱ of each asset i can be calculated via

$\begin{equation*}{\mu }^{i}={({{\Pi}}_{k=1}^{M-1}(1+{r}_{k}^{i}))}^{252/(M-1)},\end{equation*}$

where ${r}_{k}^{i}=({p}_{k}^{i}-{p}_{k+1}^{i})/{p}_{k}^{i}$ , ${p}_{k}^{i}$ is the historical daily price of asset i, M is the number of available price data points, and 252 is the number of trading days in a year. Correlations between different assets can be analysed using the annualized covariance matrix σ defined as

$\begin{equation*}{\sigma }_{ij}=\frac{252}{M-1}\sum\limits _{k=1}^{M-1}({r}_{k}^{i}-{\bar{r}}^{i})({r}_{k}^{j}-{\bar{r}}^{j}),\end{equation*}$

where ${\bar{r}}^{i}=1/(M-1){\sum }_{k=1}^{M-1}\enspace {r}_{k}^{i}$ is the mean daily price change. The problem of finding a set of B assets out of a portfolio containing n stocks that minimizes risk while maximizing return can be thought of as minimizing the following cost function

$\begin{equation*}f({x}_{1},\dots ,{x}_{n})=q\sum\limits _{i,j=1}^{n}\enspace {x}_{j}{x}_{i}{\sigma }_{ij}-(1-q)\sum\limits _{i=1}^{n}\enspace {x}_{i}{\mu }^{i}\end{equation*}$

under the constraint ∑_i x_i = B, where x_i = 1, 0, and q is a factor to scale the weight of return and risk. The lower and upper index i is used to label the assets in consistency with previous equations and is not representing any summation convention. By replacing x_i → (1 − Z_i)/2 and absorbing constant terms into σ' and μ', we can transform f into the quantum operator

$\begin{equation*}{H}_{\mathrm{c}}=\sum\limits _{i > j}\enspace {Z}_{i}{Z}_{j}{\sigma }_{ij}^{\prime }+\sum\limits _{i}\enspace {Z}_{i}{\mu }_{i}^{\prime }.\end{equation*}$

Choosing the mixer operator to be H_B = −∑_i X_i results in the QAOA Ansatz state

$\begin{equation*}\vert {\psi }_{p}(\boldsymbol{\gamma },\boldsymbol{\beta })\rangle ={\text{e}}^{-\text{i}{\beta }_{p}{H}_{\mathrm{B}}}{\text{e}}^{-\text{i}{\gamma }_{p}{H}_{\mathrm{C}}}\dots {\text{e}}^{-\text{i}{\beta }_{1}{H}_{\mathrm{B}}}{\text{e}}^{-\text{i}{\gamma }_{1}{H}_{\mathrm{C}}}\vert {+\rangle }^{\otimes n}.\end{equation*}$

The explicit data used in figure 7 are stock prices of 5 randomly selected stocks in the DAX-30 averaged over a period of 5 years.