On the capacity of a quantum perceptron for storing biased patterns

Although different architectures of quantum perceptrons have been recently put forward, the capabilities of such quantum devices versus their classical counterparts remain debated. Here, we consider random patterns and targets independently distributed with biased probabilities and investigate the storage capacity of a continuous quantum perceptron model that admits a classical limit, thus facilitating the comparison of performances. Such a more general context extends a previous study of the quantum storage capacity where using statistical mechanics techniques in the limit of a large number of inputs, it was proved that no quantum advantages are to be expected concerning the storage properties. This outcome is due to the fuzziness inevitably introduced by the intrinsic stochasticity of quantum devices. We strengthen such an indication by showing that the possibility of indefinitely enhancing the storage capacity for highly correlated patterns, as it occurs in a classical setting, is instead prevented at the quantum level.


Introduction
Machine learning aims at building methods that are able to make predictions or decisions based on sample data, without being explicitly programmed to do so.Quantum information theory studies the storage and transmission of information encoded in quantum states.Nowadays these two disciplines are becoming intertwined giving rise to the field of quantum machine learning.
The flow of ideas runs both ways: on the one hand applications of machine learning techniques are envisaged to analyze quantum systems [1,2], on the other hand, the implementation of machine learning concepts on quantum hardware is also actively investigated [3,4,5].Along this latter avenue quantum advantages are expected in terms of higher storage capabilities and an increased information processing power [6,7,8,9].
The task of precisely comparing the power of quantum and classical neural networks as probabilistic models for information processing and storage is thus becoming pressing.In particular, the issue of determining precisely the storage capacity of the most elementary constituent of a neural network, namely the perceptron [10], has been addressed in the classical scenario without referring to any specific learning rule using several approaches, ranging from combinatorics [11,12] to statistical mechanics methods [13,14,15,16].The latter has been used recently to generalize the calculation to some models of quantum perceptrons [17,18,19].However, the results depend on the specific model used (see e.g.[18] and [19], based respectively on the models [5] and [20]).
Here, by referring to the continuous variable quantum perceptron model introduced in [20], we study the storage capacity of random classical binary patterns.The components of the patterns and their assigned output classification are taken to be independent and identically distributed (i.i.d.) according to a probability with a bias −1 ≤ m in ≤ 1 for the patterns and −1 ≤ m out ≤ 1 for their classification.Such a model admits the classical perceptron as a classical limit, thus enabling a direct comparison of the storage performances in the two cases.For classical perceptrons, simultaneously large biases for patterns and output classification allow to greatly enhance the storage capacity, which diverges when m in = m out = m → 1 [13,21,22,23].We show that this possibility is prevented at the quantum level.Moreover, we also find that, when the biases m in and m out are varied separately, the quantum storage capacity depends on both of them, unlike in the classical case, where the storage capacity is a function only of the output bias.However, also in the quantum setting, when the output correlations are maximal, that is when m out → 1, the asymptotic behaviour is no more dependent on m in , exactly as in the classical case.The dependence of the quantum storage capacity on m in in such a case is through the velocity with which the limit behaviour is reached.Overall, the performances of the continuous quantum model remain below the classical ones.These results thus corroborate those found in Ref. [19] with unbiased patterns.They confirm that, at the level of a simple, that is one-layer, quantum perceptron, the uncertainties brought about by pattern encoding via Gaussian states and homodyne measurements cannot be counteracted by linear super-positions of pattern states.

A continuous quantum perceptron model
The continuous variable model of a quantum perceptron proposed in Ref. [20] is characterized by N bosonic input modes and one bosonic output mode.The components x j of an input pattern x ∈ R N are encoded by states of the form which are Gaussian weighted normalized super-positions of pseudo-eigenstates |q j of position-like operators q j , centered around the pattern components x j with widths σ j .As a result, a pattern x is encoded into Such a state is then given as input to a quantum circuit which first operates with a series of independent squeezing operators S j (r j ) = e i r j (q j p j +p j q j ) , r j ∈ R , e −2r j = w j , where p j is a momentum-like operator conjugated to q j ([q j , p j ] = i) and r j is the squeezing parameter implementing the weight w j1 .Notice that Then, the circuit consists of entangling Controlled Addition gates CX on pairs of consecutive modes: CX j,j+1 := exp (−i q j ⊗ p j+1 ) , CX j,j+1 |q j , q j+1 = |q j , q j + q j+1 .
Their combined action on the attenuated multi-mode position eigenstates gives In such a way, the amplitude associated to the last mode position eigenstate has the form of a Gaussian distribution centered around w • x µ = N j=1 w j x µ j : where, for sake of simplicity, we have set σ2 j = σ 2 for all j and thus encoded the input patterns by Gaussian states of the same width.Finally, homodyne detection operated on the last mode position-like quadrature yields a value s with probability density Remark 1 With a slight modification of the above protocol, it is possible to obtain a description of such a continuous variable quantum perceptron as controlled unitary acting on the tensor product H = H in ⊗ H out where H in is the Hilbert space of the N bosonic modes encoding the input, while H out is the Hilbert space of the additional ancilla mode storing the output.Then, the present model can be connected with other models investigated in the literature, in particular [24], where it was pointed out that a perceptron acting as a controlled unitary has as particular cases also the models considered in [25,26].Actually, the action of the continuous variable quantum perceptron here investigated can be described with the unitary with S j (r) as in Eq. (3), while CX j,out = exp (−i q j ⊗ p out ) is the controlled addition gate involving the j-th bosonic mode of the input and the output mode.
Classically, the classification of a pattern x µ as ±1 by a weight vector w is obtained by checking the sign of w • x µ .Then, a correct classification relative to a prescribed target ξ µ = ±1 is obtained when ξ µ w • x µ ≥ κ w where κ is a stabilizing threshold.It renders the classification more robust against noise affecting the weights that, when κ = 0, might make ξ µ w • x µ jump from positive to negative values and vice versa.In the case of the quantum perceptron model outlined above, a pattern x µ is classified as ξ µ = +1 (resp.ξ µ = −1) if the measurement outcome is above the threshold κ w (resp.below −κ w ), while the pattern is not classified when the measurement outcome is in between (−κ w , κ w ).Therefore, even when classically sign(w • x µ − κ w ) = +1, quantumly, the pattern is classified as −1 if s < −κ w and such errors occur with probability density P w,x µ ,σ (s).
Consequently, the inherent randomness due to the quantum encoding of the patterns is such that the correct classification of pattern µ becomes a binary stochastic variable with probability distribution given by where Θ(•) denotes the Heaviside function.Finally, an ancilla mode is appended to the initialized N ones and its state changed according to the actual outcome of a suitable homodyne measurement.
The result of the measurement can then be used, for instance, to implement the non-linear ReLu activation function as in Ref. [20].
One of the advantages of the continuous quantum model just presented is that it allows to recover the functioning of the classical perceptron when σ → 0, i.e. by encoding a pattern x µ into the position-like pseudo-autokets |x µ 1 , x µ 2 , . . ., x µ N .Indeed, in this limit the Gaussian probability density in Eq. ( 10) becomes a Dirac delta centered around w • x µ .

Gardner's approach
According to Gardner's statistical approach [13], the optimal storage capacity of a simple perceptron can be obtained from the fraction of weights which correctly reproduces the desired input-output relations normalized to the total volume of allowed vectors w.Indeed, the storage capacity is defined as the critical value α c of the ratio of the number of patterns p to the dimension of the input space N such that the storage condition cannot be satisfied anymore.
In fact, by increasing the number of patterns, the volume of vectors w realizing the condition (12) typically shrinks, and the relative volume of such weights vanishes.Then, it is exactly the limit of vanishing relative volume that leads to the storage capacity of the perceptron.
We shall consider weights for which w 2 = N so that their components are typically of order 1.Then, the fraction of weights w of length √ N in R N that classify p binary patterns x µ ∈ {+1, −1} N , up to an error , is given by: where the total volume of the space of the weights is namely the volume of the sphere of radius √ N in R N .The relation to the classification of the pattern x µ is due to the fact that Eq.( 8) represents the probability distribution of the measurement outcomes of the quantum perceptron encoding the patterns x µ into Gaussian states of variance σ.

Remark 2
The probability R κ,σ (w, x µ , ξ µ ) depends on the pattern x µ , on the target classification ξ µ , on the weights w, on the threshold parameter κ and on the Gaussian width σ.Therefore, the fraction of volume V Q N depends on patterns, targets, threshold, width and also on the allowed statistical error : when the width σ vanishes, from the distributional limit one recovers the expression of the fraction of weights of the classical perceptron Notice that this expression does not depend on the statistical error that needs to be introduced in the quantum setting.Indeed, in this latter case, the measured parameter s is statistically distributed around the classical scalar product w • x µ / w .Therefore, R κ,σ (w, x µ , ξ µ ) cannot be equal to 1 unless the Gaussian distribution becomes a Dirac delta peaked around it.Note that the statistical error is an upper bound to the perceptron allowed errors.The value = 1/2 for the bound to the errors is a particular one: in such a case weights can provide classifications with equal probability of being right or wrong.Therefore, at = 1/2, weights are selected without further constraints beside the classical ones.Then, as far as the storage capacity is concerned, the quantum perceptron is expected to behave classically at = 1/2, in spite of the quantum pattern encoding.
In analogy with the partition function of statistical mechanics, we take log V Q N (x µ , ξ µ , κ, σ, ) as the relevant quantity, since it has the important property of being self-averaging, i.e. its average log V Q N (x µ , ξ µ , κ, σ, ) is a good representative of its typical behaviour for random choices of input patterns and targets [14,15,16].In particular, this average will be computed considering the components of the input patterns as well as targets, to be binary stochastic variables distributed according to The parameters −1 ≤ m in , m out ≤ 1 measure the bias between the binary values of patterns and targets, respectively, and thus of their correlations.The smaller the bias is, the greater the independence of their two possible values.Following the classical approach by Gardner we will derive a critical value α Q c such that for α < α Q c we obtain a finite value (potentially vanishing) for the limit lim

Replica method and saddle point equations
The quenched average appearing in the limit ( 18) can be computed by means of the replica-trick [13,27,28]: The relevant quantity [V Q N (x µ , ξ µ , κ, σ, )] n involves n replicas indexed by the subscript γ: where, for the sake of compactness, we introduced the symbols W = (w 1 , . . ., w n ) ∈ R nN and Moreover, in Eq.( 21), it is made explicit that the mean value is computed with respect to the patterns x µ and targets ξ µ , µ = 1, . . ., p.
The lengthy calculations of the mean value in Eq.( 21) by means of the replica-symmetric ansatz and of the saddle point approximation are reported in Appendix A.
The replica method introduces several order parameters, the most important one being the average overlap of two randomly chosen weights w γ and w δ in different replicas: In the replica symmetric ansatz it is assumed that for the solution of the saddle point equations, the average overlap is the same for each pair of replicas, i.e. q γδ = q for all γ = δ.Notice that by increasing the ratio p/N , the number of weights satisfying (12) diminishes, hence their average overlap increases.The critical value of α c both in the classical and the quantum scenario is then obtained in the limit of maximal overlap q → 1.
Eventually, one arrives at the following equation that must be satisfied by the critical ratio α Q c of number of patterns to weight dimension, which according to (11) defines the quantum storage capacity: In the above expression, and Φ −1 is the inverse function of while the quantity M satisfies Thus, in order to compute α Q c from (24), one has to first solve (28) in terms of M .
Remark 3 Notice that when m in = 0, that is when the patterns are unbiased, we have a + (M ) = a − (M ) so that Eq. ( 28) can be satisfied only for m out = 0, and the storage capacity is fixed by (24) only, which coincides with the expression found in [19].This is due to the fact that a perceptron cannot match unbiased patterns with biased classifications.In fact, considering the mapping realized by a perceptron with weights w, one finds that for a random input x ∈ R N , with independent components distributed according to Pr(x j = ±1) = 1/2 for each j = 1, . . ., N , the distribution of the output σ is given by: which implies Pr(σ = ±1) = 1/2 (note that Pr(w • x = 0) for all w ∈ R N except for those belonging to a set with zero Lebesgue measure on the sphere with radius √ N ).In practice, the combined effects of quantum pattern encodings and measurements is to replace the classical stabilizing threshold κ in κ defined in (26).Then, the classical storage capacity obtains not only by eliminating the errors due to quantum pattern encoding, that is by letting σ → 0, but also, confirming the argument in Remark 2, when σ = 0, so that the pattern encoding is not sharp and carries quantum fuzziness; however, = 1/2 so that Φ −1 (1 − ) = 0 and κ = κ.

Results
The numerical results obtained by solving equations ( 24) and (28) for several values of m in , m out and κ are shown in Figure 1 and Figure 2. Since the storage capacity depends solely on κ = κ + σΦ −1 (1 − ), we kept fixed the values = 0.01, κ = 0 and considered different values of σ, which also allows us to recover the classical limit for σ = 0.A striking feature which distinguishes the quantum perceptron from the classical one is the dependence of the storage capacity on the bias m in , which is not present in the classical case with zero stability κ = 0 (see the left panel of Figure 1).More precisely, as soon as σ > 0, increasing the value of |m in | while keeping fixed the value of m out always decreases the storage capacity α Q c .On the other hand, a common feature with the classical case is the divergence of the storage capacity when m out → 1, for each fixed value of m in (see Figure 2) The asymptotic behaviour in this limit (see Appendix B for the analytic derivation) is given by (see also Figure 3): confirming the result obtained for the classical scenario.The case κ = 0 corresponds to the classical case with κ = 0, where the storage capacity diverges in the limit.In the quantum scenario (κ > 0) the divergence is suppressed.The dashed lines correspond the asymptotic value α c ∼ 1/κ 2 each fixed 0 m out 1, the asymptotic behaviour when m out → 1 − does not depend on the pattern biases m in .
Even if the asymptotic behaviour (30) does not show a dependence on the patterns bias m in , one can see from Figure 2 and Figure 3 that as the input bias m in increases, higher values of m out are required to observe the asymptotic behaviour for the quantum perceptron.This is in contrast with the behaviour of the classical perceptron, where there is no dependence at all on m in (see again Figure 2).In other words, values of m in closer to 1 slow down the attainment of the asymptotic behaviour in the quantum case, which motivates the investigation of the joint limit m in = m out = m → 1.The results obtained (see Figure 4) show another striking difference between the classical and the quantum perceptron, that is, while classically the storage capacity diverges when m → 1, in the quantum case this divergence is suppressed.In particular, from the analytic asymptotic expressions (see Appendix C) we obtain that the asymptotic behaviour in the quantum scenario reads which is finite for all values of σ > 0, 0 ≤ < 1/2, while we recover the classical divergence (for κ = 0) in the classical limit σ = 0.
Remark 5 Figure 1 shows that for κ = 0 the separation between curves corresponding to different values of m out is reduced in the quantum regime.This is in contrast to what happens for curves corresponding to different values of m in , as shown in Fig. 2.

Discussion and conclusion
Summarizing, we studied the storage capacity of the continuous variable model of quantum perceptron presented in Ref. [20] in the presence of a bias in the distribution of the patterns and their corresponding classifications.Besides the advantage of allowing an almost entirely analytical study, such a model admits the classical perceptron as a classical limit, thus allowing for a direct comparison of the storage performances in the two cases.We found that the additional randomness introduced in the quantum model gives rise to an effective increment in the stability parameter used in Gardner's statistical approach, κ → κ, which gives rise to several peculiar features that are not observed in the classical case with zero stability.For instance, the possibility of indefinitely enhancing the storage capacity by increasing the bias of the patterns and their classifications is prevented at the quantum level.Moreover, in contrast to the classical case, when the bias of the patterns and the bias in their corresponding classifications are varied separately, a dependence of the storage capacity on the input patterns' bias appears even when the stability parameter reduces to zero.Overall, however, the performance of the quantum perceptron model remains below that of the classical one.This is likely due to the fact that the considered quantum model introduces two sources of randomness: one due to the encoding of patterns by means of non-zero width Gaussian states and another one due to the final measurement operation implementing the classical non-linear activation function.It is worth stressing that the modification in the effective threshold κ − κ = σΦ −1 (1 − ) contains both the contribution of randomness coming from the width σ of the Gaussian encoding of the patterns, and the statistical error due to the quantum measurement : as a consequence, the worsening of the quantum storage capacity with respect to the classical one cannot be ascribed to only one of them.These results clearly point to the necessity of considering multi-layer quantum perceptrons to hope for quantum advantages of the sort coming from linear superpositions and entanglement.so that where we introduced the symbols and the quantity To proceed, it is convenient to use (see (27)) and rewrite (10) as Then, using the exponential representation of the Dirac delta and the statistical independence of patterns and targets with different indices, one gets with where Since the patterns have components x µ j = ±1 which are statistically independent and identically distributed, using (17), one computes the mean over the patterns in C(Ω, W ) as follows: When N 1, the leading order expansion in 1/N of each factor in the product over the µ index reads the remaining terms vanishing as O 1/ √ N .Using w γ 2 = N and setting for γ, β = 1, . . ., n, γ > β, we finally focus upon neglecting corrections of order O 1/ √ N .Therefore, to leading order in 1/N , Eq. ( 40) becomes and inserting them into (32), one finally arrives at the following explicit integral expression Regrouping together the integrals over w γ,j with different j and same γ, one writes In the large N limit, the contribution 1 √ N n γ=1 L γ M γ can be neglected; using (13), at leading order in N 1, one gets where with

A.1 Saddle-point approximation
When N is large, the behaviour of [V Q N (x µ , ξ µ , κ, σ, )] n can be obtained using the saddle-point approximation, as follows.Setting z ≡ (E, F, L, M, Q) and considering it as a vector in ∂z j ∂z k be the Hessian (n 2 + 2n) × (n 2 + 2n) matrix at the stationary point.If such a matrix is negative semi-definite, by suitably deforming the integration paths into the complex domain, one can perform n 2 + 2n Gaussian integrations by rescaling with √ N the corresponding integration variables and approximate: From Eqs. ( 20) and (18), one needs to control the behaviour of the ratio 1 nN log [(V Q N (x µ , ξ µ , κ, σ, )] n for n → 0 + and N → +∞.From Eq. (58) and using Eq. ( 14) one gets:

A.2 Replica symmetric ansatz
Making use of the replica-symmetric ansatz which states that the stationary point z 0 is replicainsensitive, one seeks it setting so that (48) becomes Notice that the argument of the logarithm in (55) is the average of the following quantity Such a quantity can then be manipulated as follows: using the Gaussian representation via straightforward Gaussian integration over the variables ω γ , one writes Notice that the argument of the logarithm in (55) amounts to ∆(n) ξ .Then, one sees that ∆(n) consists in n independent integrals with respect to dη γ , dλ γ and dy γ : Due to the monotonicity of the error function, the function Φ(x) in ( 27) is invertible and one has Then, by changing the integration variable η into λ = ση + κ and using again (27), one gets n where κ = κ + σΦ −1 (1 − ).Since the replica trick lets n vanish as a continuous quantity, we can use the first order approximation z n 1 + n log z, valid for n → 0, and write: Finally, we notice that the replica symmetric ansatz makes all matrix and vector entries equal.Then, averaging over the target parameter ξ according to the distribution in (17) yields the following leading behavior for the function G 1 (M, q) when n → 0 + : G 1 (M, q) log 1 + n g(M, q) n g(M, q) , ( where Because of the replica symmetric ansatz, G 2 (E, F, L) in (56) can be recast as where now w = (w 1 , . . ., w n ).Then, rewriting n γ<β=1 and, using (63), Then, after straightforward Gaussian manipulations and integration, .
At leading order in n → 0 + one gets Finally, the replica symmetry ansatz yields the following leading order behaviour for (57), Using (65), ( 70) and ( 71) into (54), we get, at the leading order in n → 0 + : The stationary point z 0 is then found by asking that ∂G(z) ∂E = ∂G(z) ∂F = ∂G(z) ∂L = 0 yielding the stationary point components: Then, from (72) one obtains The sought after stationary point of G(z) is finally obtained by setting Using (66), one explicitly computes, with β = q, M , As observed at the end of Section 3, the optimal storage capacity is obtained when q in (44), namely the average overlap of random weights, tends to 1.The condition q → 1 implies that the arguments of the functions Φ in (66) tend to ±∞.Then, one can use the asymptotic behavior of the error function, together with (27) to get that Therefore, for q → 1 − , the vanishing Gaussian terms in (76) can only be compensated for x √ q ≥ a ± (M ), so that Furthermore, using (67), which, together with (75) yields (28).On the other hand, (74) and (75) yields (81) Thus, from and (79) one retrieves (24) as the leading term in (1 − q) −1 in the limit q → 1 − .

B Large output bias limit
In order to extract the leading order behaviour of the quantum critical storage capacity when the target bias m out → 1 − , we distinguish two possibilities.Firstly, in this Appendix, we keep the input bias m in fixed and let m out → 1 − ; then, in the next one we treat the case when m in = m out = m → 1 − .Consider the equation with a ± (M ) as in (25).If m out → 1 − and m in is kept fixed, M must diverge to +∞.Otherwise, the left-hand side cannot vanish, being the integral of a positive function.Then, in the following, we shall consider m out close to 1 so that a + (M ) < 0, a − (M ) > 0 and In the classical case κ = κ; moreover, the limit behaviour of the critical classical storage capacity for We proceed by recasting the two storage capacity defining equations ( 24) and ( 28) in terms of Gaussian and error functions: Equation ( 84) can be satisfied in the limit m out → 1 − only if the right-hand side vanishes, which can happen only for values of M such that a ± (M ) → ∓∞.For small but finite values of 1 − m out , the solution of (84) is obtained for ∓a ± (M ) 1, which implies Using the asymptotic behaviour in (77) with ∓a ± (M ) 1, one gets (89) from which the limit of strongly correlated targets, m out → 1 − , is as in (30), for all pattern biases 0 ≤ m in ≤ 1.However, it must be emphasized that, because of (82) the limit is reached with possibly quite different slopes depending on both the quantum parameter σΦ −1 (1 − ) and on the degree of independence of the input patterns m in .

C Simultaneously large input and output bias
Setting m in = m out = m, the quantity M has to be chosen such that where we set In the limit m → 1 − , the quantities a ± (M ) in (25) behave as and both diverge unless M = κ.Notice however, that inserting M = κ into (90) and taking the limit, the equality (90) cannot be satisfied.Indeed, a − (κ) = 0 and The functional dependence of M on m when m → 1 − , implicitly determined by (100), cannot be given in terms of simple functions and can be obtained only numerically; however, (100) implies that lim Finally, using 101 and (102), (86) and 87, together with (92), from (83), one gets (103)

m 9 Figure 1 :
Figure 1: Storage capacity α Q c vs m in , for κ = 0, and = 0.01.Only values 0 < m in < 1 are shown here, but the results are symmetric with respect to m in = 0. (Left) The bias on the target classification is fixed to m out = 0.6, the shown curves corresponding to different values of σ.In the classical case (σ = 0) α Q c does not depend on m in .In the quantum case (σ > 0), increasing the value of |m in | decreases α Q c .Furthermore, increasing σ always decreases the storage capacity.(Center) The value σ = 0.1 is fixed, the shown curves corresponding to different values of m out .Increasing the value of |m out | always increases the storage capacity (Right) As before, with σ = 0.3, showing the lowering of the perceptron performance with increasing quantum fuzziness in the pattern encoding.

Remark 4 mFigure 2 : 1 m
Figure 2: (Left) Storage capacity α Q c vs m out for different values of m in , κ = 0, σ = 1 and = 0.01.Increasing |m out | always increases α Q c , while increments of |m in | have the opposite effect.There is a divergence for |m out | → 1 for each value of m in , although higher values of |m out | are required to observe the divergence if |m in | is increased.(Right) Storage capacity in the classical limit, obtained for σ = 0 (all the values of the other parameters are unchanged).All the curves corresponding to different values of m in collapse into each other since in this case there is no dependence on m in (recall that here κ = 0.

Figure 3 :Figure 4 :
Figure 3: Limit α c for m out → 1, showing that the convergence to the asymptotic behaviour (30), represented here by the line y = x + q, holds for several values of m in , although the onset of the asymptotic behaviour appears later for m in close to one.