Quantum-enhanced learning of rotations about an unknown direction

We design machines that learn how to rotate a quantum bit about an initially unknown direction, encoded in the state of a spin-j particle. We show that a machine equipped with a quantum memory of O(log j) qubits can outperform all machines with purely classical memory, even if the size of their memory is arbitrarily large. The advantage is present for every finite j and persists as long as the quantum memory is accessed for no more than O(j) times. We establish these results by deriving the ultimate performance achievable with purely classical memories, thus providing a benchmark that can be used to experimentally demonstrate the implementation of quantum-enhanced learning.

black box, the state of the probe is |φ θ,g = U (j) g |φ θ , where U (j) g is the unitary matrix representing the action of the rotation g.
Scenario 2 is also relevant to the study of quantum reference frames [28]. Our learning task can be translated into a distributed quantum protocol involving two distant parties, Alice and Bob, who do not share a reference frame for spatial directions. The goal of the protocol is to allow Bob to rotate a target particle by a desired angle θ about the direction of Alice's z-axis, encoded in the state of a spin-j particle prepared by Alice and sent to Bob as a token of her reference frame. In this setting, the unknown rotation g describes the mismatch between Alice's and Bob's Cartesian axes, and the optimal learning strategy provides the optimal protocol for encoding the direction of Alice's z-axis in a spin-j particle and for rotating Bob's target particle accordingly.
A key difference between Scenarios 1 and 2 is that the initial state of the probe is irrelevant in Scenario 1 (every initial state is reset to the state |j, j n ), while it can be optimised in Scenario 2. More generally, the optimal probe state in Scenario 2 could be an entangled state involving, in addition to the the spin-j particle, an auxiliary system stored in the internal memory of the machine. Nevertheless, we will show that such auxiliary system does not increase the accuracy in the execution of the desired rotation, and therefore it can be omitted without loss of generality.
In this paper we establish the optimal learning strategies for both Scenarios 1 and 2, focussing on the case where the target particle is a qubit. A summary of the key result is as follows. For j > 1, we find that the optimal strategies for Scenarios 1 and 2 coincide. In both cases, the optimal learning strategy consists in 1. preparing the probe in the initial state |φ θ = |j, j , the eigenstate of J z with maximum eigenvalue j 2. imprinting the direction in the probe, and storing the resulting state in a quantum memory of log(2j + 1) qubits 3. retrieving the probe's state from the memory and letting it interact with the target through the isotropic Heisenberg interaction H ∝ σ x J x + σ y J y + σ z J z , where (σ i ) i=x,j,z are the Pauli matrices for the target qubit.
Notably, the structure of the optimal learning machine is independent of the desired rotation angle θ: a single probe state and a single interaction Hamiltonian work optimally for all possible angles. The rotation angle only affects the interaction time between the probe and the target.
For every j > 1, we prove that the optimal machine with quantum memory outperforms every machine with purely classical memory. We determine the optimal fidelity over all machines with purely classical memory, providing a benchmark that can be used to demonstrate the advantage of quantum memories in realistic experiments. For example, we show that the optimal classical strategy for j = 3/2 and θ = π has fidelity 64%, while the optimal quantum strategy has fidelity of 71%. As a consequence, every experimental fidelity above 64% guarantees the demonstration of quantum-enhanced learning. In general, we show that a non-zero quantum advantage is present for every rotation angle θ = 0 and for every j > 1. We also prove that the advantage persists even if the memory is accessed multiple times, as long as the number of accesses to the memory is O(j). In Scenario 1, we find that the quantum advantage persists at non-zero temperature T , as long as the magnetic energy µ B is large compared to the thermal fluctuation k B T , k B being the Boltzmann constant.
For j = 1, we find out a striking difference between Scenarios 1 and 2. In Scenario 1, the quantum memory offers an advantage for all possible rotation angles. In Scenario 2, the advantage disappears when the rotation angle approaches π. In that regime, the optimal strategy consists in 1. preparing the probe in the initial state |φ θ = |1, 0 , the eigenstate of J z with eigenvalue m = 0, 2. sending the probe to the unknown gate U For j = 1/2, the optimal strategies for Scenarios 1 and 2 coincide, and the availability of a quantum memory offers advantages for all rotation angles except θ = 0 and θ = π.
The paper is structured as follows. In Section II we introduce the problem of learning a rotation about an unknown direction, considering two alternative ways of imprinting the direction into the state of a spin-j probe. We derive the optimal quantum strategy in Section III, and the corresponding quantum benchmark in Section IV. In Section V, we show that the advantage persists even if the memory state is accessed multiple times, and in Section VI, we show that the optimal learning strategy for Scenario 1 is robust to thermal noise. In Section VII, we extend our results from qubits to systems of arbitrary dimensions. The conclusions are drawn in Section VIII.

II. LEARNING HOW TO ROTATE ABOUT AN UNKNOWN AXIS
In this section we introduce the task of learning how to rotate a quantum particle about an initially unknown axis. We consider two scenarios, in which the unknown axis is imprinted in a quantum probe via two physically different processes: (1) spin relaxation, and (2) action of an unknown rotation gate. We formalise the optimisation problems corresponding to these scenarios and establish a relation between the corresponding solutions.
A. Scenario 1: learning from a relaxation process Suppose that a static magnetic field B = (B x , B y , B z ) is turned on for a limited amount of time in a bounded region of space. While the field is turned on, a spin-j particle is placed in the region and undergoes a relaxation process, whereby its spin becomes aligned with the field's direction. After the alignment has taken place, the state of the particle is stored in the internal memory of a quantum machine, which will later use it to rotate a target particle by a desired angle θ about the direction n = B/ B .
We denote the spin-j particle as P j , and let J x , J y and J z be its spin operators, satisfying the commutation relations [J x , J y ] = iJ z , [J y , J z ] = iJ x , and [J z , J x ] = iJ y . All throughout the paper the standard notation |j, m (respectively, |j, m n ) for the eigenstate of the operator J z (respectively, n · J) with eigenvalue m.
The alignment of the magnet with the external magnetic field can be described by a thermalisation process, whereby the initial state of the magnet converges to the thermal state of the magnetic Hamiltonian H = −µ B·J = −µ(B x J x + B y J y + B z J z ), where µ > 0 is a suitable constant. For simplicity, we will assume that the temperature of the bath is low enough that the thermal state is approximately the ground state of H. Explicitly, the ground state is the spin coherent state |j, j n .
Overall, the alignment process can be modelled as a quantum channel (completely positive trace-preserving map) T n that resets every state of the probe to the state |j, j n . In Section VI we will extend our discussion to the finite-temperature scenario, where the channel T n resets the probe state to the thermal state of the magnetic dipole Hamiltonian.
The goal of the quantum machine is to rotate a target particle S by a given angle θ about the direction n. We will mostly focus on the case where the target is a spin-1/2 particle, regarded as a qubit. In this case, we denote the target rotation by V θ,n := cos θ 2 I − i sin θ 2 n · σ, where σ = (σ x , σ y , σ z ) are the three Pauli matrices, and n · σ := n x σ x + n y σ y + n z σ z .
To learn how to implement the target rotation, the machine will transfer information from the magnet to its internal memory M. Mathematically, this operation is described by a quantum channel (completely positive trace-preserving map) E θ transforming states of P j into states of M. To be completely general, we allow the channel to depend on the desired angle θ. If the memory is classical, the channel E θ represents a measurement on the magnet, followed by the storage of the outcome in the memory. If the memory is quantum, the channel E θ can be any process transforming states of the magnet into states of the memory.
When asked to perform the target rotation, the machine will retrieve information from its internal memory, and will use such information to control the evolution of the target system, hereafter denoted by S. If the memory is classical, the control amounts to a conditional operation on the target depending on the classical data stored in the memory. If the memory is quantum, the control can be any general interaction between the memory and the target system. In both cases, the control operation can be described by a quantum channel R θ transforming joint states of the composite system M ⊗ S into states of S.
Overall, the structure of the learning process is depicted in Figure 1. If the initial state of the target is |ψ , then the final state is where C θ := R θ • (E θ ⊗ I S ) is the effective quantum channel transforming joint states of the probe and the target into states of the target alone.
To evaluate the accuracy of the learning process, we compare the output state ρ θ,n with the desired output V θ,n |ψ ψ|V † θ,n . As a figure of merit, we use the average input-output fidelity [29] F 1 (j, θ) := dn dψ ψ|V † θ,n C θ |j, j n j, j| n ⊗ |ψ ψ| V θ,n |ψ , FIG. 1. Learning from a relaxation process. A spin-j particle, initially in the state |φ , undergoes a relaxation process Tn, which aligns its spin with the direction n of an external magnetic field. Information about the direction is then transferred from the spin-j particle into the machine's internal memory M. The task of the machine is to rotate a target qubit S by an angle θ about the direction n. To this purpose, the machine will perform a joint operation R θ on its internal memory and on the target, designed to approximate the desired unitary gate V θ,n .
where dn is the rotationally-invariant probability distribution on the unit sphere, |ψ is the initial state of the target qubit, and dψ is the unitarily invariant probability distribution on the pure states. The associated optimisation problem is: Problem 1 Find the quantum channel C θ that maximises the fidelity F 1 (j, θ) in Equation (2).
The optimisation can be performed with different constraints on the channel C θ , corresponding to different assumptions on the machine's internal memory. In this paper, we will consider two cases: 1. The machine is equipped with a quantum memory of log 2j + 1 qubits. In this case, the channel C θ is an arbitrary completely positive trace-preserving map.
2. The machine is equipped with a classical memory of arbitrary size. In this case, the channel C θ must be decomposable into a measurement on the probe followed by a conditional operation on the target.
We will carry out both optimisations and compare the maximum fidelity achievable with a quantum memory with the maximum fidelity achievable with classical memories of arbitrary size.
B. Scenario 2: learning from a rotation gate Consider the following general problem. A quantum machine has access to one use of a black box implementing some unknown unitary gate U x , randomly drawn from some set (U x ) x∈X . By interacting with the black box, the machine has to learn how to perform another unitary gate V x , acting on a target system S. Typically, the gate learning problems considered so far correspond to the case V x = U x (the machine attempts to emulate the gate U x [14][15][16]), or to the case V x = U † x (the machine attempts to invert the gate U x [14,27]). In general, the relation between U x and V x can be arbitrary.
To learn the target gate, the machine sends a probe P to the black box. In general, the probe can be entangled with an auxiliary system A, stored in the machine's internal memory. If the initial state of the composite system P ⊗ A is |φ , then the state after the action of the black box is where I A denotes the identity operator on the auxiliary system. After the black box has acted, the probe returns to the machine, which transfers information from the state |φ x to its internal memory M. The transfer of information is described by a quantum channel E with input system P ⊗ A and output system M. Overall, the imprinting of the parameter x in the machine's memory is called the training phase. Accordingly, we call U x the training gate.
After the training phase has been concluded, the machine will be asked to perform the gate V x on the target system. We call this phase the execution phase. The machine will access its internal memory and use it to control the evolution of the target system. The control mechanism is described by a quantum channel R with input system M ⊗ S and output system S.
We call the above scenario the U x -to-V x learning problem. Its overall structure is summarised in Figure 2. The temporal separation between the training phase and the execution phase makes the U x -to-V x learning problem distinct Learning from a unitary gate: the Ux-to-Vx learning problem. A machine learns to perform a target gate Vx by probing a training gate Ux. In training phase, a probe is prepared together with an auxiliary system in a joint state |φ . The probe undergoes the gate Ux, and the joint state of the probe and auxiliary system becomes |φx := (Ux ⊗ IA)|φ . Information from the state |φx is then transferred to the machine's internal memory through an encoding channel E. In general, the memory can be quantum, classical, or a hybrid quantum-classical system. In execution phase, machine performs a joint operation R accessing its internal memory and use it to try to perform Vx on the target system. from the problem of simulating the gate V x using the gate U x as a resource [30][31][32][33][34][35]. In that problem, the gate V x is simulated by an arbitrary circuit using the gate U x , not necessarily a circuit of the form depicted in Figure 2.
A general result by Bisio et al concerns the case where the set X is a group and the mappings x → U x and x → V x (or x → V † x ) are two unitary representation of the group X. In this scenario, the authors showed that the optimal learning performance can be achieved with a purely classical memory [14]. In this paper, we present an instance of U x -to-V x learning problem that evades Bisio et al no go theorem. In our scenario, x is a rotation g ∈ SO(3), the probe is a spin-j particle P j , the training gate is the unitary gate U (j) g that implements the rotation g on the probe, the target is a qubit, and the target gate is the rotation V θ,g defined by where θ ∈ [0, 2π) is a fixed, but otherwise arbitrary angle, U g is the 2-by-2 unitary matrix representing the rotation g, and V θ = cos θ 2 I −i sin θ 2 σ z is the 2-by-2 matrix representing a rotation by θ about the z-axis. Since the rotation angle is fixed, the target operations do not form a group, and therefore our learning problem falls outside the hypotheses of Bisio et al's no go theorem.

FIG. 3. Learning from a rotation gate.
A machine has access to a rotation gate U (j) g , which implements a rotation g on a quantum particle of spin-j, denoted by Pj. The goal is to learn how to perform rotations on a target qubit, denoted by S. Specifically, the machine is designed to perform the qubit gate V θ,g = UgV θ U † g , where θ is a generic angle, and V θ is a rotation by θ about the z axis. In the training phase, the machine probes the gate U (j) g by preparing the spin-j particle and an auxiliary system A in a joint state |φ θ . The output state U (j) g ⊗ IA |φ θ is then stored in the internal memory of the machine and is retrieved in the execution phase, when the machine performs a joint operation R θ designed to approximate the action of the target gate V θ,g .
The U g -to-V θ,g learning problem is also relevant to the study of quantum reference frames [28]. Suppose that two distant parties, Alice and Bob, do not share a reference frame for directions. This means that Bob's Cartesian axes n i , for all i ∈ {x, y, z}. Now, imagine that Bob wants to rotate a qubit by an angle θ about the direction of Alice's z-axis. To assist Bob in this task, Alice will send him a quantum system carrying information about her reference frame. If the transmitted system is a spin-j particle, prepared by Alice in the state |φ θ , then Bob will receive the particle in the state U (j) g |φ θ , owing to the mismatch of their reference frames. Using the state U (j) g |φ θ as a resource, Bob can attempt to execute the desired rotation, corresponding to the unitary gate V θ,g = U g V θ U † g . More generally, Alice could send Bob a spin-j particle together with an auxiliary particle A whose state space is invariant under rotations. In this case, Bob will receive the state U (j) g ⊗ I A |φ θ , where |φ θ is the initial state of the spin-j and the auxiliary particle. In this setting, the search for the optimal communication protocol between Alice and Bob is equivalent to the search of the optimal learning strategy for the U g -to-V θ,g learning problem.
A diagrammatic representation of the U g -to-V θ,g learning problem is provided in Figure 3. The spin-j probe and the auxiliary system A start off in the state |φ θ . Then, the probe is sent through the gate U (j) g . After the action of the gate U (j) g , the probe and system A will be in the state where I A is the identity on the auxiliary system's Hilbert space. Then, the state |φ θ,g is encoded in the machine's memory via a channel E θ . In the execution phase, the machine will perform a quantum channel R θ , transforming the input state of the memory and the target into the output state of the target. The average fidelity for U g -to-V θ,g learning task is where dg is the normalized Haar measure over the rotation group, and C θ := R θ • (E θ ⊗ I S ) is the effective channel transforming states of the composite system P j ⊗ A ⊗ S into states of S. This leads to the following optimisation problem: Problem 2 Find the auxiliary system A, the input state |φ θ , and the channel C θ that maximise the fidelity F 2 (j, θ) in Equation (6).

III. OPTIMAL QUANTUM STRATEGIES
Here we determine the optimal quantum strategies for learning rotations around an unknown direction. We solve Problems 1 and 2 defined in the previous section for all values of the spin j and for all values of the rotation angle θ. For j > 1, we show that the optimal state for Problem 2 is the spin coherent state |j, j , and therefore the optimal fidelity coincides with the optimal fidelity for Problem 1. In both problems, the best approximation of the target rotation is realised by setting up an isotropic Heisenberg interaction between the target and the probe. For j = 1/2 and j = 1, we find some curious features of the optimal strategies. Notably, the optimal solution of Problem 2 deviates from the optimal solution of Problem 1 for j = 1 when the rotation angle approaches π.
A. Structure of the optimal solution of Problem 2 Here we focus on Problem 2 and determine the structure of its optimal solution. The main result is the following theorem: Theorem 1 The optimal strategy for learning the target gate V θ,g = U g V θ U † g from the training gate U (j) g has the following features: 1. no auxiliary system is needed 2. the optimal input state is an eigenstate of J z 3. the optimal quantum channel is rotationally covariant, namely where U g and U (j) g are the quantum channels induced by the unitary gates U g and U (j) g , respectively.
The theorem follows from two lemmas: Lemma 1 No auxiliary system is needed in the optimal strategy for learning the gate V θ,g from the gate U (j) g . The optimal input is an eigenstate of the z-component of the angular momentum.
Proof. Note the target gate V θ,g satisfies the relation V θ,g = V θ,gh for every rotation h around the z axis. Then, the fidelity (6) can be rewritten as where we used the shorthand notation χ := |χ χ|, and we derived the second equality from the invariance of the Haar measure with the change of variables k = gh. Defining the average state and its rotated version , the fidelity can be expressed as Since φ θ is the average of φ over all rotations about the z axis, it can be expressed as where {p m is a pure state of the auxiliary system. Since the fidelity is linear in the input state, the optimal choice is to pick one of the terms in the mixture, such as |j, m j, m|⊗|α Moreover, the state of the the auxiliary system can be absorbed in the definition of the channel C θ . This concludes the proof that the optimal input state can be chosen to be |j, m without loss of generality and that no auxiliary system is needed.
Consistently with the above result, we will omit the auxiliary system A from now on.

Lemma 2
The optimal channel C θ for learning the gate V θ,g from the gate U (j) g can be chosen to be covariant without loss of generality.
Proof. The optimality of covariant channels follows from the following chain of equalities: having defined |ψ := U † g |ψ in the second equality, and C θ : g ⊗ U g in the third equality. Since C θ is covariant, the above equality shows that every channel can be replaced by a covariant channel with exactly the same fidelity.
Covariant channels have the same performance for all possible training gates. Hence, for a covariant channel C θ the fidelity can be rewritten as B. Choi operator formulation Theorem 1 guarantees that the optimal input state for learning the gate V θ,g from the gate U (j) g is an eigenstate of J z . Let us denote it generically as |j, m θ , for some m θ between −j and +j, possibly depending on the rotation angle θ. In the following we will search for the optimal value m θ and for the optimal covariant channel C θ .
First of all, we rewrite the average fidelity as where F (e) 2 is the entanglement fidelity [37], defined as |Φ + = (|0 ⊗ |0 + |1 ⊗ |1 )/ √ 2 being the canonical maximally entangled state and R denoting a reference qubit, entangled with the target qubit. In turn, the entanglement fidelity can be expressed as where |Φ θ is the rotated maximally entangled state and C θ is the Choi operator [38] R j being a reference system of dimension 2j + 1, I Rj (I R ) being the identity map on the reference system R j (R), and |Φ + j = m |j, m ⊗ |j, m / √ 2j + 1 being the canonical maximally entangled state in dimension 2j + 1. The problem is to maximise the fidelity (17) over all Choi operators of covariant channels. The set of the possible Choi operators is characterised by the following three conditions: g and U g denote the entry-wise complex conjugates of the matrices U (j) g and U g , respectively.) 2. Positivity: C θ is positive semidefinite, denoted as C θ ≥ 0 3. Trace preservation: Tr out [C θ ] = I in , where Tr out denotes the trace over the output, and I in denotes the identity over the input.
We now put the above conditions in a form that is convenient for optimization.
Covariance. The covariance condition can be further simplified using the fact that complex conjugate representations of the rotation group are unitarily equivalent. Defining the operator the covariance condition becomes At this point, the total Hilbert space can be decomposed into orthogonal subspaces, corresponding to different values of the total angular momentum. Specifically, the angular momentum takes values j − 1, j, and j + 1, and the total Hilbert space is decomposed as Relative to this decomposition, using Schurs lemmas and the covariance condition (21), the operator C * θ can be written as: where P l is the projection on the factor with total angular momentum l, α and β are complex coefficients, and M is a complex 2-by-2 matrix.
Positivity. The positivity of the operator C θ is equivalent to the positivity of the coefficients α, β and of the matrix M .
Trace preservation. The condition of trace preservation can be conveniently expressed in terms of the real coefficients α, β and of the complex matrix M . Indeed, tracing over the output, we obtain for a suitable choice of basis {|+ , |− }. Using Eq. (24), the trace preservation condition Tr out [C θ ] = I in becomes Figure of merit. In terms of the operator C * θ , the entanglement fidelity can be expressed as The expression can be further simplified by decomposing the state |j, −m θ ⊗ |Φ * θ on the subspaces of Eq. (22). After a bit of labor with the Clebsch-Gordan coefficients, we find the decomposition with Using the above decomposition, the entanglement fidelity can be expressed as to be maximized over all positive coefficients α and β, and over all non-negative matrices M satisfying the constraint (25).

Lemma 3
The matrix M can be chosen to be rank-one without loss of generality, namely M = |v v| for some suitable Proof. The entanglement fidelity depends on the matrix M through the matrix element c|M |c . Now, one has the chain of inequalities the second inequality following from the fact that M is positive. The first inequality holds with the equality sign when the phase of the complex number +|M |− is equal to the phase of the complex number c + c − . The second inequality holds with the equality sign if M is rank-one. In particular, the upper bound is attained by the rank- Since the normalization constraint (25) involves only the diagonal matrix elements of M , the matrix M can be replaced by the matrix M without loss of generality.
The proof of the above lemma shows that the optimal entanglement fidelity has the form with |v ± | = ±|M |± . The maximum of the fidelity (31) under the constraints (25) can be determined with the method of Lagrange multipliers. In the following we present the result of the maximization, leaving the details to Appendix A.
C. Optimal quantum strategy for j > 1 For j > 1, it turns out that Problems 1 and 2 have the same optimal solution: Theorem 2 When j > 1, the optimal probe state for learning the gate V θ,g = U g V θ U † g from the gate U (j) g is |j, j for every value of θ. For both Problems 1 and 2, optimal average fidelity over all pure input states is and has the asymptotic expression The optimality of the probe state |j, j is in agreement with a result by Holevo on the optimal estimation of directions, cf. Section 4.10 of [36]. In other words, the optimal probe state for learning how to rotate about an unknown direction coincides with the optimal probe state for producing a classical estimate of such direction, as long as j is larger than 1. It is worth stressing, however, that the optimal quantum strategy for rotating about an unknown direction is not based on estimation: in Section IV we will show that no estimation-based strategy can achieve the optimal quantum fidelity (32).
The exact values of the average fidelity are plotted in Figure 4 for various values of j from j = 2 to j = 100. Note that the fidelity decreases monotonically with the rotation angle θ. Intuitively, rotating by smaller angles is easier, because the uncertainty about the rotation axis has less influence on the performance. The easiest rotation is the identity (θ = 0), which is independent of the rotation axis and therefore can be implemented without error. The hardest rotation is the spin flip, corresponding to θ = π. In this case, the average fidelity has the simple form Note that, since the optimal probe state is |j, j , the optimal channel C θ for Problem 1 coincides with the optimal channel C θ for Problem 2. In Appendix B, we show that an optimal channel C θ can be attained by setting up an isotropic Heisenberg interaction between the memory spin and the target spin. Explicitly, we show that the maximum fidelity (32) is attained by the channel where Tr Pj denotes the partial trace over the probe, and U θ is the unitary operator in which σ = (σ x , σ y , σ z ) is the vector of the three Pauli matrices, J · σ = i=x,y,z J i ⊗ σ i is the Heisenberg coupling, and f (θ) is the function where s(θ) = 0 for θ ∈ [0, π], and s(θ) = π for θ ∈ (π, 2π). Note that f (θ) is approximately equal to θ in the large j limit. Physically, the unitary evolution (36) can be realized by setting up an isotropic Heisenberg interaction, described by the Hamiltonian H = α J · σ, for some suitable coupling constant α, and by letting the two spins evolve for time depending on the angle θ of the target rotation. Remarkably, the same probe states and the same interaction can be used to control the full time evolution of the target system: one has only to adjust the interaction time [determined by the angle f (θ)] based on the evolution time in the target dynamics [determined by the angle θ]. For example, we can set θ = ωt and simulate the precession of a spin-1/2 particle around the direction indicated by the memory state. An important feature of the optimal strategy is that the optimal probe state is independent of the rotation angle θ. Since the operation of storing the state U g |j, j in the quantum memory is also independent of θ, it follows that all the operations in the training phase can be accomplished without knowing the rotation angle. This offers the possibility to decide the value of θ at later times. In fact, the machine can optimally approximate the full continuous-time dynamics of the target particle, because the optimal operations for different θ corresponds to unitary evolutions with the same Hamiltonian, just with different evolution times.
The optimality of the Heisenberg interaction is not limited to the average fidelity. In terms of scaling with j, the unitary gate (36) is optimal also for the worst-case fidelity, defined as where F (j, θ, g, ψ) is the fidelity for the simulation of V g on the specific input state |ψ . Indeed, in Appendix C, we show that the worst-case fidelity of the unitary gate (36) is Hence, the worst-case infidelity 1 − F w,He (j, θ) has the scaling 1/j. This is the best scaling one can hope for, because the average infidelity cannot vanish faster than 1/j [as shown by Eq. (33)], and the average infidelity is a lower bound to the worst-case infidelity.
The optimality of the Heisenberg interaction answers in the affirmative a question raised by Marvian and Mann [40], who assumed the Heisenberg interaction and showed that it can be used to approximate arbitrary rotations in the limit of large j limit. In the conclusion of their work, Marvian and Mann asked whether the Heisenberg interaction achieves the best scaling of the error with the spin size. Our results provide an affirmative answer, showing that the Heisenberg interaction maximizes the average fidelity and has the optimal error scaling O(1/j) in the worst-case scenario.
D. Optimal quantum strategy for j = 1/2 For j = 1/2, the optimal probe state for Problem 2 is still the coherent state |j, j for every rotation angle θ, and the optimal solutions of Problems 1 and 2 still coincide.
For |θ − π| ≤ δ 1/2 , instead, the optimal fidelity becomes and is achieved by the following strategy: P l being the projector on the subspace with total angular momentum l, with l ∈ {0, 1}.
2. If the measurement yields outcome "yes", then apply the unitary gate (36), corresponding to the Heisenberg interaction, and discard the memory. If the measurement yields outcome "no", then perform the optimal 2-to-1 universal NOT channel [41], namely the channel C UNOT defined by The probability of the outcome "no", corresponding to the universal NOT, depends on the parameter α in Eq. (A9). At the critical distance |θ − π| = arccos[(4 + √ 7)/9], one has α = 0, and the optimal strategy is realized through the Heisenberg interaction. As the rotation angle gets closer to π, the coefficient α increases, reaching its maximum value α = 2/3 for θ = π. At this point, the weight of the universal NOT is maximum. Notably, the value α = 1 is never reached, meaning that the optimal joint measurement on the input qubits is never projective. The j = 1 case is the only case where Problems 1 and 2 yield different solutions. The difference appears when the rotation angle is within a critical distance δ 1 = 0.23π from π.
For |π − θ| > δ 1 , the optimal probe state for Problem 2 is |1, 1 , and therefore the optimal solutions for Problems 1 and 2 still coincide. The optimal average fidelity is still given by Equation (32) and the optimal channel C θ is still given by Equation (35).
For |θ − π| ≤ δ 1 , the optimal average fidelity for Problem 1 is corresponding to Equation (32) with j = 1. The optimal channel C θ is still given by Equation (35). Instead, the optimal fidelity for Problem 2 is and is attained with the probe state |1, 0 , the p-orbital aligned in the direction of the z-axis. In Subsection IV E, we will show that the optimal quantum fidelity (45) is achievable with a purely classical memory. Specifically, we will see that the optimal strategy is to perform a projective measurement on the probe, with the three measurement outcomes corresponding to the three Cartesian axes. The measurement outcome is then stored into a classical memory of 2 bits. In the execution phase, the machine rotates the target qubit by an angle π about the axis corresponding to the measurement outcome.
F. Optimal fidelities for j = 1/2 and j = 1 The dependence of the fidelity on the rotation angle is plotted in Figure 5 for j = 1 and j = 1/2. The value of the optimal quantum fidelity is contrasted with the maximum fidelity achievable with a purely classical memory, which will be derived in Section IV.
FIG. 5. Optimal quantum fidelities and benchmarks for j = 1/2 and j = 1. Solid curves show the maximum of the fidelity over all quantum machines, while dashed curves provide the corresponding benchmarks, equal to the maximum fidelity over all machines equipped with a purely classical memory (derivation provided in the next Section). For j = 1/2, the optimal strategies for Problems 1 and 2 coincide. The fidelity of the optimal quantum strategy is higher than the benchmark (blue dashed line) for all values of θ except θ = 0 and θ = π (although the difference in the transition region is so small that cannot be read out from the plot). A transition in the optimal quantum channel C θ occurs at the critical distance |θ − π| = δ 1/2 := arccos[(4 + √ 7)/9] ≈ 0.236π. For j = 1, the optimal strategies for Problems 1 and 2 coincide for |θ − π| > δ1 ≈ 0.23π, but become different for |θ − π| < δ1 ≈ 0.23π. The optimal fidelity for Problem 1 (black solid curve) is higher than the benchmark for every θ = 0 (black dashed line). The optimal fidelity for Problem 2 (red solid curve) deviates from the optimal fidelity for Problem 1 when the distance |θ − π| goes below the critical value δ1 ≈ 0.23π. At the critical distance, the optimal input state changes discontinuously from |1, 1 to |1, 0 . In this region, the optimal quantum fidelity becomes equal to the benchmark (red dashed curve).

IV. THE QUANTUM BENCHMARK
In this section we derive the maximum fidelity achievable by learning machines with a purely classical memory of arbitrarily large size. Such fidelity provides a benchmark that can be used to certify the experimental demonstration of quantum-enhanced learning. We consider the two learning tasks corresponding to Problem 1 (learning from a spin coherent state) and Problem 2 (learning from a rotation gate) coincide. The quantum benchmarks for these two problems coincide for all values of j except j = 1. For j = 1, the two benchmarks become different when the desired rotation angle approaches π.
A. Measure-and-operate (MO) channels Here we consider learning strategies where the memory M in Figures 1 and 3 is purely classical. In this case, the transfer of information from the probe to the memory is described by a quantum-to-classical channel E θ , of the form where {|y } y∈Y is a set of orthogonal states of the memory, and (P θ,y ) y∈Y is a Positive Operator-Valued Measure (POVM), describing a quantum measurement on system P j in the case of Figure 1, or a quantum measurement on system P j ⊗ A in the case of Figure 3. The execution phase consists in reading out the index y from the classical memory and performing a conditional operation O θ,y on the system. Hence, the channel R θ has the form The operations performed by machines with purely classical memory will be called measure-and-operate (MO) strategies. Combined together, the "measure" channel E θ and the "operate" channel R θ give a single quantum channel C θ,MO , of the form where Tr S denotes the partial trace over all systems except system S. In the following, we will solve the optimisations in Problems 1 and 2 under the constraint that the channel C θ is of the MO form (48). By definition, the optimal MO fidelities are by definition no larger than the optimal quantum fidelities derived in the previous Section. B. Structure of the optimal MO strategy for Problem 2 The structure of the optimal MO strategy for Problem 2 is summarized by the following Theorem, proven in Appendix D.
Theorem 3 The optimal MO strategy for learning the gate V θ,g = U g V θ U † g from the gate U (j) g has the following features: 1. no auxiliary system is needed 2. the optimal probe state is an eigenstate of J z , denoted as |j, m θ 3. the outcome of the optimal POVM is an element of the rotation group SO(3), denoted asĝ 4. the optimal POVM (P θ,g ) g∈SO(3) is rotationally covariant [36], and has the form where |ξ θ is a unit vector 5. the optimal conditional operation has the form O θ,ĝ = U (j) where O θ is a fixed channel acting on the target qubit.
In the following we will maximise the gate fidelity over all MO strategies with the features described by Theorem 3. For convenience we will express the gate fidelity in terms of the entanglement fidelity [cf. Equation (15)].

C. Choi operator formulation
For an optimal strategy as in Theorem 3, the entanglement fidelity takes the form where O θ,ĝ is the Choi operator of the channel O θ,ĝ , O θ is the Choi operator of the channel O θ , and |Φ + θ,g := (V θ,g ⊗ I R ) |Φ + .
Our goal is to maximise the entanglement fidelity (50) over all values of m θ , over all unit vectors |ξ θ , and over all Choi operators O θ . To this purpose, the key observation is that the Choi operator O θ can be chosen to be real in a suitable basis. Specifically, we have the following Proof. Every unitary V θ,g = U g V θ U g is a real linear combination of the matrices I, iσ x , iσ y , and iσ z . Hence, every vector |Φ + θ,g = (V θ,g ⊗ I)|Φ + is a real linear combination of the vectors |Φ + , i|Ψ + = (iσ x ⊗ I)|Φ + , |Ψ − = (iσ y ⊗ I)|Φ + , and i|Φ − = (iσ z ⊗ I)|Φ + . Since the fidelity depends on the Choi operator O θ only through the matrix elements Φ + θ,g |O θ |Φ + θ,g , the optimal Choi operator can be chosen to be real in the same basis as the vectors |Φ + θ,g .
Thanks to Proposition 1, the maximization of the fidelity can be restricted to the set of Choi operators that are real in the Bell basis. This set of Choi operators can be equivalently characterized as the set of Choi operators of unital channels, i.e. quantum channels mapping the identity operator to itself. Indeed, we have the following

Proposition 2 A qubit channel is unital if and only if its Choi operator is real in the Bell basis
Proof. If a qubit channel is unital, then it is a convex combination of unitary channels [42]. For every unitary channel, the corresponding Choi operator is real in the Bell basis. Indeed, every unitary channel has a Kraus decomposition with a single unitary operator of the form U = cos τ 2 I − i sin τ 2 n · σ, with τ ∈ [0, 2π) and n ∈ R 3 . Hence, the Choi operator 2 (U ⊗ I)|Φ + Φ + |(U ⊗ I) † is real in the Bell basis. Since the set of real Choi operators is convex, every unital channel is contained in it.
Conversely, suppose that a channel C has a Choi operator C that is real in the Bell basis, i.e. C = k,l C kl |Φ k Φ l |, for some real symmetric matrix (C kl ). Then, one has Since the fidelity is a linear function, its maximization can be restricted to the extreme points of the set of unital channels. For qubits, such extreme points are unitary channels [42]. Hence, we obtained the following Theorem 4 The quantum channel O θ maximizing the fidelity (50) can be chosen to be unitary without loss of generality.
Thanks to Theorem 4, the optimal entanglement fidelity (50) can be expressed as where W θ is a suitable unitary and |Φ + W θ := (W θ ⊗ I R ) |Φ + . The optimization can be further simplified using the following observation: The unitary gate W θ maximizing the fidelity (50) can be chosen without loss of generality to be a rotation about the z axis.

Proof. Every unitary W θ can be written as
where V θ is a rotation about the z axis by an angle θ , and h is the rotation that transforms the z axis into the rotation axis of W θ . Hence, the corresponding state can be written as |Φ Using this fact, the optimal MO fidelity can be rewritten as (here U T h denotes the transpose of the matrix U h .) The last equation shows that the maximisation of the fidelity can be reduced to rotations about the z axis.
At this point, it remains to maximise the fidelity (55) over m θ , ξ θ , and V θ . The result of the optimization is summarised in the following, while the details are provided in Appendix E.

D. Optimal MO strategy for j = 1
For j = 1, it turns out that the quantum benchmarks for Problems 1 and 2 coincide.
Note that the probe state and the measurement are both independent of the rotation angle θ. This means that the machine can be trained optimally even before the value of the rotation angle has been decided. The operations in the training phase coincide with the optimal estimation strategy for directions, derived in the classic work by Holevo [36].
The optimal MO strategy can be implemented by a learning machine with a purely classical memory. The size of the classical memory can be chosen without loss of generality to be 2 log(2j + 1) bits. This is because the fidelity is a linear function of the POVM, and therefore its maximum is attained by an extreme point of the convex set of all POVMs with outcomes in SO(3). The extreme points of such set consist of POVMs that assign non-zero probability to at most (2j + 1) 2 rotations [43]. Hence, the optimal POVM in Theorem 5 can be replaced by another, equally optimal POVM with at most (2j + 1) 2 outcomes, which can be stored into a classical memory of 2 log(2j + 1) bits. A plot of the MO fidelity and of the optimal quantum fidelity is provided in Figure 6. Note that the error (one minus fidelity) goes to zero in both cases, but the rate for quantum strategies is twice as fast, as one can see by comparing Equations (33) and (57).
corresponding to Equation (56) with j = 1. The MO strategy is still the one described in Theorem 5. For Problem 2, the optimal probe states transitions from |1, 1 to |1, 0 , and the optimal fidelity becomes F 2,MO,opt (j = 1, θ) = 1 3 The optimal MO strategy consists of 1. Measuring the memory with the POVM operators Pĝ = (2j + 1) U (j) g |1, 0 1, 0|U Rotating the target qubit about the axis n = g e z by an angle π, independently of θ.
Physically, the optimal POVM can be interpreted as a randomisation of the projective measurement that projects the spin-1 particle along the three Cartesian axes x, y, and z [43]. This projective measurement corresponds to the orthonormal basis {|x , |y , |z } for C 3 defined by |z := |1, 0 , |x := (|1, 1 + |1, −1 )/ √ 2, and |y := (|1, 1 − |1, −1 )/ √ 2. In the language of atomic physics, |x , |y , and |z are the p-orbitals aligned in the directions x, y, and z, respectively. Since the fidelity is a linear function of the POVM, the optimal POVM Pĝ = (2j + 1) U (j) g |1, 0 1, 0|U (j) † g can be replaced by an equally optimal POVM based on the projective measurement of {|x , |y , |z }, followed by a rotation by π about the Cartesian axis identified by the measurement outcome. In this discretised version of the MO strategy, the learning machine only needs a classical memory of 2 bits.

V. PERSISTENCE OF THE QUANTUM ADVANTAGE
We have seen that a machine equipped with a quantum memory can outperform every classical machine at the task of learning rotations about an unknown axis. Still, our analysis was restricted to the scenario where the quantum process accesses its memory only once, with the goal of reproducing a single use of the target gate. In the following we will study how the performance depends on the number of required executions of the target gate.
Let us focus on the regular case j > 1, where the optimal strategies for Problems 1 and 2 coincide, and the channel is realised by setting up a Heisenberg interaction between the memory and the target qubit. An important question is how many times the memory can be accessed before the accuracy drops below a certain threshold. In the context of quantum reference frames, the maximum number of accesses such that the fidelity is above threshold was called the longevity in Ref. [44]. Another important question is how many times the memory can be accessed before the quantum advantage is lost. The maximum number of accesses for which the fidelity is above the quantum benchmark (56) will be called persistence of the quantum advantage in the following.
Suppose that the joint evolution of memory and target is described by the same unitary gate at every step. Assuming the gate to be of the form of Eq. (36) for some fixed function f (θ), we obtain the close-form expression quantifying the average fidelity at the leading order in j (see Appendix F for the derivation). From this expression one can see that the longevity grows as j 2 . However, the persistence of the quantum advantage is much shorter: comparing the fidelity (62) with the MO fidelity (57), we find that the quantum advantage disappears when the number of repetitions is larger than One could also consider more elaborate strategies where the interaction time between memory and target is optimised at every step. However, we find that these strategies do not increase the longevity nor the persistence of the quantum advantage in the large j limit.

VI. ROBUSTNESS TO THERMAL NOISE
In Problem 1, we made the simplifying assumption that the unknown direction n is imprinted into the pure spincoherent state |j, j n , regarded as the low-temperature approximation of the thermal state of the magnetic dipole Hamiltonian. An interesting question is how this approximation affects our discussion of the quantum advantage. In the following we will address this question in the large j limit, showing that quantum memories are useful whenever the magnetic energy is sufficiently large compared to the thermal fluctuations.
The thermal states of the Hamiltonian H = −µ B · J can be written as ρ γ,n = sinh γ sinh[(2j + 1)γ] m e 2 γ m |j, m j, m| n , γ = µ|B| where T is the temperature and k B is the Boltzmann constant. The spin coherent state |j, j n is retrieved in the low temperature (γ → ∞) limit, as one has lim γ→∞ ρ γ,n = |j, j n j, j| n . Now, suppose that the learning strategy designed for the spin coherent state |j, j n is adopted for the mixed state ρ γ,n . In Appendix G, we show that the average fidelity has the asymptotic expression The above fidelity can be compared the benchmark in Equation (57), which quantifies the maximum fidelity achievable with classical memories. Note that Equation (57) provides the benchmark for both Problems 1 and 2, meaning that the benchmark applies to every pure probe state of the form |ψ θ,g = U (g) g |ψ , and by convexity, to every mixed probe state of the form ρ g = U (j) g ρU (j) † g . In particular, it applies to the thermal states ρ γ,n , as the average fidelity over all directions n is equal to the average fidelity over all rotations g. Comparing the fidelity (65) with the benchmark in Equation (57), we obtain that the quantum strategy outperforms all classical strategies whenever tanh γ is larger than 1/2, corresponding to the condition γ > 1 2 ln 3 ≈ 0.55. Hence, the quantum advantage persists whenever the magnetic energy µ|B| is larger than 1.1 times the thermal energy k B T .
Note that the quantum benchmark in Equation (57) is the optimal fidelity achievable with arbitrary probe states. If one further enforces the condition that the the probe state be thermal, then the value of the benchmark would be even lower, thereby extending the set of temperatures for which the quantum memory offers an advantage.
Note also that the above discussion applies to a variant of Problem 2 where the probe is subject to thermal noise before the action of the training gate U (j) g , resulting into a mixed input state ρ γ := ρ γ,ez . Also in this setting, the quantum memory offers a provable advantage when the parameter γ is larger than 1 2 ln 3.

VII. LEARNING HIGHER DIMENSIONAL GATES
Our result establishes the existence of a quantum advantage for learning single-qubit rotations about an unknown axis. This finding is conceptually important, because the advantage for single qubits implies an advantage of coherent learning for quantum systems of arbitrary dimension. Indeed, one can immediately prove the advantage by using the qubit benchmark for gates that act nontrivially only in a fixed two-dimensional subspace.
Our results also give a heuristic for the problem of learning rotation gates on higher dimensional spins. The idea is to encode the rotation axis in a spin coherent state and to let the memory and target spin interact as closed system. Explicitly, we make two spin systems undergo the Heisenberg interaction U in the large j limit. Remarkably, the error grows quadratically-rather than linearly-with the size of the target spin: in order to ensure high fidelity, the size of the memory must be large compared to the square of the size of the target system. The same conclusion holds for the worst-case fidelity, which has the asymptotic expression with c(k) = 0 for even k and c(k) = 1/4 for odd k.
The quantum strategy exhibits an advantage over the MO strategy consisting in measuring the direction n from the spin coherent state pointing in direction n and performing a rotation based on the outcome. Again, we find that the error of the quantum strategy vanishes in the macroscopic limit of large memory systems, at a rate twice as fast than the error of the classical strategy (see Appendix H for more details). It is an open question whether the above quantum and MO strategies are optimal for arbitrary k > 1/2.

VIII. CONCLUSIONS
We determined the ultimate accuracy for the task of learning a rotation of a desired angle θ about an unknown axis, imprinted in the state of a spin-j particle. In this task, we found that quantum memories enhance the learning performance for every j > 1 and for every rotation angle θ = 0. Specifically, we found that a quantum machine with a memory of log(2j + 1) qubits outperforms all learning machines with classical memory of arbitrarily large size.
We found that the advantage of the quantum memory persists even when the memory is accessed multiple times, as long as the total number of accesses is at most linear in the spin size. Quite interestingly, we observe a relation between the persistence and the size of the advantage: in the large j limit, the quantum advantage is of size O(1/j) and persists when the memory is accessed for O(j) times. Our results indicate that, as the memory size grows, the quantum advantage is spread over a larger amount of time. This tradeoff achieves the classical limit for spins of infinite size, for which the advantage disappears and the memory can be accessed infinitely many times.
At the fundamental level, our results provides the first example of a quantum memory advantage in a deterministic learning task involving unitary gates as the target operations. Advantages of quantum memories have been known for longer time for non-deterministic learning tasks, where the learning machine has a non-zero probability of aborting. For example, Refs. [22][23][24][25][26][27] provide examples of machines that learn an unknown unitary gate without errors, albeit with a non-unit probability of success. In all these examples, a quantum memory is necessary in order to achieve error-free learning. In practice, however, no real machine is error-free, and in order to experimentally demonstrate the advantage of the quantum memory one needs a benchmark that quantifies the best performance achievable with classical machines. No such benchmark has been derived for the non-deterministic learning tasks considered in Refs. [22][23][24][25][26][27], and a rigorous demonstration of the advantage of the quantum memory has not been possible so far. A promising direction of future research is to apply the techniques developed in this paper to the derivation of quantum benchmarks for non-deterministic learning of unitary gates.
Our work calls for the experimental demonstration of quantum-enhanced learning of rotations around an unknown direction. For small values of the spin, a possible testbed is provided by NMR systems, where spin-spin interactions are naturally available [45]. Another possibility is to use quantum dots, where one can engineer a coupling between a single spin and an assembly of spins effectively behaving as a single spin j particle [46]. This scenario, named the box model, can be achieved through a uniform coupling of a central spin to the neighbouring sites. No matter what platform is adopted, our results provide the rigorous benchmark that can be used to validate the successful demonstration of quantum-enhanced unitary gate learning in realistic scenarios where the implementation is subject to noise and experimental imperfections.
and is attained by the Choi operator The maximum of the fidelity is attained by m θ = j, independently of θ. Explicitly, the maximum fidelity is Note that the fidelity converges to 1 in the large j limit, meaning that the learning becomes nearly perfect for large spins. Comparison with Cases 2,3, and 4 in the following shows that the fidelity (A4) is optimal for every angle θ whenever the spin is larger than 1.
Case 2: x = 0, y = 0. In this case, the Lagrangian method yields the fidelity F (e) 2 (j, θ) = j + 1 2j + 3 |a| 2 1 + j j + 1 achieved by setting and x according to Eq. (25). The fidelity does not tend to 1 in the large j limit, indicating that the Case 2 strategy is suboptimal for large j. Still, it turns out that for j = 1/2 this strategy is optimal for some values of the angle θ around θ = π. In this case, the entanglement fidelity becomes F (e) and the optimal Choi operator is The transition from the Case 1 strategy to the Case 2 strategy occurs when the distance |π − θ| is below the critical value δ c = arccos[(4 + √ 7)/9] ≈ 0.236π.
Case 3: x = 0, y = 0. Note that a strategy with y = 0 can only exist for j > 1/2, because for j = 1/2 there is no subspace with spin j − 1, and therefore the coefficient y is not present. The method of Lagrange multipliers implies that, among the strategies with x = 0 and y = 0, the maximum fidelity is attained when x and y take their maximum values. The corresponding the Choi operator C * θ is and its fidelity is The maximum, attained for m θ = 0, is .
The fidelity does not reach 1 in the large j limit, indicating that the Case 3 strategy is suboptimal for large j.
Nevertheless, we find out that for j = 1 the Case 3 strategy is optimal for rotation angles around θ = π. For j = 1, the entanglement fidelity is A numerical comparison with the fidelity for Case 1 indicates that the above fidelity is optimal for |θ − π| ≤ δ c , with δ c = 0.23π. For |π − θ| > δ c , instead, the Case 1 strategy is optimal.
Case 4: x = 0, y = 0. This case is similar to Case 3, and the fidelity has the expression By comparison with the other cases, we find that the Case 4 fidelity is never optimal.

Appendix C: Worst-case fidelity
Here we show that learning to perform target gate V θ,g by using Heisenberg interaction in Eq. (35,36) has an error scaling in 1/j in terms of the worst-case fidelity (defined by Eq. (39)).
The worst-case fidelity is over all learning gate g and over all input target states ψ: where F Hei (j, θ, g, ψ) = ψ|V † θ,g C θ,Hei |j, j j, j| g ⊗ ψ V θ,g |ψ , is the fidelity for the simulation of V g on the specific input state |ψ , and C θ,Hei (|j, j j, j| g ⊗ ψ) = Tr Pj U θ (|j, j j, is calculated according to the optimal physical realization. Note that the trace is invariant under cyclic permutations and V θ,g = U g V θ U † g , we can rewrite Eq. (C2) as: By expanding U † g |ψ in basis {| 1 2 , 1 2 , | 1 2 , − 1 2 }: U † g |ψ = cos α 2 | 1 2 , 1 2 + e iβ sin α 2 | 1 2 , − 1 2 , we find that: By inserting Eq. (C5) into Eq. (C2), we can get showing that Appendix D: Proof of Theorem 3 The proof of the first two items of Theorem 3 is identical of the proof of Lemma 2. It remains to prove that there exists an optimal MO strategy consisting of a covariant POVM (P θ,ĝ ) and of conditional operations O θ,ĝ = Uĝ • O θ • U † g . The MO fidelity for Problem 2 can be expressed as For every y ∈ Y, we define the probability q θ,y = Tr[P θ,y ] 2j + 1 , the POVM and the quantum channels Note that the operators P (y) θ,ĝ g∈G satisfy the normalization condition dĝ P (y) following from Schur's lemma.
In terms of the above probabilities, POVMs, and channels, the expression (D1) can be rewritten as Since the fidelity is a convex combination, we have the upper bound It is immediate to check that the bound is attained by the MO strategy consisting of the POVM P (y * ) θ,g g∈G and of the conditional operations O θ,y * ,g , where y * is the outcome that maximizes the expression in the right-hand-side of Equation (D7).

Appendix E: Optimization of the MO strategy
Our goal is to maximize the fidelity over all values of m, all unit vectors ξ θ , and all unitary gates V θ . Using the relation U g = σ y U g σ y , we can rewrite the fidelity as having defined | ξ θ = e iπJy |ξ θ . We now insert Equation (E3) into the above expression, taking advantage of the orthogonality relation dg l, n|U |l, n; j 1 , j 2 l, n ; j 1 , j 2 | .
In this way, the fidelity becomes Expanding |ξ θ as |ξ θ = n ξ θ,n |j, n , we obtain the second inequality following from the Cauchy-Schwarz inequality applied to the vectors Γ + − Γ − |j, m ⊗ |j, −m and Γ + + Γ − |j, n ⊗ |j, −n . We will discuss the attainability of the bound (E8) in the end of the proof.
The same approach works for j = 1/2, in which case |m| = j is the only possible choice, and the optimization over θ yields again the optimal value (E11).
For j = 1, the optimal MO strategy is determined by a brute-force approach, by setting m = 0 and m = 1, optimizing the right-hand-side of Eq. (E10) over θ . When |π − θ| > 0.303π, the optimal MO strategy is the same as when j = 1. When |π − θ| 0.303π, the optimal m is m = 0, and the optimal angle θ becomes θ = π. Also in this case, the operator Γ is positive, and therefore the inequality (E8) is attained by choosing |ξ θ = |j, m .
Appendix F: Persistence of the quantum advantage The state of the memory spin after the interaction can be obtained by application of the complementary channel C θ , defined by where Tr S denotes the partial trace over the target spin, and U θ is the unitary operator in Eq. (36).
To evaluate this state, it is convenient to look at the evolution of the basis states |j, m g := U (j) g |j, m . By explicit calculation, we obtain the relation C θ |j, m j, m| g = 1 i=−1 c m+i,m |j, m + i j, m + i| g , where the coefficients c m+i,m are given by , At the first step, the memory starts in the state |j, j g . By repeatedly applying Eq.(F1), we then obtain the memory state at every step. Explicitly, the memory state for the n-th usage is given by where p(n − 1, m, θ) is the probability distribution after n − 1 usages, which is given by we get the asymptotic expression p(n, m, θ) = 2j n(1 − cos θ) + 2j · n(1 − cos θ) n(1 − cos θ) + 2j Now, Equation (F3) gives us the memory state at the n-th iteration. The fidelity obtained by using this state is given by where F Hei (j, θ, m) is the average fidelity when the probe is in the state |j, m g , namely F Hei (j, θ, m) = dg dψ ψ|V † θ,g Tr Pj U θ |j, m g j, m| g ⊗ ψ U † θ V θ,g |ψ , The average over the input states can be easily computed using the relation with the entanglement fidelity, Equation (15) . Using Equation (35) for the gate U θ , we obtain the asymptotic expression One can see directly that in asymptotics, F (j, θ, m) is a arithmetic progression and p(n, m, θ) is a geometric progression. Inserting the above expressions into Eq. (F7) we obtain F Hei (j, θ, n) = 1 − 1 − cos θ 3j · n(1 − cos θ) + j j Comparing with the MO fidelity in Eq.(57), we obtain that the persistence of the quantum advantage tends to N (j, θ) = j/(1 − cos θ).
The exact dependence of the fidelity on n is shown in Figure 7 for different values of the spin and for rotation angle θ = π. Interestingly, the persistence of the quantum advantage is exactly equal to the asymptotic value j/2 for all the values of j shown in the figure.
We showed the explicit calculation of F (j, θ, m) and p(n − 1, m, θ) when the interaction time is fixed at every step. More general strategies where the interaction time is optimized at every step can be studied in the same way. In the large j limit, we find that such step-by-step optimization is not needed: the fidelity tends to the same value, no matter whether the interaction time is optimized at every step or once for all. As a result, the persistence of the quantum advantage is the same in both scenarios. with U θ as in Equation (36). Inserting the expression for the state ρ n,γ into the above equation, we obtain F Hei (j, θ, γ) = sinh γ sinh[(2j + 1)γ] m e 2γm F Hei (j, θ, m) , with F Hei (j, θ, m) defined as in Equation (F8). The asymptotic expression for F Hei (j, θ, m) was computed in Equation (F9). Inserting this expression in the above equation, we obtain Appendix H: Learning higher dimensional rotations for spin-k particle Following the structure of the optimal learning mechanism for spin 1/2, we choose the memory state to be |j, j g and we let the two spins undergo the Heisenberg interaction where K = (K x , K y , K z ) are the spin operators of the target spin.
Using the above strategy, we can explicitly compute the entanglement fidelity, given by where |Φ (k)+ = 1 2k+1 k m=−k |k, m ⊗ |k, m being the canonical maximally entangled state of two spin-k particles, R denotes a reference qubit, entangled with the target spin-k particle, and V (k) θ is a rotation of θ around the z axis in 2k + 1 representation.
By comparing with Eq. (H5), we again see that the error is exactly twice the error of the coherent quantum learning strategy.