How to enhance quantum generative adversarial learning of noisy information

Quantum Machine Learning is where nowadays machine learning meets quantum information science. In order to implement this new paradigm for novel quantum technologies, we still need a much deeper understanding of its underlying mechanisms, before proposing new algorithms to feasibly address real problems. In this context, quantum generative adversarial learning is a promising strategy to use quantum devices for quantum estimation or generative machine learning tasks. However, the convergence behaviours of its training process, which is crucial for its practical implementation on quantum processors, have not been investigated in detail yet. Indeed here we show how different training problems may occur during the optimization process, such as the emergence of limit cycles. The latter may remarkably extend the convergence time in the scenario of mixed quantum states playing a crucial role in the already available noisy intermediate scale quantum devices. Then, we propose new strategies to achieve a faster convergence in any operating regime. Our results pave the way for new experimental demonstrations of such hybrid classical-quantum protocols allowing to evaluate the potential advantages over their classical counterparts.


Introduction
Machine learning (ML) techniques, besides transforming the way we approach hugedata processing problems, are starting to permeate even non-computer science research and applied sectors, leading to new (big) data-driven strategies, including also several concrete applications in our everyday life as domotic systems, autonomous cars, face/voice recognition, and medical diagnostics. One of the most outstanding ML results is provided by the generative adversarial networks (GANs) [1], which are models exploiting game-theory theorems [2] to learn how to reproduce some given data distribution as close as possible. More specifically, two agents, named as the generator and the discriminator, compete against each other in a zero-sum game, i.e. they play in turns, each turn trying to improve their own strategy in order, respectively, to fool the discriminator and to correctly distinguish real data from generated ones. Under some reasonable assumptions ‡ the game has a unique (Nash) equilibrium point, where the generator is able to exactly reproduce the wanted (real) data distribution.
On the other side, in the last few decades quantum information science [3] has focused on how quantum systems store and process information, and how they can be exploited to implement more efficient protocols than classical ones. This field is currently leading to the first prototypes of quantum devices with some of them already reaching the commercial market, especially in the context of quantum communication and quantum sensing protocols. Over the last few years, quantum machine learning (QML) [4], combining ML with quantum information tools, has emerged as one of the most promising applications of near-term quantum devices. Nowadays quantum processors belong to the so-called NISQ (Noisy-Intermediate-Scale-Quantum) era [5]. Circuits running on such devices are characterized by limited size and depth, and the absence of exact error correction protocol makes them still unsuitable for generalpurpose computation. However, they can already be employed for variational algorithms lying at at the core of ML. Indeed, their resilience against noise, together with the assistance of classical algorithms performing the optimization of some variational parameters, leading also to dubbing such algorithms as hybrid quantum-classical [6,7,8], is what makes them feasible on NISQ hardware. Most efforts in QML research are hence devoted to exploit quantum resources as superposition and entanglement to achieve quantum (possibly exponential) speedups in classical ML tasks. In fact, once loaded on a quantum computer, classical data can be processed with linear operations, as those happening in a neural network, which are exponentially faster than what is possible on classical computers [9]. Clearly, working in a fully quantum landscape is interesting as well, particularly from the perspective of improving quantum simulations [10], control [11], metrology [12], and communication [13] strategies.
In this context, GANs can be successfully generalized to the quantum domain leading to the so-called Quantum Generative Adversarial Networks (QGANs) [14,15]. The aim of QGANs is to learn reproducing the state of a quantum system, usually a register of qubits. A way to achieve this goal, as well as implementing learning algorithms over quantum computers, is to leverage the representation power of parametrized quantum circuits (PQCs). Examples of learning algorithms realized via PQCs are the quantum approximation optimization algorithm [16], and the variational autoencoders [17] and eigensolvers [18]. A detailed review on PQCs and their features, as well as applications, can be found in Ref. [19]. Since PQCs are circuits composed of quantum gates controlled by real tunable parameters, they allow us to steer the output state at our will, by adapting the gate parameters to the measurement outcomes. As for classical neural networks, we can set up a gradient-based strategy to optimally update the parameters. Moreover, QGANs have been exploited to learn classical distributions ‡ The strategy spaces of the agents are compact and convex. of data [20,21] and to provide a new tool in learning pure states [15,22], whereas mixed states (i.e. noisy information) have been addressed only as ensembles of pure data [23]. However, the latter play a crucial role in the coming NISQ technologies since the environmental noise is unavoidable and usually partially destroys the quantum features as entanglement that do not have a classical counterpart and that are instead mainly responsible for the predicted quantum speedups.
For these reasons we strongly believe that, in order to more deeply understand the performance of QGANs on real hardware, it is remarkably relevant to investigate the scenario of learning mixed quantum states. This is the main focus of this work. The paper is organized as follows. In Sec. 2 we review the mathematical formalism for QGANs, hence showing why learning mixed states could be an issue. Then, Sec. 3 discusses how to implement a QGAN game on PQCs, where we find the emergence of limit-cycle like behaviours (around the equilibrium point) in single-qubit mixed states learning, slowing down the optimization process. Hence, we explore a so-called optimistic algorithm that allows to achieve convergence, i.e. destroying the limit cycles. After a discussion about convex optimization over PQCs (Sec. 4), conclusions and outlooks are drawn in Sec. 5.

Quantum adversarial game
In GANs the goal of the discriminator (D) is indeed to discriminate real (R) data from the fake ones generated by the generator (G), while the goal of the latter is to fool the discriminator as much as possible by generating fake data. Here both real and generated data are modeled as quantum states, respectively described by their density operators ρ R and ρ G . To generate mixed quantum states, one can create a generic pure state (living in a larger Hilbert space) as a quantum circuit by applying quantum gates to a given pure (ground) state, and then tracing out half of the qubit register. The same procedure can be exploited to generate the fake data, but in terms of a PQC where the gate parameters can be tuned. Besides, D applies another PQC to the real or fake state at hand, and entangles it to an additional (ancilla) qubit that later is measured in order to perform the discrimination -see Fig. 1.
As in the classical case, without any restriction on the operations performed by both agents, the game should end when G is able to perfectly reproduce the real data and, accordingly, D is unable to correctly discriminate fake data from real ones. Even in the quantum domain, this corresponds to the unique Nash equilibrium of the underlying game [14]. Let us point out that, before reaching the equilibrium point, both D and G try to iteratively update their strategy to win the game. The D action is modeled via a two-outcome positive operator-valued measure (POVM) Π D i whose outcome i ∈ {R, G} judges whether the state is real or fake, generated by G. Therefore, at each iteration, D has to solve a binary quantum state discrimination task. The error in such discrimination process is given by the conditional probability of judging real a generated state, i.e. p(R|G) = Tr[Π D R ρ G ], and by that of judging fake a real Figure 1: QGAN circuit structure for generic n-qubit mixed states. The R/G/D blocks are PQCs that are exploited to create real/generated data, and to implement the discrimination process, respectively. The discriminator has access to an ancilla qubit a, while the auxiliary qubits A are used by R/G to create mixed states. The × symbol over a wire means tracing the degrees of freedom associated to it.
Assuming equal a priori probabilities, and that G's current state is known by D, we may define the discrimination error as [p(R|G) + p(G|R)]/2. The discriminator strategy during their turn can be formalized as a minimization of the discrimination error that, without loss of generality, can be written as where we set Π D ≡ Π D R to simplify the notation. An analytic solution to the above optimization is provided by Helstrom's theorem [24,25], which states that the optimum POVM {Π D i } is formed by projectors onto the positive (Π D R ) and negative (Π D G ) subspaces of the operator ρ R − ρ G . On the other hand, the generator's strategy is to fool the discriminator as much as possible by reducing their ability to distinguish the real and generated states. This results in the following optimal strategy for G Generator : min This strategy has a formal analytic solution as ρ G = |π max π max |, where |π max is the eigenvector of Π D with maximum eigenvalue. If D is always playing with the optimal Helstrom measurement, then ρ G is a projection onto an eigenstate of ρ R − ρ G with positive eigenvalue. However, it is simple to show that D and G cannot and should not solve the optimization problems (1) and (2) at each iteration. Firstly, they cannot find the optimal solution without perfectly knowing, at each iteration, ρ R and the other player's strategy, which contradicts the original scope of the game. Secondly, they should not perform such difficult optimization at each round: if D and G iteratively play using the solution of Eqs. (1)-(2), then they never reach the equilibrium for mixed states ρ R . This is summarized by the following theorem, whose proof is straightforward: as discussed above, the solution of (2) is always a pure state ρ G = |π max π max |, and as such ρ G = ρ R in general.
Theorem 1 For mixed states ρ R , the adversarial game fails to converge when D and G iteratively use the strategies (1)-(2).
To achieve convergence, each player must slightly update their strategy at each operation [14], rather than using Eqs. (1)- (2). Moreover, in the language of Nash equilibria, each player is unaware of the adversary's move, and the best they can do is to assume that the opponent is playing with the optimal strategy and fight against it. Setting the score as the bilinear function we see that G increases its score whenever D loses the same amount, making this a zerosum game. Both the states ρ G and the measurement operators Π D form a convex set in their respective spaces. § Therefore, the Nash equilibrium is the result of the minimax where the first equality follows from the Helstrom theorem and the definition of the 1-norm [3]. As a result, the Nash equilibrium is when the generator is able to perfectly reproduce ρ R , as originally shown in [14]. However, how to achieve in practice this equilibrium configuration is far from being trivial.
Inspired by the success of gradient-based training of generative adversarial networks [26], the most natural approach to play the quantum adversarial game is to use a suitable parametrizations of ρ G and Π D , see e.g. Fig. 1, and then iteratively update these parameters, e.g. via gradient descent [14]. Using these methods, convergence with pure target states ρ R = |ψ R ψ R | has been obtained in several scenarios [15,22], while for mixed states convergence was observed with a few extra steps, e.g. by setting ρ G as a random superposition of pure states [23]. In spite of these successful examples, gradientbased training may be problematic, as we numerically investigate in the next section. This is due to the bilinear nature of the score function (3). Indeed it has been shown that the adversarial optimization of bilinear score functions may display limit cycles when trained with standard gradient descent rules, or even a "chaotic" behaviour, see e.g. [27,28] and references therein.
More precisely, let us consider the simplest case where ρ R is a single-qubit mixed state. The most natural parametrization of ρ R is via the Bloch vector r, namely ρ R = [1 + r · σ]/2 where σ is the vector of Pauli matrices and | r| ≤ 1. Similarly we parametrize ρ G with the Bloch vector g and The above score function has been extensively investigated in Refs. [29,30] where the emergence of limit cycles in classical GANs training was shown. Nonetheless, Refs. [29,30] focus on bilinear problems with linear constraints, while Bloch vectors satisfy a non-linear constraint since they live in the Bloch ball. This difference may   (4) is optimized via gradient descent/ascent, as described in the main text. Here the learning rate of both agents is η = 0.1, and one training turn consists of 5 discriminator's steps followed by a single generator's one.
be the reason behind the good performance of quantum adversarial learning for pure states [15,22], as pure states lie at the boundary of the Bloch sphere where such nonlinear constraints are important. However, when dealing with the optimization of highly mixed states, which lie well inside the Bloch ball, the presence of the boundary may not affect the optimization, and limit cycles may emerge. We summarise this aspect in the following theorem, whose proof, adapted from [29], can be found in Appendix A: Theorem 2 (informal statement): Gradient descent/ascent applied to the problem min ρ G max Π D S(Π D , ρ G ) diverges for states far from the surface of the Bloch sphere.
We bring evidence to the previous statement by running a QGAN game in a single qubit scenario where both D and G are parametrized via their Bloch vectors. We employ gradient descent/ascent (GDA) -namely gradient descent for g and gradient ascent for d -on the score function (4) with an added penalty term to enforce the constraints on the Bloch vectors, i.e. g ≤ 1 and d ≤ d 0 ≤ 2 − d . Results are shown in figure 2, where the limit cycle behaviour in the trajectory of g is evident.
An algorithm dubbed "optimistic mirror descent" (OMD) has been proposed in Ref. [29] to escape from the limit cycles that emerge in the minimax optimization of bilinear cost functions (4). In the next section we show that, although perfect limit cycles may not exist for more complex parametrizations of ρ G and Π D , a simple gradient descent/ascent update may produce a "chaotic" behaviour, where convergence is not observed. We find instead that convergence is obtained via OMD.

Training with parametric quantum circuits
Motivated by the capabilities of current noisy intermediate-scale quantum hardwares [5], common PQCs are based on single qubit gates controlled by tunable real parameters, e.g. qubit rotations around a fixed axis with variable angle, and non-parametric twoqubit gates, typically CNOTs. Sequences of single and two-qubit gates are then stacked in a layered fashion. When sketched down, it is easy to see a certain resemblance with classical neural networks, with the parameters playing the role of the weights and biases of the latter. Indeed, it turns out that, just as a neural network can represent any function given the proper depth and structure, an accurately built PQC can approximate any unitary mapping over the input quantum register. Indeed, CNOT gates and singlequbit rotations are universal for quantum computation, i.e. they can be composed to simulate any unitary evolution to the desired accuracy [31]. PQCs can be designed to comply with nowadays NISQ hardware, by adapting the two-qubit (entangling) gates to the connectivity of the experimental realization of the quantum processor, and by limiting the circuit's depth to fight decoherence. Since learning tasks ultimately boil down to the problem of minimizing a certain loss/score function of the model parameters, we can employ PQCs as quantum learning models and tune them through a feedback loop with a classical optimizer. This is the standard framework of hybrid quantumclassical variational approaches [7], and we will use this scheme in our analysis.
Here, we will ultimately be concerned with the problem of learning a mixed state via a QGAN game. Since every mixed state can be written as a pure state in a larger Hilbert space (Fig. 1), we build the generator via the following PQC with classical parameters θ G where both A and ρ G contain n qubits, and U (θ G ) is the unitary operator corresponding to the PQC. Similarly, since every measurement operator can be written as a projective measurement onto a larger Hilbert space (Fig. 1), we define the discriminator's POVM with classical parameters θ D as where a is a single auxiliary qubit. This measurement can be interpreted as follows: first apply a PQC U (θ D ) entangling the system with an auxiliary qubit a, then measure the qubit a in the computational basis. If the outcome 0 is detected, then we guess that the state is the real state, otherwise (outcome 1) the state is judged as fake.

Circuits Ansatz
Following Refs. [22,32], G and D circuits are built by repeating a two-qubit block which implements a generic unitary U ∈ SU(4). As shown in Fig. 3, this building block is composed of 15 single-qubit rotations and 3 CNOT gates. One block allows to generate every two-qubit pure state. For larger registers, we apply this block to each pair of consecutive qubits, thus obtaining a layer. Layers are then concatenated in a staggered pattern.
The Controlled-NOT (CNOT) operation applies Pauli's X to the target qubit if the control one is found in |1 and does nothing otherwise. Throughout this manuscript, the gradients needed to train the QGAN are analytically computed through the parameter shift rule [33,34,35]. Since the parametrized part of our circuit consists of single-qubit rotations, the gradient of a function f ( θ) = O( θ) is obtained as where e i is the unit vector in the i-th direction. We first focus on learning pure states, for which it is known that QGANs converge. Indeed, Fig. 4 does further confirm it in terms of the relevant figures of merit, as the score function value, the probability p(R|G) of D labelling fake data as true, the trace distance d = 1 2 ρ G − ρ R 1 between the generated state and the target one, and their fidelity F = Tr √ ρ G ρ R √ ρ G 2 . In our simulations the target real data are random pure states of n qubits, with n = 1, 2, 3, obtained via a PQC with the same structure of the one used for G, but with random fixed parameters. Here, training is carried out via alternately updating D and G via a single gradient descent/ascent step. We have tried different optimizers, always observing a qualitatevely similar convergence behaviour ¶.

Emergence of limit cycles
We now generalize the above analysis to the more interesting case of learning mixed states. They have been so far addressed as an ensemble of orthogonal pure states [23], while here they are created by tracing out half of the qubits register. Our results are summarized in Fig. 5, where we show the learning process for mixed states of the form ρ R = I 2 + a 2 √ 2 (σ x + σ y ) with purity P = tr ρ 2 = 1+a 2 2 , ranging from the completely mixed one P = 1/2 to P = 3/4. The selected optimizers are the previously defined GDA and Adam, i.e. one of the best performing optimization algorithm for ML [36]. As we can see in Fig. 5, none of the chosen optimizers allows to reach convergence, even by changing the values of the optimization hyperparameters. However, comparing these results with the ones in Fig. 2, we can see that for PQCs the limit-cycle behaviour disappears because the score function is no longer bilinear. Let us point out that in Fig.    ¶ Hyperparameters such as the learning rates might be further fine-tuned in order to improve   None of them display convergence, although the latter has a less pronounced oscillating behaviour. In all these cases the initial configurations of G and D correspond to the same random parameters. These trajectories have been obtained by running the QGAN for 250 total turns, where each turn comprises 10 optimization steps for D followed by 1 for G. Lastly, the learning rate of GDA is set to 0.1, whereas that of Adam is 0.05. 5 we have an overparametrization because D and G use 15 parameters each, whereas a convergence speed, but this is beyond the aim of this manuscript. general single-qubit POVM has 4 real degrees of freedom only, and a single qubit mixed state has 3. For this reason we devise two tailored circuits in order to achieve a minimal parametrization for both D and G (see figure 6), as in the following: and with cos(θ) ≡ c(θ) and sin(θ) ≡ s(θ). Even with these tailored circuits, convergence is not achieved as numerically reported in Fig. 7. Moreover, by using simple gradient descent method, we still observe limit cycles (not shown in figure).

Training with optimism
In standard GANs competing players are usually unaware of the opponent's strategy. However, each player may try to guess the opponent's move in order to improve  its strategy. This is the building concept of the Optimistic Mirror Descent (OMD) optimization algorithm [37], which was shown to fix convergence issues, namely limit cycles, of classical GANs with bilinear score functions -see Ref. [29]. However, there it has been used to enhance convergence also in the case of non-bilinear score functions. Motivated by these results, we now show that OMD works successfully also for our QGANs -see Fig. 8. More specifically, the OMD-based update rule for the score function of Eq.
where η D/G are the learning rates for D and G. Notice that this rule corresponds to the scenario where D is optimized first.

Convex optimization
Since PQCs are not the only way of modelling quantum states, here we present a nonparametric method, hereafter dubbed ConvexQGAN, to solve the minimax problem min ρ G max Π D S(ρ G , Π D ) using the formalism of convex optimization presented in Ref. [38]. Since {ρ G }, {Π D } are both convex sets and S(ρ G , Π D ) is bilinear, when we iteratively fix either ρ G or Π D we always obtain a convex function over a convex set. Therefore, by adapting the Frank-Wolfe algorithm from Ref. [38], we may write the following update rules at the kth step where α k and β k are decaying learning rates, e.g. typically α k = β k = 2 k+2 , the state |G k is the eigenvector with smallest eigenvalue of . Although the update rules directly follow from the Frank-Wolfe algorithm, we highlight here an : ConvexQGAN: Learning different random mixed states using either the Frank-Wolfe iteration (12)-(13), the imaginary time evolution (14) or the quantum circuit update from (17). The cases with α k = 1 in (13) correspond to Helstrom measurements. For each algorithm, 20 lines are plotted for different random initial states. All simulations are for 5-qubit states.
interesting result from the physics points of view. The states |D k are elements of Helstrom measurement to optimally distinguish the real state ρ R from the current fake state ρ k G . As such, it is tempting to consider a different strategy with α k = 1 at each iteration step. The downside of the latter approach is that the measurement operator highly fluctuates between different steps. However, for α k = 1 we get |D k = |G k so Eq. (13) gets a clear operational meaning. The generator's state is iteratively updated with one of the states entering in the Helstrom optimal measurement. This reminds us the original optimization from Eq. (2), but without its convergence issues for mixed states. Indeed, the update rule of Eq. (13) allows the generation of mixed states, unlike in Eq. (2).
Finally we propose a physics inspired alternative by observing that, for small β k , Eq. (13) can be interpreted as an imaginary time evolution where after the imaginary evolution we need to normalize the state such as Tr[ρ k+1 G ] = 1. The gradient-based Frank-Wolfe algorithm (12)-(13) and the imaginary time iteration (14) are numerically studied in Fig. 9 for random 5-qubit states with full-rank. For the imaginary time iteration we use α k = 1, so |G k = |D k in (14). We observe in Fig. 9 that the imaginary time evolution, together with the optimal Helstrom measurement at each step, significantly outperforms the Frank-Wolfe iteration, both in terms of speed and accuracy.
ConvexQGAN methods show fast convergence towards the equilibrium configura-tion, but they require eigendecompositions of the state at each step. This operation is simple for classical computers as long as the Hilbert space is sufficiently small. To extend this operation to larger systems, we now discuss how to write an update like in Eq. (14), but using a quantum circuit. For this purpose, we use the following quantum map, which is at the heart of the quantum density matrix exponentiation algorithm [39], where SWAP is the swap operator. Applying this map twice with different signs, one has Therefore, setting I t and t k such that cos 4 t k = 1−β k , namely β k ≈ 2t 2 k , we get where H k was defined in (14). The latter update rule is akin to a mixture of (13) and (14), but it has the advantage that it can be explicitly evaluated as a quantum circuit applied to ρ k and two copies of the state |G k . As shown in Fig. 9, the performance obtained with the update rule (17) is similar to that of imaginary time evolution. Therefore, if the states |G k can be efficiently prepared, for instance via strategies like the Helstrom classifier circuit from [40], then the above update rule can be used for QGAN training in a quantum computer.

Conclusions
In this work we have studied the convergence of quantum generative adversarial learning for mixed states. We have first showed that "learning" via simple gradient descent/ascent updates, or even via more advanced methods such as the Adam optimizer, may be problematic when the target state is mixed. We have attributed such convergence issues to the bilinear nature of QGAN's score function. Indeed, it is known from classical GAN literature that the optimization of such score functions leads to limit cycles, where the generator gets stuck into, cycling around the target solution without ever reaching it. We have observed that states obtained via PQCs, such as those commonly implemented in nowadays available NISQ devices, are less affected by the emergence of limit cycles, but may nonetheless display a "chaotic" behaviour during training, without achieving convergence. We have then proposed new algorithms for reliable training of QGANs, which always achieve convergence in our numerical simulations. The first algorithm, suitable for PQCs, is based on the adaptation of optimistic mirror descent, i.e. a gradientbased technique allowing provable convergence with bilinear score functions. The second algorithm is based on convex optimization techniques, and is especially suited for non-parametric states that are iteratively updated via a suitably designed, yet nonparametric, quantum circuit.
Thanks to our theoretical and numerical analysis, we believe that the proposed algorithms should work better than previously used techniques for QGAN training, especially when highly mixed states are involved. Having good training heuristics for learning mixed states will help in leveraging their higher representation power, as well as in providing us with a way to study noisy quantum maps. Indeed, a next necessary step for the classification of QGANs capabilities is the analysis of their performance against noise. This path has been paved in [41], where it was shown that adversarial schemes share the noise robustness of other known hybrid quantum-classical variational algorithms [42,43,44,45]. Lastly, from the physics point of view, since QGANs perform an implicit state tomography, we believe that, by further endowing our scheme with the ability to process entangled copies of the target state, performance will be enhanced.
It is an open question whether an adversarial strategy may take over some current metrology scheme [12], by providing faster and more efficient strategies for sensing and system certification. Our results shed new light on how hybrid classical-quantum QML protocols might be exploited in already available experimental platforms with potential promising applications in quantum computing and noise sensing. and accordingly ∆ t+1 = (1 + η 2 )∆ t , (A. 5) namely the distance from the equilibrium point increases at each iteration. We point out that the above proof is valid only when we neglect the physical constraints on the Bloch vectors. The latter are however important for pure states or for states near the surface of the Bloch sphere.