Elementary proof of QAOA convergence

The quantum alternating operator ansatz (QAOA) and its predecessor, the quantum approximate optimization algorithm, are one of the most widely used quantum algorithms for solving combinatorial optimization problems. However, as there is yet no rigorous proof of convergence for the QAOA, we provide one in this paper. The proof involves retracing the connection between the quantum adiabatic algorithm and the QAOA, and naturally suggests a refined definition of the ‘phase separator’ and ‘mixer’ keywords.


I. INTRODUCTION
In the current era of gate-based noisy quantum computers, the class of variational quantum algorithms (VQAs) is at the center of research.First and foremost, the quantum approximate optimization algorithm [1] receives enormous scientific as well as industrial attention.Like many other VQAs, it is developed for the purpose of solving combinatorial optimization problems (COPs) (maximize f : {0, 1} n → R subject to some constraints) on quantum computers with the aid of classical optimizers.It is, to some extend, a discretized and gate-based version of the quantum adiabatic algorithm (QAA, [2]) which is itself a continuous-time algorithm.The QAA and the closely related quantum annealing [3] rely on slowly evolving a quantum system (resp.some external parameters) in order to transition a well-known initial state into some state representing an optimal solution.Due to their analog structure, they are not executable on gate-based architectures, but on quantum annealers (see [4] for an overview) which constitute the second large family of quantum computer architectures.
In its original formulation, the quantum approximate optimization algorithm is only suited for unconstrained problems.A common technique for enlarging its scope to constrained problems is softcoding the constraints.That is, the constraints enter the objective function as additional terms, penalizing infeasible inputs.However, for several instances, this approach was observed to produce unfavorable output distributions which suffer from poor optimization quality or feasibility violation (see, e.g., [5][6][7]).In order to improve the treatment of constrained problems Hadfield et al. extended the quantum approximate optimization algorithm to the quantum alternating operator ansatz (QAOA, [8]) which also allows for hardcoding the constraints.That is, the objective function is left unchanged and feasibility preservation is instead enforced strictly.
In a nutshell, a QAOA-circuit consists of parametrized phase separator gates U P and mixer gates U M .Both types of gates should preserve feasibility such that -in an ideal setting -feasible states are mapped to feasible states again.Classically and iteratively optimizing the circuit parameters then should yield a good approximation of an optimal solution.This heuristic argument goes through only if every feasible state can be reached: The QAOA-circuit is, given the right parameter values, able to (approximately) produce every feasible state.Typically, the reachability of feasible states only depends on the properties of the mixer U M .
A more or less rigorous proof why the quantum approximate optimization algorithm should converge for every (unconstrained) COP with only one optimal solution was already given in [1].This sketch of a proof, in turn, builds on the close connection to the QAA and the underlying principle of adiabatic evolution/quantum annealing (see [9] for mathematical treatment).However, neither is the proof carried out in great mathematical detail, nor does it attempt to be as general as possible.Moreover, since the QAOA comprises similar principles as the quantum approximate optimization algorithm, it stands to reason to extend this result, once suitably formalized, to the QAOA; a task that, surprisingly, has not yet been tackled.With this paper we address this issue and come up with refined definitions for the phase separator and mixer gates which make the connection to the quantum approximate optimization algorithm more visible.
First, we prove the convergence of the QAA with suitable initial Hamiltonian and initial state in Section III.The proof is built on the aforementioned proof sketch in [1].We extract the underlying principles and already obtain a precise definition for a mixer Hamiltonian.However, by invoking a version of the adiabatic theorem without gap condition, we obtain a more general result which does not require the considered optimization problem to have only one single optimal solution.
Second, we prove the convergence of the QAOA with suitable initial state in Section IV.For this, we generalize all the properties of the original mixer proposed in the quantum approximate optimization algorithm.We define our versions of simultaneous and sequential mixers which directly make use of the just generalized properties.The convergence proof is then built on the convergence of the QAA instance which admits the respective mixer Hamiltonian as initial Hamiltonian.The underlying idea is again due to Farhi et al., but suitably generalized for constrained problems and sequential mixers.

II. PRELIMINARIES 1. Combinatorial Optimization Problems
In the following, we restrict to maximization problems, as minimization tasks may be considered analogously.This choice of the optimization direction simply allows us to state the convergence proofs more compactly.A generic COP of size N is of the form where Z(N ) denotes the set of bit strings of length N , f : Z(N ) → R is the objective function, and S ⊆ Z(N ) is the set of feasible bit strings or the solution set.The problem is called unconstrained if S = Z(N ).Moreover, we denote the set of all solutions maximizing f by S max .

Problem Encoding on Quantum Computers
In order to treat a COP with the help of quantum computers, the problem first has to be translated into a quantummechanical language.The standard encoding procedure identifies each bit string z with a computational basis state |z of the N -qubit space H := C 2 N .The classical objective function f is further considered as an objective Hamiltonian C via In this setting, the (optimal) solution bit strings span the (optimal ) solution space The maximization task is now equivalent to finding a computational basis state in S max .By construction, S max is the eigenspace of C| S corresponding to its largest eigenvalue.In the following, we will slightly relax the quantum optimization task as we will consider any highest energy state of C| S an optimal solution.

Quantum Adiabatic Algorithm
In a nutshell, the continuous-time quantum adiabatic algorithm (QAA, [2]) tackles the eigenstate search via quasiadiabatic evolution of an initial state |ι with respect to a time-dependent Hamiltonian H(t) which interpolates between an initial Hamiltonian H I and the objective Hamiltonian C. In case of a maximization task, |ι should be a highest energy state of H I .The interpolating Hamiltonian is typically given by the convex combination [10] The evolution speed is controlled via a parameter T > 0: The actual time evolution is with respect to H(s/T ), s ∈ [0, T ].The intuition behind the QAA is that evolving a highest energy state of H(0) sufficiently slowly (i.e., T ≫ 1) yields a highest energy state of C if the energy levels stay separated.Mathematical rigor is granted by the adiabatic theorems (see Section III).

Quantum Approximate Optimization Algorithm
The quantum approximate optimization algorithm [1] can, in some sense, be seen as a discrete version of the QAA with fixed initial state and initial Hamiltonian Note that |+ is the non-degenerate highest energy state of B. B and C are incorporated into parametrized gates: Specifying a depth p ∈ N, the parametrized trial states are constructed via In an iterative process, the parameters are updated by a classical optimization rule in order to maximize the expectation value Measuring the final outcome | β opt , γ opt in the computational basis then yields a distribution of optimal solution approximations.

Quantum Alternating Operator Ansatz
Building on the ideas of the quantum approximate optimization algorithm, the quantum alternating operator ansatz (QAOA, [8]) extends its design to general constrained problems.Given a COP with objective Hamiltonian C and solution space S, the parametrized gate U B (β) is substituted with problem-specific 'mixer' gates.For simplicity, we will focus on the case where the same mixers are used in every iteration.Thereby, we can collect them again in a single mixer gate U M (β).It is demanded to fulfill two important properties: • Feasibility preservation: For all parameter values β ∈ R U M (β)(S) ⊆ S should hold.
• Full mixing of solutions: For all feasible computational basis states |z , |z ′ ∈ S, there should exist a power r ∈ N and a parameter value β ∈ R so that z|U r M (β)|z ′ = 0. Furthermore, the parametrized gate U C (γ) could be replaced by a more general 'phase separator' gate U P (γ) which resembles the classical objective function's behavior.In order to be more concrete, we will further focus on the case where U M (β) and U P (γ) are given by (products of) exponentials of Hamiltonians.
The correct definition of U M (β) follows naturally from the following convergence considerations and is given in Section IV.We define the phase separator already now: Definition 1.Given a COP with solution space S and optimal solution space S max , a Hamiltonian H is called a phase separator Hamiltonian iff it fulfills the following two conditions: (i) H is diagonal in the computational basis.
(ii) The eigenspace of H| S corresponding to its largest eigenvalue is S max .
The corresponding (parametrized ) phase separator is given by

III. CONVERGENCE PROOF FOR THE QAA
We first examine the convergence behavior of the QAA.Although originally stated for unconstrained problems, we can easily extend the idea to a COP with a non-trivial solution space S: The initial Hamiltonian H I should preserve feasibility, i.e., H(S) ⊆ S, and the initial state |ι should lie within S. In addition, we substitute the objective Hamiltonian C with a more general phase separator Hamiltonian H P which trivially preserves feasibility.Then, for every t ∈ [0, 1], the time evolution with respect to H lin(HI,HP) (t) applied to |ι will give again a feasible state.Thus, we effectively restrict ourselves to the subspace S ⊆ H.
The underlying concept of the QAA is captured by the adiabatic theorem.For our analysis, we use a more general version than Farhi et al. did in [2].
Theorem 2 (Adiabatic Theorem, [11]).Let {H(t) : 0 ≤ t ≤ 1} ⊆ L(H) be a family of self-adjoint operators such that H( • ) ∈ C 2 ([0, 1], L(H)).For T > 0, let ŨT be the solution of and set U T (t Theorem 2 essentially states that, in the adiabatic limit, starting within (a subspace of) the eigenspace of H(0) corresponding to the eigenvalue λ(0), one stays within the eigenspace of H(t) corresponding to the eigenvalue λ(t), 0 ≤ t ≤ 1, if one follows the time evolution generated by H, and the curve of spectral projections P can be C 2continued through all potential level crossings.In contrast, Farhi et al. used a version of the adiabatic theorem that prohibits any level crossing (see [12]).
A sketch of a convergence proof for the QAA was given in [1] as an intermediate step to argue the convergence of the quantum approximate optimization algorithm.Besides the adiabatic theorem, the proof is mainly based on the Perron-Frobenius Theorem.First recall the definition of irrecudibility in the context of matrices.Definition 3. A matrix A ∈ C n×n is called irreducible iff there are no proper A-invariant coordinate subspaces of C n .That is, the only coordinate subspaces left invariant by A are {0} and C n .
Theorem 4 (Perron-Frobenius).Let A ∈ C n×n be component-wisely non-negative and irreducible.Then A admits a non-degenerate largest eigenvalue.
The crucial observation is that the matrix representation of the initial Hamiltonian (6) in the computational basis fulfills both requirements of the Perron-Frobenius Theorem.As this will also play an essential role throughout our convergence proof, we use these very properties for giving a first definition of a mixer.The idea is now to apply the Perron-Frobenius Theorem to the linear interpolation H lin(B,C) (t) at every time 0 ≤ t < 1 to conclude the existence of an eigenvalue curve λ max that connects both the largest eigenvalues of H lin(B,C) | S (0) = B| S and H lin(B,C) | S (1) = C| S .For this, we need the following immediate result which can be proven quite easily.
where U T is the quasi-adiabatic evolution w.r.t. to the linear interpolation between B and C.
In the following proof, we directly identify all appearing operators L(S) with their matrix representation in the computational basis.
Following the above proof, one realizes that the eigenvalue curve λ max does not cross any other eigenvalue curve of H lin(B,C) | S except, possibly, at t = 1.In [1], even a level crossing at t = 1 is avoided by assuming that the COP only has one optimal solution, implying that λ max (1) is non-degenerate.However, by invoking a more general version of the adiabatic theorem, we were able to get rid of this assumption.

IV. CONVERGENCE PROOF FOR THE QAOA
We next examine the convergence behavior of the QAOA which contains the quantum approximate optimization algorithm as a special case.Its ingredients are basically the same as for our generalized version of the QAA.However, the decomposition of the mixer Hamiltonian into local Hamiltonians is extremely valuable from an application-oriented point of view and is also introduced by the QAOA.In the spirit of Definition 5, we propose the following adaptation of Hadfield et al.'s definition.Definition 8. Given a COP with solution space S, a family of Hamiltonians {B i } i∈I ⊂ L(H) is called a mixing family iff for every i ∈ I, B i (S) ⊆ S, B i | S is component-wise non-negative in the computational basis, and any coordinate subspace of S that is left invariant under every B i is already trivial.
That Definition 8 really is a decomposed version of Definition 5 can be argued as follows: Consider the matrix representation of each of the operators B i | S in the computational basis as adjacency matrix of a graph whose vertices are identified with feasible computational basis states.Starting from the graph resembled by B 1 , adding another operator B i corresponds to adding edges represented by non-zero entries of B i 's matrix representation.The actual weights (i.e., values of the entries) are not important, but the condition of component-wise non-negativity implies that no entries are cancelled during the summation, that is, the edge set of the graph G I with adjacency matrix The eigenvalue curve λ max stays separated from all the other eigenvalue curves for 0 ≤ t < 1.If the corresponding COP has exactly one optimal solution the separation extends to t = 1 (left plot).However, if the COP has multiple optimal solutions λ max intersects with at least one other eigenvalue curve at t = 1 (right plot).proper invariant coordinate subspaces is empty, hence really is the union of all the edge sets of the graphs G i with respective adjacency matrix B i | S , i ∈ I.The imposed condition of triviality of mutual invariant coordinate subspaces then is equivalent to the fact that G I is fully connected which, in turn, is equivalent to its adjacency matrix being irreducible.Thus, we have concluded Proposition 9. Given a COP with solution space S, a family of Hamiltonians {B i } i∈I ⊂ L(H) is a mixing family iff B I is a mixer Hamiltonian.
Utilizing our definition of a mixing family, we now introduce our version of 'simultaneous' and 'sequential' mixers.
Definition 10.Let H = {H i } i∈I ⊂ L(H) be a mixing family for a given COP.The corresponding (parametrized ) simultaneous mixer is defined as Specifying a permutation σ ∈ S(I), the corresponding (parametrized ) sequential mixer is defined as From their definition it immediately follows that both ( 16) and ( 17) fulfill the original QAOA demands: feasibility preservation and full mixing of solutions.However, due to our refined definition, we can now extend the sketch of a convergence proof in [1] to the general QAOA setting.The procedure is as follows: 1. discretize the quasi-adiabatic time evolution U T 2. decompose H lin(B,C) using a (multivariate) Lie product formula

exploit the convergence of the corresponding QAA instance
We start with a simple statement about the distance of products of operators with factors being close together.Lemma 11.For ε > 0 and m ∈ N, let {V j } m j=1 , {W j } m j=1 , ⊂ L(H) be families of unitary operators so that holds for all j ∈ [m].Then the following estimate is valid: Proof.Since (18) holds, one can find linear operators R j ∈ L(H) with R j ≤ 1 and V j = W j + εR j for each j ∈ [m], respectively.(19) clearly holds for m = 1.Therefore, it remains to show that if (19) holds for an m ∈ N, then it also holds for m + 1: Theorem 12 (Convergence of QAOA).Consider a COP with solution space S ⊆ H, optimal solution space S opt ⊆ S, phase separator Hamiltonian C, and mixing family {B i } i∈I .Let U P and U M be the corresponding phase separator and (simultaneous or sequential) mixer.Furthermore, let |ι ∈ S be a highest energy state of B I | S .Then, for every ε > 0, one can choose finitely many parameters β and γ such that where Proof.Let U T , T > 0, denote the quasi-adiabatic evolution w.r.t.H lin(BI ,C) .By Proposition 9, B I is a mixer Hamiltonian in the sense of Definition 5. Therefore, for any ε > 0, Theorem 7 implies the existence of a T > 0 so that where P 1 is the C 2 -continuation of the curve of spectral projections onto the highest energy eigenspaces of H lin(BI ,C) | S .W.l.o.g.assume that dim(S) > 1 as the statement would be trivial otherwise.Then, α := 1 − P 1 (1) L(S) > 0 since In the following, set W j := e −iH lin(B I ,C) (j T m )j T m and distinguish between the two possibilities to choose a mixer.Simultaneous mixer: The Lie product formula implies that for all j ∈ [m], there exist n j ∈ N such that for all ñ ≥ n j it holds that respectively.Taking n := max{n j : j ∈ [m]}, this estimate holds for all j ∈ [m] and (especially) ñ = n.Sequential mixer: W.l.o.g.choose the permutation σ = id I .The multivariate Lie product formula [15, Problem IX.8.5] imples that for all j ∈ [m], there exist n j ∈ N so that for all ñ ≥ n j it holds that respectively.In both cases, choose q = n m parameter values β = ( β 1 , . . ., β m ) and γ = ( γ 1 , . . ., γ m ) as Thus, by (22) and Lemma 11, it follows that In summary, it follows that Then, im(P 1 (1)) ⊆ S opt proves the assertion.

V. CONCLUSION AND OUTLOOK
In this paper we presented an elementary proof for the convergence of the QAOA.This proof can be regarded as a discretized and carefully extended version of the Adiabatic Theorem, building on the ideas of Farhi et al.Beside another core theorem (Perron-Frobenius), this extension is merely based on elementary matrix inequalities.Most importantly, our proof builds on fewer assumptions (multiple optimal solutions are allowed) and extends to nontrivial feasibility structures (S H).
Furthermore, the proof canonically gave rise to refined definitions of the QAOA-mixer and QAOA-phase separator concepts.Most notably, exactly the same notions arise when properly recreating classical feasibility symmetries within the framework of QAOA (see [16]).This strongly indicates that the definitions we gave in this paper optimally capture the overall principle the QAOA is based on.
We essentially showed that irreducibility and component-wise non-negativity of the mixer Hamiltonian B, restricted to the feasible subspace S, are sufficient criteria for the convergence of the QAA and the QAOA.Moreover, one can readily verify that irreducibility is also a necessary condition in the following sense: Given an arbitrary initial state |ι and the existence of a non-trivial B| S -invariant coordinate subspace, there always exists an objective Hamiltonians C such that the QAA and the QAOA will not be able to approximate any state in S max to arbitrary precision.On the other hand, the condition that B| S should be component-wise non-negative is not necessary.In our convergence proof, we imposed this condition in order to apply the Perron-Frobenius Theorem.However, there also exist more general versions of this theorem (see, e.g., [17]) which substitute this condition and irreducibility with the more general properties of preserving a given cone and permuting its faces, respectively.Unfortunately, the cones in questions are merely given by all the orthants in S since every coordinate subspace of S should be represented on their faces.This yields again, up to some additionally allowed matrix signatures, the same conditions.Therefore, we do not see much possibilities for relaxing the assumptions, made in Theorem 7 and Theorem 12.
An interesting and still remaining question is whether one can also characterize the rate of convergence of the QAA for the case of multiple optimal solutions.In this case, the spectral gap is necessarily vanishing for t → 1 (see Figure 1), but stays finite throughout the interval [0, 1).That is, even though a level crossing occurs, it only happens once and at a predictable time.There are some results on the rate of convergence in the Adiabatic Theorem which are valid for all kinds of (allowed) level crossings (see, e.g., [11,18]).Fine-tuning these results with respect to the particular situation of the QAA promises to be an insightful future project.

Definition 5 .
A Hamiltonian B ∈ L(H) is called a mixer for a COP with solution space S iff B(S) ⊆ S and B| S ∈ L(S) is component-wise non-negative and irreducible in the computational basis.

Corollary 6 .
Let A ∈ C n×n be diagonal and let B ∈ C n×n be irreducible.Then also A + B is irreducible.Theorem 7 (Convergence of QAA).Consider a COP with solution space S ⊆ H, optimal solution space S opt ⊆ S, and phase separator Hamiltonian C. If B ∈ L(H) is a mixer Hamiltonian in the sense of Definition 5 and |ι ∈ S is a highest energy state of B| S , then lim T →∞ U T (1) |ι ∈ S opt ,

P 1 ( 1 )
has rank one by continuity.Discretizing the quasi-adiabatic time evolution U T (1) yields the existence of an m ∈ N such that m j=1 e −iH lin(B I ,C)