MAX 2-SAT with up to 108 qubits

We experimentally study the performance of a programmable quantum annealing processor, the D-Wave One (DW1) with up to 108 qubits, on maximum satisfiability problem with 2 variables per clause (MAX 2-SAT) problems. We consider ensembles of random problems characterized by a fixed clause density, an order parameter which we tune through its critical value in our experiments. We demonstrate that the DW1 is sensitive to the critical value of the clause density. The DW1 results are verified and compared with akmaxsat, an exact, state-of-the-art algorithm. We study the relative performance of the two solvers and how they correlate in terms of problem hardness. We find that the DW1 performance scales more favorably with problem size and that problem hardness correlation is essentially non-existent. We discuss the relevance and limitations of such a comparison.


I. INTRODUCTION
Adiabatic quantum computation (AQC) is a model of solving computational problems, in particular hard optimization problems, by evolving a closed system in the ground state manifold of an adiabatic Hamiltonian H(t) with t ∈ [0, t f ] [1,2]. The ground state of the beginning Hamiltonian H B = H ad (0) is assumed to be easily prepared, while the ground state of the problem Hamiltonian H P = H ad (t f ), represents the solution to the computational problem. AQC has been proven to be polynomially equivalent to standard, closedsystem, circuit model QC [3][4][5][6][7], but so far it is unclear whether this equivalence extends to the open system, nonzero temperature setting. There is some theoretical evidence of inherent robustness of open system AQC [8][9][10][11][12][13] and scalability using currently available technology. A case in point are the D-Wave processors, comprised of superconducting rf SQUID flux qubits [14]. Recent experimental evidence [15][16][17][18][19] suggests that the first commercial generation D-Wave One (DW1) "Rainier" processor (with up to 128 qubits) implements physical quantum annealing (QA) [20], a non-zero temperature, non-universal form of AQC, whose algorithmic performance has been extensively discussed in the literature [21][22][23][24][25][26][27][28]. Quantum annealing can also be understood as the quantum-mechanical version of the simulated annealing (SA) [29] algorithm for optimization problems. While SA employs the slow annealing of (classical) thermal fluctuations to converge on the ground state manifold, QA additionally uses quantum fluctuations. There is extensive numerical [23][24][25] as well as analytical [28] evidence which shows that QA can be more efficient than SA for the problem of finding ground states of classical Ising-type Hamiltonians.
In this work we experimentally study the performance of physical QA, using the DW1 processor, on MAX 2-SAT optimization problems (maximum satisfiability problem with two variables per clause) [30]. We examine both the scaling with problem size and the classical phase transition in problem hardness as a function of the clause density, i.e., the ratio of the number of clauses to variables. The clause density is related to computational complexity, is associated with rigorous bounds, and is a natural order parameter for random MAX 2-SAT, as the problem exhibits a "hardness" phase transition at a critical value α c = 1 [31,32]. We present evidence for this transition in our DW1 experiments. Thus the clause density serves as a tunable hardness parameter for analyzing performance that is specific to the MAX 2-SAT problem.
One might hope to be able to detect a quantum speedup by comparing physical QA to highly optimized classical solvers. While recent work [33] attempted to show that the latest generation of D-Wave processors (the D-Wave Two "Vesuvius" processor, with up to 512 qubits) could outperform the best classical solvers on random instances of MAX 2-SAT [34], concurrent results already demonstrated a classical stochastic solver outperforming the DW1 processor for an ensemble of random Ising spin glass problems with a native embedding on the Chimera graph [18] (see Fig. 1). Nevertheless, the competitive nature of the results along with the mounting evidence of quantum phenomena [15][16][17][18][19] suggests the intriguing but currently unproven possibility that the D-Wave quantum annealing architecture may some day be capable of outperforming any classical solver for some ensembles of problems, though it seems inevitable that some form of error correction will eventually be required [35][36][37][38][39]. Our study attempts to shed light on this possibility by studying a random ensemble of MAX 2-SAT problems characterized by a given clause density α. We verify the empirical solutions of the MAX 2- The D-Wave One Rainer DW1 processor consists of 4 × 4 unit cells of eight qubits (circles), connected by programmable inductive couplers (lines). The 108 green (grey) circles denote functional (inactive) qubits. Most qubits connect to six other qubits. In the ideal case, where all qubits are functional and all couplers are present (as in the central four unit cells), one obtains the non-planar "Chimera" connectivity graph of the DW1 processor [41,42].
SAT problem obtained by the DW1 using akmaxsat [40], a state-of-the-art, exact branch and bound algorithm, which we also use as a performance benchmark. We find that the DW1 and akmaxsat exhibit not only distinct scaling, but also very different sensitivities to problem hardness. In fact, we show that over the random ensemble that is characterized by a fixed clause density and is compatible with the Chimera graph, the DW1 has a scaling with problem size that is better than akmaxsat's, and there is no correlation between the two solvers in terms of problem hardness.
However, we hasten to point out that our comparison is not unproblematic. The same work that most definitively established that the DW1 performs quantum annealing [18] also found that simulated annealing [29] was significantly faster than the DW1 on its ensemble of random Ising spin glass instances. While our ensemble is different, this does suggest that stochastic classical algorithms such as simulated annealing, rather than exact and deterministic ones such as akmaxsat, are the better benchmark. Moreover, Ref. [18] also found that other exact, deterministic classical solvers scale better than akmaxsat for its ensemble of random spin glass problems [43]. Our study does therefore certainly not settle the question of a quantum speedup for physical QA, but should rather be seen as a first indication that MAX 2-SAT is an interesting candidate for such a speedup, when perceived through the lens of fixed clause density ensembles. Follow-up studies using the 512-qubit D-Wave Two will shed more light on the scaling question.
The structure of the paper is as follows: in Section II we provide pertinent background on the MAX 2-SAT optimization problem and its phase transition as a function of the clause density. In Section III we briefly review the DW1, describe the procedure for mapping MAX 2-SAT instances into Ising problems, and define the restricted ensemble of DW1-compatible problems. Section IV presents our results: we compare the MAX 2-SAT success probabilities and time to solution using the DW1 processor and akmaxsat, discuss the evidence for a transition at α c , and study the problem hardness correlation between the DW1 and akmaxsat. We present our conclusions and discuss scope for future work in Section V. Various supplementary details are presented in the Appendices.

II. MAX 2-SAT BACKGROUND
In this section we briefly review pertinent theoretical background concerning the random MAX 2-SAT problem and its solution methods. We focus on random ensembles characterized by a fixed clause density.

A. 2-SAT and MAX 2-SAT
Many multivariate problems of practical interest involve determining values of variables {x i } N i=1 , collectively called a configuration x, that extremise the value of some objective. Satisfiability (SAT) problems are one such class of problems defined in terms of N Boolean variables and a set of M constraints between them where each constraint takes the special form of a clause. An example of a problem instance of a 2-SAT problem, written in Conjunctive Normal Form (CNF), and also called a formula, is: which is the logical AND of M clauses, where each clause itself is a logical OR of exactly two literals, defined as a variable or its negation. A clause that evaluates to TRUE (FALSE) is called SAT (UNSAT). The 2-SAT problem is a decision problem of determining whether there exists a collection of Boolean values for x such that Eq. (1) evaluates to TRUE, in which case the formula is said to be satisfiable. A related optimization problem known as MAX 2-SAT is to find the variable assignment which maximizes the number of satisfied clauses. While 2-SAT is in the complexity class P, i.e., it admits a polynomial (in fact linear [44]) time solution, MAX 2-SAT, like 3-SAT, is NP-complete [45].

B. Clause density and phase transition
In this work we are interested in random MAX 2-SAT, where 2-SAT problem instances are generated by choosing uniformly at random, with fixed clause density, from among all possible clauses, without clause repetition (the same clause may not appear twice). The clause density is α = M N. (2) As the number of clauses grows at a fixed number of variables, it becomes harder to satisfy all clauses. Indeed, the probability of satisfiability versus the clause density α = M N exhibits a phase transition at a critical value α c = 1 in the thermodynamic limit (N → ∞), whose finite-size scaling has also been studied [46,47]. Thus, the clause density is an order parameter. An intuitive explanation for the specific value of α c is that for each clause only one of the two variables needs to be TRUE. Therefore when there are N variables and N clauses, one essentially uses up one variable to satisfy each clause. The phase structure of MAX 2-SAT has also been extensively studied. Coppersmith et al. proved that MAX 2-SAT exhibits a phase transition at the same critical clause density as 2-SAT [31]. Namely, in the large N limit, to the left of the critical clause density the maximum fraction of clauses that are satisfiable is almost always 1, while to the right a fraction of all clauses are unsatisfiable. In the large clause density limit, the fraction of clauses that are satisfiable is almost surely the same as that expected from a random assignment, 3 4. Near the phase transition, we have the most uncertainty about the correct fraction of satisfied clauses. For k-SAT with k > 2, this type of uncertainty near the phase transition has been linked to the appearance of an easy-hard-easy pattern of computational complexity for backtracking solvers [48]. The relevance of the phase structure for the difficulty of the decision version of MAX 2-SAT has been empirically confirmed by Shen and Zhag [49]. The finite-size scaling window of the MAX 2-SAT phase transition has a clause density width of Θ(N −1 3 ) [31].
These and other results [30] suggest that it is natural to explore random MAX 2-SAT ensembles characterized by a fixed clause density, and this is our focus here.

C. Polynomial time approximation scheme
Many NP-complete problems can be approximately solved to arbitrary precision in polynomial time. A ρ-polynomial time approximation scheme (ρ-PTAS) for MAX 2-SAT is an algorithm that provides an assignment of variables that provably satisfies a number of clauses within at least a fraction ρ of the maximum number of clauses that can be satisfied for any formula. Goemans and Williamson [50] demonstrated a ρ-PTAS for MAX 2-SAT with ρ ≈ 0.87, and an improved version of their result achieves ρ ≈ 0.94 [51]. On the other hand, it has also been shown that no ρ-PTAS exists for ρ > 21 22 ≈ 0.95 unless P=NP [52]. Thus, in the worst case, it is not only difficult to find an assignment that satisfies the maximum number of clauses, it is also difficult to find an assignment that comes close. We shall return to the ρ-PTAS issue from an experimental perspective in Section V.

D. Optimized classical numerical solvers
Optimized exact classical MAX 2-SAT solvers have been extensively studied and regularly compete in an annual competition [53]. Here by "exact" we mean that the solver is guaranteed to eventually return a correct answer. The basic idea behind the most successful exact solvers is to combine a branch and bound algorithm that searches the (exponentially large) tree of possible assignments [54] with heuristics to improve performance. Improvements come about in two ways. First, branches of the search space are avoided by intelligently upper bounding the maximum number of clauses that can be satisfied in that branch. Second, heuristics are used to simplify a formula, reducing the number of clauses or variables. In this work we benchmark the DW1 processor against a recent MAX SAT competition winner, akmaxsat [40], that incorporates all of these techniques. We motivate this choice in Section IV.

III. ENSEMBLES OF 2-SAT PROBLEMS AND THEIR RESTRICTION TO THE DW1 PROCESSOR
We focus here on the average complexity for an ensemble of MAX 2-SAT problems characterized by a fixed clause density α. The behavior of algorithms with respect to an ensemble may be taken to signify the typical behavior when given a specific problem instance. We note that it is not known whether this typical behavior implies anything about the worst case complexity, i.e., MAX 2-SAT is not known to be random selfreducible, unlike certain NP problems [55,56].
A. Quantum annealing using the DW1 The DW1 implements the quantum annealing Hamiltonian: where the "annealing schedules" A(t) and B(t) (shown in Appendix A) are, respectively, monotonically decreasing and increasing functions of time, satisfying A(0) ≫ max(k B T, B(0)) and B(t f ) ≫ A(t f ). The beginning and problem Hamiltonians implemented on the DW1 correspond to a transverse-field, non-planar Ising model, i.e., where σ x(z) j represent the spin-1/2 Pauli matrices for the jth qubit. Thus to solve MAX 2-SAT problems using the DW1 we map these problems to the Ising model. In the DW1 the N rf SQUID flux qubits occupy the vertices V of the so-called "Chimera" graph (see Fig. 1), with maximum degree 6, and are coupled inductively along the edges E of this graph. The local fields h j and the couplers J ij are programmable and once chosen they define a "problem instance". Each "annealing run" corresponds to evolving H(t), with a preprogrammed and fixed set of local fields and couplers, from t = 0 to a predetermined annealing time t f , followed by a projective measurement of all qubits in the computational basis, i.e., the eigenbasis of the Ising Hamiltonian H P . Each such measurement results in a spin configuration {s 1 , . . . , s N }, where s j = ±1 is the eigenvalues of σ z j . By repeating these annealing runs many times one builds up statistics of spin configurations for a given problem instance. The processor can then be reprogrammed to generate statistics for a new problem instance.

B. Mapping MAX 2-SAT to equivalent Ising problems
In order to solve instances of MAX 2-SAT on the DW1, we must construct the problem Hamiltonian H P of Eq. (4b), such that the ground state configuration encodes the satisfying assignment for the problem instance. Following the prescription of [1] for the conversion of SAT problems to finding the ground state(s) of quantum Hamiltonians, as a first step we transform from Boolean to binary variables, letting TRUE=0 and FALSE=1, so that the truth table of the OR function becomes the multiplication of the binary variables. We next identify the binary variables {x j } N j=1 of a 2-SAT formula with the ±1 eigenstates of the Pauli spin operator σ z j acting on qubit j, i.e., We also define variables v k j ∈ {−1, 0, 1}, where the indices j = {1, 2, ..., N } and k = {1, 2, ..., M } label the variables and clauses respectively, with v k j = −1 (+1) if x j appears negated (unnegated) in the kth clause and v k j = 0 for all clauses that x j does not appear in. Each two-variable clause, Ω k , k ∈ {1, 2, . . . , M }, in an arbitrary 2-SAT formula F = Ω 1 ∧ Ω 2 ∧ ... ∧ Ω M is then translated into a corresponding 2-local term in the problem Hamiltonian of the form It is easy to check that in this manner if {x j1 , x j2 } ∈ {0, 1} 2 violate the clause then H Ω k is associated with an energy penalty of 1, and zero otherwise. Rather than taking the logical AND of all clauses as in the original 2-SAT problem, the problem Hamiltonian is now constructed as i.e., the sum of the energies of all M clauses contained in the 2-SAT instance F . This means that the ground state of H P corresponds to the bit assignment that violates the minimal number of clauses, i.e., the ground state is the solution to the MAX 2-SAT problem for the problem instance F . A generic computational basis state of the system can be written as ψ⟩ = In the case that {x * j } is a satisfying assignment for the formula F we have the correspondence where ψ * ⟩ = x * 1 x * 2 . . . x * N ⟩, while all non-satisfying assignments correspond to computational states with positive energy. In the case of MAX 2-SAT the ground state might have a positive energy E min > 0 and the question becomes to determine the assignment ψ⟩ such that H P ψ⟩ = E min ψ⟩.
Written out in detail, Eq. (7) becomes and upon equating with the problem Hamiltonian in Eq. (4b), after rescaling by a factor of 4 and dropping the constant term, we obtain the local fields h j and the couplings J ij in terms of the parameters of the given MAX 2-SAT instance: where i ∈ {1, 2} and the indices j 1 , j 2 are the qubit indices on the Chimera graph.

C. Restricted ensemble of DW1 processor-compatible 2-SAT problems
The DW1 process-compatible problem instances must satisfy a number of constraints, namely N ≤ 108 and the DW1 Chimera graph connectivity. To account for these constraints we generated restricted ensembles E DW (N, α) with 13 different numbers of variables and 20 different clause density values, as follows: • We define DW1 processor-compatible problem instances as those instances whose clauses are formed by two literals x j1 , x j2 which correspond to qubits j 1 , j 2 on the Chimera graph G = (E, V ) that are active as well as coupled, i.e., {j 1 , j 2 } ∈ V and (j 1 , j 2 ) ∈ E. Recall that not all qubits are active (see Fig. 1).
• When α < 1 2 there are more variables than can fit into the clauses. For α < 1 2, any variable that did not appear in a clause was not used in H P .
• At each value of N and α we generated an ensemble, E DW (N, α), of 500 DW1 processor-compatible random 2-SAT problem instances. We excluded all instances involving identical clauses. In our ensembles we applied negation to each of the two variables representing the qubits uniformly at random.
To cover a range of interesting clause density values we used a maximum value of 2α c = 2, thus ensuring that our instances were all well within the range of the the finitesize scaling window of the phase transition, whose width is Θ(N −1 3 ) [31]: four our range 16 ≤ N ≤ 108 we have 0.40 ≥ N −1 3 ≥ 0.21. The maximum value of α supported by the DW1 processor is discussed in Appendix B. We note that having enforced an equal probability for negated and unnegated variables somewhat restricts the hardness as it is known that an unbalanced probability of negation can lead to harder instances [57].
To test how well our DW1-restricted instances approximate an unrestricted random ensemble, we also generated ensembles, E(N, α), of 1000 random 2-SAT problems with a given number of variables and clause density and no constraint on the literals that comprise a single clause, except that no clauses are repeated. A comparison between E DW (N, α) and E(N, α) is presented in the next section.

IV. EXPERIMENTAL AND NUMERICAL RESULTS
In this section we report on our experimental and numerical results for the ensembles of MAX 2-SAT problems described above. A complete description of the settings under which we ran the akmaxsat algorithm is given in Appendix C.
We compare the scaling of the solution time required by akmaxsat to the empirical probability of the correct ground state found by the DW1 processor, from which we compute an extrapolated time required to achieve a certain solution threshold accuracy. This is, of course, not entirely an "apples-toapples" comparison, since it compares an exact algorithm with a probabilistic machine. Moreover, there exist faster stochastic classical algorithms [18]. However, to check the proposed DW1 solutions for correctness an exact algorithm is required, and we decided to use akmaxsat due to its excellent performance on the benchmark problem sets of MAX 2-SAT used at the MAXSAT-2009 and MAXSAT-2010 evaluations for stateof-the-art MAX SAT solvers [53]. We make no claims here as to the significance of our results in the larger context of whether experimental quantum annealing can outperform all classical algorithms. Rather, we focus on the scaling comparison with a state-of-the-art exact classical algorithm, and on whether there is any correlation in problem hardness between this classical solver and the DW1. We find that the DW1 has a scaling that is better than that of akmaxsat, a performance gap which increases with the clause density, and that there is rapidly decreasing correlation between the DW1 and akmaxsat for the hardness of problem instances, as α increases. In addition we closely examine the behavior in the vicinity of the critical clause density.
A. Comparison of akmaxsat for the random and DW1-compatible ensembles As noted above, the DW1 processor-compatible problem instances are restricted by the Chimera graph connectivity (see Fig. 1). To test how the resulting ensemble E DW (N, α) compares with an unrestricted random ensemble E(N, α), we analyzed the performance of akmaxsat for instances drawn from each ensemble.  2. (color online) Computational effort comparison between an ensemble of random (blue surface) and DW1 processor-compatible (red surface) MAX 2-SAT instances for akmaxsat. The time to solution is generally greater for the random ensemble; however, substantial differences between T soln for DW1 processor-compatible and random instances are not observed for a majority of the α and N values considered. All results are averaged over 500 instances of each ensemble and 10 akmaxsat runs per data point.
In Fig. 2, we present results for such an analysis for the time to solution as a function of α and N for each ensemble. The clause density α and problem size N are varied as described in Sec. III C with 1000 instances generated for each ensemble for a given α and N . The akmaxsat solver is deterministic, yet the solution time for a given instance may deviate due to variations in the initial starting point of the algorithm; hence, all instances are averaged over 10 algorithm implementations. (Another view of this data is offered in Appendix D; see Fig. 18.) We find that the solution time T soln is generally somewhat larger for the unrestricted random ensemble E(N, α), implying (unsurprisingly) that on average the unrestricted instances are somewhat harder for akmaxsat than the DW1 processorcompatible case. The differences between the solution times of the two ensembles are relatively small, differing by at most 2ms, around N = 67 with α = 2.0. Solution times are essentially identical for α ≲ 1 and N ≲ 40, with small regions of the (α, N )-space where the E DW (N, α) ensemble requires somewhat larger average solution times. The solution times differ somewhat more for N > 40 and α > 1. It is important to remember in any case that hardness is also dependent upon the solution method. Indeed, we show below that instances which are hard for akmaxsat need not be hard for the DW1. Moreover, we show later (see Fig. 7) that according to another hardness measure, processor-compatible instances are harder than random ones.

B. Experimental Results
In this subsection we describe our DW1 results. Details about the experimental settings are given in Appendix A. . The probability decreases as the number of clauses increases for a given N . The α-dependence becomes more pronounced as the problem size N increases, indicating that problems become harder. All data points are averages over 500 random DW1 processor-compatible instances with each instance averaged over 100 annealing runs. Error bars denote standard error. Note the absence of a specific feature at the critical clause density αc = 1. However, see  Fig. 3. While we found that δ = 3 2 works well, we do not have an explanation for this particular value.

Mean success probability as a function of N and α
In Fig. 3, we present experimental DW1 processor results for the mean success probability as a function of α for various N values. The average for each data point is over the number of annealing runs and instances. Each set of data points includes a least squares fit using the following function as an ansatz: where we normalized p(0, 0) = 1, and we find A = 9.28[6] × 10 −5 , γ = 2.40 [9] (the number in square brackets is the roundoff of the remaining digits), and δ = 3 2 for all α and N considered. To test the universality of this ansatz we show a data collapse in Fig. 4, which indicates a reasonable fit to the data, yielding a good collapse for larger values of N (N ≥ 75) and slightly overestimating the success probability at low N values, as can also be seen from Fig. 3. Note that the success probability decreases more rapidly with increasing α than with increasing N . Several fundamental factors contribute to the decrease of the success probability: the inherent increase in problem difficulty as N increases results in a smaller ground state gap, which in turn enhances coupling to the finite-temperature environment in the form of thermal excitations, and nonadiabaticity due to the finite annealing time [58]. Added to this is the qubit approximation to the rf SQUID [59]. Control errors such as miscalibration and the finite digital-to-analog (DAC) converter resolution contribute as well. A larger number of these occur as both N and the number of clauses increase. A detailed discussion of the effect of control errors and their contribution to the observed success probabilities is presented in Appendix E.

Sorted success probabilities as a function of N and α
A fine-grained characterization of the ensemble of problem instances for fixed N is given in Fig. 5, where we present contour plots of the success probability for each of the 500 problem instances as a function of α. That is, for each value of α we first sort the instances by success probability, and then plot the probabilities of the sorted instance set for that value of α (thus the instance number varies as one moves horizontally through each panel). The predominance of the color red in the panels indicates that most problem instances had a probability of success close to unity. As suggested by the mean case results shown in Fig. 3, the success probabilities decrease with increasing α and N . Perhaps most interesting is the appearance of a soft transition around α ≈ 1. That is, for all N most of the problems have success probability very close to 1 for α < 1, while lower success probabilities appear for α > 1. This can be interpreted as an experimental indication of the critical clause density.
We take a closer look at the ensembles of problem instances with N = 108 in Fig. 6, where we show the success probability distribution for all values of α we studied. The distribution has a peak that moves from p = 1 to p = 0.9 as α increases, and develops a broad wing at lower success probabilities. Its unimodality is a feature that persists for all values of α, N we tried [60].

Transition at the critical clause density
While the phase transition discussed in Section II B becomes sharp in the limit N → ∞, analytic bounds provide a more nuanced characterization for finite N . The bounds of Coppersmith et al. [31] provide useful intuition but should be interpreted cautiously as they only apply for finite, but very large N . In the so-called "finite-scaling window," α ∈ [1 − N −1 3 , 1 + N −1 3 ], the probability that a random formula   window by estimating the width of the region in which the probability of a formula being satisfied is far from 0 or 1 [49]. Fig. 7 compares the probability a formula is satisfiable for DW1-compatible and random instances for various N as a function of α. Analytically, the width of the finitescaling window should be 2N −1 3 . We estimate the value of the clause density α l (α r ) at which the probability of a random formula being satisfiable drops below 0.98 (0.3), as demonstrated in Fig. 7. Then we plot our empirical estimate of the width of window, α r −α l as a function of N in Fig. 8. We confirm that the scaling of the size of this window is indeed proportional to N −1 3 . Note that the observed width of the scaling window is larger than the analytic predictions (valid only for asymptotically large N ) for both random instances and processor-compatible instances. Referring again to Figs. 7 and 8, we note that the DW1 processor-compatible instances have a slightly wider window whose center is shifted slightly towards lower values of α. In other words, at a given value of α a processor-compatible instance will tend to be harder to satisfy. This is in interesting contrast to the situation for the empirical success probabilities we found for akmaxsat in Fig. 2. We now show that the phase transition has computational consequences for both akmaxsat and DW1. Again, we use p(α, N ), the probability that DW1 produced the correct solution, as a proxy for problem difficulty. In Fig. 9, we consider a plot analogous to Fig. 8, except now we estimate for each N the value, ∆α DW1 = α DW1,r − α DW1,l where α DW1,r (α DW1,l ) indicates the value of α at which the probability of successfully solving a problem for DW1 falls below the threshold 0.97 (0.99). The threshold 0.97 was set high enough that the p(α, N )-α curve passes through it for every N . The largest minimum success probability was 0.967 for N = 16, as seen in Fig. 3. For very small N , many factors including control errors may affect this curve. While it seems plausible that the N −1 3 scaling will continue to hold for larger N , solving larger problems on next-generation processors (such as the 512 qubit DW2) will be required to verify this hypothesis.
We also see the effect of the phase transition in Fig. 10, in which we compare p(α, N = 108) to α using a density histogram. A sharp change in the number of difficult instances clearly occurs around α = α c = 1. Furthermore, the variance in problem difficulty is much higher to the right of the transition. We compare the time to solution for akmaxsat to α in Fig. 11. While the difficulty clearly increases with clause density, we see again that the variance in difficulty increases at the phase transition. While the appearance of difficult instances is linked to the phase transition in both cases, surprisingly the instances that are difficult appear unrelated, as we will see in Sec. IV D. We discuss further evidence for a transition at α c in Section IV D (see in particular Fig. 15).

C. Scaling of the time to solution: DW1 vs akmaxsat
Given a success probability p, assuming no correlation between successive annealing runs on the DW1 (an assumption that is satisfied to a good approximation [18]), the probability of not getting a single correct answer in k runs is (1 − p) k . We define p desired as the threshold probability of getting one or more correct answers, i.e., p desired = 1 − (1 − p) k . For a run with annealing time t f the time required to reach the ground state at least once in k runs with probability p desired is: where t f = 1ms for our experiments. Taking p as the mean success probability reported in Fig. 3, the extrapolated time to solution for p desired = 0.99 for α = 0.1 (right-triangles) and α = 2.0 (up-triangles) is plotted in Fig. 12 versus problem size N (for complete data see Appendix F). Note that for α = 0.1 ∀N , the DW1 obtained the correct solution with p ≥ 99% on average. Clearly, this subensemble of problems was too easy for the chosen value of the annealing time, since this translates into a single repetition (k = 1), i.e., a constant time to solution equal to the (unoptimized) annealing time of 1ms. On the other hand, as N grows the time to solution must of course eventually start to grow as well. This illustrates the danger of extrapolating to large N values from experimental data based on a single (unoptimized) annealing time. In fact this conclusion also applies to the other values of α shown, since for a fixed annealing time the time to solution must eventually blow up as a function of N , due to the inevitable reduction in success probability resulting from restricting the time to settle on an optimal solution in an increasingly growing configuration space [18]. This extrapolation caveat does not apply to our akmaxsat data, since there we simply let the algorithm run until it finds a solution.

Extrapolation for the fixed-α ensemble
With the just-stated caveat in mind, we present a best fit to the entire DW1 data set, and separately for the akmaxsat timing dataset. We find that the data is well fit by the following   function: where the values of the various parameters are given in Table I. The numerical values were obtained from a least squares fit followed by a data collapse of the data shown in Fig. 12 for both akmaxsat and the DW1. More details regarding the data collapse are given in Appendix F, where we also present a second fit with more free parameters; this does not change our conclusions below.
Comparing the numerical values of the exponents γ and δ of, respectively, the clause density and the number of variables in Table I we see that while akmaxsat has the smaller value of the clause density exponent, the DW1 has the smaller exponent for the number of variables. This explains the better scaling for the DW1 as a function of N seen in Fig. 12, in spite of the fact that the DW1 has a much larger value of the exponent B.

The limit of large N and constant M
Departing momentarily from our emphasis on the fixedα ensemble, note that Eq. (13) can also be written as A exp BM γ N δ−γ . While viewed in this way akmaxsat has a better scaling with the number of clauses M since it has the smaller γ value, our fit yields a negative value for the DW1 exponent δ − γ of N , while akmaxsat's exponent is positive (see Table I). Now consider the limit of large N and small α, while keeping the number of clauses M constant. In this limit the probability of repeated variables vanishes, so that it becomes possible to satisfy each clause independently. A parallel processor capable of updating all clauses simultaneously would therefore solve the MAX 2-SAT problem in constant time in this limit. This is indeed the prediction of Eq. (13) for the DW1 time to solution, given that δ − γ < 0: In contrast, the predicted scaling of the time to solution for akmaxsat diverges with N even in this limit, which clearly shows its suboptimality. An independent check of this is shown in Fig. 13, where we plot the time to solution for akmaxsat, for the unrestricted ensemble E(N, α) and the DW1-compatible ensemble E DW (N, α). It can be seen that indeed, akmaxsat solution times do not seem to converge to a constant as N increases, and only a mild improvement is seen as M decreases. The fact that the DW1 seems capable of "recognizing" that the fixed-M , large N limit is easy, while akmaxsat does not, is interesting. It suggests that the DW1 naturally acts as a parallel processor, making "cluster moves" that simultaneously find SAT solutions for multiple clauses.

Time to solution for different levels of problem hardness
In Fig. 14 (a) we plot, for α = 2, the time taken by akmaxsat to solve the problems at 8 different percentiles of the 500 instances at each N , and compare this in Fig. 14 (b) to the estimated time to solution for DW1 for the same percentiles requiring p desired = .99. Note that the same percentile value in the two figures generally represent different, possibly overlapping sets of problem instances. In Fig. 14(a) lower and upper percentiles correspond to shorter and longer solution times, respectively, with the easiest (hardest) problems being in the 0.01 (0.99) percentile. The scaling for akmaxsat is approximately exponential (note the logarithmic time axis), with the higher percentiles having larger exponents. The solution time for the hardest (0.99) percentile is approximately twice that for the easiest (0.01) for N = 108. As seen in Fig. 14(b), the range of DW1 solution times varies significantly more between different percentiles than for akmaxsat. Disregarding fluctuations due to the control errors and the small sample size, the scaling of the DW1 solution times appears to be more favorable than for akmaxsat for all percentiles, and to match an exponential of √ N rather than N , in agreement with the scaling of the tree-width of the Chimera graph [18,41,42]. Once again we point out that extrapolation to larger N values are unreliable due to our suboptimal annealing time. We also note that the procedure whose results are shown in Fig. 14 corresponds to first computing the percentiles, then estimating the time to solution; in Appendix G we show that the order of these operations does not change the results.  Fig. 6. The easiest and hardest sets of problem instances are indicated by the 0.01 and 0.99 percentiles, respectively. While the performance of akmaxsat appears fairly uniform across the different percentiles, the performance of the DW1 is significantly better for the lower (i.e., easier) percentiles, with a noticeable deterioration seen at the 90th percentile.
FIG. 15. Scatter plot of success probability of DW1 vs. time taken by akmaxsat for each problem instance with N = 108 and α ∈ {0.1, 0.2, . . . , 2.0} (500 instances per α value). While for very small clause densities all problems are easy for the DW1 and take roughly the same amount of time for akmaxsat, as α approaches αc = 1 from below harder problems start to appear for the DW1 and akmaxsat timings start to spread. Above αc there is a significant spread in both the DW1 success probabilities and the akmaxsat timings. As a result the correlation between these two variables steadily diminishes as α grows.
around their mean and the clear majority of problem instances are easy for the DW1. For α ≥ 1 hard problem instances start to appear and the akmaxsat timing results spread around their mean. The DW1 probabilities are much more scattered; while they still cluster somewhat near p = 1, there is an in-creasingly large spread across the entire range of possible values of p. The correlation between the the DW1 success probability and the akmaxsat time to solution thus steadily diminishes with α; in particular we see that for α > α c there is a large spread of p values for any fixed solution time.  Fig. 15. Clearly there is essentially no correlation for problem difficulty, even for small values of the clause density..
A more direct measure of correlations is given in Fig. 16. Shown in this figure is a ranked difficulty comparison between the DW1 and akmaxsat, where we rank the difficulty of each instance from 1 (easiest) to 500 (hardest) for both solvers. Perfect correlation would then be evidenced as all instances falling on the 45°diagonal. While for α = 0.1 there is a slight tendency for clustering of the instances along this diagonal, such evidence of correlation is entirely absent for all higher values of the clause density.
If it is true that QA offers an advantage for certain ensembles of problems, then we also expect that for the random ensemble (which is, in a sense, a uniform average over possible ensembles), we should see specific problem instances for which QA offers an advantage over classical solvers. That we see no correlation in problem difficulty between QA and one particular classical solver is an interesting, though not conclusive, piece of evidence in this direction.

V. CONCLUSION AND FUTURE WORK
In this work we undertook a study of the performance of the DW1 quantum annealing processor on MAX 2-SAT problems, and compared it to akmaxsat, a competitive exact classical solver. We focused on the random problem ensemble characterized by a fixed clause density, as the latter is a well defined order parameter which allows "tuning" the problem hardness. After showing how MAX 2-SAT problems can be mapped to the DW1, we studied three main experimental questions: 1. Is there experimental evidence for a hardness transition at or near the critical clause density?
2. Is there a correlation between the DW1 and akmaxsat in terms of problem hardness?
3. How does the time to solution scale for the two solvers, and is there evidence of a significant difference?
Our answer to the first question is a qualified "yes". Within the limitations of our relatively small number of variables we did indeed find evidence for a significant decrease in DW1 success probability starting around the critical clause density. Moreover, we found that the width of the finite-size scaling window follows the theoretical expectation. This suggests that QA is sensitive to the most essential feature of problem hardness. Our answer to the second question is a resounding 'no'. This means that within the ensemble of hard random MAX 2-SAT problems there are likely to be found problems for which QA has an advantage over exact classical solvers, and vice versa. However, one cannot exclude on the basis of our data that there might exist strong correlations between QA as encapsulated by the DW1 and stochastic classical solvers. Indeed, Ref. [18] showed that in terms of random spin glass problem instance hardness, simulated quantum annealing correlates very well with the DW1 (essentially as well as the DW1 correlates with itself). Since simulated quantum annealing has an efficient classical implementation using quantum Monte Carlo algorithms (by mapping to a classical spin problem in one extra dimension), it is undoubtedly desirable to follow up our work with a simulated quantum annealing study of the same set of MAX 2-SAT problem instances. This remark and more applies also to the third question: we found that the DW1 scaling with problem size is clearly better than akmaxsat's over the range of problem sizes and clause densities we studied, and this is encouraging for QA, but at the same time additional study with classical stochastic solvers such as simulated annealing are needed in order to establish whether the DW1 advantage persists.
There are several other interesting directions for future research (see also Ref. [18]). We focused exclusively on the probability of finding the actual ground state, meaning that even a single-qubit error disqualifies a state as a correct solution; this criterion could be relaxed and one could instead focus on the distribution of excited states or Hamming distances from the ground state. The connection between problem hardness and the minimum energy gap between the ground and lowest excited state encountered during the annealing evolution is another question of great interest. Finally, it is obviously interesting to extend the results presented here to larger problem sizes and clause densities using the DW2 processor and its successors.
We conclude with a suggestion for future research that is related to the ρ-PTAS discussed in Section II C, which highlights a particularly interesting aspect of MAX 2-SAT. Recall that a ρ-PTAS is an algorithm that provides an assignment of variables that provably satisfies a number of clauses within at least a fraction ρ of the maximum number of clauses that can be satisfied for any formula, and that no ρ-PTAS exists for ρ > 21 22 unless P=NP [52]. This is a rather tight bound and it is tempting to try to probe it empirically. Of course an experiment cannot satisfy the conditions of rigorous proof required by the definition of a ρ-PTAS, but suppose the data is interpreted as a means to estimate an empirical ρ, as follows: when the processor finds an incorrect solution one counts the number of satisfied clauses n e for the excited state it found and compares to the correct solution for that instance, i.e., the true maximal number of clauses n t that can be satisfied. The ratio ρ ′ = n e n t is the empirical ρ for that instance. One can then analyze the distribution of empirical ρ values over all instances, and compare it to existing classical bounds. Note that ρ ′ cannot be used in a straightforward manner to infer anything about the P versus NP question, since even if ρ ′ > 21 22 the inapproximability result states that this violates P≠NP only if the inequality can be achieved in poly(N ) time. Even if we find that ρ ′ appears to be constant as N increases, it could be that we had not picked the "worst case" distribution, and if we had, it would reduce ρ ′ below 21 22 (at least asymptotically). Still, we believe this is an interesting question. We suspect that instances near α c are "hard to approximate", and if one considers the output of the best ρ-PTAS available [51], ρ ′ for random instances will be distributed in the range [0.94, 1], while we expect that QA will yield a distribution of ρ ′ values peaked closer to 1. The question for the future is then whether this may be used to infer anything about the asymptotic computational efficiency of QA.
Our experiments were performed using the D-Wave One Rainier processor at the USC Information Sciences Institute, comprising 16 unit cells of 8 superconducting flux qubits each, with a total of 108 functional qubits. The couplings are programmable superconducting inductances. Fig. 1 is a schematic of the device, showing the allowed couplings between the qubits which form a "Chimera" graph [41,42]. The qubits and unit cell, readout, and control have been described in detail elsewhere [14,61,62]. The processor performs a quantum annealing protocol to find the ground state of a classical Ising Hamiltonian, as described by the transverse Ising Hamiltonian in Eq. (3). The initial energy scale for the transverse field is 33.7GHz (the A function in Fig. 17), ensuring that the initial state is to an excellent approximation a uniform superposition in the computational basis, with any deviations mainly due to control errors resulting in non-uniformity in the values of the local transverse fields. The final energy scale for the Ising Hamiltonian (the B function) is 33.6GHz, about 15 times the experimental temperature of 17mK ≈ 2.3GHz. We performed 100 runs for each problem instance on the DW1. Each run returns a state measured in the computational basis (eigenvectors of σ z ), i.e., a proposed solution. Applying H P as given in Eq. (9) to this state yields the corresponding energy. The success probability, p(α, N ), is defined as fraction of times (out of 100) the measured state is the ground state, i.e., is the correct solution as verified against the guaranteed-correct solution returned by akmaxsat.
We used default settings for the DW1, including programming and thermalization times of 1ms, and an annealing time of 1ms, which we did not attempt to optimize. Moreover, we did not average over different choices of subsets of active qubits, nor did we consider different "gauges", i.e., reassignments of qubit up/down values which leave the spectrum invariant [17]. Thus our results may have been affected by systematic flux and coupler biases. However, removing such biases via averaging and gauges would have only improved the DW1 performance, so that our results can be viewed as lower performance bounds. What is the maximum α allowing for a meaningful ensemble of problems to be generated?
Consider a single edge between two nodes x 1 , x 2 . Then the largest formula we can generate has 4 clauses: Of course this problem instance is UNSAT, but is in principle allowed. If we only allow M = 3 clauses then in the example above there are N M = 4 possible formulae. More generally, given a total of C edges between a total of N variables, we can use each edge to generate 4 clauses as in Eq. (B1), and thus have one unique formula with M max = 4C clauses. Again, this particular formula will be UNSAT, but is allowed. Now, for any n ≤ N we can always choose a subgraph (of the Chimera graph) that has c edges between the n nodes and contributes up to 4 unique clauses. Thus α max (n) = 4c n, but for any n ≤ N there are N n choices of the subgraph, each of which might have a different number of edges. We claim that the ratio c n is maximized for n = N which lets us use all c = C edges. To see this suppose we start with all N = 108 qubits and C = 255 edges. Removing any qubit would result in the loss of at most 6 edges, and 5 edges on average given the actual connectivity of the Chimera graph. Thus we should check whether (C−5) (N −1) < C N , which holds at N = 108 and C = 255. Also, for any number of qubits n < N and edges c < 255 between them, we can check that the same inequality holds: (c − 5) (n − 1) < c n. This means that the ratio c n increases with n and thus will be maximal at the end point n = N = 108. This establishes that the maximum possible clause density is α * max = 4C N corresponding to one unique UNSAT problem.
However, a unique formula is unsuitable when an ensemble is desired, as in our case. Thus consider clause densities 3C N < α < 4C N . For formulae with α in this range we note that there will necessarily be a combination of the type of Eq. (B1)). This can be proven by the pigeon hole principle: Let α = 3C+c N . If we choose 3 possible clauses from each of the C edges the formula may still be SAT; however we may choose the remaining c clauses only from the unchosen remaining clauses from c edges. This means that the problem is guaranteed to be UNSAT.
This brings us to the range α < 3C N . In this range we can have ensembles with 4C 3C−a unique formulae corresponding to problems with α = 3C−a N . In practice we chose α max = 2, which guaranteed that some of our problem instances were SAT and that we had large enough ensembles (at least 500 instances for each value of α and N ). us focus on the variable x 1 that appears the most, n 1 times, in the formula. After we convert the clauses in the formula to local terms in the problem Hamiltonian, suppose that from each of the n 1 clauses that x 1 participates in, the field contributions are h i , with i ∈ {1, 2, . . . , n 1 }. The value of the local field for x 1 , h F 1 , corresponding to F is: Now we note that n 1 ≤ 24, since the Chimera graph degree of x 1 is ≤ 6 and for each edge x 1 can appear in at most 4 clauses. But if n 1 = 24 then h F 1 = 0 because it appears negated the same number of times as unnegated. The local field of x 1 is maximized if it appears in clauses unnegated for all the couplings that participate in the formula. This means that n 1 = 2 × 6 with h 1 = h 2 = ⋅ ⋅ ⋅ = h n1 = +1, and the maximum possible local field for any of our problems is h F 1 = 12. Thus rescaling involve division by at most 6, resulting in fractional coupler and field values, and in such cases the uncertainty of 0.1 in setting the couplers and local fields could have caused the 2SAT formula to be unfaithfully represented. Such cases probably contributed to lowering the success probability, simply because the "wrong" problem was solved by the DW1 processor. However, a priori for our ensemble of problems the situation is somewhat better, since we imposed a maximum value of α = 2 which means that, on an average, each qubit participated in two clauses and the maximum local field strength is 2, where rescaling is not required.
To investigate the contribution of such control errors we plot in the left panel of Fig. 19 the mean success probability at N = 108, and for all values of α, as a function of the scale factor required to force all local field and coupler values into their allowed ranges. The scale factor is seen to significantly impact the success probability, with an impact that grows with the clause density. At the high end of α values the decrease in success probability is roughly linear with the scale factor. The middle and right panels Figure 19 show the impact of the scale factor at α = 1.0 and 2.0 for all values of N . From this perspective too it is seen that the larger the scale factor the smaller the success probability, an effect that increases with N . Thus rescaling, which results in the ±0.1 uncertainty in fields and couplings becoming important, explains part of the reduction in the experimental success probability.
FIG. 21. Data collapse for solution time vs problem size N for akmaxsat (left) and DW1 (right) for all α using the fitting function given in Eq. (13). The data collapses well for larger α values, as can be seen from the inset of each plot where the collapse for α ≥ 1 is shown. The outlier for akmaxsat is α = 0.1. Data collapse for the DW1 data is poor for α < αc.  Table II.

Appendix F: Data collapse analysis
In Fig. 12 we display best fits for the solution time as a function of problem size for a range of clause densities. Figure 20 shows the complete data set, where the fit parameters are determined via an initial least squares fit and then a data collapse analysis. As a result, we find that for akmaxsat and DW1 γ = 3 4 and γ = 3 2, respectively, correspond to a strong collapse of the data particularly for α ≥ α c . In Fig. 21, the result of the data collapse is shown for akmaxsat (left) and the DW1 (right) with the collapse for α ≥ α c shown in the insets.
To complement the fit given in Eq. (13) and Table I which use a restricted set of fitting parameters, we present a second, less constrained fit, which includes a purely α-dependent part: T soln (α, N ) = A exp Bα γ N δ + C exp Dα ζ + Eα + F (F1) We find the results given in Table II, and the fit is shown in Fig. 22. Note that in this case we did not use a data collapse.  Fig. 14(b)).
The coefficient of determination for the akmaxsat data is identical to that for the first fit (Table I), while this coefficient improves from 0.991 for the DW1 data, so the second fit is slightly better. However, this improvement comes at the expense of introducing many more free parameters. Note that again δ − γ is positive for akmaxsat and negative for the DW1 (recall the discussion in Section IV C 2). TABLE II. Numerical values obtained from a least squares fit of the data shown in Fig. 12 for both akmaxsat and the DW1. The parameters are the ones appearing in Eq. (F1).
Appendix G: Comparison of different procedures for extracting the time to solution In Fig. 23 we compare the results of extracting the time to solution in two different orders for N = 108 and α = 2.0. On the left: we first use Eq. (12) to compute the time to solution given the success probability p for each instance, create a histogram of the resulting times, and then extract the percentiles. On the right: we first create a histogram of success probabilities for all instances, then calculate the probability p associated with a given percentile, and then use Eq. (12) to compute the time to solution for that percentile. It can be seen that the two procedures give essentially identical results.