Diabatic quantum annealing for the frustrated ring model

Jeremy Côté; Frédéric Sauvage; Martín Larocca; Matías Jonsson; Lukasz Cincio; Tameem Albash

doi:10.1088/2058-9565/acfbaa

1. Introduction

The ubiquity of optimization problems across disciplines and the incredible computational resources they consume continues to motivate researchers to explore new and more efficient approaches to tackle them. Though quantum computing has so far offered limited provable computational advantages for combinatorial optimization problems [1–14], we expect that the library of quantum approaches for tackling different classes of optimization problems will expand as quantum information processors mature and become more readily available.

Quantum annealing (QA) is a generic quantum approach to tackling combinatorial optimization problems. Starting from an efficiently prepared ground state of an initial Hamiltonian $\hat{H}_d$ , the system is evolved according to an interpolating Hamiltonian between $\hat{H}_d$ and a problem Hamiltonian $\hat{H}_p$ , which ground state encodes the solution to the optimization problem. At the end of the evolution, the algorithm is deemed successful if the quantum state has high overlap with the ground state of $\hat{H}_p$ .

A sufficient but not necessary condition for QA to succeed is for the evolution to satisfy the adiabatic condition, which requires the total evolution time T, also called the annealing time, to scale as an inverse power of the minimum gap encountered along the interpolations [15–17]. When it does, QA is sometimes referred to as quantum adiabatic optimization (QAO) [18, 19], and it belongs to the adiabatic paradigm of quantum computing [20, 21]. The adiabatic theorem of quantum mechanics [15–17] provides a guarantee that if the evolution is sufficiently slow, then the state at the end of the evolution will have high overlap with the ground state of $\hat{H}_p$ . The efficiency of the algorithm is thus given by the scaling of the minimum gap encountered along the interpolating schedule with the system size.

However, enforcing adiabaticity for all realizations of QA can result in exponentially slow evolutions even for simple problems [22, 23]. Furthermore, the only provable exponential speedups currently known for QA involves an evolution that is not adiabatic [8]. This suggests that it might be meaningful to consider heuristic continuous-time quantum optimization algorithms that are not necessarily adiabatic, and for which we can design the interpolation to take advantage of non-adiabatic transitions. This approach is sometimes called 'diabatic quantum annealing' (DQA) [24]. The added flexiblity of DQA comes at the cost of identifying a suitable interpolation schedule, which is a non-trivial optimization problem in itself. This is in contrast to other approaches to shortcuts to adiabaticity [25], such as counterdiabatic driving [26] whereby new Hamiltonian terms are added to the Hamiltonian. If our goal is to search for interpolating schedules that improve on the evolution times of the adiabatic approach, we must ensure that the computational resources to identify these schedules also scale favorably with the system size.

In this work, we consider the performance of DQA on a simple class of optimization problems, known to be exponentially hard for QAO [27]. The problem is a one-dimensional frustrated ring model whose ground state can be found analytically, but poses a challenge for standard implementations of QAO because it exhibits a perturbative crossing [28], leading to exponentially slow evolution times. We will show that we can identify interpolating schedules that solve the optimization problem more efficiently than QAO.

The paper is organized as follows. In section 2, we review the frustrated ring model and why it exhibits an exponentially closing gap with the system size. In section 3, we present our algorithm for optimizing the annealing schedules and annealing times. In section 4, we present our numerical results for our optimized schedules and the performance of the DQA algorithm. We conclude and highlight some future directions in section 5.

2. Model

For our problem Hamiltonian, we focus on the frustrated ring model of [27] defined on N Ising spins arranged in a ring. The problem Hamiltonian is given by:

$\begin{align} \hat{H}_p = - \sum_{j = 1}^N J_{j} \hat{Z}_j \hat{Z}_{j+1}, \end{align} \tag{ 1 }$

where J_j denotes the coupling between spins j and j + 1, $\hat{Z}_j$ is the Pauli operator $\hat{Z}$ acting on spin j, and $\hat{Z}_{N+1} \equiv \hat{Z}_1$ . We denote the +1 eigenstate of $\hat{Z}$ by $|0\rangle$ , and the −1 eigenstate by $|1\rangle$ . For simplicity, we restrict ourselves to the case where N is odd. There are three types of couplings, and using the indexing convention in figure 1, they are:

$\begin{align} J_j = \left\{ \begin{array}{lll} -J_R & \mbox{if } j = N, \\ J_L & \mbox{if } j = \frac{1}{2} \left( N \pm 1 \right), \\ J & \mbox{otherwise.} \end{array} \right. \end{align} \tag{ 2 }$

For the rest of the paper, we take the couplings to satisfy $0\lt J_R \lt J_L \lt J = 1$ . With these choices, the couplings associated with J_L and J are ferromagnetic, and the couplings associated with J_R are antiferromagnetic.

**Figure 1.** A visualization of the frustrated ring model with a total of N (odd) spins with our indexing convention for the sites.
Download figure:
Standard image High-resolution image

The doubly degenerate ground states of this Hamiltonian are given by the states $|0\rangle^{\otimes N}$ and $|1\rangle^{\otimes N}$ , which satisfy all the ferromagnetic couplings but violate the one antiferromagnetic coupling of the Hamiltonian. The ground state energy is thus given by:

$\begin{align} E_{0} = - \sum_{j = 1}^N J_j = - \left(N-3\right)J + J_R - 2J_L . \end{align} \tag{ 3 }$

On the other hand, the first excited states satisfy the antiferromagnetic coupling but violate one of the two J_L ferromagnetic couplings, giving rise to a four-fold degenerate first excited eigenspace with energy:

$\begin{align} E_{1} = -\left(N-3\right)J - J_R = E_{0} + 2\left(J_L - J_R \right) . \end{align} \tag{ 4 }$

Thus, the difference $J_L - J_R$ determines the ground state energy gap of the problem Hamiltonian.

We now consider a continuous-time algorithm with a time-dependent Hamiltonian $\hat{H}(t)$ that interpolates between a uniform transverse field Hamiltonian $\hat{H}_d = - \sum_{j = 1}^N \hat{X}_j$ , where $\hat{X}_j$ is the Pauli operator $\hat{X}$ acting on site j, and the problem Hamiltonian. That is,

$\begin{align} \hat{H}\left(t\right) = \left(1-A\left(t\right)\right) \hat{H}_d + A\left(t\right) \hat{H}_p = \hat{H}_d + A\left(t\right) \left( \hat{H}_p - \hat{H}_d \right) \ , \end{align} \tag{ 5 }$

where A(t) is the annealing schedule of our algorithm. For the remainder of the paper, we require $A(0) = 0$ and $A(T) = 1$ , where T is the total annealing time for the protocol.

An important feature of this combination of problem Hamiltonian and mixer Hamiltonian is that it gives rise to a 'perturbative crossing' [28] in the energy spectrum. To see this, we first express the ground states and first excited states as symmetric and anti-symmetric combinations (for simplicity we write this for the case of N = 7):

$\begin{align} |{\Phi_0^{\left(\pm\right)}}\rangle & = \frac{1}{\sqrt{2}} \left( |{0000000}\rangle \pm |1111111\rangle \right) \ , \end{align} \tag{ 6a }$

$\begin{align} |{\Phi_1^{\left(\pm\right)}}\rangle & = \frac{1}{\sqrt{2}} \left( |{0001111}\rangle \pm |{1110000}\rangle \right) \ , \end{align} \tag{ 6b }$

$\begin{align} |{\Phi_{1^{^{\prime}}}^{\left(\pm\right)}} \rangle & = \frac{1}{\sqrt{2}} \left( |{0000111}\rangle \pm|{1111000}\rangle \right) \ . \end{align} \tag{ 6c }$

Because the Hamiltonian $\hat{H}(t)$ commutes with the operator $\hat{X}^{\otimes N}$ , the energy eigenstates can be labeled by the two eigenvalues ±1 of this operator. We only need to concern ourselves with the +1 eigenvalue states (the reason for this will become apparent when we discuss the initial state of our continuous-time algorithm), which in turn means we only need to consider the symmetric combinations of states in (6). The two first excited states $\Phi_1^{(+)}$ and $\Phi_{1^{^{\prime}}}^{(+)}$ are coupled by the transverse field, so the degeneracy of the first excited state of the problem Hamiltonian is broken when we turn on the parameter $1-A$ . At first order in perturbation theory, the unique instantaneous first excited state has energy

$\begin{align} E_1\left(A\right) = E_1 A - \left(1-A\right) \ . \end{align} \tag{ 7 }$

The ground state energy is corrected to $E_0(A) = A E_0$ at the same order. As A decreases from 1, first order perturbation theory predicts an energy level crossing at:

$\begin{align} A_\ast = \frac{1}{1 + E_1 - E_0} \ , \end{align} \tag{ 8 }$

which will be corrected to an exponentially small (in N) avoided level crossing in the full quantum theory [27], with (8) being approximately the position of the minimum gap. For example, with $J_R = 0.45, J_L = 0.5$ we find $A_\ast \approx 0.9091$ .

3. Optimizing the annealing schedules

Starting from the state $|{\psi(0)}\rangle = |{+}\rangle^{\otimes N}$ , which is the ground state of $\hat{H}(0)$ , our objective is to prepare a state $|{\psi(T)}\rangle = \mathrm{Texp}( \int_0^T \hat{H}(t) \mathrm{d}t ) |{\psi(0)}\rangle$ , with $\mathrm{Texp}$ being the time ordered exponential, that has a large overlap with the (relevant) ground state of $\hat{H}_p$ . We provide details of how we simulate the dynamics in appendix A. In order to avoid unfairly steering the optimization of the annealing schedule towards the known solution, we choose the cost function to minimize to be the energy $E(T) = \langle{\psi(T)}| \hat{H_p} |{\psi(T)} \rangle$ . We note that this choice is general and can be applied to any problem Hamiltonian. In order to determine the success of our optimization, we choose a threshold value $\Delta_E$ and consider any state $|{\psi(T)}\rangle$ with $E(T) \unicode{x2A7D} E_0 + \Delta_E$ to be a success. This choice does use our knowledge of the true ground state, and we use it to provide no ambiguity in the scaling performance of our optimization method. In more realistic situations, where the ground state energy of the problem Hamiltonian is not known, one can use more sophisticated stopping criteria such as the optimal stopping discussed in [29], where a further cost associated with every call to the algorithm is introduced.

Our goal is to identify the lowest time T and corresponding schedule which, given a target energy threshold value, ensures the success of the optimization. Other objectives besides the lowest time could be used, but we do not pursue them in this work. For example, if our objective is to minimize the total time to find a state with or below a target energy $E_0 + \Delta_E$ , a more suitable metric might be the time-to-solution [30], which balances spending more time per independent run of the algorithm for a higher success probability against performing multiple fast runs of the algorithm to boost a lower success probability.

As a baseline, we consider the linear annealing schedule $A(t) = t/T$ , with T large enough to satisfy the adiabatic condition. As we discussed in section 2, Hamiltonian $\hat{H}(t)$ exhibits along its interpolation a minimum energy gap between its first excited state and ground state that closes exponentially with the system size. This implies that an adiabatic protocol would require an evolution time that scales exponentially with the system size, which means that the protocol is exponentially slow in finding the ground state (with high probability) of the one-dimensional frustrated ring Ising problem, despite its computational simplicity.

Our objective is to optimize the annealing time and schedule of DQA to avoid this exponential slowdown. To do so, we divide our optimization procedure into an inner and outer routine. Within the inner routine, which we visualize in figure 2, the annealing time T is fixed and we optimize the schedule A(t) to minimize the energy of the prepared state. We discuss the details of this in section 3.1.

**Figure 2.** The inner routine of our optimization algorithm for optimizing the annealing schedule A(t). It shows the control flow for algorithm 1.
Download figure:
Standard image High-resolution image

In addition, the outer routine (figure 4) aims at minimizing the annealing time T required to achieve a target threshold energy. We do this by iteratively decreasing (or increasing) T depending on the success (or failure) of the inner routine. The algorithm repeats these steps until we identify a sufficiently narrow range of the values of T for which the algorithm has found a successful schedule (section 3.2). We now describe the inner and outer routines in further detail.

3.1. Optimizing the schedule

The inner routine of our algorithm involves optimizing the annealing schedule A(t) such that we minimize $E(T) - E_{0}$ (algorithm 1), for a fixed time T. In particular, we want to ensure our output state $|{\psi(T)}\rangle$ has enough overlap with the true ground state $|{\Phi_0^+}\rangle$ so that we could in principle implement this schedule on a quantum processor and have a finite probability of measuring the ground states.

Algorithm 1. FindSchedule.
Input: J_R , J_L , J, E₀, N, T, $\delta t$ , $\delta E, c, k_0, \mathrm{trials}$
Output: A(t)
$k \gets k_0$ / / `Starting number of points`;
$E_{\mathrm{best}} \gets \infty$ / / `We set at` 10⁵;
for $i \lt \mathrm{trials}$ do
$a \gets \text{InitialPoints}(k)$ / / `Get` $\left\{a_j \right\}$ ;
$a(T) \gets \mathrm{Optimize}(a, T)$ ;
$E(T) \gets \mathrm{ComputeEnergy}(a, T)$ ;
if $E(T) \lt E_{\mathrm{best}}$ then
$E_{\mathrm{best}} \gets E(T)$ ;
$a_{\mathrm{best}}(T) \gets a(T)$ ;
end if
$i \gets i+1$ ;
end for
$k \gets 2k + 1$ ;
$a(T) \gets a_{\mathrm{best}}(T)$ ;
$E_{\mathrm{previous}} \gets E_{\mathrm{best}}$ ;
repeat
$a \gets \text{EmbedSchedule}(a(T), \, k)$ / / `Get` $\left\{a_j \right\}$ ;
$a(T) \gets \mathrm{Optimize}(a, T)$ ;
$E(T) \gets \mathrm{ComputeEnergy}(a, T)$ ;
if $E(T) \lt E_{\mathrm{previous}}$ then
$E_{\mathrm{previous}} \gets E(T)$ ;
end if
$k \gets 2k + 1$ ;
until $E(T) - E_{0} \lt 2c(J_L - J_R)$ or $\|E(T) - E_{\mathrm{previous}} \| \lt \delta E$ ;
$A(t) \gets \text{Schedule}(a, T)$ // `Schedule from $\left\{a_j \right\}$`;
return A(t);

We parameterize the schedule A(t) in terms of a finite number of continuous parameters corresponding to weights over a predefined set of functions. For instance, these functions could be a basis of trigonometric functions [31], a basis of polynomials [32, 33], 'bang-bang' or on-off pulses (standard in the quantum approximate optimization algorithm (QAOA) [34, 35]), or others [36, 37]. In our case, we opt for a linear piecewise decomposition [38], which we iteratively refine and now describe.

To parameterize A(t), we begin by choosing k₀ random values $a_1, \, a_2,\, \ldots a_{k_0}$ in the interval $\left[ 0, 1 \right)$ and split our annealing interval $\left[0, T \right]$ into $k_0+1$ equal intervals separated at $t_j = jT/(k_0+1)$ for $j \in \left[1, k_0 \right]$ . We then set $A(t_j) = a_j$ . This defines the initial annealing schedule A(t), with $k_0+1$ linear functions connecting the k₀ points (as well as the start and end points, $A(0) = 0$ and $A(T) = 1$ ).

After initializing a random schedule, we optimize the parameters a_j such that we minimize the energy of the output state $|{\psi(T)} \rangle$ . We perform the optimization using a gradient-free method from SciPy, which we detail further in appendix B. The result is a locally optimal schedule A(t) using a piece-wise linear function with $k_0+1$ segments. In our experiments, we use ten different random instances of the initial tuple $(a_1, a_2, \ldots, a_{k_0})$ and select the best set of parameters (in terms of the energy achieved) when proceeding to the following step. We also experiment with $k_0 \in \left\{3, 5, 7, 9 \right\}$ so that the averaged results we present in section 4 do not depend on a specific number of initial parameters k₀.

Next, we refine the schedule by adding new points (i.e. additional free parameters) bisecting the line segments connecting our existing points. For example, suppose we start with $k_0 = 3$ . Then, we will initially have the points $(0,0)$ , $(t_1, a_1)$ , $(t_2, a_2)$ , $(t_3, a_3)$ and $(T,1)$ . In this case, we would go from three free parameters to seven (including the original three). In general, when we split the interval $\left[0, T \right]$ with k points (not including the fixed endpoints), the points are located at $t_j = j T / (k+1)$ , with $j \in \left\{1,2,\ldots,k \right\}$ . We set the new values of a_j so that the points $\left(t_j, a_j \right)$ bisect the line segments connecting the original points. This ensures that the introduction of new points does not initially alter the schedule. Now, the optimization performed on the refined schedule should either maintain or improve the quality of the result (measured in energy E(T)) as the interpolated schedule is initially identical, and further optimization is expected to improve the energy. We note that running the optimization algorithm on a schedule with more points does not guarantee a lower energy will be found.

For each new set of values $\left\{a_j \right\}$ defining A(t), we run our optimization routine and compare the lowest energy found to the previous lowest energy value. If the lowest energy found does not decrease by more than a threshold value $\delta E$ , we declare the energy 'converged' and output A(t) as our optimal annealing schedule (figure 2). Otherwise, we add more points in between the current ones and optimize again. In figure 3, we show these intermediate steps of schedule refinement, from 3 parameters to 15 parameters that are required to converge close enough to the desired ground state.

**Figure 3.** An example annealing schedule A(t) for N = 5 during three stages of the optimization process, for T = 12.5. The green dotted line with circular markers is the initial guess with $k_0 = 3$ points, the blue dashed line with triangular markers is the first optimized schedule with 7 points, and the purple solid line with square markers is the final optimized schedule with 15 points. The ground state energy is $E_{0} = -2.55$ , and only the purple schedule reaches it within a tolerance of c = 0.5. The tolerance c is defined in (9).
Download figure:
Standard image High-resolution image

**Figure 3.** An example annealing schedule A(t) for N = 5 during three stages of the optimization process, for T = 12.5. The green dotted line with circular markers is the initial guess with $k_0 = 3$ points, the blue dashed line with triangular markers is the first optimized schedule with 7 points, and the purple solid line with square markers is the final optimized schedule with 15 points. The ground state energy is $E_{0} = -2.55$ , and only the purple schedule reaches it within a tolerance of c = 0.5. The tolerance c is defined in (9).
Download figure:
Standard image High-resolution image

In our optimization approach, we place no restrictions on the range of A(t), but the schedules we find have a bounded range. This is important since in any physical implementation of the time-dependent Hamiltonian, the coupling strengths made available by the hardware will be limited.

3.2. Optimizing the annealing time

The outer loop of the algorithm aims at determining the minimum annealing time $T_{\mathrm{min}}$ that keeps us within our threshold $E(T) - E_{0} \unicode{x2A7D} \Delta_E$ (see figure 4 and algorithm 2). We choose $\Delta_E$ to be a fraction c of the problem Hamiltonian gap (4):

$\begin{align} \Delta_E\left(c\right) = 2c \left(J_L - J_R \right), \,\,\, c > 0. \end{align} \tag{ 9 }$

When c = 1, the threshold corresponds to the gap between the ground state and the first excited state of the problem Hamiltonian. In practice, we vary c to determine the performance of our algorithm.

**Figure 4.** The outer routine of our optimization algorithm for optimizing the annealing time T. It shows the control flow for algorithm 2.
Download figure:
Standard image High-resolution image

Algorithm 2. MinimizeAnnealingTime.
Input: J_R , J_L , J, E₀, N, T, $\delta T$ , $\delta E, c, k_0$
Output: $T_{\mathrm{min}}$
$T_{\mathrm{low}} \gets 0$ ;
$T_{\mathrm{high}} \gets T$ ;
while $\frac{T_{\mathrm{high}} - T_{\mathrm{low}}}{T_{\mathrm{high}} + T_{\mathrm{low}}} \unicode{x2A7E} \delta T$ do
$A(t) \gets \mathrm{FindSchedule}$ / / `Use algorithm 1`;
$E(T) \gets \mathrm{ComputeEnergy}(A(t))$ ;
if $E(T) - E_{0} \unicode{x2A7D} 2c(J_L - J_R)$ then
$T_{\mathrm{high}} \gets T$ ;
$T \gets \frac{T_{\mathrm{high}} + T_{\mathrm{low}}}{2}$ ;
else
$T_{\mathrm{low}} \gets T$ ;
if $T_{\mathrm{high}} = T_{\mathrm{low}}$ then
$T_{\mathrm{high}} \gets 2 T_{\mathrm{high}}$ / / `Increase guess`;
$T \gets T_{\mathrm{high}}$ ;
else
$T \gets \frac{T_{\mathrm{high}} + T_{\mathrm{low}}}{2}$ ;
end if
end if
end while
$T_{\mathrm{min}} \gets T_{\mathrm{high}}$ ;
return $T_{\mathrm{min}}$ ;

To estimate $T_{\mathrm{min}}$ , we build an estimation interval $\left[ T_{\mathrm{low}}, T_{\mathrm{high}} \right]$ and take the estimate to be the upper limit of this interval. We start by setting $T_{\mathrm{low}} = 0$ . To set an initial $T_{\mathrm{high}}$ , we find a time T such that the corresponding energy E(T) after optimizing the schedule satisfies $E(T) - E_{0} \unicode{x2A7D} \Delta_E$ . We declare such an optimization successful and set our current upper bound for the minimal annealing time as $T_{\mathrm{high}} = T$ . If we are not successful, we set $T_{\mathrm{low}} = T$ (see algorithm 2 for the edge case where $T_{\mathrm{high}} = T_{\mathrm{low}}$ ). Then, we update T:

$\begin{align} T \leftarrow \frac{T_{\mathrm{high}} + T_{\mathrm{low}}}{2}. \end{align} \tag{ 10 }$

As we continue refining our best estimate for $T_{\mathrm{min}}$ , our interval of uncertainty $\left[ T_{\mathrm{low}}, T_{\mathrm{high}} \right]$ will shrink. We stop our algorithm when

$\begin{align} \frac{T_{\mathrm{high}} - T_{\mathrm{low}}}{T_{\mathrm{high}} + T_{\mathrm{low}}} \unicode{x2A7D} \delta T = 0.05, \end{align} \tag{ 11 }$

which is just the ratio between the uncertainty of the interval (half the range) and its midpoint $\left(T_{\mathrm{high}} + T_{\mathrm{low}} \right) / 2$ .

By calculating $T_{\mathrm{min}}$ for multiple system sizes, we can study how the minimal annealing time scales with the system size. This is only an upper bound for the minimal annealing time $T_{\mathrm{min}}(N)$ satisfying $E(T) \unicode{x2A7D} E_0 + \Delta_E$ , because our optimization algorithm could always fail to find a schedule with a smaller $T_{\mathrm{min}}$ .

4. Results

In this section, we first compare the minimum annealing times obtained by means of our optimization (as described in section 3) to the times for a linear schedule to reach the same energy $E(T) \unicode{x2A7D} E_0 + \Delta_E$ . We show that the optimized schedules can avoid the exponential times required by a linear schedule, resulting in annealing times scaling only polynomially with the system size. Second, we investigate what distinguishes the optimized schedules from the linear ones by studying the overlap of the instantaneous state with the ground state and first excited state of the instantaneous Hamiltonian $\hat{H}(t)$ . In all the results we present, we set the system parameters to $\left(J_R, J_L, J \right) = \left(0.45, 0.5, 1 \right)$ .

4.1. Scaling of annealing time

In figure 5, we report the annealing times corresponding to both the linear schedules and the optimized ones, over different values of the threshold factor c = 0.10, 0.25 and 0.50 appearing in (9) (colors in legend). As expected from the discussion in section 2, we find that the linear schedules result in annealing times that grow exponentially fast with the size N of the system. Already, for N > 9 this would entail times $T \gt 10^6$ that are too computationally demanding to simulate.

**Figure 5.** Scaling of the minimum annealing times $T_{\mathrm{min}}$ found for the optimized schedules (solid lines) and the linear ones (dashed lines). These are obtained for odd system sizes $N\in[5,39]$ and for different values of the energy threshold c (colors in legend). Each data point in the 'Optimized' part of the main figure represents the minimum annealing time found over 100 runs of our algorithm. Furthermore, in the inset we report the median values of these times over the 100 repetitions, along with a $95\%$ confidence interval in these estimates (shaded region). In both cases, we see evidence of the polynomial scaling of the optimized $T_{\mathrm{min}}$ with respect to N. In particular, the median values are found to scale quadratically with N, with a fit of the form $\alpha N^2$ as a leading term, depicted in the inset (dashed lines).
Download figure:
Standard image High-resolution image

In stark contrast, figure 5 demonstrates that our optimization protocol is capable of finding schedules significantly shorter than the linear ones (up to 5 orders of magnitude shorter for N = 9). Here, each data point corresponding to the optimized schedules is obtained as the minimum time found over 100 optimizations. Overall, we see that times only scale polynomially with N for all the values of $c \in \left\{0.10, 0.25, 0.50 \right\}$ we study.

While this already provides evidence that optimized schedules can be exponentially shorter than linear ones, it is also of great interest to understand the scaling of annealing times entailed by a typical optimized schedule. For that purpose, we report in the inset of figure 5, the median values of the times found over the 100 repetitions. As shown, these median times can be up to one order of magnitude larger than the minimum times found. Despite this increase, these median times still scale polynomially in N. Fits of the form $\mathrm{median}(T_{\mathrm{min}}) \sim \alpha(c) N^2$ (as a leading term) are displayed as dashed lines in the inset. As c decreases, the slopes $\alpha(c)$ increases.

4.2. Population levels

To understand the underlying dynamics associated with our optimized annealing schedules and low annealing times, we first inspect the schedules themselves. In figure 6, we report a selection of representative optimized schedules, identified for the threshold value c = 0.10. These schedules share a common structure across various system sizes, in particular a similar 'up-down-up' trajectory.

**Figure 6.** A selection of optimized schedules A(t) across many system sizes that have been found for the case c = 0.10 in (9). We plot each schedule along the time $t / T_{\mathrm{min}}$ that has been scaled by the overall time $T_{\mathrm{min}}$ of the schedule. Dots in the figure depict the optimized points for the schedule, with the initial and final ones fixed to $(0,0)$ and $(1,1)$ . The dashed black horizontal line represents the value of schedule $A_\ast$ where there is a perturbative gap.
Download figure:
Standard image High-resolution image

**Figure 6.** A selection of optimized schedules A(t) across many system sizes that have been found for the case c = 0.10 in (9). We plot each schedule along the time $t / T_{\mathrm{min}}$ that has been scaled by the overall time $T_{\mathrm{min}}$ of the schedule. Dots in the figure depict the optimized points for the schedule, with the initial and final ones fixed to $(0,0)$ and $(1,1)$ . The dashed black horizontal line represents the value of schedule $A_\ast$ where there is a perturbative gap.
Download figure:
Standard image High-resolution image

To understand the role of the non-monotonic schedule, we look at the populations in the energy levels of the instantaneous Hamiltonian $\hat{H}^{^{\prime}}(t) = \hat{P}_N \hat{H}(t) \hat{P}_N$ , where $\hat{P}_N = \left( \hat{\unicode{x1D7D9}}_N + \hat{X}^{\otimes N} \right) / 2$ , and $\hat{\unicode{x1D7D9}}_N$ is the identity operator of dimension 2^N. Because the initial state satisfies ${\hat P_N}\left| {\psi (0)} \right\rangle = \left| {\psi (0)} \right\rangle$ and our evolution preserves the symmetry ( $[\hat{H}(t), \hat{X}^{\otimes N} ] = 0$ ), we can restrict ourselves to states in that symmetry subspace. This means that the relevant eigenvalues and eigenvectors are those of the Hamiltonian $\hat{H}^{^{\prime}}(t)$ .

By tracking how the populations in the energy levels of $\hat{H}^{^{\prime}}(t)$ change over time, we can observe how A(t) affects the quantum state. Our simulations show that the majority of the energy population is restricted to the ground state and first excited state of $\hat{H}^{^{\prime}}(t)$ . We show two examples in figure 7. The strategy followed by the optimized schedule is to first transfer population from the ground state to the first excited state and then cross the perturbative gap at $A \approx A_\ast$ one last time to diabatically transfer the population back into the ground state [39]. What is of particular interest is that the transfer of population to the first excited state is done by using a non-monotonic schedule. In some cases, the function A(t) crosses the perturbative gap at $A_\ast$ multiple times. For example, the schedule in figure 3 crosses the value $A_\ast \approx 0.9091$ a total of three times (twice increasing above $A_\ast$ and once decreasing below it). In others, the function A(t) approaches the perturbative gap at $A_\ast$ but does not cross it, moves away from it, and only crosses it near the end of the evolution. We show this in figure 6, where the horizontal dashed line indicates the schedule value $A_\ast$ which leads to the perturbative gap.

**Figure 7.** Comparison of the linear versus optimized schedule for (a) N = 9 and (b) N = 13. Top row: The population levels of the ground and first excited state invert twice for the optimized schedule, versus only once in the linear case. Middle and bottom rows: This is because the energy gap between the two levels only closes at the perturbative crossing of $A_\ast \approx 0.9091$ , which inverts the population levels in an undesirable way. On the other hand, the gap closes more than once for the optimized schedule: In the top panel of (b) we see that the schedule approaches $A_\ast$ at approximately $t/T_{\mathrm{min}} = 0.4$ and $t/T_{\mathrm{min}} = 0.95$ . We see in the middle row panels that the ground state and first excited state energies approach each other at those times, which we emphasize in the final row by using a vertical logarithmic scale that displays the absolute energy difference. This allows the population levels to accumulate in the first excited state, and then transfer into the ground state when the system passes through $A_\ast$ . For the simulations, we discretize the time $T_{\mathrm{min}}$ in steps of $\delta t = 2^{-5}$ (see appendices A and B).
Download figure:
Standard image High-resolution image

To put our results into context, we compare our optimized schedules to linear schedules for the same optimized time $T_{\mathrm{min}}$ in figure 7. Doing so highlights the advantages of a non-adiabatic evolution. We look at the instantaneous population levels, energy levels, and gap to the ground state along both schedules. We observe that the linear schedule only creates a small gap near the exponential closing $A_\ast$ . If we anneal too fast, this exponentially small gap will cause the population levels of the ground and first excited state to invert. The only option is to anneal more slowly, which lowers the probability that the population levels invert. By contrast, some of our optimized schedules first make the gap to the ground state small without crossing $A_\ast$ , allowing most of the population to migrate from the ground state to the first excited state. Then, the quantum state anneals through the point $A_\ast$ , inverting the populations so the ground population is the majority.

A common feature of our optimized schedules is the 'plateau' in the populations at intermediate times. Naively, this might suggest that the annealing time could be further reduced since the system's energy appears to be 'idle' during the plateau period, but we believe that the relative phase accumulation during this period is crucial to get a high first excited population near the end of the anneal that then transitions to a high ground state population.

In addition to gaining a better understanding of the physics at play, identifying common features of the optimized schedules as shown in figure 6 could serve to develop more informed initialization strategies, or to potentially refine the way the schedules are parameterized. For instance, one would expect that using the patterns identified so far as an initial guess for the schedules to be optimized over (rather than the random initialization adopted in this study) would significantly improve the convergence of our algorithm. We leave this study for future work.

5. Conclusion

Our work shows that we can trade the exponential annealing time required using an adiabatic approach for a polynomial annealing time using an optimized schedule with a non-adibatic evolution on certain problems that are efficient to solve classically. By examining the population levels of a few schedules, we showed that the non-adiabatic nature of the schedules helps the system anneal more quickly into the ground state. Our Ising Hamiltonian (1) is one dimensional, so from a classical optimization perspective it is easy to find the ground state using heuristic algorithms since growing the size of domain walls along the spin chain does not cost additional energy. The problem's exponential hardness for adiabatic QA stems from a perturbative crossing that arises from how the transverse field breaks the degeneracy of the low-lying energy eigenstates. Our work suggests that there are relatively simple annealing schedules that can allow for a polynomial time annealing algorithm to solve this problem. This highlights the importance of sufficient flexibility in the annealing schedule to achieve such an advantage.

However, it should be noted that the saving in annealing time could be offset by the effort spent in optimizing the schedules in the first place, rendering such an approach less attractive. We provide several points of indirect evidence that the cost of optimization remains under control. Most importantly, for the fixed amount of computational resources allocated for each run of the optimization of the algorithm, we were able to optimize schedules for up to N = 39, which we would not expect if the optimization effort was growing exponentially with the system size. Furthermore, we fixed the amount of computational resources when running each schedule optimization and did not see any particular increase in the number of steps of schedules refinement required as the system size increases. This suggests that for this problem Hamiltonian, the cost of the schedule optimization is unlikely to be exponential in the system size.

We can consider following a similar strategy for solving more complex problems. For spin glasses, we can expect our schedule optimization to help us reach the low energy states of the rugged landscape more quickly, but it is unrealistic to imagine that an efficient schedule optimization would allow us to identify the global optimum efficiently. Nevertheless, it may be sufficient to find high quality low-energy solutions quickly depending on the optimization context.

We do not claim that the schedules we have identified are optimal in the sense that they achieve the highest ground state probability for the lowest annealing time. We have observed that different choices of initial schedules can affect the best schedules found by our heuristic algorithm, so the optimization landscape is non-monotonic and includes schedules with different characteristics and possibly some with higher quality than the ones we present. A thorough investigation of this possibility using quantum optimal control for QA [35] would however be limited to small system sizes due to the high computational cost.

It is worth noting that the performance of DQA is likely to be related to the performance of another quantum algorithm for tackling optimization problems, the QAOA [40], in the quantum circuit paradigm. Recent works [31, 41] have highlighted that in the heuristic setting where the QAOA parameters are trained, the algorithm can be understood as implementing a diabatic evolution as in DQA. In fact, it has been shown that a hybrid of the two approaches, whereby the unitary evolution is allowed to be defined by both a continuous interpolating Hamiltonian as well as Hamiltonians that are turned on and off, is in fact optimal for solving optimization problems in this heuristic way [35, 42]. Our results for DQA may therefore indicate that similar behavior might be observed for QAOA.

As identified and discussed in section 4.2, the existence of a shared structure of the optimized schedules across different system sizes could serve as a basis of improved initialization of the algorithm. Such feature has been observed recently in the context of QAOA [31, 41, 43, 44]. Given the difficulties that would be encountered when performing optimization directly based on measurement data—with in particular the flattening of the cost function values [45, 46]—this may prove crucial for realistic optimization of annealing schedules at scale.

There are several aspects of our optimization protocol that could be modified. These include the choice of driver Hamiltonian or intermediate Hamiltonian [38, 47], the initial quantum state, and the way we parameterize the schedule. Instead of using the average energy E(T) for the optimization, we could for example try to minimize different quantiles of the energy distribution generated by the final state. An optimization procedure that includes the cost of identifying the optimal schedule using optimal stopping techniques [29] would verify the viability of such strategies for real world optimization.

Our hope is that such optimization protocols can be used to create targeted schedules for highly-controllable quantum annealers, offering non-trivial operating benchmarks for such devices as well as removing the bottlenecks of the standard adiabatic paradigm when possible.

Acknowledgments

This work was supported by the U.S. Department of Energy (DOE) through a quantum computing program sponsored by the Los Alamos National Laboratory (LANL) Information Science & Technology Institute. L C and M L were supported by the U.S. DOE, Office of Science, Office of Advanced Scientific Computing Research, under the Accelerated Research in Quantum Computing (ARQC) program. F S acknowledges Laboratory Directed Research and Development (LDRD) program of LANL under Project Number 20220745ER. J C was supported by a B2X scholarship from the Fonds de recherche–Nature et technologies and a scholarship from the Natural Sciences and Engineering Research Council of Canada (Funding Reference Number: 456431992). J C also acknowledges the physics PhD program at the Université de Sherbrooke and funding from the Canada First Research Excellence Fund. This material is based upon work supported by the National Science Foundation under Grant No. 2037755. T A thanks Jarrett Smalley for useful discussions.

Data availability statements

The data that support the findings of this study are openly available at the following URL/DOI: https://doi.org/10.5281/zenodo.7613047.

The repository [48] includes: data for the optimized schedules in figure 6, the scaling of the annealing time, the population levels, and a Qiskit implementation of our algorithm, as well as code for calculating the population levels.

Appendix A: Simulating the schedule

For the optimizations in section 4.1, we study system sizes of up to N = 39 qubits. Simulations with such a large number of qubits are possible by numerical routines first proposed in [49] and implemented in [50]. The simulation works at the level of the system's dynamical Lie algebra [51, 52], the Lie closure of the set $\{i\hat{H}_d,i\hat{H}_p\}$ , which in our case scales efficiently (polynomially) in the system size. We perform Heisenberg evolution of the observable $\hat{H}_p$ . Crucially, irrespective of the time T considered and irrespective of the schedule employed, the evolved operator remains contained in the dynamical Lie algebra, and therefore admits an efficient description. That is, by keeping track of the instantaneous support of the observable over a basis of the dynamical Lie algebra, one can assess the energy of the evolved system at scale.

However, the study of the instantaneous population in section 4.2 requires full knowledge of the system over time, and therefore we resort to conventional full-state vector simulations, thus limiting the number of qubits we can simulate. For the example schedule of figure 3 and the population levels in section 4.2, we use Qiskit [53] to simulate the time evolution of the schedule. We write the time-evolved state $|{\psi(T)}\rangle$ at the end of the annealing procedure as

$\begin{align} |{\psi \left(T\right)} \rangle = \hat{U}\left(T\right) |{\psi\left(0\right)}\rangle, \end{align} \tag{ A.1 }$

where $|{\psi(0)}\rangle = |{+}\rangle^{\otimes N}$ , and the unitary $\hat{U}(T)$ is the time-ordered propagator

$\begin{align} \hat{U}\left(T\right) \equiv \mathrm{T}\exp{\left( - i \int_0^T \hat{H}\left(t\right) \mathrm{d}t \right)} . \end{align} \tag{ A.2 }$

We approximate $\hat{U}(T)$ with small time slices $\delta t$ and implement the evolution as a quantum circuit. This means the unitary in (A.2) decomposes as:

$\begin{align} \hat{U}\left(T\right) \approx U\left( T, T-\delta t\right) \dots \hat{U}\left(2\delta t, \delta t\right) \hat{U}\left(\delta t, 0\right) , \end{align} \tag{ A.3 }$

where $\hat{U}(t + \delta t, t)$ is the unitary between times t and $t + \delta t$ . For small enough $\delta t$ , we assume the Hamiltonian $\hat{H}(t)$ is constant in a given interval.

Let us focus our attention on one particular time slice at time $t^*$ , with the associated unitary operator

$\begin{align} \begin{aligned} \hat{U}\left(t^* + \delta t, t^*\right) & \equiv \exp{\left[- i \hat{H}\left(t^*\right) \delta t \right]} \\ & = \exp{\left[- i \delta t \left( A\left(t^*\right) \hat{H}_p + \left(1 - A\left(t^*\right)\right) \hat{H}_d \right)\right]}. \end{aligned} \end{align} \tag{ A.4 }$

Because the terms in $\hat{H}_d$ and $\hat{H}_p$ do not commute (one contains $\hat{Z}$ terms and the other contains $\hat{X}$ terms), we decompose the unitary slice with a first-order Trotter decomposition [54, 55]:

$\begin{align} \begin{aligned} \hat{U}\left(t^* + \delta t, t^*\right) & \approx \exp{\left[- i \delta t \left( - A\left(t^*\right) \sum_{j = 1}^N J_{j} \hat{Z}_j \hat{Z}_{j+1} \right) \right]} \times \exp{\left[- i \delta t \left(- \left(1-A\left(t^*\right)\right) \sum_{j = 1}^N \hat{X}_j \right) \right]}. \end{aligned} \end{align} \tag{ A.5 }$

Within each exponential in (A.5), the operators commute with each other for all i, which allows us to rewrite the sum as a product. We then find the final form for our unitary:

$\begin{align} \begin{aligned} \hat{U}\left(t^* + \delta t, t^*\right) & \approx \prod_{j = 1}^N \exp{\left[- i \delta t \left( - A\left(t^*\right) J_{j} \hat{Z}_j \hat{Z}_{j+1} \right) \right]} \times \prod_{j = 1}^N \exp{\left[- i \delta t \left(- \left(1-A\left(t^*\right)\right) \hat{X}_j \right)\right]}. \end{aligned} \end{align} \tag{ A.6 }$

Equation (A.6) tells us that each time slice of size $\delta t$ requires 2N gates (there are N terms in each product), and the two types of quantum gates are:

$\begin{align} \begin{aligned} \mathrm{RZZ} \left( j, j+1, \theta_Z \right) & \equiv \exp{\left[-i \frac{\theta_Z}{2} \hat{Z}_j \hat{Z}_{j+1} \right]}, \\ \mathrm{RX} \left(j, \theta_X \right) & \equiv \exp{\left[-i \frac{\theta_X}{2} \hat{X}_j\right]}, \end{aligned} \end{align} \tag{ A.7 }$

where in our case the angles are $\theta_Z = -2 \delta t A(t^*) J_j$ and $\theta_X = -2 \delta t (1-A(t^*))$ .

This collection of gates in (A.6) defines a quantum circuit for each time slice. The full evolution involves the same layout of gates over and over again, but with varying θ_Z and θ_X, according to the schedule A(t), and the couplings J_j . In figure A1, we show the circuit according to this layout.

**Figure A1.** The quantum circuit for the evolution given by (A.6) for N = 5. The circles label the physical sites of the ring, and the $\mathrm{RZZ}$ gate at the top and bottom connects sites 1 and N. This is the circuit for one time slice between $t^*$ and $t^* + \delta t$ .
Download figure:
Standard image High-resolution image

**Figure A1.** The quantum circuit for the evolution given by (A.6) for N = 5. The circles label the physical sites of the ring, and the $\mathrm{RZZ}$ gate at the top and bottom connects sites 1 and N. This is the circuit for one time slice between $t^*$ and $t^* + \delta t$ .
Download figure:
Standard image High-resolution image

In order to calculate $E(T) \equiv \langle\psi(T) | \hat{H}_p | \psi(T)\rangle$ , we choose to discretize the unitary evolution operator in terms of time steps $\delta t$ , and the output state is then given by $|{\psi(T, \delta t)}\rangle$ . Calculating the energy then requires sending the discretization to zero:

$\begin{align} E\left(T\right) = \lim_{\delta t \rightarrow 0} \langle\psi\left(T, \delta t\right) | \hat{H}_p | \psi\left(T, \delta t\right)\rangle .\end{align} \tag{ A.8 }$

We do this by building the circuit for multiple values of $\delta t$ , and checking how the energy E(T) changes as we decrease the size of $\delta t$ . We use the same threshold $\delta E$ as in section 3.1 to stop this procedure, giving us our 'converged' estimate of E(T) (see appendix B).

To calculate the quantities from section 4.2, we track the instantaneous quantum state $|{\psi(t)}\rangle$ during the annealing schedule. We also calculate the instantaneous Hamiltonian in the relevant symmetry subspace $\hat{H}^{^{\prime}}(t) = \hat{P}_N \hat{H}(t) \hat{P}_N$ , where $\hat{P}_N = \left( \hat{\unicode{x1D7D9}}_N + \hat{X}^{\otimes N} \right) / 2$ , and $\hat{\unicode{x1D7D9}}_N$ is the identity operator of dimension 2^N. Diagonalizing $\hat{H}^{^{\prime}}(t)$ provides us with $E_0(t)$ and $E_1(t)$ in the relevant symmetry subspace, so we can calculate the relevant energy gap for our dynamics.

In order to calculate the instantaneous populations of a given energy eigenstate, we find the eigenvectors $|{E_k(t)}\rangle$ of $\hat{H}^{^{\prime}}(t)$ , with k indexing the energy levels. We then define the population level as:

$\begin{align} P_k\left(t\right) \equiv \rvert \langle{\psi\left(t\right)}|{E_k\left(t\right)}\rangle \lvert^2, \,\,\, k \in \left\{0, 1, 2, \ldots, 2^N-1 \right\}, \end{align} \tag{ A.9 }$

where $|{E_k(t)}\rangle$ is the eigenstate of the Hamiltonian $\hat{H}^{^{\prime}}(t)$ with energy $E_k(t)$ . The population levels are fractions, so they must sum to one:

$\begin{align} \sum_{k = 0}^{2^{N}-1} P_k\left(t\right) = 1. \end{align} \tag{ A.10 }$

In figure 7, we show $P_0(t)$ and $P_1(t)$ , since they contribute the most to the annealing dynamics.

Appendix B: Implementation details

For our numerical work, we use the minimization procedure implemented in SciPy with the COBYLA method [56]. We set the maximum number of iterations 'maxiter' of the function to be 800, and the tolerance for the gradient 'tol' to be 0.001.

For every numerical experiment, we set $J_R = 0.45$ , $J_L = 0.50$ , and J = 1. We discretize the total annealing time T with an initial step size of $\delta t = 1$ , except for the population levels, where we set the initial $\delta t$ to be 2⁻⁴ (which is small enough such that the final $\delta t$ we have when the algorithm converges is 2⁻⁵).

The thresholds we use to determine when to stop our algorithm are:

(i)
$\delta T = 0.05$ .
(ii)
$c \in \left\{0.10, 0.25, 0.50 \right\}$ .
(iii)
$\delta E = 0.001$ .

Diabatic quantum annealing for the frustrated ring model

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Model

3. Optimizing the annealing schedules

3.1. Optimizing the schedule

3.2. Optimizing the annealing time

4. Results

4.1. Scaling of annealing time

4.2. Population levels

5. Conclusion

Acknowledgments

Data availability statements

Appendix A: Simulating the schedule

Appendix B: Implementation details

Diabatic quantum annealing for the frustrated ring model

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Model

3. Optimizing the annealing schedules

3.1. Optimizing the schedule

3.2. Optimizing the annealing time

4. Results

4.1. Scaling of annealing time

4.2. Population levels

5. Conclusion

Acknowledgments

Data availability statements

Appendix A: Simulating the schedule

Appendix B: Implementation details