Optimal thermometers with spin networks

Paolo Abiuso; Paolo Andrea Erdman; Michael Ronen; Frank Noé; Géraldine Haack; Martí Perarnau-Llobet

doi:10.1088/2058-9565/ad37d3

1. Introduction

Our ability to measure temperature in quantum systems is currently being pushed to new regimes [1–4]. At the experimental level, ultraprecise temperature measurements of gases at the lowest temperatures in the Universe are possible [5, 6], and new methods for thermometry with probes of atomic size are being developed. Relevant examples include nanodiamonds acting as thermometers of living cells [7, 8], nanoscale electron calorimeters based on the absorption of single quanta of energy [9–11], and single-atom thermometry probes [12–14]. At the theoretical level, progress has been made in the understanding of ultraprecise thermometry via quantum probes in equilibrium [15–23] and out-of-equilibrium states [24–34]. Crucially, the energy structure of optimal thermometers has been revealed [35–38], suggesting that the precision can grow quadratically with the number of constituents [39]. There is however still a gap between such theoretical bounds and state-of-the art experimental implementations, which is crucial to address to exploit the full potential of quantum thermometry.

Due to its generality and practical relevance, we consider in this work equilibrium thermometry [4]. In this case, the probe is assumed to be well described by a thermal state at the temperature T that is being estimated. Then, the error ΔT of any measurement on the probe is bounded by [40, 41]:

$\begin{align} \frac{\langle\left(\Delta T\right)^2\rangle}{T^2} \unicode{x2A7E} \frac{1}{\nu\mathcal{C}}\;, \end{align} \tag{ 1 }$

where $\mathcal{C}$ is the heat capacity of the probe, and ν the number of repetitions of the experiment—see section 2 for a precise definition of the quantities involved. Intuitively speaking, a high heat capacity ensures that the energy of the probe highly varies with T, thus enabling the detection of small temperature variations.

An optimal probe for equilibrium thermometry is hence the one with the highest heat capacity. The ultimate limits to this problem were set by Correa et al in [35] by finding the maximum $\mathcal{C}$ given an arbitrary Hamiltonian of dimension D. The spectrum of such an optimal probe consists in an effective two-level system, with a single ground state and an exponential degeneracy of the excited level. The resulting optimal heat capacity reads $\mathcal{C}^\textrm{opt}\simeq (\ln D)^2/4$ . If we consider that the probe consists of N bodies of dimension d (hence $D = d^N$ ), then $\mathcal{C}^\textrm{opt}$ becomes [35, 39]:

$\begin{align} \mathcal{C}^\textrm{opt}\simeq\frac{N^2 \left(\ln d\right)^2}{4}. \end{align} \tag{ 2 }$

This expression shows a quadratic scaling with the number of constituents N, to be confronted with the typical extensive behaviour of the heat capacity (i.e. linear in N). This quadratic scaling is reminiscent of the well-known Heisenberg limit in quantum metrology [42], although it should be realised that the advantage here arises due to the interacting nature of the probe's Hamiltonian, and not from the presence of entanglement in the probe. Mok et al [37] provides a specific N-spin interacting Hamiltonian that can saturate (2), which however requires N-body interactions. A natural question therefore arises:

Q:
Can we reach the ultimate limit (2) via realistic Hamiltonians, i.e. featuring two-body and local interactions?

A natural approach to address Q is to consider probes at the verge of a thermal phase transition, where the heat capacity can scale superextensively with N [43–48]. Previous studies with spin systems close to criticality exemplify the potential of phase transitions for thermometry [45, 46, 48] but do not come close to the ultimate limit (2). For small values of N, proposals for optimal probes have also been considered with spin chains [27, 37] or interacting fermionic systems [49]. Yet, despite promising progress, none of the above approaches leads to a general answer to Q and hence to the possibility of approaching a quadratic precision in quantum thermometry.

To address Q, we consider as a platform a generic system of spins with two-body interactions, such as those currently programmable in quantum annealers. Their open system dynamics is starting to be studied [50–54], and they represent flexible physical devices with a high degree of control. More specifically, we consider a Hamiltonian of the form:

$\begin{align} H = \sum_i^N h_i \sigma^{z}_i + \sum_{i < j}^N J_{ij} \sigma^{z}_i \sigma^{z}_j \;, \end{align} \tag{ 3 }$

where $\sigma_i^z = \pm 1$ is the i-th classical spin of the system⁹ . We then maximise $\mathcal{C}$ over all control parameters h_i and J_ij (with different constraints on their locality and strength). To tackle the exponential complexity of this task, we use advanced numerical techniques, commonly employed in the Machine-Learning community, to discover ansatzes for the form of optimal probes. We then combine these numerical ansatzes with physical insights to analytically prove that $\mathcal{C}$ can display the quadratic scaling of equation (2), with a slightly worse prefactor that depends on the locality of the Hamiltonian (3), thus answering affirmatively Q. These results add on recent applications of Machine-Learning based techniques in the field of quantum thermodynamics [55–62], as well as in other domains, including protein folding [63], many-body problems [64–66], geosciences [67], algorithm discovery [68].

In figure 1, we illustrate the type of results obtained. The heat capacity of any spins system is upper-bounded by the fundamental bound $\mathcal{C}^\textrm{opt}$ (red line, and equation (2) with d = 2), however the maximum heat capacity obtainable with N non interacting spins simply corresponds to N times the maximum heat capacity of a single spin (green line). The use of interactions can enhance $\mathcal{C}$ . Nevertheless, standard interacting spin-networks such as the 1D Ising model in the figure (purple dots) show an extensive scaling of $\mathcal{C}_\textrm{max}$ in the limit of large N, hence losing their advantage. In contrast, we find optimal spin-network architectures (3) that approximate $\mathcal{C}^\textrm{opt}$ for all N, see e.g. the Star model as an example of the architectures discussed in the next sections (blue dots in figure 1).

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Maximum heat capacity $\mathcal{C}_\textrm{max}$ , as a function of $N$ in a log-log plot, of spin-based thermometers. The red line corresponds to the mathematical bound $\mathcal{C}^\textrm{opt}$ (2) on any system with dimension $D = 2^N$ (for a formal definition, see equation (8)), which shows a quadratic $\propto N^2$ scaling in terms of the number N of total spins employed. Our optimal spin-network architecture, the 'Star model', provides the highest heat capacity for Hamiltonians of the form (3) when $N\unicode{x2A7E} 6$ , and can reach the mathematical bound $\mathcal{C}^\textrm{opt}$ (2) in the large N limit. This is to be compared with the extensive $\propto N$ scaling of standard models, such as the 1D Ising chain. The green line delimits the region accessible with non-interacting spins, and simply corresponds to ${\sim}0.44 N$ , 0.44 being the maximum heat capacity of a single spin.
Download figure:
Standard image High-resolution image

**Figure 1.** Maximum heat capacity $\mathcal{C}_\textrm{max}$ , as a function of $N$ in a log-log plot, of spin-based thermometers. The red line corresponds to the mathematical bound $\mathcal{C}^\textrm{opt}$ (2) on any system with dimension $D = 2^N$ (for a formal definition, see equation (8)), which shows a quadratic $\propto N^2$ scaling in terms of the number N of total spins employed. Our optimal spin-network architecture, the 'Star model', provides the highest heat capacity for Hamiltonians of the form (3) when $N\unicode{x2A7E} 6$ , and can reach the mathematical bound $\mathcal{C}^\textrm{opt}$ (2) in the large N limit. This is to be compared with the extensive $\propto N$ scaling of standard models, such as the 1D Ising chain. The green line delimits the region accessible with non-interacting spins, and simply corresponds to ${\sim}0.44 N$ , 0.44 being the maximum heat capacity of a single spin.
Download figure:
Standard image High-resolution image

The rest of the paper is structured as follows. In section 2 we review equilibrium thermometry, and we analyse the fundamental properties of the optimal energy spectra for the maximization of $\mathcal{C}$ . In section 3 we move to the case of physically realistic probes (3), we present the derivation and analysis of our optimal thermometer models, and then discuss their implementation and properties. In section 4 we describe other relevant models which we use for performance comparison. Finally in section 5 we conclude and discuss future directions and applications of this work. The appendix contains details of the numerical methods employed, technical analytical derivations, and complementary analysis of our results. The code written to perform the machine-learning based optimization is available upon request (see the "Data availability statement").

2. Equilibrium thermometry and properties of optimal spectra

Let us consider a sample at some unknown temperature T, corresponding to the inverse temperature $\beta = T^{-1}$ (herafter we set $k_B = 1$ for simplicity). To assess β, we let the sample weakly interact with a probe described by its Hamiltonian H. After a sufficiently long time, it is assumed that the probe will reach a Gibbs state, fully determined by H and β:

$\begin{align} \rho_\beta\left(H\right): = \frac{e^{-\beta H}}{{\textrm{Tr}}\left[e^{-\beta H}\right]}\;. \end{align} \tag{ 4 }$

By measuring the energy of $\rho_\beta(H)$ , it is possible to infer β (hence the temperature T). Let us note that projective energy measurements were shown to be optimal for temperature estimation [35, 41]. In particular, the Cramer–Rao bound [69] specific to the case of temperature estimation at equilibrium [3, 4] can be exploited to estimate the minimal error ΔT. More precisely, for a number ν of identically and independently distributed (i.i.d.) repetitions of the experiment, ΔT has a mean square value that is bounded by equation (1), that is $\frac{\langle(\Delta T)^2\rangle}{T^2} \unicode{x2A7E} (\nu\mathcal{C})^{-1}$ . It is therefore clear that the maximum precision one can get in estimating the temperature T by measuring the energy of the probe at equilibrium with the sample is determined by the heat capacity $\mathcal{C}$ . This is formally defined as the variation in mean energy of the probe per temperature change unit, i.e.

$\begin{align} \mathcal{C}\left(H,\beta\right): = \frac{{\textrm{d}}}{{\textrm{d}} T}{\textrm{Tr}}\left[H\rho_\beta\left(H\right)\right] = -\beta^2\frac{{\textrm{d}}}{{\textrm{d}}\beta}{\textrm{Tr}}\left[H\rho_\beta\left(H\right)\right]\;. \end{align} \tag{ 5 }$

In terms of the eigensystem $\{E_i, \vert E_i\rangle \}$ of the probe's Hamiltonian H, the state populations of $\rho_\beta(H)$ read $p_i\equiv Z_\beta^{-1} e^{-\beta E_i}$ , with $Z_\beta = \sum_i e^{-\beta E_i}$ and $\rho_\beta = \sum_i p_i \vert E_i\rangle \langle E_i\vert$ . In the energy eigenbasis of the probe, it is easy to verify from equation (5) that the heat capacity is proportional to the energy variance of the Gibbs state:

$\begin{align} \mathcal{C} \left(H,\beta\right) = & \beta^2\Delta_\beta^2 H\;, \end{align} \tag{ 6 }$

$\begin{align} \Delta_\beta^2 H = & \sum_{i = 1}^D p_i E_i^2 - \left(\sum_{i = 1}^D p_i E_i\right)^2, \end{align} \tag{ 7 }$

where D is the dimension of the Hilbert space. Such expression clarifies that the heat capacity only depends on the spectrum of the Hamiltonian H and inverse temperature β. It is important for the following to note the scale invariance $\mathcal{C}(\lambda H,\lambda^{-1}\beta) = \mathcal{C}(H,\beta)$ , with $\lambda \in \mathbb{R}$ . This allows us to express all energies in units of β, and to simply refer to the heat capacity as a function of a adimensional Hamiltonian $\tilde{H}: = \beta H$ , as $\mathcal{C}(\tilde{H},1) = \mathcal{C}(H,\beta)$ . In the following, we will omit the tilde and simply use adimensional units, writing $\mathcal{C}(H): = \mathcal{C}(H,1)$ . We also emphasize that a global energy shift does not affect neither the Gibbs state nor the heat capacity as $\mathcal{C}(H) = \mathcal{C}(H+c\unicode{x1D7D9}), c \in \mathbb{R}$ .

2.1. Optimal spectrum for equilibrium thermometry

From equation (1), we see that an optimal probe for thermometry is the one with maximum heat capacity $\mathcal{C}$ . The maximization of $\mathcal{C}$ of a generic D-dimensional system at thermal equilibrium has been carried out in [35] assuming full-control on the Hamiltonian and its spectrum,

$\begin{align} \mathcal{C}^\textrm{opt}\left(D\right): = \max_{H | \textrm{dim }H = D} \mathcal{C} \left(H\right). \end{align} \tag{ 8 }$

The resulting optimal spectrum consists of a single ground state and a $(D-1)$ -degenerate excited state, that is

$\begin{align} H_\textrm{deg} = 0\vert 0\rangle \langle 0\vert +\sum_{i = 1}^{D-1} E\vert i\rangle \langle i\vert \;, \end{align} \tag{ 9 }$

with an optimal gap E = x in temperature units that satisfies the transcendental equation $e^x = (D-1)(x+2)/(x-2)$ . The corresponding heat capacity is $\mathcal{C}^\textrm{opt}(D) = x^2e^x (D-1)/(D-1+e^x)^2$ [35]. This expression gives in the asymptotic regime of large probes ( $D\rightarrow\infty$ ) $x \simeq \ln D$ , hence $\mathcal{C}^\textrm{opt}(D) \simeq (\ln D)^2/4$ . For a probe made up of N constituents, each with local dimension d, so that $D = d^N$ , we recover the Heinsenberg-like scaling in equation (2).

2.2. Properties of optimal spectra

In order to understand the origin of the desired scaling $\mathcal{C} \propto N^2$ , we now discuss the relevant features of the spectrum equation (9) in its optimal configuration, as well as possible perturbations of it. This will be relevant for cases in which a physical realization (e.g. using equation (3)) can approximate equation (9), but not exactly. Specifically, we prove that a large class of spectra can exhibit the Heisenberg-like scaling $\propto N^2$ of the heat capacity, when the following 3 properties are satisfied:

P1: exponential degeneracy. The spectrum has a two-level structure with single ground state, and a first excited level that is exponentially degenerate (in N), with a gap that can be tuned.

P2: bandwidth tolerance. The engineering of the effective two-level spectrum, and in particular of the bandwidth of the first excited level, can tolerate a relative precision of $\mathcal{O}(1/N)$ .

P3: tolerance to additional energy levels. The presence of other energy levels does not necessarily deteriorate the maximal value of $\mathcal{C}$ and its scaling. In particular: i) high energy levels (i.e. above the first excited level) do not decrease the maximal heat capacity, while ii) energy levels below the first excited have an exponentially small contribution to the heat capacity, provided that their total degeneracy is (at most) polynomial in N, and their gap to the ground level increases (at least) linearly in N.

A schematic representation of the class of spectra satisfying the above three properties is given in figure 2. We now provide an intuitive understanding of these properties.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** (Left) The idealized model $H_\textrm{deg}$ (9). (Right) we prove that any Hamiltonian featuring a spectrum of the form respecting properties **P1-P2-P3** (see details in text) can exhibit a $\propto N^2$ scaling of the maximal heat capacity.
Download figure:
Standard image High-resolution image

**Figure 2.** (Left) The idealized model $H_\textrm{deg}$ (9). (Right) we prove that any Hamiltonian featuring a spectrum of the form respecting properties **P1-P2-P3** (see details in text) can exhibit a $\propto N^2$ scaling of the maximal heat capacity.
Download figure:
Standard image High-resolution image

The importance of the exponential degeneracy of the first excited state (P1) can be appreciated from the degenerate model equation (9) and its corresponding ground state probability for the Gibbs state (in units of β), $p_0 = (1+(D-1)e^{- E})^{-1}$ , which can be expressed as

$\begin{align} p_0 = \left(1+\textrm{Exp}\left[\ln{\left(D-1\right)}-E\right]\right)^{-1} \end{align} \tag{ 10 }$

For small energy gaps E and large D, the value of p₀ tends to 0 as ${\sim}\textrm{Exp}[E]/(D-1)$ meaning that in the thermal state, almost all the population is spread evenly in the degenerate excited subspace. When the gap reaches $E\sim (\ln (D-1))$ , $p_0 = \frac{1}{2}$ , while for larger values it increases to ${\sim}1$ , and the excited levels become empty. The width of this transition is of order ${\sim}\mathcal{O}(1)$ , and it is the point where the system experiences the peak in heat capacity; in fact, for smaller (larger) values of E, the energy variance is suppressed exponentially, given that the whole population collapses to the excited subspace (ground state). At the peak of the heat capacity, approximately half of the population is in the ground state, and half is spread in the degenerate level. If the degeneracy $(D-1) = d^N-1$ is exponential in N, the optimal gap is linear in N, and the resulting energy variance equation (7) scales quadratically. According to this observation, the exponential (in N) degeneracy of the first excited level is the first main ingredient for a system to exhibiting such quadratic scaling of the heat capacity. Furthermore, we notice that at a formal level, the same scaling is obtained whenever $D\propto d^{^{\prime}} N$ for some $d^{^{\prime}}\gt1$ , which leads to P1. Notice however that any physical implementation of such a conceivably highly fine-tuned two-level probe will be susceptible to noise. The resulting deviation will cause a broadening of the ideally degenerate excited level into a band. In appendix C, we prove that the optimal scaling of $\mathcal{C}$ is preserved as long as the error in the energy gap between the ground state and the first excited state (including the broadening of level band) is of order $\mathcal{O}(1)$ . This is to be contrasted with an optimal gap $E\propto N$ that scales linearly, thus requiring a relative precision of $1/N$ in the engineering of the energy levels (P2).

Finally, it is possible to show that the quadratic scaling of the heat capacity is preserved even in the presence of additional 'undesired' energy levels, provided that property P3 is satisfied.

More precisely, consider two Hamiltonians, H₁ with dimension $1+k_1$ , and H₂ with dimension $1+k_1+k_2$ . H₁ has 1 ground state and a k₁-degenerate excited state, $H_1 = 0\vert 0\rangle \langle 0\vert +\sum_{i = 1}^{k_1} E \vert i\rangle \langle i\vert$ , while H₂ has the same spectrum and additional k₂ excited states above, $H_2 = H_1+ \sum_{\alpha = k_1+1}^{k_1+k_2} E_\alpha \vert \alpha\rangle \langle \alpha\vert$ , with $0\unicode{x2A7D} E\unicode{x2A7D} E_\alpha\; \forall\alpha$ . Assuming control over the first excited gap E, we prove in appendix B that the maximal achievable heat capacity with H₂ is always larger than the maximal achievable heat capacity with H₁,

$\begin{align} \max_{E}\mathcal{C}\left(H_1\right)\unicode{x2A7D} \max_{E\unicode{x2A7D} E_\alpha} \mathcal{C}\left(H_2\right)\;. \end{align} \tag{ 11 }$

This property guarantees that additional excess levels above the k₁-degeneracy of H₁ can only increase the maximal heat capacity. As a consequence, as the system size grows, any model featuring an exponential degeneracy of the first excited level and a tunable gap will show the desired Heisenberg-like scaling of the heat capacity. The control over E, while keeping $E_\alpha\unicode{x2A7E} E\; \forall\alpha$ , can easily be obtained, for example by rescaling all the parameters of H₁ or H₂ globally. For what concerns additional levels below the first excited, it is enough to notice that if their total number is of order $\mathcal{O}(N^k)$ for finite k, and their gap from the ground state energy is bounded between NK and $N\ln d^{^{\prime}}$ ( $0\lt K\lt\ln d^{^{\prime}}$ ), their total contribution to the variance (7) scales as $\mathcal{O}(N^{k+2} \exp[-\beta NK])$ , and is therefore suppressed for large N.

3. Optimal spin-network thermometers

We recall that, without any restriction on the possible interactions among the N spins, it is possible to generate the Hamiltonian equation (9) and to saturate the theoretical maximum value $\mathcal{C}^\textrm{opt}$ of the heat capacity (see e.g. [37], where the authors make use of arbitrary N-body interactions). The question Q we address in this work is whether it is possible to achieve the optimal scaling $\mathcal{C} \propto N^2$ if we restrict ourselves to physically motivated 2-body Hamiltonians given by equation (3). In such spin-systems, we have $D = 2^N$ , where N is the total number of spins, thus the ultimate limit equation (2) reads

$\begin{align} \mathcal{C}^\textrm{opt}\left(2^N\right) \simeq \frac{N^2 \left(\ln 2\right)^2}{4}\;, \quad \beta E\simeq {N}{\ln 2}\;, \end{align} \tag{ 12 }$

for large N. Below, we demonstrate that the answer to our main question is positive. We show that it is possible to design a thermal probe ('Star model', section 3.1) consisting of N interacting spins with two-body interactions that approximates the maximum value $\mathcal{C}^\textrm{opt}$ of the thermal sensitivity equation (12). We further prove that a thermal probe ('Star-chain model', section 3.2) with two-body and local interactions can be designed with a heat capacity exhibiting the same scaling as equation (12) with a prefactor that can be made arbitrarily close to the Star model. Moreover, in section 3.3 we show that the Star-chain model can be realized on currently available quantum annealers. Finally, in section 3.4 we analyze the scaling of the Hamiltonian parameters in these configurations, and the effect of constraints on the absolute value of the parameters.

3.1. Star model

We now search for thermal probes, consisting of spin networks with two body interactions, that maximize the heat capacity. We maximized $\mathcal{C}(H)$ over the parameters h_i and J_ij employing equation (7) and constraining H to be of the form (3). Notice that such problem is numerically hard due to (i) its nonconvexity, (ii) the number of optimization parameters that scales quadratically in N, and (iii) the number of spin configurations that scales exponentially. As such, first attempts based on simpler techniques such or gradient descent with momentum [70] estimating the gradients with finite-differences, and the gradient-free covariance matrix adaptation evolution strategy [71], would get stuck in sub-optimal local maxima with a substantially lower $\mathcal{C}$ , not exhibiting the Heisenberg-like scaling. We thus decided to use tools commonly employed in Machine Learning, i.e. we implemented the optimization in PyTorch that allows us to compute the exact gradients of the negative heat capacity using backpropagation [72], and we used the Adam optimizer [73] (see appendix A for details).

After repeating the optimization for different total numbers of spins N, a recurrent pattern emerges (cf appendix A and figure 3), corresponding to a 'Star model' Hamiltonian of the form

$\begin{align} H_{\text{Star}\left[N\right]}\left(a,b\right) &: = a \,\sigma_1^z + b \sum_{i = 2}^N \sigma_i^z \left(\unicode{x1D7D9} + \sigma_1^z\right) \;, \end{align} \tag{ 13 }$

with $a, b \in \mathbb{R}$ , corresponding to a single spin ( $\sigma_1^z$ ) that is coupled uniformly to all the other ones. A representation of this Star model is shown in figure 3.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** (a), (b): Machine learned Hamiltonian parameters (3), for T = 1 and N = 7. (a) shows the local field *h_i* as a function of the spin index, while (b) shows the J_ij parameters (color) as a function of the site indices i and j. The resulting model that emerges, sketched in (c), is $H_\textrm{Star}$ (13). It consists of a single central spin (corresponding to spin nr. 2 in (a), (b), and orange circle in (c)) with a different local magnetic field and that interacts (gray lines) with all the other N − 1 spins homogeneously (black circles) resulting in a Star-shaped connectivity.
Download figure:
Standard image High-resolution image

**Figure 3.** (a), (b): Machine learned Hamiltonian parameters (3), for T = 1 and N = 7. (a) shows the local field *h_i* as a function of the spin index, while (b) shows the J_ij parameters (color) as a function of the site indices i and j. The resulting model that emerges, sketched in (c), is $H_\textrm{Star}$ (13). It consists of a single central spin (corresponding to spin nr. 2 in (a), (b), and orange circle in (c)) with a different local magnetic field and that interacts (gray lines) with all the other N − 1 spins homogeneously (black circles) resulting in a Star-shaped connectivity.
Download figure:
Standard image High-resolution image

The resulting spectrum has 2 main classes of eigenstates. The first class consists of $\binom{N-1}{k}$ -degenerate evenly spaced states with energy

$\begin{align} E_k = a + 2b\left(k-\left(N-1-k\right)\right),\quad \text{for }k = 0,\dots,N-1\;, \end{align} \tag{ 14 }$

corresponding to the first spin being up, i.e. $\sigma_1^{z} = 1$ , and k spins up among the remaining N − 1 ones. The second class consists of the first spin being down $\sigma^z_1 = -1$ . In this the second term in (13) becomes null, independently of the value of all the other spins $i = 2,\dots,N$ , and we get a $2^{N-1}$ -degenerate excited state with energy

$\begin{align} E_\text{deg} = -a\;. \end{align} \tag{ 15 }$

That is, thanks to the simple topology and choice of the couplings in equation (13), the first spin $\sigma^z_1$ acts as an 'on-off' switch for the effective magnetic field on the remaining spins, generating an exponential degeneracy of the $E_\textrm{deg}$ level. The partition function of the Star model can be solved analytically, being the sum of the two partition functions corresponding to $\sigma_1^z = \pm 1$ , i.e.

$\begin{align} Z_\textrm{Star} = \left(e^{2\beta b}e^{-\beta a}+e^{-2\beta b}\right)^{N-1}+2^{N-1}e^{\beta a}\;. \end{align} \tag{ 16 }$

This expression can be used to efficiently compute all the relevant thermodynamic quantities of the model (cf appendix D.1). Moreover, it is easy to see that by choosing b > 0 and $b(N-3)\unicode{x2A7D} a \lt b(N-1)$ , one ensures

$\begin{align} E_0 < E_\textrm{deg}\unicode{x2A7D} E_k \quad \text{for } k = 1,\dots, N-1\;, \end{align} \tag{ 17 }$

corresponding to a single ground state, and $2^{N-1}$ -degenerate first excited level. By saturating $b(N-3) = a$ , one gets $E_\textrm{deg} = E_1$ , corresponding to a $2^{N-1}+N-1$ degeneracy for the first excited state¹⁰ (for a visual representation, see figure 4). Notice that property P3 (11) ensures that such a model can achieve at least the heat capacity $\mathcal{C}^\textrm{opt}(2^{N-1})$ , that is

$\begin{align} \mathcal{C}^\textrm{opt}\left(2^{N-1}\right)\unicode{x2A7D} \mathcal{C}_\textrm{max}^{\textrm{Star}\left[N\right]} \unicode{x2A7D} \mathcal{C}^\textrm{opt}\left(2^N\right)\;. \end{align} \tag{ 18 }$

In the asymptotic limit, we get

$\begin{align} \mathcal{C}_\textrm{max}^{\textrm{Star}\left[N\right]}\gtrsim \frac{\left(N-1\right)^2\left(\ln 2\right)^2}{4}\,, \end{align} \tag{ 19 }$

which becomes indistinguishable from the theoretical bound $\mathcal{C}^\textrm{opt}(2^N)$ , see equation (12) and shown in figures 1 and 9 below. In appendix A.3, we provide a table with the optimal values of the Hamiltonian parameters $a,b$ (see equation (13)) and the corresponding value of $\mathcal{C}^\textrm{Star[N]}_\textrm{max}$ given by numerical optimization.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Spectrum of the Star model (13) (for N = 9). $2^{N-1}$ eigenvalues form a binomial spectrum (14), while the other $2^{N-1}$ eigenvalues are completely degenerate (15). The gap between the ground energy and the exponentially degenerate level approximately coincides with the optimal gap of the ideal spectrum $H_\textrm{deg}$ (9) for $D = 2^{N-1}$ [35] (red line), as expected from the discussion in section 3.1.
Download figure:
Standard image High-resolution image

**Figure 4.** Spectrum of the Star model (13) (for N = 9). $2^{N-1}$ eigenvalues form a binomial spectrum (14), while the other $2^{N-1}$ eigenvalues are completely degenerate (15). The gap between the ground energy and the exponentially degenerate level approximately coincides with the optimal gap of the ideal spectrum $H_\textrm{deg}$ (9) for $D = 2^{N-1}$ [35] (red line), as expected from the discussion in section 3.1.
Download figure:
Standard image High-resolution image

3.2. Star-chain model

As the Star model arises from an unconstrained numerical optimization of $\mathcal{C}$ for Hamiltonians of the form (3) (cf above section 3.1 and appendix A), we conjecture it to be the global optimum for such a class. However, the star-shaped connectivity of equation (13) (figure 3) cannot be scaled to arbitrarily large number of constituents as it has long-range interactions. This motivates us to restrict the star-shaped connectivity to short-range interactions only, given rise to the hereafter named 'Star-chain model'. Specifically, inspired by the Star model, we consider $N = n(m+1)$ spins as sketched in figure 5, described by the Hamiltonian

$\begin{align} H_{\text{Star-chain}\left[N = m\left(n+1\right)\right]}\left(a,J,b\right): = a \sum_\alpha \sigma^z_\alpha + J \sum_\alpha \sigma^z_\alpha \sigma^z_{\alpha+1} + b \sum_{\alpha,i} \left(\sigma^z_\alpha+\unicode{x1D7D9}\right) \sigma^z_{\alpha,i}\;. \end{align} \tag{ 20 }$

Here α is the index identifying the central spin of each Star-like sub-unit (orange circles in the figure), while $(\alpha,i)$ selects the i-th spin in each sub-unit (black circles), i.e.

$\begin{align} \alpha = 1,\dots,n\;, \quad i = 1,\dots,m \;. \end{align} \tag{ 21 }$

Short-range interactions are guaranteed by considering a fixed value of m. The partition function of the Star-chain model can be computed analytically (cf appendix D.3),

$\begin{align} Z_\text{Star-chain} & = \lambda^{n}_{-}+\lambda^{n}_{+}\;, \nonumber\\ \lambda_{\pm} & = \frac{2^{m-1}}{AC} \! \left( C^2\left(A^2 \!+\! B^m\right) \!\pm\! \sqrt{4A^2 B^m \!+\! C^4\left(A^2 \!-\! B^m\right)^2} \right)\! , \end{align} \tag{ 22 }$

where $A = e^{\beta a}$ , $B = \cosh{(2\beta b)}$ , and $C = e^{\beta J}$ . From the spectral point of view, this model guarantees a $2^{n_\downarrow m}$ degeneracy for each energy level with $n_{\downarrow}$ down α-spins. In particular, when all the α-spins are down, i.e. $\sigma^z_\alpha = -1\ \forall \alpha$ , corresponding to a 2^mn degeneracy. Moreover, if the couplings J are negative and strong enough to force all the n α-spins to be the same (for a detailed analysis of the needed coupling strengths, see section 3.4 and appendix C), one is left with two configurations only, $\sigma_\alpha = \pm 1 \ \forall \alpha$ . The case $\sigma_\alpha = 1$ corresponds to an evenly spaced, binomially distributed spectrum $E = na + 2b \sum_{\alpha,i}\sigma_{\alpha,i}$ , while the case $\sigma_\alpha = -1$ corresponds to $E = -na$ with degeneracy 2^nm. It should be noticed that these configurations effectively lead to the same spectrum as the one of the Star model. More precisely, while the Star spectrum consists in $2^{N-1}$ states with binomial spectrum (14) and other $2^{N-1}$ states that are completely degenerate, see equation (15), the Star-chain has, in the limit of large −J, 2^mn states with a binomial spectrum, 2^mn degenerate states, and $(2^{n}-2)2^{mn}$ remaining arbitrarily high energy levels that can be neglected as justified earlier in this work. Property 2 (11) then ensures that the Star-chain model can exhibit a heat capacity at least as large of that of a system having 1 ground state and a $2^{mn}+mn$ -fold degenerate first excited level. This spectrum is achieved by choosing

$\begin{align} -na = na-2bmn+4b \rightarrow b\left(mn-2\right) = na \;, \end{align} \tag{ 23 }$

which leads to $\mathcal{C}^{\text{Star-chain}[N]}_\textrm{max}\sim (\ln (2^{mn}+mn))^2/4$ This shows that $\mathcal{C}$ is essentially quadratic in $N = n(m+1)$ , i.e.

$\begin{align} \mathcal{C}^{\text{Star-chain}\left[N\right]}_\textrm{max}\gtrsim \left(\frac{m\ln 2 }{2\left(m+1\right)}\right)^2 N^2\;. \end{align} \tag{ 24 }$

Equation (24) makes clear how large values of m increase the achievable heat capacity, (see appendix A.3 for the optimal values of $\mathcal{C}$ and the corresponding Hamiltonian parameters). For $m = N-1$ , the Star-chain model coincides with the Star model (n = 1, cf figures 3 and 5). Let us note that short-range interactions impose a maximum m, but this one only impacts the prefactor of the quadratic scaling. Hence, it does not change the quadratic scaling of $\mathcal{C}$ demonstrated with the Star-chain model.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** Representation of the Star-chain model (20). The total number of spins is $N = n(m+1)$ . The orange circles represent the α-spins, coupled to each other through the blue lines, while the black circles represent the $(\alpha,i)$ -spins coupled to their respective α-spin through the gray lines. The values of the local fields *h_i* and coupling terms J_ij are reported in the figure.
Download figure:
Standard image High-resolution image

**Figure 5.** Representation of the Star-chain model (20). The total number of spins is $N = n(m+1)$ . The orange circles represent the α-spins, coupled to each other through the blue lines, while the black circles represent the $(\alpha,i)$ -spins coupled to their respective α-spin through the gray lines. The values of the local fields *h_i* and coupling terms J_ij are reported in the figure.
Download figure:
Standard image High-resolution image

3.3. Implementation in the Chimera graph

Quantum annealers are devices governed by programmable quantum spin Hamiltonians, therefore representing a natural platform to test our findings. Interestingly, the topology of the interactions of the Star-chain model with m = 3 (cf figure 5) can be embedded into the Chimera graph (cf figure 6) of the D-Wave annealing quantum processor^¹¹. This means that, as from (24), a programmable spin network in the Chimera graph can reach at least $\sim\frac{m^2}{(m+1)^2} = 9/16$ of the ultimate bound $\mathcal{C}^\textrm{opt}(2^N)$ (12). Remarkably, numerical optimization of $\mathcal{C}$ for the Chimera model results indeed in the Star-chain model with m = 3 represented in figure 6 (see appendix A.5). Notice also that there exists new architectures of the D-Wave annealers, such as the Pegasus graph [74–76], which can reach higher connectivities, and therefore higher values of m for which the Star-chain Hamiltonian can be embedded. Such optimal thermometer probes could be used, for example, to precisely measure the surrounding effective temperature of the annealer (to be compared with the cryostat temperature), and overall to gain a better understanding of the D-Wave annealer as an open quantum system [50, 51, 77, 78].

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Embedding of the Star-chain model for m = 3 (see figure 5) into the Chimera graph, which is used by D-Wave Systems¹¹ . As in figure 5, orange circles represent the α-spins, coupled to each other through the blue lines, while black circles represent the $(\alpha,i)$ -spins coupled to the respective α-spin through the gray lines. Dotted lines represent unused couplings of the Chimera architecture (i.e. where $J_{ij} = 0$ ).
Download figure:
Standard image High-resolution image

**Figure 6.** Embedding of the Star-chain model for m = 3 (see figure 5) into the Chimera graph, which is used by D-Wave Systems¹¹ . As in figure 5, orange circles represent the α-spins, coupled to each other through the blue lines, while black circles represent the $(\alpha,i)$ -spins coupled to the respective α-spin through the gray lines. Dotted lines represent unused couplings of the Chimera architecture (i.e. where $J_{ij} = 0$ ).
Download figure:
Standard image High-resolution image

Let us emphasize a specificity of both the Star and Star-chain models, that may become relevant for practical applications. For both models, it is enough to measure a single spin to perform temperature estimation. In the regime of large N, the only relevant energy levels contributing to the Gibbs state are the ground level, and the first excited level (higher excited levels are exponentially suppressed in the statistics, cf appendix D.2 and previous discussions). We can distinguish between these two cases by simply measuring the value of $\sigma^z_1$ for the Star model (13), or any of the $\sigma^z_\alpha$ in the Star-chain model (20).

3.4. Scaling and constraints on the strength of the interactions

While the results presented above are very promising, one challenging requirement of the optimal configurations is the strength of the interactions between the constituents. This one becomes increasingly demanding for large N. For instance, engineering the optimal spectrum for the Star model with a first-excited state degeneracy $2^{N-1}+N-1$ requires the scaling $b\propto N$ and $a\propto N^2$ as N grows (see 'unconstrained' dots in figure 7(a)), accompanied by a relative precision $\propto N^{-2}$ for both parameters (cf section 3.1 and appendix C).

Figure 7. Refer to the following caption and surrounding text. — **Figure 7.** Parameter scaling of the Star model (a) and Star-chain model for m = 3 (b) in their configurations that maximise $\mathcal{C}$ (we obtain nearly identical values of $\mathcal{C}$ for both cases). In the 'unconstrained' case (empty circles), b increases linearly with N, and a quadratically. This corresponds to the optimal choice $a = b(N-3)$ of section 3.1. In the 'constrained' case (full circles), it is possible to find solutions in which b is limited by a constant, while a increases linearly. These solutions preserve the same numerical value of $\mathcal{C}$ , and are found optimizing the heat capacity over a and b, and choosing as initial point for the optimization b = 2.2 and $a = 2N-3$ .
Download figure:
Standard image High-resolution image

However, as we show in appendix D.1, there exist solutions that are mathematically sub-optimal but numerically indistinguishable in terms of $\mathcal{C}$ , with much more favorable scaling of the Hamiltonian parameters. In fact, even when limiting b to be bounded by a constant, it is possible to achieve the desired quadratic scaling of $\mathcal{C}$ , arbitrarily close to the optimal value equation (19). These solutions feature a finite b, whose precise value becomes irrelevant, and a linear scaling of $a\propto N$ , which admits a relative precision $\propto N^{-1}$ (see appendices C, D.1 and 'constrained' dots in figure 7(a)). Similarly, the Star-chain model features solutions in which the scaling of its $a,b$ parameters is bounded, while J scales linearly (see figure 7(b)). As seen in figure 7, for reasonable sizes of the thermal probe up to 50 spins, these scaling induce Hamiltonian parameters of moderate strength, both for the Star and Star-chain model.

Finally, it is possible to use the machine learning optimization method to maximize $\mathcal{C}$ over all possible Hamiltonians of the form equation (3) with the additional constraint for the parameters $\{h_i,J_{ij}\}$ to be bounded (see appendix A.2 for details). Our numerical optimization leads to configurations that are too complex and case-dependent to be discussed in generality. However, the resulting maximal heat capacities seem to indicate that a quadratic scaling is still possible under such constraints, see figure 8.

Figure 8. Refer to the following caption and surrounding text. — **Figure 8.** Comparison of the heat capacity, as a function of N, on a linear scale (a), and on a log-log scale (b). The 'bound' curves represent numerical maximizations of the general spin Hamiltonian (3), where all parameters are constrained to be in a certain interval, i.e. $h_i, J_{ij}\in [-c,c]$ , with c shown in the legend. The red and black lines are reference quadratic and linear scalings.
Download figure:
Standard image High-resolution image

**Figure 8.** Comparison of the heat capacity, as a function of N, on a linear scale (a), and on a log-log scale (b). The 'bound' curves represent numerical maximizations of the general spin Hamiltonian (3), where all parameters are constrained to be in a certain interval, i.e. $h_i, J_{ij}\in [-c,c]$ , with c shown in the legend. The red and black lines are reference quadratic and linear scalings.
Download figure:
Standard image High-resolution image

4. Comparison to alternative models

In figure 9, we compare the maximum values of the heat capacity $\mathcal{C}$ for different models of spin Hamiltonians, as the number of spins N grows. The Star and Star-chain models show a quadratic scaling in N that eventually surpasses all standard models—such as the Ising model in 1D, as well as a model of uniform 'all-to-all' interactions. The latter show instead the standard thermodynamic extensive scaling, i.e. linear in N, of the heat capacity. Below, we briefly describe each of the relevant alternative models to which our results obtained for the Star and Star-chain models have to be compared.

Figure 9. Refer to the following caption and surrounding text. — **Figure 9.** Performance of the optimal spin-based thermometers found in this work as a function of $N$ . Our optimal architectures, 'Star model' and 'Star-chain model' demonstrate a quadratic $\propto N^2$ scaling of the maximal heat capacity $\mathcal{C}$ in terms of the number N of total spins employed. This is to be compared with the extensive $\propto N$ scaling of standard models, such as the 1D Ising chain, or the All-to-all model described in the text. For $N\unicode{x2A7E} 6$ (see the inset for the heat capacity zoomed around small values of $N$ ), the Star model provides the highest heat capacity for Hamiltonians of the form (3) and can reach the mathematical bound $\mathcal{C}^\textrm{opt}$ (8) (red line in the plot) in the large N limit. The Star-chain model has a similar behaviour while only using short-range interaction, and it can be programmed on current quantum annealers (cf section 3.3). A detailed description and discussion of all the models we consider is given in the text.
Download figure:
Standard image High-resolution image

**Figure 9.** Performance of the optimal spin-based thermometers found in this work as a function of $N$ . Our optimal architectures, 'Star model' and 'Star-chain model' demonstrate a quadratic $\propto N^2$ scaling of the maximal heat capacity $\mathcal{C}$ in terms of the number N of total spins employed. This is to be compared with the extensive $\propto N$ scaling of standard models, such as the 1D Ising chain, or the All-to-all model described in the text. For $N\unicode{x2A7E} 6$ (see the inset for the heat capacity zoomed around small values of $N$ ), the Star model provides the highest heat capacity for Hamiltonians of the form (3) and can reach the mathematical bound $\mathcal{C}^\textrm{opt}$ (8) (red line in the plot) in the large N limit. The Star-chain model has a similar behaviour while only using short-range interaction, and it can be programmed on current quantum annealers (cf section 3.3). A detailed description and discussion of all the models we consider is given in the text.
Download figure:
Standard image High-resolution image

4.1. Ising lattices

The 1D Ising model is arguably the simplest candidate for an interacting-spin thermometry probe. For N spins, it is defined by the Hamiltonian

$\begin{align} H_{1D} \left(\vec{h},\vec{J}\right) : = -\sum_{i = 1}^N h_i \sigma_i^z - \sum_{i = 1}^N J_i \sigma_i^z \sigma_{i+1}^z, \end{align} \tag{ 25 }$

where we choose periodic boundary conditions $\sigma_{N+1}\equiv\sigma_1$ . The heat capacity for this model can be efficiently computed with standard techniques [79]. Numerically maximization leads consistently to homogeneous interactions $J_{i} = J$ and local fields $h_i = h$ . As expected, an Ising chain probe will at most achieve a linear scaling in N of the heat capacity, as seen in figure 9. Note that a 2-dimensional Ising model can achieve a slightly higher scaling at criticality, i.e. $\mathcal{C}_\textrm{max}\propto N\ln N$ [80, 81], while the 3-dimensional Ising model has $\mathcal{C}_\textrm{max}\propto N^{1.058}$ using critical scaling [82].

4.2. All-to-all symmetric model

Another relevant model for this work is a model with all-to-all interactions, completely symmetric under permutations. Its Hamiltonian takes then the form

$\begin{align} H_\text{All}\left(h, J\right) : = -h \sum_{i = 1}^N \sigma_i^z -J \sum_{i < j} \sigma_i^z \sigma_j^z\,. \end{align} \tag{ 26 }$

It describes a complete graph with homogeneous interactions J > 0 and local fields h > 0. Taking the systems' symmetries into account, we get the following $\binom{N}{k}$ -degenerate eigenenergies for $k\unicode{x2A7D} N$ up-spins:

$\begin{align} E_k = h\left(N-2k\right) +\frac{J}{2}\left[ 4k\left(N-k\right) - N\left(N-1\right) \right]. \end{align} \tag{ 27 }$

As shown in appendix E, the all-to-all model shows a 'large degeneracy' of the first excited level for small N. It is remarkable that for $N\unicode{x2A7D} 5$ , the all-to-all model appears to have the highest heat capacity among the models we investigated (see inset in figure 9) and consistently emerged from numerical optimisation in the same regime (cf appendix A). However, the degeneracy of the first excited state increases linearly in N, to be contrasted with the exponential increase of the Star and Star-chain models. This ultimately leads to a linear scaling of the heat capacity of the symmetric all-to-all model for large N.

4.3. k-SAT model and the exponential degeneracy

Finally, we notice that in [83, 84], a Hamiltonian replicating a global AND-operation between M logical bits (represented by M spins) was introduced, with the aid of M ancillary spins. Such Hamiltonian was proposed as the basic element to build general models to solve k-SAT problems [85]. We will thus refer to it as the k-SAT model. A logical AND identifies a single string (without loss of generality, the string given by $111\dots 1$ , M times) with an energy $E_\textrm{AND}$ different from the energy $E_{\overline{\mathrm{AND}}}$ associated to all the other $2^M-1$ logical strings. Formally, this spectrum coincides with the ideal two-level degenerate model (9), and therefore the Hamiltonian proposed in [83, 84] exhibits the desired quadratic scaling of the maximum $\mathcal{C}$ , more precisely $\mathcal{C}_\textrm{max}^{k-\textrm{SAT}[N]} = \mathcal{C}^\textrm{opt}(2^\frac{N}{2})$ (cf figure 9). The construction uses a total $N = 2M$ of spins (the neglected levels correspond to energies that can be made arbitrarily high, see [83]), that is, an overhead of $N/2$ . The Star and Star-chain models achieve similar degeneracies while using a much smaller overhead, i.e. a 1-spin overhead for the Star model and a $N/(m+1)$ -spins overhead for the Star-chain model. In table 1, we compare these models in terms of the excited-level degeneracy and scaling of $\mathcal{C}$ , as well as the locality of the interactions.

Table 1. Models recreating an effective spectrum with a single ground state and an exponentially degenerate first excited level. With the same total number N of spins, the k-Sat model 'sacrifices' half of them to obtain a ${\sim}2^{\frac{N}{2}}$ degeneracy, while the Star model has a single spin overhead ( $\sim 2^{N-1}$ degeneracy), and the Star-chain a $N/(m+1)$ overhead ( ${\sim}2^{\frac{mN}{m+1}}$ degeneracy). Moreover, the Star-chain model can be realised with short-range interactions.

Model	1st excited deg.	Asymptotic $\mathcal{C}_\textrm{max}$	Short-range?
k-Sat [83]	$2^{\frac{N}{2}}-1$	${\sim}\dfrac{(\ln 2)^2}{4} \dfrac{N^2}{4}$	${\unicode{x2718}}$
Star	$2^{N-1}+N-1$	${\sim}\dfrac{(\ln 2)^2}{4} (N-1)^2$	${\unicode{x2718}}$
Star-chain	$2^{\frac{mN}{m+1}}+\dfrac{mN}{m+1}$	${\sim}\dfrac{(\ln 2)^2}{4} \dfrac{m^2 N^2}{(m+1)^2}$	✓

5. Conclusions and outlook

In this work, we addressed the problem of maximizing the heat capacity $\mathcal{C}$ of physically realisable quantum systems, which amounts to engineer the best probe for temperature estimation in the context of equilibrium thermometry [4]. Using a combination of analytical derivations, Machine-Learning methods, and physical insights, we explore the design space of spin Hamiltonians with two-body interactions and local magnetic fields, discovering Hamiltonians with star-shaped topology that can approach the theoretical maximum of $\mathcal{C}$ in the limit of large systems. Additionally, we showed that an arbitrarily good approximation can be achieved when requiring these interactions to be short-ranged. The models emerging from such optimisation achieve a Heisenberg-like scaling of the sensitivity, without the use of entanglement, contrary to the well-known case of phase estimation in quantum optics [42]. Remarkably, these models show a simple architecture of the interactions that make them ideal probes also for adaptive temperature estimation schemes [39, 86]. We further showed that the models we found can be embedded in currently available quantum annealers^¹¹, making them highly attractive both from a theoretical and experimental points of view. These results pave the way to the physical realization of ultra-sensitive spin-based thermometers, valid also for alternative experimental platforms such as cold atoms [87], NV centers [88], and Rydberg atoms [89]. Of particular interest is the use of these engineered optimal spin-network thermal probes for ultracold gases [13, 21–23, 32].

In terms of Hamiltonian spectrum engineering, we showed that the essential requirement for an optimal thermal probe made of N constituents is the presence of a single ground state and an exponential degeneracy of the first excited level. This effective two-level spectrum also appears in other problems, in which we speculate that our work might have application, such as protein folding modelling [90, 91], adiabatic Grover's search [92–94], energy based boolean computation [83], and quantum heat engines [82, 95–98].

An interesting challenge for the future is to characterise the relaxation timescale τ_rel of the optimal probes derived here, see also [94, 95]. Due to critical slowdown, we expect a trade-off between large heat capacity and slowness of the relaxation process. It hence remains a relevant open question if a similar Hamiltonian engineering can be performed taking as a figure of merit $\mathcal{C}/\tau_\textrm{rel}$ , which would also have important consequences in the optimization of thermal engines [82, 95–98]. At the same time, it is worth emphasising that when time is a resource for thermometry¹² , optimal non-equilibrium protocols require the same effective two-level structure of the models presented in this work, as recently shown in [38]. Another challenge is to move beyond the weak coupling assumption behind (4), and consider the optimisation of thermometer probes for the more general mean force Gibbs state [99–101].

Acknowledgments

We thank Rosario Fazio for fruitful discussions. P A is supported by the QuantERA II programme, that has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No 101017733, by the Austrian Science Fund (FWF), project I-6004, by 'la Caixa' Foundation (ID 100010434, Grant No. LCF/BQ/DI19/11730023), and by the Government of Spain (FIS2020-TRANQI and Severo Ochoa CEX2019-000910-S), Fundacio Cellex, Fundacio Mir-Puig, Generalitat de Catalunya (CERCA, AGAUR SGR 1381). F N gratefully acknowledges funding by the BMBF (Berlin Institute for the Foundations of Learning and Data—BIFOLD), the European Research Commission (ERC CoG 772230) and the Berlin Mathematics Center MATH+ (AA1-6, AA2-8). P A E gratefully acknowledges funding by the Berlin Mathematics Center MATH+ (AA1-6, AA2-18). G H and M P L acknowledge funding from the Swiss National Science Foundation through a starting Grant PRIMA PR00P2_179748 and an Ambizione Grant No. PZ00P2-186067, and through the NCCR SwissMAP.

Data availability statement

The code used to generate these results can be provided upon request to the authors. All data that support the findings of this study are included within the article (and any supplementary files).

Appendix A: ADAM optimization and the emergence of the Star (and Star-chain) models

In this appendix we explain how we carried out the numerical optimization of the heat capacity using methods that are commonly employed in machine learning.

Let us consider an arbitrary Hamiltonian $H(\theta)$ that depends on a set of parameters θ. The θ parameters could be, for example, the h_i and J_ij parameters in equation (3). Our aim is to determine the value of the parameters θ that maximize the heat capacity of the system. As discussed in the main text, this is equivalent to maximizing the Hamiltonian variance of the thermal state $\Delta^2_\beta H$ given in equation (7).

In machine learning, it is common to minimize a 'loss function' $\mathcal{L}(\theta)$ that depends on a set of parameters. One way to determine the value of θ that minimizes $\mathcal{L}(\theta)$ is to use gradient descent. This consists of starting from a random value of the parameters θ, computing the gradient $\partial_\theta \mathcal{L}(\theta)$ , and updating the parameters according to

$\begin{align} \theta_\text{new} = \theta_\text{old} -\alpha \partial_\theta \mathcal{L}\left(\theta\right), \end{align} \tag{ A1 }$

where α > 0 is the so-called 'learning rate' that determines how large of a step we take in parameter space in the opposite direction of the gradient. If α is small enough and $\mathcal{L}(\theta)$ is differentiable, then it is guaranteed that $\mathcal{L}(\theta_\text{new})\unicode{x2A7D} \mathcal{L}(\theta_\text{old})$ ; reiterating this gradient descent step many times, we will reach a local minimum.

However, this method is prone to getting stuck in local minima, may take many iterations to converge, and choosing appropriate values of α is not always straightforward. An alternative to the update rule in equation (A1) is given by ADAM (Adaptive Moment Estimation) [73]; this method was empirically found to converge better in a variety of problem. As equation (A1), it only requires the calculation of the gradient at each iteration, but it improves over it in various ways, so we refer to [73] for details.

In order to find the parameters θ that maximize the heat capacity of the system described by $H(\theta)$ , we use as loss function

$\begin{align} \mathcal{L}\left(\theta\right) \equiv - \Delta^2_\beta H, \end{align} \tag{ A2 }$

such that minimizing the loss function corresponds to maximizing the Hamiltonian variance. We then start from a random choice of θ and use the ADAM optimization method to minimize the loss function. We compute the gradient of the variance using backpropagation [72], which is a common machine learning algorithm that automatically computes the gradient of a function in a given point. In particular, we use the PyTorch framework to compute the Hamiltonian variance of the thermal state, its gradient, and to perform the ADAM optimization using the default hyperparameters.

We now display some of the results we found with this method in different classes systems.

A.1. N Spin Hamiltonian

In this subsection we show how the Star model emerged from the numerical optimization considering the N spin Hamiltonian given in equation (3), and we provide some details on the optimization method. The optimization was carried out as described above considering $\{h_i\}$ and $\{J_{ij}\}$ as the θ parameters. Both are initialized randomly between −1 and 0. We performed separate optimizations for $N\in \{2,3,\dots, 15\}$ , finding the All-to-All model for $N\in\{2,\dots,6 \}$ , and the Star model for $N \unicode{x2A7E} 7$ . For $N\in \{2,\dots,11\}$ , the results were found running a single optimization with fixed learning rate α = 0.001 for 60000 steps (although most optimizations converge much sooner). However, for $N\unicode{x2A7E} 12$ , this choice would sometimes get stuck in local minima. For $N = \{12,13\}$ , we ran the optimization multiple times as detailed above, and chose the model with the largest heat capacity. For $N = \{14,15\}$ , to avoid getting stuck in local minima, we used a common technique in Machine Learning, which consists of scheduling the learning rate, i.e. of varying it at each optimization step. In particular, we used the 'CyclicLR' scheduler of PyTorch that varies the learning rate in a triangular fashion between a minimum $\alpha_\text{min}$ and a maximum $\alpha_\text{max}$ value. For N = 14, we chose $\alpha_\text{min} = 0.0003$ and $\alpha_\text{max} = 0.2$ , and halved the amplitude of the triangle at every repetition (such that, asymptotically, the learning rate converges to $\alpha_\text{min})$ . The number of steps during the 'up phase' of the triangle was chosen to be 6000. For N = 15, we chose $\alpha_\text{min} = 0.001$ and $\alpha_\text{max} = 0.1$ without halving the amplitude of the triangle at every repetition. Also in this case the 'up phase' consists of 6000 steps.

To show the emergence of the Star model, in figure 10 we show the values of h_i and of J_ij found with our numerical method for N = 4 (left panels), N = 7 (middle panels), and N = 9 (right panels). The upper panels show h_i as a function of the site index $i = 1,\dots,N$ , while the lower panels show the value of J_ij (the color) as a function of the site indices i and j. Since J_ij is only defined for i < j, a white square is shown when such condition is not satisfied.

Figure 10. Refer to the following caption and surrounding text. — **Figure 10.** Values of *h_i* and of J_ij found with our numerical method for N = 4 (left panels), N = 7 (middle panels), and N = 9 (right panels). The upper panels show *h_i* as a function of the site index $i = 1,\dots,N$ , while the lower panels show the value of J_ij (the color) as a function of the site indices i and j. Since J_ij is only defined for i < j, a white square is shown when such condition is not satisfied.
Download figure:
Standard image High-resolution image

**Figure 10.** Values of *h_i* and of J_ij found with our numerical method for N = 4 (left panels), N = 7 (middle panels), and N = 9 (right panels). The upper panels show *h_i* as a function of the site index $i = 1,\dots,N$ , while the lower panels show the value of J_ij (the color) as a function of the site indices i and j. Since J_ij is only defined for i < j, a white square is shown when such condition is not satisfied.
Download figure:
Standard image High-resolution image

As we can see, for N = 4 all parameters take the same value ( $h_i = J_{ij}\approx -0.377$ for every i and j); this corresponds to the All-to-All model described in equation (26) with $h = J\approx 0.377$ . For N = 7, we see that all spins are interchangeable except for a single privileged spin corresponding to i = 2. Indeed, $h_2\approx 4.41$ , while $h_{i\neq 2} \approx -1.15$ , and J_ij is non null, and equal to −1.15, only when i or j are equal to 2. This is precisely the Star model as described in equation (13) with a ≈ 4.41 and $b \approx -1.15$ . For N = 9, we find a model where all spins are interchangeable, except for two privileged spins corresponding to $i = 2,3$ . Indeed, $h_{i} = 0$ except for $h_2 = h_3\approx -1.51$ . Also $J_{ij} = 0$ when both i and j are not 2 or 3, while $J_{23}\approx 8.96$ , and $J_{ij} = -1.51$ when i or j are 2 or 3, but not both. The Hamiltonian of this model can be written as

$\begin{align} H_{\overline{\text{Star}}\left(N\right)}\left(a,b\right) = a \,\sigma_1^z \sigma_2^z +b \sum_{i = 1,2} \sigma_i^z \left(\unicode{x1D7D9} + \sum_{j = 3}^N \sigma_j^z\right), \end{align} \tag{ A3 }$

and schematically represented as in figure 11 with a ≈ 8.96 and $b\approx = -1.51$ . Interestingly, it can be seen that $H_{\overline{\text{Star}}(N)}(a,b)$ has the same exact spectrum as $H_{{\text{Star}}(N)}(a,b)$ , and therefore the same heat capacity. Therefore, while they are physically two different models, they have identical characteristics as thermometers.

Figure 11. Refer to the following caption and surrounding text. — **Figure 11.** Schematic representation of the Hamiltonian $H_{\overline{\text{Star}}(N)}(a,b)$ given in equation (A3).
Download figure:
Standard image High-resolution image

At last, in figure 12 we analyze the spectrum of the Star model corresponding to N = 9. The individual eigenenergies are plotted as dots on the y-axis. For comparison, we plot as a red line the value of E (measured from the ground state energy) that maximizes the heat capacity of the degenerate Hamiltonian $H_\textrm{deg}$ (see equation (9)) for N = 9. As expected, there is a single ground state, a highly degenerate first excited state (with a degeneracy approximately given by half of the states), and further excited states with a binomial degeneracy. Interestingly, and as suggested by the Lemma of appendix B, the value of E is quite similar to the energy of the highly degenerate first excited state.

Figure 12. Refer to the following caption and surrounding text. — **Figure 12.** Spectrum of the Star model, found with the numerical optimization for N = 9, represented as dots with the corresponding energy on the y-axis. The red line represents the energy E (measured from the ground state energy) that maximizes the heat capacity of the degenerate Hamiltonian $H_\textrm{deg}$ (see equation (9)) for N = 9.
Download figure:
Standard image High-resolution image

A.2. N Spin Hamiltonian with bounded parameters

In this subsection we discuss how we performed the optimization of the heat capacity of N spins with an additional bound on the magnitude of the parameters that lead to the 'bound' curves in figure 8. In particular, we wish to maximize equation (3) with respect to h_i and J_ij, with the additional constraint that

$\begin{align} |h_i| &\unicode{x2A7D} c, & |J_{ij}|\unicode{x2A7D} c, \end{align} \tag{ A4 }$

where c > 0 is a real constant.

Since our method works well for unconstrained optimizations, we introduce the following parameterization

$\begin{align} h_i & = c\tanh x_i, & J_{ij}& = c\tanh y_{ij}, \end{align} \tag{ A5 }$

where x_i and y_ij are real parameters. Since the hyperbolic tangent produces values in $[-1,1]$ , the parameterization of equation (A5) guarantess to satisfy the constraint in equation (A4) for any value of x_i and y_ij.

We therefore apply the same optimization described above, but choosing $\{x_i\}$ and $\{y_{ij}\}$ as our unconstrained optimization parameters θ, instead of $\{h_i\}$ and $\{J_{ij}\}$ .

In particular, the orange and green dots in figure 8 were produced the following way. We initialize the $\{x_i\}$ and $\{y_{ij}\}$ parameters randomly in the interval $[-1.5, 1.5]$ . Then, for each value of $N \in \{3,4, \dots,20\}$ , we repeat the optimization 12 times, choosing the one with the highest heat capacity. In particular, 6 repetitions are performed with learning rate α = 0.01, and 6 with α = 0.03. For $N\in\{21,22\}$ , we do the same but choosing as learning rates respectively α = 0.01 and α = 0.003.

A.3. Optimal values for the Star model and the Star-chain

In this subsection we provide some details regarding the heat capacity maximization in the Star and Star-chain models. In particular, in tables 2 and 3 we provide the explicit values plotted in figure 7.

Table 2. Values of the parameters of the Star and Star-chain models, that optimize the heat capacity, plotted in figure 7 (see caption of figure 7) for details. The second half of the parameters are given in table 3.

	Star model (unconstr.)		Star model (constr.)		Star-chain model
N	a	b	a	b	a	b	J
2	−0.711	0.711	−0.711	0.711	—	—	—
3	0.000	0.797	−0.136	0.752	—	—	—
4	0.894	0.894	0.578	0.810	0.578	0.810	−1.600
5	2.015	1.007	1.518	0.897	—	—	—
6	3.398	1.133	2.797	1.020	—	—	—
7	5.070	1.267	4.482	1.173	—	—	—
8	7.052	1.410	6.553	1.340	1.964	1.101	−1.191
9	9.358	1.560	9.041	1.520	—	—	—
10	11.998	1.714	13.722	1.905	—	—	—
11	14.977	1.872	17.489	2.123	—	—	—
12	18.297	2.033	20.311	2.216	3.504	1.559	−1.612
13	21.960	2.196	22.760	2.263	—	—	—
14	25.967	2.361	25.032	2.289	—	—	—
15	30.318	2.527	27.208	2.304	—	—	—
16	35.013	2.693	29.322	2.314	4.953	2.021	−2.038
17	40.053	2.861	31.397	2.320	—	—	—
18	45.438	3.029	33.444	2.324	—	—	—
19	51.168	3.198	35.473	2.326	—	—	—
20	57.243	3.367	37.490	2.328	5.720	2.267	−2.468
21	63.664	3.537	39.495	2.329	—	—	—
22	70.431	3.707	41.493	2.329	—	—	—
23	77.543	3.877	43.488	2.329	—	—	—
24	85.001	4.048	45.477	2.329	6.164	2.411	−2.903

Table 3. Continuation of the parameters displayed in table 2.

	Star model (unconstr.)		Star model (constr.)		Star-chain model
N	a	b	a	b	a	b	J
25	92.805	4.218	47.465	2.329	—	—	—
26	100.956	4.389	49.451	2.329	—	—	—
27	109.452	4.560	51.436	2.329	—	—	—
28	118.294	4.732	53.420	2.329	6.452	2.504	−3.336
29	127.483	4.903	55.404	2.329	—	—	—
30	137.018	5.075	57.388	2.330	—	—	—
31	146.899	5.246	59.371	2.329	—	—	—
32	157.127	5.418	61.354	2.328	6.649	2.568	−3.767
33	167.701	5.590	63.339	2.329	—	—	—
34	178.621	5.762	65.324	2.329	—	—	—
35	189.887	5.934	67.310	2.329	—	—	—
36	201.500	6.106	69.295	2.329	6.790	2.614	−4.195
37	213.460	6.278	71.280	2.329	—	—	—
38	225.766	6.450	73.267	2.329	—	—	—
39	238.418	6.623	75.254	2.329	—	—	—
40	251.417	6.795	77.241	2.329	6.896	2.648	−4.622
41	264.762	6.967	79.230	2.329	—	—	—
42	278.453	7.140	81.218	2.329	—	—	—
43	292.492	7.312	83.206	2.329	—	—	—
44	306.876	7.485	85.196	2.329	6.979	2.676	−5.047
45	321.607	7.657	87.187	2.330	—	—	—
46	336.685	7.830	89.176	2.330	—	—	—
47	352.109	8.002	91.167	2.330	—	—	—
48	367.879	8.175	93.158	2.330	7.046	2.697	−5.472
49	383.996	8.348	95.150	2.330	—	—	—
50	400.460	8.520	97.141	2.330	—	—	—

All three optimizations are carried out using the Adam optimizer for 6000 steps and backpropagation to compute the gradients as described in appendix A, but we only optimize over the a and b parameters of the Star model, and over the a, b and J parameters of the Star-chain model.

In particular, the 'unconstrained' case of the Star model was optimized fixing the condition $a = b(N-3)$ , and optimizing only over b. The initial value is set to b = 6, and the learning rate is set at α = 0.01. In the 'constrained' case of the Star model, we optimize over a and b choosing as initial values $a = 2N-3$ and b = 2.2, and learning rate α = 0.001. In the Abel model with m = 3, we optimize over a, b and J. For N = 4, we choose as initial values a = 3.5, b = 1.55 and $J = -1.6$ . For higher values of N, we choose as initial values the parameters that maximize the previous optimization. We set the learning rate at α = 0.003.

A.4. Quantum N Spin Hamiltonian

In this subsection, we employ our numerical optimization method to maximize the heat capacity of the most generic two-body spin Hamiltonian, namely

$\begin{align} \bar{H}_\text{quantum} = \sum_{i \atop \mu\in\left\{x,y,z\right\}} \bar{h}^{\left(\mu\right)}_i \sigma^{\mu}_i + \sum_{i < j \atop \mu,\nu\in\left\{x,y,z\right\}} \bar{J}^{\left(\mu,\nu\right)}_{ij} \sigma^{\mu}_i \sigma^{\nu}_j, \end{align} \tag{ A6 }$

where $\bar{h}^{(\mu)}_i$ and $\bar{J}^{(\mu,\nu)}_{ij}$ are arbitrary parameters. Since the heat capacity only depends on the spectrum of the Hamiltonian, we can perform arbitrary unitary operations to $\bar{H}_\text{quantum}$ without changing its spectrum, thus its heat capacity. Choosing local unitary transformations of the form

$\begin{align} U_i = \exp\left[ i \sum_{\mu\in \left\{x,y,z\right\} } \theta_\mu \sigma^\mu_i \right], \end{align} \tag{ A7 }$

where θ_µ are three suitable angles, we can always rotate $\sum_{\mu\in \{x,y,z\}} \bar{h}^{(\mu)}_i \sigma^{\mu}_i$ into an operator proportional only to $\sigma^z_i$ . Therefore, applying the appropriate unitary transformation on each spin site, we obtain the Hamiltonian

$\begin{align} {H}_\text{quantum} = \sum_{i} {h}_i \sigma^{z}_i + \sum_{i < j \atop \mu,\nu\in\left\{x,y,z\right\}} {J}^{\left(\mu,\nu\right)}_{ij} \sigma^{\mu}_i \sigma^{\nu}_j, \end{align} \tag{ A8 }$

where ${h}_i$ and ${J}^{(\mu,\nu)}_{ij}$ are arbitrary parameters.

In this subsection, without loss of generality, we optimize equation (A8) considering ${h}_i$ and ${J}^{(\mu,\nu)}_{ij}$ as optimization parameters θ. We performed a separate optimization for $N\in \{2,3,\dots,9\}$ with a fixed learning rate α = 0.001, performing 20000 optimization steps, and starting from random initial values of the parameters uniformly distributed between 0 and 1. In all cases, we found values of the heat capacity that are identical (up to numerical errors) to the values found considering the N spin Hamiltonian with only σ^z, i.e. the model, given by equation (3), considered in the previous subsection. Furthermore, these solutions also have the same spectrum found in the previous subsection. However, they are not the same model: indeed, the optimal values of ${J}^{(\mu,\nu)}_{ij}$ that we find are non-zero when $\mu\neq z$ and $\nu\neq z$ . This can be understood in the following way: since the spectrum and the heat capacity are invariant under unitary transformations, we can apply any unitary transformation to the Star model to generate different models that display the same spectrum and heat capacity. Therefore, there is an infinitely large class of systems with the same optimal heat capacity, and our optimization method converges randomly to one of these solutions.

As an example, in figure 13 we plot the spectrum, h_i , and ${J}^{(\mu,\nu)}_{ij}$ (for $\mu,\nu\in \{x,y,z\}$ ), that we found for N = 9, in the same style as in figures 10 and 12. As we can see, there is some structure in $J^{(\mu,\nu)}_{ij}$ for $\mu,\nu\in\{x,y\}$ that privileges a specific spin index (number 8 in this case). However, it is clear that this model is different from the N spin Hamiltonian with only σ^z. Nonetheless, we see that the spectrum, and thus the heat capacity, is essentially the Star spectrum (compare the first panel of figure 13 with figure 12). The very small discrepancies are due to the numerical optimization method that reached parameters near the local minima, but not exactly. As previously anticipated, the 'noisyness' that is visible in many panels can be explained by the infinite number of models that yield the same spectrum, such that the numerical method converges to a random one based on the initial stochastic choice of the parameters.

Figure 13. Refer to the following caption and surrounding text. — **Figure 13.** Optimization results for the quantum spin Hamiltonian in equation (A8) for N = 9. The first row shows the spectrum and the *h_i* as in figures 10 and 12. Each of the lower 9 panels displays $J^{(\mu,\nu)}_{ij}$ , for all combinations of $\mu,\nu\in\{x,y,z\}$ , as a function of the site index i and j (similar to figure 10). Since $J^{(\mu,\nu)}_{ij}$ is only defined for i < j, a white box is plotted when such condition is not fulfilled.
Download figure:
Standard image High-resolution image

A.5. D-Wave annealer Hamiltonian

In this subsection we consider a spin Hamiltonian as in equation (3) with only $\sigma^z_i$ terms, but we constrain the optimization to reflect the topology of the interactions of D-Wave annealers. In particular, we consider the Chimera graph as in figure 6, and we focus on 3 units, i.e. 24 spins. This corresponds to excluding the lower right unit of figure 6. Mathematically, we enforce elements of J_ij to be null whenever a connection between spin i and j is not present in the topology, and then we minimize the loss function considering the non-null $\{J_{ij}\}$ and $\{h_i\}$ parameters as θ. We use the Star optimization method for 20 000 steps at a fixed learning rate, and randomly initializing the parameters between −1.5 and 1.5. In fact, we ran the optimization 3 times: once with α = 0.01, and twice with α = 0.03. This yielded values of the heat capacity between 39.99 and 41.57.

In figure 14 we show the numerical results that we found in the optimization run that yielded the largest heat capacity (corresponding to $\alpha = 0.03)$ . The first panel shows the spectrum in the same style as figure 12, while the second and third panels show the values of h_i and J_ij as in figure 13. To better understand the results, we applied a local unitary flip of $\sigma_i^z$ in all sites where $h_i\lt0$ (which does not change the spectrum, thus the heat capacity). This amounts to changing the sign of h_i whenever h_i is negative, and correspondingly changing the sign of J_ij and J_ji for all j. The indexing of the spins is such that the first unit is described by $i\in \{1,\dots,8\}$ , the second by $i\in \{9,\dots,16\}$ , and the third by $i\in \{17,\dots,24\}$ . Furthermore, spins $\{5,\dots,8\}$ of the first unit are coupled to spins $\{9,\dots,12\}$ of the second unit, and spins $\{13,\dots,16\}$ of the second unit are coupled to spins $\{17,\dots,20\}$ are the third unit (see non-white boxes in the last panel of figure 14).

Figure 14. Refer to the following caption and surrounding text. — **Figure 14.** Optimization results for the Chimera graph topology of the interactions of D-Wave annealers with 3 units (24 spins). The first panel shows the spectrum of the model as in figure 12, while the second and third panels show the values of *h_i* and of J_ij as in figure 10. Since J_ij is only defined for i < j and when spins i and j are coupled according to the Chimera graph shown in figure 6, a white box is plotted whenever J_ij is not defined.
Download figure:
Standard image High-resolution image

As we can see, this model is very similar to the Star-chain model with m = 3 embedded into the Chimera graph as in figure 6. Indeed, there are two privileged spins per unit (corresponding to spins $2,6$ in the first unit, $10,15$ in the second, and $19,23$ in the third). These are represented in orange in figure 6. These spins have a larger on-site potential as compared to all other ones (see middle panel of figure 14), and they are each coupled to 3 spins within the same unit (see the 'dark blue crosses' in the last panel of figure 14). Furthermore, the three units are linked to each other through these privileged spins (see the two brown isolated dots in the last panel of figure 14).

Appendix B: A small Lemma of (Property 2)

In this appendix we prove a theoretical Lemma that leads to Property 2 in the main text, section 2.2. The Lemma considers two Hamiltonians, H₁ and H₂, such that H₁ has a single ground state and a k₁-degenerate excited state ( $k_1+1$ levels in total), while H₂ has the same spectrum and additional k₂ excited states above (totaling $1+k_1+k_2$ levels) that is,

$\begin{align} H_1& = 0\vert 0\rangle \langle 0\vert +\sum_{i = 1}^{k_1} E \vert i\rangle \langle i\vert \;, \end{align} \tag{ B1 }$

$\begin{align} H_2& = 0\vert 0\rangle \langle 0\vert +\sum_{i = 1}^{k_1} E \vert i\rangle \langle i\vert + \sum_{\alpha = k_1+1}^{k_1+k_2} E_\alpha \vert \alpha\rangle \langle \alpha\vert \;, \end{align} \tag{ B2 }$

with $0\unicode{x2A7D} E\unicode{x2A7D} E_\alpha\; \forall\alpha$ . Consider now the realistic situation in which these Hamiltonians are controlled via internal coupling parameters $\vec{\lambda}$ , such as is the case of our work. The Lemma has two assumptions: i) it is possible to control the first excited gap $E(\vec{\lambda})$ , contemporary to keeping the additional α-levels above ii) $E_\alpha(\vec{\lambda})\unicode{x2A7E} E(\vec{\lambda})$ . A simple scenario in which these assumptions are satisfied is the simple Hamiltonian $H_2(\lambda) = \lambda\left(\sum_{i = 1}^{k_1} E \vert i\rangle \langle i\vert + \sum_{\alpha = k_1+1}^{k_1+k_2} E_\alpha \vert \alpha\rangle \langle \alpha\vert \right)$ . Under these assumptions, the Lemma states that the maximal achievable heat capacity with H₂ is always larger than the maximum heat capacity obtainable with H₁.

$\begin{align} \max_{E}\mathcal{C}\left(H_1\right)\unicode{x2A7D} \max_{E\unicode{x2A7D} E_\alpha} \mathcal{C}\left(H_2\right)\;. \end{align} \tag{ B3 }$

B.1. Proof of the lemma

When computing the variance of the energy in a thermal state, global shifts in the energy do not matter. For this reason we rewrite the same Hamiltonians putting the k₁ $\vert i\rangle$ levels to zero, i.e.

$\begin{align} H^{^{\prime}}_1& = - E\vert 0\rangle \langle 0\vert \;, \end{align} \tag{ B4 }$

$\begin{align} H^{^{\prime}}_2& = - E\vert 0\rangle \langle 0\vert + \sum_{\alpha} E^{^{\prime}}_\alpha \vert \alpha\rangle \langle \alpha\vert \;. \end{align} \tag{ B5 }$

with $E\unicode{x2A7E} 0$ and $E_\alpha-E\equiv E^{^{\prime}}_\alpha\unicode{x2A7E} 0$ .

We now use temperature units β = 1, to simplify the discussion. The thermal states are therefore

$\begin{align} \rho^{\left(1\right)} = \frac{e^{-H^{^{\prime}}_1}}{{\textrm{Tr}}\left[e^{-H^{^{\prime}}_1}\right]}\;, \quad \rho^{\left(2\right)} = \frac{e^{-H^{^{\prime}}_2}}{{\textrm{Tr}}\left[e^{-H^{^{\prime}}_2}\right]}\;. \end{align} \tag{ B6 }$

Let us call $p_0^{(1)}$ and $p_0^{(2)}$ the corresponding ground state populations,

$\begin{align} p_0^{\left(1\right)} = \dfrac{1}{1+k_1 e^{-E}}\;, \quad p_0^{\left(2\right)} = \dfrac{1}{1+k_1 e^{-E}+\sum_\alpha e^{-\left(E^{^{\prime}}_\alpha+E\right)}}\;. \end{align} \tag{ B7 }$

Notice that they both depend on the value of E, which we omit in the following for simplicity. It is easy to compute the heat capacity (equivalently, the energy variance) as

$\begin{align} \Delta^2 H_1 = E^2\, p_0^{\left(1\right)}\left(1-p_0^{\left(1\right)}\right)\;. \end{align} \tag{ B8 }$

For what concerns H₂ instead (calling p_α the population of the level E_α)

$\begin{align} \Delta^2 H_2 = E^2 p_0^{\left(2\right)} + \sum_\alpha E^{^{\prime}} 2_\alpha p_\alpha - \left( - E\, p_0^{\left(2\right)} + \sum_\alpha E^{^{\prime}}_\alpha p_\alpha \right)^2 = E^2 p_0^{\left(2\right)}\left(1-p_0^{\left(2\right)}\right) + A + B\;, \end{align} \tag{ B9 }$

with

$\begin{align} A: = \sum_\alpha E^{^{\prime}} 2_\alpha p_\alpha - \left(\sum_\alpha E^{^{\prime}}_\alpha p_\alpha \right)^2 \unicode{x2A7E} 0\;, \quad B: = 2 E p_0^{\left(2\right)} \left(\sum_\alpha E^{^{\prime}}_\alpha p_\alpha\right) \unicode{x2A7E} 0 \;. \end{align} \tag{ B10 }$

where the inequalities follow from B being trivially positive, while A corresponds to the variance of an Hamiltonian having levels E_α with population p_α and all the rest of the population $1-\sum_\alpha p_\alpha$ being at an energy = 0. It follows that

$\begin{align} \Delta^2 H_2 \unicode{x2A7E} E^2 p_0^{\left(2\right)}\left(1-p_0^{\left(2\right)}\right)\;. \end{align} \tag{ B11 }$

Finally, let us compare the maximal value of the heat capacity in the two cases. Let's call $\bar{E}^{(1)}$ the optimal value for the Hamiltonian H₁, which induces a ground state population equal to $\bar{p}^{(1)}_0 = p^{(1)}_0(\bar{E}^{(1)})$ , i.e.

$\begin{align} \max \Delta^2 H_1 = \max_ E E^2 p_0^{\left(1\right)}\left( E\right)\left(1-p_0^{\left(1\right)}\left( E\right)\right) = \left(\bar{E}^{\left(1\right)}\right)^2\bar{p}^{\left(1\right)}_0 \left(1-\bar{p}^{\left(1\right)}_0\right)\;. \end{align} \tag{ B12 }$

It suffices to conclude now by noticing that $p_0^{(2)}(E)$ is an increasing function of E and is always smaller than $p_0^{(1)}(E)$ , cf equation (B7). This means that

$\begin{align} p_0^{\left(2\right)}\left(x\right) = p_0^{\left(1\right)}\left(y\right)\quad \text{implies}\quad x>y\;. \end{align} \tag{ B13 }$

Therefore one can choose $p_0^{(2)}( E) = \bar{p}^{(1)}_0$ which will lead to $E\gt\bar{E}$ and therefore from (B11)

$\begin{align} \max \Delta^2 H_2 \unicode{x2A7E}\left(\bar{E}^{\left(1\right)}\right)^2\bar{p}^{\left(1\right)}_0 \left(1-\bar{p}^{\left(1\right)}_0\right)\;, \end{align} \tag{ B14 }$

concluding the proof.

Appendix C: Parametric scaling and noise-tolerance

In this section we estimate the strength and the precision that is needed in the engineering of the Hamiltonian parameters (3) in order to achieve the optimal Heisenberg-like scalings (19), (12) and (24) of the models described in the main text. For simplicity, we will work in adimensional units in which β = 1.

C.1. Bandwidth tolerance in the degenerate model

As we argued in section 2.1, the main property that optimal models satisfy in order to reach the optimal $\propto N^2$ scaling of the maximal heat capacity, is that of generating a single ground state and an exponentially-degenerate first excited level, i.e. approximating the degenerate Hamiltonian (9) in the best possible way, as well as possibly having additional higher energy levels (cf lemma in appendix B). However, in any physical realization, the resulting spectrum will have imperfections, when compared to (9). In particular, the exponentially-degenerate level might split into a bandwidth, or the overall gap might be shifted. Here we estimate the noise tolerance to such imprecisions in the first excited level.

C.1.1. Uniform shift

Consider first the case in which there is no splitting in the D − 1 first excited levels, but a uniform error, that is

$\begin{align} H = 0\vert 0\rangle \langle 0\vert + E \sum_{i = 1}^{D-1} \vert i\rangle \langle i\vert \;, \end{align} \tag{ C1 }$

and the value of E is not exactly the optimal energy gap. We show here that as far as the imprecision does not scale with the dimension, the heat capacity behaves smoothly. We know, infact, that the optimal value of E, for large dimensions is $E\sim \ln (D-1)$ . (cf main text and [35]). Suppose now that

$\begin{align} E = \left(1+\epsilon\right)\ln\left(D-1\right)\equiv \left(1+\epsilon\right)\ln d \end{align} \tag{ C2 }$

Then, the resulting adimensional variance of the flat Hamiltonian with $d\equiv D-1$ degenerate excited states and gap E can be computed given the ground state probability

$\begin{align} p_0 = \frac{1}{1+d e^{-E}}\;, \quad p_i = \frac{1}{d}\left(1-p_0\right) \; \text{for } i = 1,\dots, d\;. \end{align} \tag{ C3 }$

It follows that

$\begin{align} \nonumber \langle H^2\rangle - \langle H\rangle^2 = \frac{de^{-E}E^2}{1+de^{-E}}-\frac{d^2e^{-2E}E^2}{\left(1+de^{-E}\right)^2} = & \left(\ln d\right)^2\left(1+\epsilon\right)^2\left(\frac{d^{-\epsilon}}{1+d^{-\epsilon}}-\frac{d^{-2\epsilon}}{\left(1+d^{-\epsilon}\right)^2}\right) \end{align}$

$\begin{align} = &\left(\ln d\right)^2\frac{\left(1+\epsilon\right)^2}{4\cosh^2\left(\frac{\epsilon}{2}\ln d\right)}\;. \end{align} \tag{ C4 }$

The leading term of the asymptotic energy variance, for large dimension $D-1 = d$ is given, for $\epsilon\rightarrow 0$ , by $(\ln d)^2/4$ , i.e. recovering the results of [35] (see also main text, equation (2) and section 2.1). Moreover, from expression (C4) we see immediately that in order to keep such scaling, the denominator needs to be suppressed such that $\epsilon\ln d$ is bounded. This implies that the relative noise tolerance of the energy gap is given by

$\begin{align} \epsilon \propto \frac{1}{\ln d} \sim \frac{1}{N}\;, \end{align} \tag{ C5 }$

where we used that, in the case of N constituents the dimension of the system is exponential in N. That is, the relative error ε should scale as $1/N$ . Given $\ln d\propto N$ , this is equivalent to the absolute error being bounded by a constant

$\begin{align} E-\ln d \sim \mathcal{O}\left(1\right)\;. \end{align} \tag{ C6 }$

C.1.2. Single eigenstate shift and bandwith tolerance

We now analyse the case in which the highly degenerate first excited level splits into separate energies. That is, consider a D-dimensional Hamiltonian with a single ground state and $d\equiv D-1$ levels contained in a bandwith δ, i.e. (w.l.o.g. we can shift the ground-state energy to be negative, and the degenerate bandwith to be centered around 0)

$\begin{align} H = -E \vert 0\rangle \langle 0\vert +\sum_{i = 1}^d E_i \vert i\rangle \langle i\vert \quad \text{with} \quad -\delta \unicode{x2A7D} E_i \unicode{x2A7D} \delta \; \forall i \text{ and } E \simeq \ln d\;. \end{align} \tag{ C7 }$

Computing the energy variance, we obtain

$\begin{align} \langle H^2\rangle - \langle H \rangle & = E^2 p_0+\sum_i E_i^2 p_i -\left( E p_0+\sum_i E_i p_i\right)^2 \nonumber\\ & = E^2 p_0\left(1-p_0\right) + \sum_i E_i^2 p_i - \left(\sum_i E_i p_i\right)^2 - 2 E p_0 \sum_i E_i p_i = E^2 p_0\left(1-p_0\right)+ A+B\;, \end{align} \tag{ C8 }$

with

$\begin{align} A = & \sum_i E_i^2 p_i - \left(\sum_i E_i p_i\right)^2\;, \end{align} \tag{ C9 }$

$\begin{align} B = & - 2 E p_0 \sum_i E_i p_i\;. \end{align} \tag{ C10 }$

Notice now that the term A is always positive, while the first term, $E^2 p_0(1-p_0)$ is the leading term in the degenerate model in which (cf main text and [35])

$\begin{align} E\sim\ln d\propto N\;,\quad p_0\sim \frac{1}{2}\;.\quad \Delta^2 H\sim \frac{\left(\ln d\right)^2}{4}\propto N^2\;. \end{align} \tag{ C11 }$

It is then easy to show that, similarly to (C4), as far as the E_i levels are small and do not scale with the dimension, the optimal N² scaling is preserved. This is easily seen as $A\unicode{x2A7E} 0$ , while B in case $-\delta \unicode{x2A7D} E_i\unicode{x2A7D} \delta$ is bounded as

$\begin{align} |B|\unicode{x2A7D} 2 E\delta \sim \mathcal{O}\left(N\right)\;. \end{align} \tag{ C12 }$

It follows that the leading term, $E^2p_0(1-p_0)\sim \mathcal{O}(N^2)$ remains dominant and the heat capacity achieves the N² scaling, as far as $p_0(1-p_0)$ is finite. This is guaranteed by the fact that

$\begin{align} p_0 = \frac{1}{1+\sum_i e^{-\left(E+E_i\right)}} = \dfrac{1}{1+d^{-1}\sum_i e^{-E_i}}\;, \end{align} \tag{ C13 }$

and therefore

$\begin{align} \frac{1}{1+e^{\delta}}\unicode{x2A7D} p_0\unicode{x2A7D} \frac{1}{1+e^{-\delta}}\;. \end{align} \tag{ C14 }$

C.2. Consequences for the Star and Star-chain

In the above subsection we estimated the noise tolerance of the energy spectrum of the degenerate Hamiltonian (9) in order for the heat capacity to be close to its optimal value and maintain the Heisenberg-like scaling $\propto N^2$ . The results indicate that the error in the spectral engineering should be constant while N (and therefore the dimension D) grows. In terms of relative precision, as the optimal spectrum has a first excited gap $E\propto N$ , this means a $1/N$ relative precision in the engineering of the spectrum around the optimal values.

However, the spectrum of the Hamiltonian is a function of its parameters $\{h_i,J_{ij}\}$ (3). In this subsection, we analyse what precision is needed in our main models, $H_\textrm{Star}$ (13) and $H_\text{Star-chain}$ (20) and how the estimation of the noise tolerance is reflected in the Hamiltonian parameters.

C.2.1. Reduction to $H_\textrm{Star}$

First, for generic considerations, we notice that we limit ourselves to estimate the noise tolerance in the Star model only, because there is an exact mapping between the Star-chain and the Star model, in the limit of strong couplings −J, i.e. given (we allow a small relaxation in the Star-chain model, i.e. we assume different magnetic fields a_α on the $\sigma^z_\alpha$ spins, which is useful in the following analytical derivation, but does not significantly change the spectral properties of the model)

$\begin{align} H_{\text{Star-chain}\left[N = n\left(m+1\right)\right]}\left(a_\alpha,J,b\right): = \sum_\alpha a_\alpha \sigma^z_\alpha + J \sum_\alpha \sigma^z_\alpha \sigma^z_{\alpha+1} + b \sum_{\alpha,i} \left(\sigma^z_\alpha+\unicode{x1D7D9}\right) \sigma^z_{\alpha,i}\;, \end{align} \tag{ C15 }$

in the limit of high J, as explained in the main text, only configurations in which $\sigma^z_\alpha = \sigma^z_{\alpha^{^{\prime}}}$ are allowed, and therefore the effective Hamiltonian spectrum, up to an irrelevant global shift, becomes

$\begin{align} H_{\text{Star-chain}\left[N = n\left(m+1\right)\right]}\left(a_{\alpha},J\rightarrow -\infty,b\right) = h \tilde{\sigma}^z + b \sum_{\alpha,i} \left(\tilde{\sigma}^z+\unicode{x1D7D9}\right) \sigma^z_{\alpha,i}\;, \end{align} \tag{ C16 }$

where $\tilde{\sigma}^z$ is a formal spin that has value +1 when $\sigma^z_\alpha = +1 \;\forall\alpha$ (similarly for −1), and $h = \sum_\alpha a_\alpha$ . The above (C16) formally coincides with the Star model (13)

$\begin{align} H_{\text{Star}\left[N = nm+1\right]}\left(a,b\right) &: = a \,\sigma_1^z + b \sum_{i = 2}^{nm+1} \left( \sigma_1^z + \unicode{x1D7D9}\right) \sigma_i^z \;, \end{align} \tag{ C17 }$

if one identifies $h\rightarrow a$ . Moreover the mapping preserves the parametrization in b, while $h: = \sum a_\alpha$ in the gets mapped to a in the above equation. One could therefore assume $a_\alpha = 0$ to be zero for all αs except one, $h = a_{\bar{\alpha}}$ , such that the mapping between the two models is complete and the parameterization is formally the same by mapping $a_{\bar{\alpha}}\rightarrow a$ .

C.2.2. Parametric scaling an noise-tolerance of $H_\textrm{Star}$ in the optimal-degeneracy configuration

We thus consider here the $H_\textrm{Star}$ Hamiltonian (13)

$\begin{align} H_{\text{Star}\left[N\right]}\left(a,b\right) &: = a \,\sigma_1^z + b \sum_{i = 2}^{N} \left( \sigma_1^z + \unicode{x1D7D9}\right) \sigma_i^z \;, \end{align} \tag{ C18 }$

As mentioned in the main text, by choosing $b\unicode{x2A7E} 0$ and $b(N-3)\unicode{x2A7D} a \unicode{x2A7D} b(N-1)$ , it is ensured the presence of a single ground state at energy E₀, a $2^{N-1}$ -degenerate level at energy $E_\textrm{deg}$ and a 2nd excited, N − 1-degenerate level E₁ as

$\begin{align} E_0 \unicode{x2A7D} & E_\textrm{deg} \unicode{x2A7D} E_1\;, \end{align} \tag{ C19 }$

$\begin{align} E_0 = & a-2b\left(N-1\right)\;, \end{align} \tag{ C20 }$

$\begin{align} E_\textrm{deg} = & -a\;, \end{align} \tag{ C21 }$

$\begin{align} E_1 = & a-2b\left(N-3\right)\;. \end{align} \tag{ C22 }$

The optimal degeneracy of $2^{N-1}+N-1$ is reached when $a = b(N-3)$ , but it is not necessary for the model to achieve its N² scaling of for the heat capacity $\mathcal{C}$ . For simplicity, consider the choice $a = b(N-3)$ . The first excited gap is, in this case

$\begin{align} E_1-E_0 = E_\textrm{deg}-E_0 = 4b\;. \end{align} \tag{ C23 }$

This means that, for such choice of parameters, in the asymptotic limit of large N, one has $4b\sim N \ln 2$ , and consequently $a = b(N-3)\sim N(N-3)\frac{\ln 2}{4}$ . That is, b has a linear scaling in N and a has a quadratic scaling in N. This happens even if we relax the assumption of $a = b(N-3)$ . In that case, it remains valid that

$\begin{align} 4b = E_1-E_0 \unicode{x2A7E} E_\textrm{deg}-E_0 \sim E \ln 2\;, \end{align} \tag{ C24 }$

therefore b scales at least linearly in N and a at least quadratically, as it satisfies $b(N-3)\unicode{x2A7D} a \unicode{x2A7D} b(N-1)$ .

For what concerns the parametric error-tolerance for a and b in $H_\textrm{Star}$ , notice that we estimated above the gap-error tolerance, which results to be constant (cf appendix C.1), that is, one should have

$\begin{align} 2b\left(N-1\right)-2a = E_\textrm{deg}-E_0\sim N\ln 2 + \delta \end{align} \tag{ C25 }$

with $|\delta| = \mathcal{O}(1)$ bounded by a constant. From the above expression is easy to see that, if treated as independent, b can have an error $\delta_b\sim \mathcal{O}(\frac{1}{N})$ , while for a the admitted error is $\delta_a\sim \mathcal{O}(1)$ . It follows that both the relative error for a and b is inversely quadratic

$\begin{align} \frac{\delta_a}{a}\sim \mathcal{O}\left(N^{-2}\right)\;, \quad \frac{\delta_b}{b}\sim \mathcal{O}\left(N^{-2}\right)\;. \end{align} \tag{ C26 }$

C.2.3. Noise-tolerance in the degeneracy-suboptimal configurations

In the subsection above, we considered the case in which the Star model (C18) is forced in its optimal configurations satisfying an exponentially-degenerate first excited level, i.e. $E_0\unicode{x2A7D} E_\textrm{deg}\unicode{x2A7D} E_1$ . We saw that this imposes a linear scaling on b and quadratic scaling on a, and a relative error of order $\mathcal{O}(N^{-2})$ on both parameters. However, it is possible to show, as we do in appendix D.1, that (slightly) suboptimal solutions exist, featuring a bounded value of b $\forall N$ , while still achieving quadratic scaling of $\mathcal{C}$ . In fact, these solutions can have $\mathcal{C}$ to be arbitrarily close to its optimal value $\mathcal{C}^\textrm{Star}_\textrm{max}$ (19)). While referring the reader to appendix D.1 for the details, it is enough for our purposes to notice that, in such solutions, b can take any finite value larger than a certain treshold $b_\textrm{tresh}\sim \ln 2$ , while, the gap between $E_\textrm{deg}$ and E₀ still approximates the optimal value

$\begin{align} E_\textrm{deg}-E_0 = 2b\left(N-1\right)-2a\sim \ln 2\;. \end{align} \tag{ C27 }$

The exact finite value of b is not important in this case, and can be taken as given. It follows that in such configurations, while $b\propto\mathcal{O}(1)$ , while $a\sim (N-1)(b-\ln 2/2)$ scales linearly. Consequently we can obtain the error that is admitted on a in these configurations. It follows, given equation (C27) and the fact that the optimal gap has a fixed bandwidth tolerance $\mathcal{O}(1)$ (cf C.1), that a similar noise scaling applies to a, i.e.

$\begin{align} a\sim \mathcal{O}\left(N\right)\;, \quad \delta a\sim \mathcal{O}\left(1\right)\;, \quad \frac{\delta a}{a}\sim \mathcal{O}\left(N^{-1}\right)\;. \end{align} \tag{ C28 }$

C.2.4. Subtler sources of parameter-noise

Finally, notice that in the implementation of $H_\textrm{Star}$ a more general error could arise. That is, the actual tuning of the parameters of the generic spin Hamiltonian (3)

$\begin{align} H = \sum_i^N h_i \sigma^{z}_i + \sum_{i < j}^N J_{ij} \sigma^{z}_i \sigma^{z}_j \end{align} \tag{ C29 }$

to be converted into $H_\textrm{Star}$ assumes all $J_{ij} = 0$ for i > 1 and j > 1. Moreover, assuming (realistically) these contributions to be null, the resulting Hamiltonian is

$\begin{align} H_\textrm{Star-noisy}\left(a,\vec{b}^{\left(1\right)},\vec{b}^{\left(2\right)}\right) = a\sigma^z_1+\sum_{i = 2}^N b_i^{\left(1\right)}\sigma_i^z + \sum_{i = 2}^N b_i^{\left(2\right)}\sigma_i^z\sigma_1^z\;. \end{align} \tag{ C30 }$

The Star model assumes $b^{(1)}_i = b^{(2)}_i: = b$ . Noise in the couplings might however affect this constraint. The main consequence would be a splitting of the level $E_\textrm{deg}$ due to the fact that the configurations with $\sigma^z_1 = -1$ would have a binomial spectrum

$\begin{align} E_\textrm{deg-noisy} = -a+\sum_{i = 2}^N \sigma_i^z \left(b_i^{\left(1\right)}-b_i^{\left(2\right)}\right)\;. \end{align} \tag{ C31 }$

The bandwidth splitting of $E_\textrm{deg}$ is therefore characterized by

$\begin{align} \big| \sum_{i = 2}^N b_i^{\left(1\right)}-b_i^{\left(2\right)} \big| \lesssim \mathcal{O}\left(1\right) \end{align} \tag{ C32 }$

where the allowed constant bandwidth was derived above C.1. It follows that, in general the error of each spin $i\unicode{x2A7E} 2$ should be of order $\mathcal{O}(N^{-1})$ , i.e.

$\begin{align} b^{\left(-\right)}_i: = |b_i^{\left(1\right)}-b_i^{\left(2\right)}| \lesssim \mathcal{O}\left(N^{-1}\right)\;. \end{align} \tag{ C33 }$

For what concerns

$\begin{align} b^{\left(+\right)}_i: = |b_i^{\left(1\right)}+b_i^{\left(2\right)}| = \mathcal{O}\left(N\right)\; \end{align} \tag{ C34 }$

its scaling and relative error tolerance are the same as b, that is (C26) for the optimal degeneracy case appendix C.2.2, or 'irrelevant' for the suboptimal configurations discussed above in appendix C.2.3.

Appendix D: Analytics for the Star model and Star-chain model

In this appendix we provide additional analytics regarding the two main models presented in the main text, i.e. the Star model (13) and the Star-chain (20).

D.1. Partition function for the Star model

Given the energies and degeneracies indicated in section 3.1, we can exactly compute the partition function $Z = {\textrm{Tr}}[e^{-\beta H}]$ for the Star model (13),

$\begin{align} H_{\text{Star}\left[N\right]}\left(a,b\right) &: = a \,\sigma_1^z + b \sum_{i = 2}^N \sigma_i^z \left(\unicode{x1D7D9} + \sigma_1^z\right) \;, \end{align} \tag{ D1 }$

The partition function $Z = \sum_i e^{-\beta E_i}$ is

$\begin{align} Z_\text{Star} = \sum_{\vec{\sigma}^z}e^{-\beta H_\textrm{Star}\left[\vec{\sigma^z}\right]} = e^{-\beta a}\left(e^{2\beta b}+ e^{-2\beta b}\right)^{N-1} + 2^{N-1}e^{\beta a}, \end{align} \tag{ D2 }$

where the first term correspond to the binomial part of the spectrum (i.e. for $\sigma_1^z = 1$ ), while the second term correspond to the $2^{N-1}$ degeneracy that is obtained for $\sigma_1^z = -1$ . The above expression can be manipulated into

$\begin{align} Z_\text{Star} = 2^{N-1}\left(e^{-\beta a} \cosh\left(2\beta b\right)^{N-1}+e^{\beta a}\right)\;, \end{align} \tag{ D3 }$

and can be used to compute efficiently relevant quantities such as the average energy, the free energy etc as from standard statistical mechanics. In particular the average energy is given by

$\begin{align} \langle H\rangle_\beta = \frac{\sum_{\vec{\sigma}^z} H\left[\vec{\sigma^z}\right] e^{-\beta H\left[\vec{\sigma^z}\right]}}{Z} = -\frac{\partial}{\partial\beta}\ln Z\;. \end{align} \tag{ D4 }$

Similarly the heat capacity, or energy variance, is given by

$\begin{align} \Delta_\beta^2 H = \frac{\partial^2}{\partial\beta^2}\ln Z\;, \end{align} \tag{ D5 }$

which for the Star model can be expressed analytically by substituting (D2)

$\begin{align} \frac{\begin{array}{l}4b^2 \left(N-1\right)\cosh\left(2 b\right)^N+ 2e^{2 a}\cosh\left(2 b\right) \left(a^2-b^2\left(N-1\right)\left(N-3\right)+\left(a^2+b^2\left(N-1\right)^2\right)\cosh\left(4 b\right)\right.\\ \quad \left.-\ 2ab\left(N-1\right)\sinh\left(4 b\right)\right)\end{array}}{\cosh\left(2 b\right)^{2-N}\left(e^{2 a}\cosh\left(2 b\right)+\cosh\left(2 b\right)^N\right)^2} \end{align} \tag{ D6 }$

in temperature units where β = 1.

D.2. Statistics of the energy levels, exponential suppression above the degeneracy, and slightly suboptimal configurations with better parameter scaling

In this section we analyze the statistics of the energy levels of the Star model. As we argued in appendix C.2.1, the Star-chain model becomes equivalent to the former in the limit of large $|J|$ .

The probability of a given energy outcome in from a Gibbs state is, in temperature units β = 1, given by

$\begin{align} P\left(E\right) = \frac{\sum_i e^{-E_i}\delta\left(E-E_i\right)}{\sum_j e^{-E_j}} = \frac{\sum_i e^{-E_i}\delta\left(E-E_i\right)}{Z}\;, \end{align} \tag{ D7 }$

Z being the partition function, which for the Star model is, from (D3), in temperature units β = 1,

$\begin{align} Z_\textrm{Star} = 2^{N-1}\left(e^{-a} \cosh\left(2 b\right)^{N-1}+e^{a}\right)\;. \end{align} \tag{ D8 }$

Without loss of generality, it is possible to shift all energies in order to have $E_\textrm{deg}$ = 0 for simplicity. Given that $E_\textrm{deg} = -a$ , this is equivalent to multiplying the partition function with a factor e^−a. That is, in this case the spectrum reduces to

$\begin{align} E_0 = & 2a-2b\left(N-1\right)\;, & \text{degeneracy }&1\;, \end{align} \tag{ D9 }$

$\begin{align} E_\textrm{deg} = & 0\;, & \text{degeneracy }&2^{N-1}\;, \end{align} \tag{ D10 }$

$\begin{align} E_k = &2a-2b\left(N-1-2k\right)\;, & \text{degeneracy }&\binom{N-1}{k}\;. \end{align} \tag{ D11 }$

and the partition function becomes

$\begin{align} Z^{^{\prime}} = 2^{N-1}\left(e^{-2a} \cosh\left(2 b\right)^{N-1}+1\right) = e^{2b\left(N-1\right)-2a} \left(1+e^{-4b}\right)^{N-1}+2^{N-1}\;. \end{align} \tag{ D12 }$

Notice that it is possible to identify three contributions to Z', i.e.

$\begin{align} Z^{^{\prime}} = Z^{^{\prime}}P\left(E_0\right)+Z^{^{\prime}}P\left(E_\textrm{deg}\right)+Z^{^{\prime}}\sum_{k\unicode{x2A7E} 1}P\left(E_k\right) \end{align} \tag{ D13 }$

corresponding respectively to the weight of the ground state, degenerate level, and all the binomial levels above E₀, i.e.

$\begin{align} Z^{^{\prime}}P\left(E_0\right) = & e^{2b\left(N-1\right)-2a} \end{align} \tag{ D14 }$

$\begin{align} Z^{^{\prime}}P\left(E_\textrm{deg}\right) = & 2^{N-1} \end{align} \tag{ D15 }$

$\begin{align} Z^{^{\prime}}\sum_{k\unicode{x2A7E} 1}P\left(E_k\right) = & e^{2b\left(N-1\right)-2a} \left(\left(1+e^{-4b}\right)^{N-1}-1\right)\;. \end{align} \tag{ D16 }$

We now prove that in the optimal configurations, all the statistics of the Star model resides in $P(E_0)$ and $P(E_\textrm{deg})$ , while the remaining $\sum_{k\unicode{x2A7E} 1}P(E_k)$ is exponentially suppressed, as far as b grows (at least) logaritmically. Moreover, in the optimal configurations, we know from the main text and from appendix D.1 that $2b(N-1)-2a\sim (N-1)\ln 2$ , and therefore in such case one has $P(E_0)\sim P(E_\textrm{deg)}\sim \frac{1}{2}$ . Finally, as far as b grows faster than $\ln N$ , all the statistical contribution from the other levels $E_{k\unicode{x2A7E} 1}$ is suppressed. This can be seen from equation (D16) and

$\begin{align} \left(1+e^{-4b}\right)^{N-1} = \left(1+\frac{e^{-\left({4b}-{\ln \left(N-1\right)}\right)}}{N}\right)^{N-1}\;, \end{align} \tag{ D17 }$

which, for large N, tends to

$\begin{align} \left(1+\frac{e^{-\left({4b}-{\ln \left(N-1\right)}\right)}}{N}\right)^{N-1} \rightarrow e^{e^{-\left({4b}-{\ln \left(N-1\right)}\right)}}\;. \end{align} \tag{ D18 }$

For example, in the optimal configuration of the Star model, $4b\sim (N-1)\ln 2$ grows linearly in N and the whole contribution of the statistics from all levels $E_{k\unicode{x2A7E} 1}$ , is suppressed exponentially as

$\begin{align} e^{e^{-\left({4b}-{\ln \left(N-1\right)}\right)}} -1 \rightarrow e^{-2^{N-1}} - 1 \sim \frac{1}{2^{N-1}}. \end{align} \tag{ D19 }$

D.3. Star-chain: partition function and spectrum

D.3.1. Partition function of the Star-chain

Remarkably, the Star-chain model at equilibrium can be exactly solved, in the sense that it is possible to compute analytically its partition function, using the transfer matrix method, which is used in standard solutions of the 1D Ising model [102]. Consider the Star-chain Hamiltonian

$\begin{align} H_\text{Star-chain} = a \sum_\alpha \sigma^z_\alpha + J \sum_\alpha \sigma^z_\alpha \sigma^z_{\alpha+1} + b \sum_{\alpha,i} \left(\sigma^z_\alpha+\unicode{x1D7D9}\right) \sigma^z_{\alpha,i}\;, \quad \alpha = 1,\dots,n\;, \ \ i = 1,\dots,m\;. \end{align} \tag{ D20 }$

The partition function is given by definition as

$\begin{align} Z_\text{Star-chain} = \sum_{\vec{\sigma}^z}e^{-\beta H_\text{Star-chain}\left[\vec{\sigma}^z\right]}, \end{align} \tag{ D21 }$

where $\vec{\sigma}^z$ is the $(n+nm)$ -long vector given by $\vec{\sigma}^z = \{\sigma^z_\alpha,\sigma^z_{\alpha,i}\}$ and is summed over all possible values ±1 of all the spins. It is therefore possible to separate the two classes of spins by defining

$\begin{align} \vec{\sigma}^z_{\left(1\right)} = \left\{\sigma^z_\alpha\right\}\;, \quad \vec{\sigma}^z_{\left(2\right)} = \left\{\sigma^z_{\alpha,i}\right\}\;. \end{align} \tag{ D22 }$

To compute the partition function we can consider

$\begin{align} Z_\text{Star-chain} = \sum_{\vec{\sigma}^z_{\left(1\right)}}\sum_{\vec{\sigma}^z_{\left(2\right)}}e^{-\beta H_\text{Star-chain}\left[\vec{\sigma}^z\right]}\;. \end{align} \tag{ D23 }$

Moreover for fixed $\vec{\sigma}^z_{(1)} = \{\sigma^z_\alpha\}$ , we can re-express the second sum as

$\begin{align} \sum_{\vec{\sigma}^z_{\left(2\right)}}e^{-\beta H_\text{Star-chain}\left[\vec{\sigma}^z\right]} = \prod_{\alpha}\sum_{\left\{\sigma^z_{\alpha,i}\right\}} e^{-\beta a\sigma^z_{\alpha}}e^{-\beta J \sigma^z_{\alpha}\sigma^z_{\alpha+1}} e^{-\beta b\sum_i \left(\sigma^z_\alpha+\unicode{x1D7D9}\right) \sigma^z_{\alpha,i}} : = \prod_{\alpha} W\left(\sigma^z_\alpha,\sigma^z_{\alpha+1}\right)\;, \end{align} \tag{ D24 }$

$\begin{align} \text{with }\quad W\left(\sigma^z_\alpha,\sigma^z_{\alpha+1}\right)\equiv \sum_{\left\{\sigma^z_{\alpha,i}\right\}} e^{-\beta a\sigma^z_{\alpha}}e^{-\beta J \sigma^z_{\alpha}\sigma^z_{\alpha+1}} e^{-\beta b\sum_i \left(\sigma^z_\alpha+\unicode{x1D7D9}\right) \sigma^z_{\alpha,i}}\;. \end{align} \tag{ D25 }$

We now notice that, when fixing $\vec{\sigma}^z_{(1)} = \{\sigma^z_\alpha\}$ it is possible to solve the sum over $\vec{\sigma}^z_{(2)} = \{\sigma^z_{\alpha,i}\}$ by using the fact that

$\begin{align} \sigma^z_\alpha = 1 &\Rightarrow \sum_{\sigma^z_{\alpha,i}}e^{-\beta\left(\sigma^z_\alpha+1\right)\sum_i\sigma^z_{\alpha,i}} = \prod_i \sum_{\sigma^z_{\alpha,i}}e^{-\beta\left(\sigma^z_\alpha+1\right)\sigma^z_{\alpha,i}} = \left(e^{2\beta b}+e^{-2\beta b}\right)^m\;. \end{align} \tag{ D26 }$

$\begin{align} \sigma^z_\alpha = -1 &\Rightarrow \sum_{\sigma^z_{\alpha,i}}e^{-\beta\left(\sigma^z_\alpha+1\right)\sum_i\sigma^z_{\alpha,i}} = \prod_i \sum_{\sigma^z_{\alpha,i}}e^{-\beta\left(\sigma^z_\alpha+1\right)\sigma^z_{\alpha,i}} = 2^m\;. \end{align} \tag{ D27 }$

It follows that $W(\sigma^z_{\alpha},\sigma^z_{\alpha+1})$ can be seen as a $2\times 2$ matrix (corresponding to the four elements $(\sigma^z_{\alpha},\sigma^z_{\alpha+1}) = (\pm 1, \pm 1)$ ) ,

$\begin{align} W\left(\sigma^z_{\alpha},\sigma^z_{\alpha+1}\right) = \begin{pmatrix} e^{-\beta a} e^{-\beta J} \left(e^{2\beta b}+e^{-2\beta b}\right)^m & e^{-\beta a} e^{\beta J} \left(e^{2\beta b}+e^{-2\beta b}\right)^m \\[3pt] e^{\beta a} e^{\beta J} 2^m & e^{\beta a} e^{-\beta J} 2^m \end{pmatrix}\;. \end{align} \tag{ D28 }$

Finally, notice that the partition function is given by (cf (D23) and (D24))

$\begin{align} Z_\text{Star-chain} = \sum_{\left\{\sigma^z_\alpha\right\}}\prod W\left(\sigma^z_\alpha,\sigma^z_{\alpha+1}\right) = {\textrm{Tr}}\left[W^n\right] = \lambda_+^n+\lambda_-^n\;, \end{align} \tag{ D29 }$

where $\lambda_+$ and $\lambda_-$ are the two eigenvectors of W, that are, in temperature units β = 1,

$\begin{align} \lambda_+ = & 2^{m-1}e^{-a-J} \left( e^{2J}\left(e^{2a}+\cosh{\left(2b\right)}^m\right) +\sqrt{4e^{2a}\cosh{\left(2b\right)}^m+ e^{4J}\left(e^{2a}-\cosh{\left(2b\right)}^m\right)^2} \right)\;, \end{align} \tag{ D30 }$

$\begin{align} \lambda_- = & 2^{m-1}e^{-a-J} \left( e^{2J}\left(e^{2a}+\cosh{\left(2b\right)}^m\right) -\sqrt{4e^{2a}\cosh{\left(2b\right)}^m+ e^{4J}\left(e^{2a}-\cosh{\left(2b\right)}^m\right)^2} \right)\;. \end{align} \tag{ D31 }$

Substituting these values in (D29) constitutes the analytical expression of the partition function for the Star-chain model.

D.3.2. Spectrum of the Star-chain

In this subsection we will build analytical considerations on the energy spectrum of the Star-chain model. Consider again the Hamiltonian

$\begin{align} H_\text{Star-chain} = a \sum_\alpha \sigma^z_\alpha + J \sum_\alpha \sigma^z_\alpha \sigma^z_{\alpha+1} + b_1 \sum_{\alpha,i} \sigma^z_\alpha \sigma^z_{\alpha,i} + b_2 \sum_{\alpha,i} \sigma^z_{\alpha,i}\;. \end{align} \tag{ D32 }$

Here $\alpha = 1,{\ldots},n$ indices the privileged spins, and $i = 1,{\ldots},m$ the 'subordinate spins' of each α-spin, for a total of $N = n(m+1)$ spins. Also, we can take the 'Star-choice' $b_1 = b_2$ that guarantees degeneracy. We are left with

$\begin{align} J\mathcal{I}\left(\sigma^z_\alpha\right)+ a \sum_\alpha \sigma^z_\alpha + b \sum_{\alpha,i} \left(\sigma^z_\alpha+\unicode{x1D7D9}\right) \sigma^z_{\alpha,i} \qquad \text{with } \mathcal{I}\left(\sigma_\alpha^z\right): = \sum_\alpha \sigma_\alpha^z \sigma_{\alpha+1}^z\;. \end{align} \tag{ D33 }$

We separated the term $\mathcal{I}(\sigma_\alpha^z)$ as it is the one that breaks the permutation symmetry (for cyclic boundary conditions it has only cyclic symmetry). Such term needs therefore all its 2ⁿ levels to be resolved. The remaining part is permutationally symmetric and therefore has a spectrum that can be computed efficiently. Define

$\begin{align} n_\uparrow + n_\downarrow = n \qquad n_\uparrow: = \left\{\# \ \alpha-\text{spins up}\right\} \end{align} \tag{ D34 }$

It follows that

$\begin{align} a \sum_\alpha \sigma^z_\alpha = a \left(2n_\uparrow -n\right)\;. \end{align} \tag{ D35 }$

Then, for the remaining spins, notice that there are $n_\downarrow m = (n-n_\uparrow)m$ that do not contribute to the energy due to their α-spin being down, and the interaction of the form $\sum_{\alpha,i} (\sigma^z_\alpha+\unicode{x1D7D9}) \sigma^z_{\alpha,i}$ . For the same reason, all the remaining one see an effective magnetic field equal to 2b. We therefore define

$\begin{align} \mu_\uparrow+\mu_\downarrow = m n_\uparrow \qquad \mu_{\uparrow}: = \left\{\# \text{subordinate spins up with their } \alpha-\text{spin up}\right\} \end{align} \tag{ D36 }$

It follows that

$\begin{align} b \sum_{\alpha,i} \left(\sigma^z_\alpha+\unicode{x1D7D9}\right) \sigma^z_{\alpha,i} = 2b \left(2\mu_\uparrow - mn_\uparrow\right)\;. \end{align} \tag{ D37 }$

Putting all the pieces together, we cannot coarse grain easily the 2ⁿ degeneracy of the $\sigma_\alpha^z$ configurations, but we can simplify the remaining degeneracy by writing down the energy levels as

$\begin{align} E = J\mathcal{I}\left(\sigma_\alpha^z\right) + a \left(2n_\uparrow -n\right) + 2b \left(2\mu_\uparrow - mn_\uparrow\right) \end{align} \tag{ D38 }$

with $n_\uparrow = 1,\dots,n$ fixed by the configuration $\sigma_\alpha^z$ , $\mu_\uparrow = 1,{\ldots},mn_\uparrow$ , and degeneracy (for each $\sigma_\alpha^z$ -configuration) equal to

$\begin{align} 2^{\left(n-n_\uparrow\right)m}\binom{m n_\uparrow}{\mu_\uparrow}\;. \end{align} \tag{ D39 }$

Appendix E: All-to-All model

Consider the following model of an N-spin Hamiltonian.

$\begin{align} H_\text{All}\left(h, J\right) : = -h \sum_{i = 1}^N \sigma_i^z -J \sum_{i < j}^N \sigma_i^z \sigma_j^z, \end{align} \tag{ E1 }$

where h and J are two coefficients. This model consistently emerged from numerical optimisation of $\mathcal{C}$ for small number of spins, up to N = 5. Moreover, the Hamiltonian (E1) model is completely symmetric under permutations of the spins' operators. This helps in expressing its spectrum as a function of the total number k of spins up, having $\sigma^z_i = +1$ (it follows that N − k spins are in the opposite configuration, $\sigma_i^z = -1$ )

$\begin{align} E_k = h\left(N-2k\right) +\frac{J}{2}\left[ 4k\left(N-k\right) - N\left(N-1\right) \right], \end{align} \tag{ E2 }$

each level with degeneracy

$\begin{align} \text{deg}\left[E_k\right] = \binom{N}{k}\;. \end{align} \tag{ E3 }$

It follows that the partition function is given by

$\begin{align} Z_\text{all} = \sum_{k = 0}^N \binom{N}{k} e^{-\beta E_k}. \end{align} \tag{ E4 }$

From numerical optimization (cf appendix A.1), it appears that the optimal values of h and J that maximise $\mathcal{C}$ in this model satisfy the relation

$\begin{align} h = J. \end{align} \tag{ E5 }$

Under such hypothesis, the above expression (E2) for the energy levels can be written as

$\begin{align} \frac{E_k}{J} = -\frac{N\left(N+1\right)}{2}+2\left(k+1\right)\left(N-k\right) = E_{k = N}+2\left(k+1\right)\left(N-k\right)\;, \end{align} \tag{ E6 }$

which clarifies explicitly the ground state being $E_{k = N}$ and the fact that all the levels above form a spectrum that is parabolic in k, with a first excited level corresponding to $k = N-1$ and k = 0, with total degeneracy $\text{deg}[E_{k = 0}]+\text{deg}[E_{k = N-1}] = N+1$ .

Dates

Peer review information

C.1.1. Uniform shift

C.1.2. Single eigenstate shift and bandwith tolerance

C.2.1. Reduction to $H_\textrm{Star}$

C.2.2. Parametric scaling an noise-tolerance of $H_\textrm{Star}$ in the optimal-degeneracy configuration

C.2.3. Noise-tolerance in the degeneracy-suboptimal configurations

C.2.4. Subtler sources of parameter-noise

D.3.1. Partition function of the Star-chain

D.3.2. Spectrum of the Star-chain

Optimal thermometers with spin networks

Could you publish open access in this journal at no cost?

Author notes

Notes

Article metrics

Submit

Share this article

Dates

Peer review information

Abstract

1. Introduction

2. Equilibrium thermometry and properties of optimal spectra

2.1. Optimal spectrum for equilibrium thermometry

2.2. Properties of optimal spectra

3. Optimal spin-network thermometers

3.1. Star model

3.2. Star-chain model

3.3. Implementation in the Chimera graph

3.4. Scaling and constraints on the strength of the interactions

4. Comparison to alternative models

4.1. Ising lattices

4.2. All-to-all symmetric model

4.3. k-SAT model and the exponential degeneracy

5. Conclusions and outlook

Acknowledgments

Data availability statement

Appendix A: ADAM optimization and the emergence of the Star (and Star-chain) models

A.1. N Spin Hamiltonian

A.2. N Spin Hamiltonian with bounded parameters

A.3. Optimal values for the Star model and the Star-chain

A.4. Quantum N Spin Hamiltonian

A.5. D-Wave annealer Hamiltonian

Appendix B: A small Lemma of (Property 2)

B.1. Proof of the lemma

Appendix C: Parametric scaling and noise-tolerance

C.1. Bandwidth tolerance in the degenerate model

C.1.1. Uniform shift

C.1.2. Single eigenstate shift and bandwith tolerance

C.2. Consequences for the Star and Star-chain

C.2.1. Reduction to H_\textrm{Star}

C.2.2. Parametric scaling an noise-tolerance of H_\textrm{Star} in the optimal-degeneracy configuration

C.2.3. Noise-tolerance in the degeneracy-suboptimal configurations

C.2.4. Subtler sources of parameter-noise

Appendix D: Analytics for the Star model and Star-chain model

D.1. Partition function for the Star model

D.2. Statistics of the energy levels, exponential suppression above the degeneracy, and slightly suboptimal configurations with better parameter scaling

D.3. Star-chain: partition function and spectrum

D.3.1. Partition function of the Star-chain

D.3.2. Spectrum of the Star-chain

Appendix E: All-to-All model

Footnotes

C.2.1. Reduction to $H_\textrm{Star}$

C.2.2. Parametric scaling an noise-tolerance of $H_\textrm{Star}$ in the optimal-degeneracy configuration