Optimal thermometers with spin networks

The heat capacity C of a given probe is a fundamental quantity that determines, among other properties, the maximum precision in temperature estimation. In turn, C is limited by a quadratic scaling with the number of constituents of the probe, which provides a fundamental limit in quantum thermometry. Achieving this fundamental bound with realistic probes, i

Due to its generality and practical relevance, we consider in this work equilibrium thermometry [4].In this case, the probe is assumed to be well described by a thermal state at the temperature T that is being estimated.Then, the error ∆T of any measurement on the probe is bounded by [40,41]: where C is the heat capacity of the probe, and ν the number of repetitions of the experiment -see Sec.II for a precise definition of the quantities involved.Intuitively speaking, a high heat capacity ensures that the energy of the probe highly varies with T , thus enabling the detection of small temperature variations.An optimal probe for equilibrium thermometry is hence the one with the highest heat capacity.The ultimate limits to this problem were set by Correa et al. in Ref. [35] by finding the maximum C given an arbitrary Hamiltonian of dimension D. The spectrum of such an optimal probe consists in an effective two-level system, with a single ground state and an exponential degeneracy of the excited level.The resulting optimal heat capacity reads C opt ≈ (ln D) 2 /4.If we consider that the probe consists of N bodies of dimension d (hence D = d N ), then C opt becomes [35,39]: This expression shows a quadratic scaling with the number of constituents N , to be confronted with the typical extensive behaviour of the heat capacity (i.e.linear in N ).This quadratic scaling is reminiscent of the well-known Heisenberg limit in quantum metrology [42], although it should be realised that the advantage here arises due to the interacting nature of the probe's Hamiltonian, and not from the presence of entanglement in the probe.Reference [37] provides a specific N -spin interacting Hamiltonian that can saturate (2), which however requires N -body interactions.A natural question therefore arises: Q: Can we reach the ultimate limit (2) via realistic Hamiltonians, i.e. featuring two-body and local interactions?
A natural approach to address Q is to consider probes at the verge of a thermal phase transition, where the heat capacity can scale superextensively with N [43][44][45][46][47][48].Previous studies with spin systems close to criticality exemplify the potential of phase transitions for thermometry [45,46,48] but do not come close to the ultimate limit (2).For small values of N , proposals for optimal probes have also been considered with spin chains [37,49] or interacting fermionic systems [50].Yet, despite promising progress, none of the above approaches leads to a general answer to Q and hence to the possibility of approaching a quadratic precision in quantum thermometry.
To address Q, we consider as a platform a generic system of spins with two-body interactions, such as those currently programmable in quantum annealers.Their open system dynamics is starting to be studied [51][52][53][54][55], and they represent flexible physical devices with a high degree of control.More specifically, we consider a Hamiltonian of the form: where σ z i = ±1 is the i-th classical spin of the system 1 .We then maximise C over all control parameters h i and J ij (with different constraints on their locality and strength).To tackle the exponential complexity of this task, we use advanced numerical techniques, commonly employed in the Machine-Learning community, to discover ansatz for the form of optimal probes.We then combine these numerical ansatz with physical insights to analytically prove that C can display the quadratic scaling of Eq. ( 2), with a slightly worse prefactor that depends on the locality of the Hamiltonian (3), thus answering affirmatively Q.These results add on recent applications of Machine-Learning based techniques in the field of quantum thermodynamics [56][57][58][59][60][61][62][63], as well as in other domains, including protein folding [64], many-body problems [65][66][67], geosciences [68], algorithm discovery [69].
In Fig. 1, we illustrate the type of results obtained.The heat capacity of any spins system is upper-bounded by the fundamental bound C opt (red line, and Eq. ( 2) with d = 2), however the maximum heat capacity obtainable with N non interacting spins simply corresponds FIG. 1. Maximum heat capacity Cmax of spin-based thermometers.The red line corresponds to the mathematical bound C opt (2) on any system with dimension D = 2 N (for a formal definition, see Eq. ( 8)), which shows a quadratic ∝ N 2 scaling in terms of the number N of total spins employed.Our optimal spin-network architecture, the "Star model", provides the highest heat capacity for Hamiltonians of the form (3) when N ≥ 6, and can reach the mathematical bound C opt (2) in the large N limit.This is to be compared with the extensive ∝ N scaling of standard models, such as the 1D Ising chain.The green line delimits the region accessible with noninteracting spins, and simply corresponds to ∼ 0.44N , 0.44 being the maximum heat capacity of a single spin.
to N times the maximum heat capacity of a single spin (green line).The use of interactions can enhance C. Nevertheless, standard interacting spin-networks such as the 1D Ising model in the Figure (purple dots) show an extensive scaling of C max in the limit of large N , hence losing their advantage.In contrast, we find optimal spinnetwork architectures (3) that approximate C opt for all N , see e.g. the Star model as an example of the architectures discussed in the next sections (blue dots in Fig. 1).
The rest of the paper is structured as follows.In Sec.II we review equilibrium thermometry, and we analyse the fundamental properties of the optimal energy spectra for the maximization of C. In Sec.III we move to the case of physically realistic probes (3), we present the derivation and analysis of our optimal thermometer models, and then discuss their implementation and properties.In Sec.IV we describe other relevant models which we use for performance comparison.Finally in Sec.V we conclude and discuss future directions and applications of this work.The Appendix contains details of the numerical methods employed, technical analytical derivations, and complementary analysis of our results.The code written to perform the machine-learning based optimization will be available online when published (see the "Code availability" section).

II. EQUILIBRIUM THERMOMETRY AND PROPERTIES OF OPTIMAL SPECTRA
Let us consider a sample at some unknown temperature T , corresponding to the inverse temperature β = T −1 (herafter we set k B = 1 for simplicity).To assess β, we let the sample weakly interact with a probe described by its Hamiltonian H.After a sufficiently long time, it is assumed that the probe will reach a Gibbs state, fully determined by H and β: By measuring the energy of ρ β (H), it is possible to infer β (hence the temperature T ).Let us note that projective energy measurements were shown to be optimal for temperature estimation [35,41].In particular, the Cramer-Rao bound [70] specific to the case of temperature estimation [3,4] can be exploited to estimate minimal error ∆T .More precisely, for a number ν of identically and independently distributed (i.i.d.) repetitions of the experiment, ∆T has a mean square value that is bounded by Eq. ( 1), that is It is therefore clear that the maximum precision one can get in estimating the temperature T by measuring the energy of the probe at equilibrium with the sample is determined by the heat capacity C.This one is formally defined as the variation in mean energy of the probe per temperature change unit, i.e.

C(H,
In terms of the eigensystem {E i , |E i } of the probe's Hamiltonian H, the state populations of ρ β (H) read In the energy eigenbasis of the probe, it is easy to verify from Eq. ( 5) that the heat capacity is proportional to the energy variance of the Gibbs state: where D is the dimension of the Hilbert space.Such expression clarifies that the heat capacity only depends on the spectrum of the Hamiltonian H and inverse temperature β.It is important for the following to note the scale invariance C(λH, λ −1 β) = C(H, β), with λ ∈ R.This allows us to express all energies in units of β, and to simply refer to the heat capacity as a function of a adimensional Hamiltonian H := βH, as C( H, 1) = C(H, β).
In the following, we will omit the tilde and simply use adimensional units, writing C(H) := C(H, 1).We also emphasize that a global energy shift does not affect neither the Gibbs state nor the heat capacity as A. Optimal spectrum for equilibrium thermometry From Eq. ( 1), we see that an optimal probe for thermometry is the one with maximum heat capacity C. The maximization of C of a generic D-dimensional system at thermal equilibrium has been carried out in [35] assuming full-control on the Hamiltonian and its spectrum, The resulting optimal spectrum consists of a single ground state and a (D −1)-degenerate excited state, that is with an optimal gap E = x in temperature units that satisfies the transcendental equation e x = (D − 1)(x + 2)/(x−2).The corresponding heat capacity is C opt (D) = x 2 e x (D − 1)/(D − 1 + e x ) 2 [35].This expression gives in the asymptotic regime of large probes (D → ∞) x ln D, hence C opt (D) (ln D) 2 /4.For a probe made up of N constituents, each with local dimension d, so that D = d N , we recover the Heinsenberg-like scaling in Eq. ( 2).

B. Properties of optimal spectra
In order to understand the origin of the desired scaling C ∝ N 2 , we now discuss the relevant features of the spectrum Eq. ( 9) in its optimal configuration, as well as possible perturbations of it.This will be relevant for cases in which a physical realization (e.g. using Eq. ( 3)) can approximate Eq. ( 9), but not exactly.Specifically, we prove that a large class of spectra can exhibit the Heisenberg-like scaling ∝ N 2 of the heat capacity, when the following 3 properties are satisfied: P1: exponential degeneracy.The spectrum has a two-level structure with single ground state, and a first excited level that is exponentially degenerate (in N ), with a gap that can be tuned.
P2: bandwidth tolerance.The engineering of the effective two-level spectrum, and in particular of the bandwidth of the first excited level, can tolerate a relative precision of O(1/N ).
P3: tolerance to additional energy levels.The presence of other energy levels does not necessarily deteriorate the maximal value of C and its scaling.In particular: i) high energy levels (i.e.above the first excited level) do not decrease the maximal heat capacity, while ii) energy levels below the first excited have an exponentially small contribution to the heat capacity, provided that their total degeneracy is (at most) polynomial in N , and their gap to the ground level increases (at least) linearly in N .
A schematic representation of the class of spectra satisfying the above three properties is given in Fig. 2. We

FIG. 2. (Left)
The idealized model H deg (9).(Right) we prove that any Hamiltonian featuring a spectrum of the form respecting properties P1-P2-P3 (see details in text) can exhibit a ∝ N 2 scaling of the maximal heat capacity.
now provide an intuitive understanding of these properties.
The importance of the exponential degeneracy of the first excited state (P1) can be appreciated from the degenerate model Eq. ( 9) and its corresponding ground state probability for the Gibbs state (in units of β), p 0 = (1 + (D − 1)e −E ) −1 , which can be expressed as (10) For small energy gaps E and large D, the value of p 0 is ∼ 0, meaning that in the thermal state, almost all the population is spread evenly in the degenerate excited subspace.When the gap reaches E ∼ (ln(D − 1)), p 0 = 1 2 , while for larger values it increases to ∼ 1, and the excited levels become empty.The width of this transition is of order ∼ O(1), and it is the point where the system experiences the peak in heat capacity; in fact, for smaller (larger) values of E, the energy variance is suppressed exponentially, given that the whole population collapses to the excited subspace (ground state).At the peak of the heat capacity, approximately half of the population is in the ground state, and half is spread in the degenerate level.If the degeneracy (D − 1) = d N − 1 is exponential in N , the optimal gap is linear in N , and the resulting energy variance Eq. ( 7) scales quadratically.According to this observation, the exponential (in N ) degeneracy of the first excited level is the first main ingredient for a system to exhibiting such quadratic scaling of the heat capacity.Furthermore, we notice that at a formal level, the same scaling is obtained whenever D ∝ d N for some d > 1, which leads to P1.Notice however that any physical implementation of such a conceivably highly finetuned two-level probe will be susceptible to noise.The resulting deviation will cause a broadening of the ideally degenerate excited level into a band.In App.C, we prove that the optimal scaling of C is preserved as long as the error in the energy gap between the ground state and the first excited state (including the broadening of level band) is of order O(1).This is to be contrasted with an optimal gap E ∝ N that scales linearly, thus requiring a relative precision of 1/N in the engineering of the energy levels (P2).
Finally, it is possible to show that (the quadratic scaling of the heat capacity?) is preserved even in the presence of additional "undesired" energy levels, provided that property P3 is satisfied.
More precisely, consider two Hamiltonians, H This property guarantees that additional excess levels above the k 1 -degeneracy of H 1 can only increase the maximal heat capacity.As a consequence, as the system size grows, any model featuring an exponential degeneracy of the first excited level and a tunable gap will show the desired Heisenberg-like scaling of the heat capacity.The control over E, while keeping E α ≥ E ∀α, can easily be obtained, for example by rescaling all the parameters of H 1 or H 2 globally.For what concerns additional levels below the first excited, it is enough to notice that if their total number is of order O(N k ) for finite k, and their gap from the ground state energy is bounded between N K and N ln d (0 < K < ln d ), their total contribution to the variance (7) scales as O(N k+2 exp[−βN K]), and is therefore suppressed for large N .

III. OPTIMAL SPIN-NETWORK THERMOMETERS
We recall that, without any restriction on the possible interactions among the N spins, it is possible to generate the Hamiltonian Eq. ( 9) and to saturate the theoretical maximum value C opt of the heat capacity (see e.g.[37], where the authors make use of arbitrary N -body interactions).The question Q we address in this work is whether it is possible to achieve the optimal scaling C ∝ N 2 if we restrict ourselves to physically motivated 2-body Hamiltonians given by Eq. ( 3).In such spin-systems, we have D = 2 N , where N is the total number of spins, thus the ultimate limit Eq. (2) reads for large N .Below, we demonstrate that the answer to our main question is positive.We show that it is possible to design a thermal probe ("Star model", Sec.III A) consisting of N interacting spins with two-body interactions that approximates the maximum value C opt of the thermal sensitivity Eq. ( 12).We further prove that a thermal probe ("Star-chain model", Sec.III B) with two-body and local interactions can be designed with a heat capacity exhibiting the same scaling as Eq. ( 12) with a prefactor that can be made arbitrarily close to the Star model.Moreover, in Sec.III C we show that the Star-chain model can be realized on currently available quantum annealers.Finally, in Sec.III D we analyze the scaling of the Hamiltonian parameters in these configurations, and the effect of constraints on the absolute value of the parameters.

A. Star model
We now search for thermal probes, consisting of spin networks with two body interactions, that maximize the heat capacity.We maximized C(H) over the parameters h i and J ij employing Eq. ( 7) and constraining H to be of the form (3). Notice that such problem is numerically hard due to (i) its nonconvexity, (ii) the number of optimization parameters that scales quadratically in N , and (iii) the number of spin configurations that scales exponentially.As such, first attempts based on simpler techniques such or gradient descent with momentum [71] estimating the gradients with finite-differences, and the gradient-free covariance matrix adaptation evolution strategy [72], would get stuck in sub-optimal local maxima with a substantially lower C, not exhibiting the Heisenberg-like scaling.We thus decided to use tools commonly employed in Machine Learning, i.e. we implemented the optimization in PyTorch that allows us to compute the exact gradients of the negative heat capacity using backpropagation [73], and we used the Adam optimizer [74] (see App.A for details).
After repeating the optimization for different total numbers of spins N , a recurrent pattern emerges (cf.App.A and Fig. 3), corresponding to a "Star model" Hamiltonian of the form with a, b ∈ R, corresponding to a single spin (σ z 1 ) that is coupled uniformly to all the other ones.A representation of this Star model is shown in Fig. 3.The resulting spectrum has 2 main classes of eigenstates.The first class consists of N −1 k -degenerate evenly spaced states with energy  this the second term in (13) becomes null, independently of the value of all the other spins i = 2, . . ., N , and we get a 2 N −1 -degenerate excited state with energy That is, thanks to the simple topology and choice of the couplings in Eq. ( 13), the first spin σ z 1 acts as an "on-off" switch for the effective magnetic field on the remaining spins, generating an exponential degeneracy of the E deg level.The partition function of the Star model can be solved analytically, being the sum of the two partition functions corresponding to σ z 1 = ±1, i.e.
This expression can be used to efficiently compute all the relevant thermodynamic quantities of the model (cf.App.D 1).Moreover, it is easy to see that by choosing b > 0 and b(N − 3) ≤ a < b(N − 1), one ensures corresponding to a single ground state, and 2 N −1degenerate first excited level.By saturating b(N −3) = a, one gets  13) (for N = 9). 2 N −1 eigenvalues form a binomial spectrum (14), while the other 2 N −1 eigenvalues are completely degenerate (15).The gap between the ground energy and the exponentially degenerate level approximately coincides with the optimal gap of the ideal spectrum H deg (9) for D = 2 N −1 [35] (red line), as expected from the discussion in Sec.III A.
degeneracy for the first excited state2 (for a visual representation, see Fig. 4).Notice that property P3 (11) ensures that such a model can achieve at least the heat capacity C opt (2 N −1 ), that is In the asymptotic limit, we get which becomes indistinguishable from the theoretical bound C opt (2 N ), see Eq. ( 12) and shown in Fig. 1 and Fig. 9 below.In App.A 3, we provide a table with the optimal values of the Hamiltonian parameters a, b (see Eq. ( 13)) and the corresponding value of C

Star[N] max
given by numerical optimization.

B. Star-chain Model
As the Star model arises from an unconstrained numerical optimization of C for Hamiltonians of the form (3) (cf.above Sec.III A and App.A), we conjecture it to be the global optimum for such a class.However, the star-shaped connectivity of Eq. ( 13) (Fig. 3) cannot be scaled to arbitrarily large number of constituents as it has long-range interactions.This motivates us to restrict the star-shaped connectivity to short-range interactions only, given rise to the hereafter named "Star-chain model".Specifically, inspired by the Star model, we consider N = n(m + 1) spins as sketched in Fig. 5, described by the Hamiltonian Here α is the index identifying the central spin of each Star-like sub-unit (orange circles in the where A = e βa , B = cosh (2βb), and C = e βJ .From the spectral point of view, this model guarantees a 2 n ↓ m degeneracy for each energy level with n ↓ down α-spins.
In particular, when all the α-spins are down, i.e. σ z α = −1 ∀α, corresponding to a 2 mn degeneracy.Moreover, if the couplings J are negative and strong enough to force all the n α-spins to be the same (for a detailed analysis of the needed coupling strengths, see Sec.III D and App.C), one is left with two configurations only, σ α = ±1 ∀α.The case σ α = 1 corresponds to an evenly spaced, binomially distributed spectrum E = na + 2b α,i σ α,i , while the case σ α = −1 corresponds to E = −na with degeneracy 2 nm .It should be noticed that these configurations effectively lead to the same spectrum as the one of the Star model.More precisely, while the Star spectrum consists in 2 N −1 states with binomial spectrum ( 14) and other 2 N −1 states that are completely degenerate, see Eq. ( 15), the Star-chain has, in the limit of large −J, 2 mn states with a binomial spectrum, 2 mn degenerate states, and (2 n − 2)2 mn remaining arbitrarily high energy levels that can be neglected as justified earlier in this work.Property 2 (11) then ensures that the Star-chain model can exhibit a heat capacity at least as large of that of a system having 1 ground state and a 2 mn + mn-fold degenerate first excited level.This spectrum is achieved by choosing which leads to C Equation ( 24) makes clear how large values of m increase the achievable heat capacity, (see App.A 3 for the optimal values of C and the corresponding Hamiltonian parameters).For m = N −1, the Star-chain model coincides with the Star model (n = 1, cf.Figs. 3 and 5).Let us note that short-range interactions impose a maximum m, but this one only impacts the prefactor of the quadratic scaling.Hence, it does not change the quadratic scaling of C demonstrated with the Star-chain model.

C. Implementation in the Chimera graph
Quantum annealers are devices governed by programmable quantum spin Hamiltonians, therefore representing a natural platform to test our findings.Interestingly, the topology of the interactions of the Star-chain model with m = 3 (cf.Fig. 5) can be embedded into the Chimera graph (cf.Fig. 6) of the D-Wave annealing quantum processor [75].This means that, as from (24), a programmable spin network in the Chimera graph can reach at least ∼ m 2 (m+1) 2 = 9/16 of the ultimate bound C opt (2 N ) (12).Remarkably, numerical optimization of C for the Chimera model results indeed in the Star-chain model with m = 3 represented in Fig. 6 (see App.A 5).Notice also that there exists new architectures of the D-Wave annealers, such as the Pegasus graph [76][77][78], which can reach higher connectivities, and therefore higher values of m for which the Star-chain Hamiltonian can be embedded.Such optimal thermometer probes could be used, for example, to precisely measure the surrounding effective temperature of the annealer (to be compared with the cryostat temperature), and overall to gain a better understanding of the D-Wave annealer as an open quantum system [51,52,79,80].
Let us emphasize a specificity of both the Star and Star-chain models, that may become relevant for practical applications.For both models, it is enough to measure a single spin to perform temperature estimation.FIG. 6. Embedding of the Star-chain model for m = 3 (see Fig. 5) into the Chimera graph, which is used by D-Wave Systems [75].As in Fig. 5, orange circles represent the α-spins, coupled to each other through the blue lines, while black circles represent the (α, i)-spins coupled to the respective α-spin through the gray lines.Dotted lines represent unused couplings of the Chimera architecture (i.e.where Jij = 0).
In the regime of large N , the only relevant energy levels contributing to the Gibbs state are the ground level, and the first excited level (higher excited levels are exponentially suppressed in the statistics, cf.App.D 2 and previous discussions).We can distinguish between these two cases by simply measuring the value of σ z 1 for the Star model (13), or any of the σ z α in the Star-chain model (20).

D. Scaling and constraints on the strength of the interactions
While the results presented above are very promising, one challenging requirement of the optimal configurations is the strength of the interactions between the constituents.This one becomes increasingly demanding for large N .For instance, engineering the optimal spectrum for the Star model with a first-excited state degeneracy 2 N −1 +N −1 requires the scaling b ∝ N and a ∝ N 2 as N grows (see "unconstrained" dots in Fig. 7(a)), accompanied by a relative precision ∝ N −2 for both parameters (cf.Sec.III A and App.C).
However, as we show in App.D 1, there exist solutions that are mathematically sub-optimal but numerically indistinguishable in terms of C, with much more favorable scaling of the Hamiltonian parameters.In fact, even when limiting b to be bounded by a constant, it is possible to achieve the desired quadratic scaling of C, arbitrarily close to the optimal value Eq. (19).These solutions feature a finite b, whose precise value becomes irrelevant, and a linear scaling of a ∝ N , which admits a relative precision ∝ N −1 (see Apps.C, D 1 and "constrained" dots in Fig. 7(a)).Similarly, the Star-chain model features solutions in which the scaling of its a, b parameters is bounded, while J scales linearly (see Fig. 7(b)).As seen in Fig. 7, for reasonable sizes of the thermal probe up to 50 spins, these scaling induce Hamiltonian parameters of moderate strength, both for the Star and Star-chain model.Finally, it is possible to use the machine learning optimization method to maximize C over all possible Hamiltonians of the form Eq. ( 3) with the additional constraint for the parameters {h i , J ij } to be bounded (See App.A 2 for details).Our numerical optimization leads to configurations that are too complex and case-dependent to be discussed in generality.However, the resulting maximal heat capacities seem to indicate that a quadratic scaling is still possible under such constraints, see Fig. 8.

IV. COMPARISON TO ALTERNATIVE MODELS
In Fig. 9, we compare the maximum values of the heat capacity C for different models of spin Hamiltonians, as the number of spins N grows.The Star and Star-chain models show a quadratic scaling in N that eventually surpasses all standard models -such as the Ising model in 1D, as well as a model of uniform "all-to-all" interactions.The latter show instead the standard thermodynamic extensive scaling, i.e. linear in N , of the heat capacity.Below, we briefly describe each of the relevant alternative models to which our results obtained for the Star and Star-chain models have to be compared.
a. Ising Lattices.The 1D Ising model is arguably the simplest candidate for an interacting-spin thermom- etry probe.For N spins, it is defined by the Hamiltonian where we choose periodic boundary conditions σ N +1 ≡ σ 1 .The heat capacity for this model can be efficiently computed with standard techniques [81].Numerically maximization leads consistently to homogeneous interactions J i = J and local fields h i = h.As expected, an Ising chain probe will at most achieve a linear scaling in N of the heat capacity, as seen in Fig. 9.Note that a 2-dimensional Ising model can achieve a slightly higher scaling at criticality, i.e.C max ∝ N ln N [82,83], while the 3-dimensional Ising model has C max ∝ N 1.058 using critical scaling [84].b.All-to-all symmetric model Another relevant model for this work is a model with all-to-all interactions, completely symmetric under permutations.Its Hamiltonian takes then the form It describes a complete graph with homogeneous interactions J > 0 and local fields h > 0. Taking the systems' symmetries into account, we get the following N kdegenerate eigenenergies for k ≤ N up-spins: As shown in App.E, the all-to-all model shows a "large degeneracy" of the first excited level for small N .It is remarkable that for N ≤ 5, the all-to-all model appears to have the highest heat capacity among the models we investigated (see inset in Fig. 9) and consistently FIG. 9. Detailed performance of the optimal spin-based thermometers found in this work.Our optimal architectures, "Star model" and "Star-chain model" demonstrate a quadratic ∝ N 2 scaling of the maximal heat capacity C in terms of the number N of total spins employed.This is to be compared with the extensive ∝ N scaling of standard models, such as the 1D Ising chain, or the All-to-all model described in the text.For N ≥ 6, the Star model provides the highest heat capacity for Hamiltonians of the form (3) and can reach the mathematical bound C opt (8) (red line in the plot) in the large N limit.The Star-chain model has a similar behaviour while only using short-range interaction, and it can be programmed on current quantum annealers (cf.Sec.III C).A detailed description and discussion of all the models we consider is given in the text.
emerged from numerical optimisation in the same regime (cf.Sec.A).However, the degeneracy of the first excited state increases linearly in N , to be contrasted with the exponential increase of the Star and Star-chain models.This ultimately leads to a linear scaling of the heat capacity of the symmetric all-to-all model for large N .
c. k-SAT model and the exponential degeneracy Finally, we notice that in Refs.[85,86], a Hamiltonian replicating a global AND-operation between M logical bits (represented by M spins) was introduced, with the aid of M ancillary spins.Such Hamiltonian was proposed as the basic element to build general models to solve k-SAT problems [87].We will thus refer to it as the k-SAT model.A logical AND identifies a single string (without loss of generality, the string given by 111 . . . 1, M times) with an energy E AND different from the energy E AND as-sociated to all the other 2 M − 1 logical strings.Formally, this spectrum coincides with the ideal two-level degenerate model (9), and therefore the Hamiltonian proposed in [85,86] exhibits the desired quadratic scaling of the maximum C, more precisely 2 ) (cf.Fig. 9).The construction uses a total N = 2M of spins (the neglected levels correspond to energies that can be made arbitrarily high, see [85]), that is, an overhead of N/2.The Star and Star-chain models achieve similar degeneracies while using a much smaller overhead, i.e. a 1-spin overhead for the Star model and a N/(m + 1)spins overhead for the Star-chain model.In Table I, we compare these models in terms of the excited-level degeneracy and scaling of C, as well as the locality of the interactions.
Model 1st excited deg.Asymptotic Cmax Short-range?degeneracy).Moreover, the Star-chain model can be realised with short-range interactions.

V. CONCLUSIONS AND OUTLOOK
In this work, we addressed the problem of maximizing the heat capacity C of physically realisable quantum systems, which amounts to engineer the best probe for temperature estimation in the context of equilibrium thermometry [4].Using a combination of analytical derivations, Machine-Learning methods, and physical insights, we explore the design space of spin Hamiltonians with two-body interactions and local magnetic fields, discovering Hamiltonians with star-shaped topology that can approach the theoretical maximum of C in the limit of large systems.Additionally, we showed that an arbitrarily good approximation can be achieved when requiring these interactions to be short-ranged.The models emerging from such optimisation achieve a Heisenberg-like scaling of the sensitivity, without the use of entanglement, contrary to the well-known case of phase estimation in quantum optics [42].Remarkably, these models show a simple architecture of the interactions that make them ideal probes also for adaptive temperature estimation schemes [39,88].We further showed that the models we found can be embedded in currently available quantum annealers [75], making them highly attractive both from a theoretical and experimental points of view.These results pave the way to the physical realization of ultrasensitive spin-based thermometers, valid also for alternative experimental platforms such as cold atoms [89], NV centers [90], and Rydberg atoms [91].Of particular interest is the use of these engineered optimal spin-network thermal probes for ultracold gases [13,[21][22][23]32].
In terms of Hamiltonian spectrum engineering, we showed that the essential requirement for an optimal thermal probe made of N constituents is the presence of a single ground state and an exponential degeneracy of the first excited level.This effective two-level spectrum also appears in other problems, in which we speculate that our work might have application, such as protein folding modelling [92,93], adiabatic Grover's search [94][95][96], energy based boolean computation [85], and quantum heat engines [84,[97][98][99][100]].
An interesting challenge for the future is to characterise the relaxation timescale τ rel of the optimal probes derived here, see also Refs.[96,97].Due to critical slowdown, we expect a trade-off between large heat capacity and slowness of the relaxation process.It hence remains a relevant open question if a similar Hamiltonian engineering can be performed taking as a figure of merit C/τ rel , which would also have important consequences in the optimization of thermal engines [84,[97][98][99][100].At the same time, it is worth emphasising that when time is a resource for thermometry3 , optimal non-equilibrium protocols require the same effective two-level structure of the models presented in this work, as recently shown in [38].Another challenge is to move beyond the weak coupling assumption behind (4), and consider the optimisation of thermometer probes for the more general mean force Gibbs state [101][102][103].
In this subsection we show how the Star model emerged from the numerical optimization considering the N spin Hamiltonian given in Eq. ( 3), and we provide some details on the optimization method.The optimization was carried out as described above considering {h i } and {J ij } as the θ parameters.Both are initialized randomly between −1 and 0. We performed separate optimizations for N ∈ {2, 3, . . ., 15}, finding the All-to-All model for N ∈ {2, . . ., 6}, and the Star model for N ≥ 7.For N ∈ {2, . . ., 11}, the results were found running a single optimization with fixed learning rate α = 0.001 for 60000 steps (although most optimizations converge much sooner).However, for N ≥ 12, this choice would sometimes get stuck in local minima.For N = {12, 13}, we ran the optimization multiple times as detailed above, and chose the model with the largest heat capacity.For N = {14, 15}, to avoid getting stuck in local minima, we used a common technique in Machine Learning, which consists of scheduling the learning rate, i.e. of varying it at each optimization step.In particular, we used the "CyclicLR" scheduler of PyTorch that varies the learning rate in a triangular fashion between a minimum α min and a maximum α max value.For N = 14, we chose α min = 0.0003 and α max = 0.2, and halved the amplitude of the triangle at every repetition (such that, asymptotically, the learning rate converges to α min ).The number of steps during the "up phase" of the triangle was chosen to be 6000.For N = 15, we chose α min = 0.001 and α max = 0.1 without halving the amplitude of the triangle at every repetition.Also in this case the "up phase" consists of 6000 steps.
To show the emergence of the Star model, in Fig. 10 we show the values of h i and of J ij found with our numerical method for N = 4 (left panels), N = 7 (middle panels), and N = 9 (right panels).The upper panels show h i as a function of the site index i = 1, . . ., N , while the lower panels show the value of J ij (the color) as a function of the site indices i and j.Since J ij is only defined for i < j, a white square is shown when such condition is not satisfied.
As we can see, for N = 4 all parameters take the same value (h i = J ij ≈ −0.377 for every i and j); this corresponds to the All-to-All model described in Eq. ( 26) with h = J ≈ 0.377.For N = 7, we see that all spins are interchangeable except for a single privileged spin corresponding to i = 2. Indeed, h 2 ≈ 4.41, while h i =2 ≈ −1.15, and J ij is non null, (right panels).The upper panels show hi as a function of the site index i = 1, . . ., N , while the lower panels show the value of Jij (the color) as a function of the site indices i and j.Since Jij is only defined for i < j, a white square is shown when such condition is not satisfied.and equal to −1.15,only when i or j are equal to 2. This is precisely the Star model as described in Eq. ( 13) with a ≈ 4.41 and b ≈ −1.15.For N = 9, we find a model where all spins are interchangeable, except for two privileged spins corresponding to i = 2, 3. Indeed, h i = 0 except for h 2 = h 3 ≈ −1.51.Also J ij = 0 when both i and j are not 2 or 3, while J 23 ≈ 8.96, and J ij = −1.51when i or j are 2 or 3, but not both.The Hamiltonian of this model can be written as H Star(N ) (a, b) has the same exact spectrum as H Star(N ) (a, b), and therefore the same heat capacity.Therefore, while they are physically two different models, they have identical characteristics as thermometers.At last, in Fig. 12 we analyze the spectrum of the Star model corresponding to N = 9.The individual eigenenergies are plotted as dots on the y-axis.For comparison, we plot as a red line the value of E (measured from the ground state energy) that maximizes the heat capacity of the degenerate Hamiltonian H deg [see Eq. ( 9)] for N = 9.As expected, there is a single ground state, a highly degenerate first excited state (with a degeneracy approximately given by half of the states), and further excited states with a binomial degeneracy.Interestingly, and as suggested by the Lemma of App.B, the value of E is quite similar to the energy of the highly degenerate first excited state.7 (see caption of Fig. 7) for details.The second half of the parameters are given in Table III.

Quantum N Spin Hamiltonian
In this subsection, we employ our numerical optimization method to maximize the heat capacity of the most generic two-body spin Hamiltonian, namely where h(µ) i and J(µ,ν) ij are arbitrary parameters.Since the heat capacity only depends on the spectrum of the Hamiltonian, we can perform arbitrary unitary operations to Hquantum without changing its spectrum, thus its heat capacity.Choosing local unitary transformations of the form where θ µ are three suitable angles, we can always rotate µ∈{x,y,z} h(µ) i σ µ i into an operator proportional only to σ z i .Therefore, applying the appropriate unitary transformation on each spin site, we obtain the Hamiltonian where h i and J (µ,ν) ij are arbitrary parameters.
In this subsection, without loss of generality, we optimize Eq. (A8) considering h i and J (µ,ν) ij as optimization parameters θ.We performed a separate optimization for N ∈ {2, 3, . . ., 9} with a fixed learning rate α = 0.001, performing 20000 optimization steps, and starting from random initial values of the parameters uniformly distributed between 0 and 1.In all cases, we found values of the heat capacity that are identical (up to numerical errors) to the values found considering the N spin Hamiltonian with only σ z , i.e. the model, given by Eq. ( 3), considered in the previous subsection.Furthermore, these solutions also have the same spectrum found in the previous subsection.However, they are not the same model: indeed, the optimal values of J (µ,ν) ij that we find are non-zero when µ = z and ν = z.This can be understood in the following way: since the spectrum and the heat capacity are invariant under unitary transformations, we can apply any unitary transformation to the Star model to generate different models that display the same spectrum and heat capacity.Therefore, there is an infinitely large class of systems with the same optimal heat capacity, and our optimization method converges randomly to one of these solutions.
As an example, in Fig. 13 we plot the spectrum, h i , and J (µ,ν) ij (for µ, ν ∈ {x, y, z}), that we found for N = 9, in the same style as in Figs. 10 and 12.As we can see, there is some structure in J (µ,ν) ij for µ, ν ∈ {x, y} that privileges a specific spin index (number 8 in this case).However, it is clear that this model is different from the N spin Hamiltonian with only σ z .Nonetheless, we see that the spectrum, and thus the heat capacity, is essentially the Star spectrum (compare the first panel of Fig. 13 with Fig. 12).The very small discrepancies are due to the numerical optimization method that reached parameters near the local minima, but not exactly.As previously anticipated, the , for all combinations of µ, ν ∈ {x, y, z}, as a function of the site index i and j (similar to Fig. 10).Since J (µ,ν) ij is only defined for i < j, a white box is plotted when such condition is not fulfilled.
"noisyness" that is visible in many panels can be explained by the infinite number of models that yield the same spectrum, such that the numerical method converges to a random one based on the initial stochastic choice of the parameters.

D-Wave annealer Hamiltonian
In this subsection we consider a spin Hamiltonian as in Eq. (3) with only σ z i terms, but we constrain the optimization to reflect the topology of the interactions of D-Wave annealers.In particular, we consider the Chimera graph as in Fig. 6, and we focus on 3 units, i.e. 24 spins.This corresponds to excluding the lower right unit of Fig. 6.Mathematically, we enforce elements of J ij to be null whenever a connection between spin i and j is not present in the topology, and then we minimize the loss function considering the non-null {J ij } and {h i } parameters as θ.We use the Star optimization method for 20000 steps at a fixed learning rate, and randomly initializing the parameters the maximum heat capacity obtainable with Proof of the Lemma.When computing the variance of the energy in a thermal state, global shifts in the energy do not matter.For this reason we rewrite the same Hamiltonians putting the k 1 |i levels to zero, i.e.
with E ≥ 0 and E α − E ≡ E α ≥ 0. We now use temperature units β = 1, to simplify the discussion.The thermal states are therefore Let us call p (B7) Notice that they both depend on the value of E, which we omit in the following for simplicity.It is easy to compute the heat capacity (equivalently, the energy variance) as For what concerns H 2 instead (calling p α the population of the level E α ) with where the inequalities follow from B being trivially positive, while A corresponds to the variance of an Hamiltonian having levels E α with population p α and all the rest of the population 1 − α p α being at an energy = 0.It follows that Finally, let's compare the maximal value of the heat capacity in the two cases.Let's call Ē(1) the optimal value for the Hamiltonian H 1 , which induces a ground state population equal to p(1) It suffices to conclude now by noticing that p (2) 0 (E) is an increasing function of E and is always smaller than p  (B13) Therefore one can choose p (2) 0 (E) = p(1) 0 which will lead to E > Ē and therefore from (B11) concluding the proof.
As we argued in Sec.II A, the main property that optimal models satisfy in order to reach the optimal ∝ N 2 scaling of the maximal heat capacity, is that of generating a single ground state and an exponentially-degenerate first excited level, i.e. approximating the degenerate Hamiltonian (9) in the best possible way, as well as possibly having additional higher energy levels (cf.Lemma in App.B).However, in any physical realization, the resulting spectrum will have imperfections, when compared to (9).In particular, the exponentially-degenerate level might split into a bandwidth, or the overall gap might be shifted.Here we estimate the noise tolerance to such imprecisions in the first excited level.
a. Uniform shift.Consider first the case in which there is no splitting in the D − 1 first excited levels, but a uniform error, that is and the value of E is not exactly the optimal energy gap.We show here that as far as the imprecision does not scale with the dimension, the heat capacity behaves smoothly.We know, infact, that the optimal value of E, for large dimensions is E ∼ ln(D − 1).(cf. main text and [35]).Suppose now that Then, the resulting adimensional variance of the flat Hamiltonian with d ≡ D − 1 degenerate excited states and gap E can be computed given the ground state probability It follows that The leading term of the asymptotic energy variance, for large dimension D − 1 = d is given, for → 0, by (ln d) 2 /4, i.e. recovering the results of [35] (see also main text, Eq. ( 2) and Sec.II A).Moreover, from expression (C4) we see immediately that in order to keep such scaling, the denominator needs to be suppressed such that ln d is bounded.This implies that the relative noise tolerance of the energy gap is given by where we used that, in the case of N constituents the dimension of the system is exponential in N .That is, the relative error should scale as 1/N .Given ln d ∝ N , this is equivalent to the absolute error being bounded by a constant b. Single eigenstate shift and bandwith tolerance.We now analyse the case in which the highly degenerate first excited level splits into separate energies.That is, consider a D-dimensional Hamiltonian with a single ground state and d ≡ D − 1 levels contained in a bandwith δ, i.e. (w.l.o.g.we can shift the ground-state energy to be negative, and the degenerate bandwith to be centered around 0) Computing the energy variance, we obtain Notice now that the term A is always positive, while the first term, E 2 p 0 (1 − p 0 ) is the leading term in the degenerate model in which (cf. main text and [35]) It is then easy to show that, similarly to (C4), as far as the E i levels are small and do not scale with the dimension, the optimal N 2 scaling is preserved.This is easily seen as A ≥ 0, while B in case −δ ≤ E i ≤ δ is bounded as It follows that the leading term, E 2 p 0 (1 − p 0 ) ∼ O(N 2 ) remains dominant and the heat capacity achieves the N 2 scaling, as far as p 0 (1 − p 0 ) is finite.This is guaranteed by the fact that and therefore (C14)

Consequences for the Star and Star-chain
In the above subsection we estimated the noise tolerance of the energy spectrum of the degenerate Hamiltonian (9) in order for the heat capacity to be close to its optimal value and maintain the Heisenberg-like scaling ∝ N 2 .The results indicate that the error in the spectral engineering should be constant while N (and therefore the dimension D) grows.In terms of relative precision, as the optimal spectrum has a first excited gap E ∝ N , this means a 1/N relative precision in the engineering of the spectrum around the optimal values.However, the spectrum of the Hamiltonian is a function of its parameters {h i , J ij } (3).In this subsection, we analyse what precision is needed in our main models, H Star (13) and H Star-chain (20) and how the estimation of the noise tolerance is reflected in the Hamiltonian parameters.
a. Reduction to H Star .First, for generic considerations, we notice that we limit ourselves to estimate the noise tolerance in the Star model only, because there is an exact mapping between the Star-chain and the Star model, in the limit of strong couplings −J, i.e. given (we allow a small relaxation in the Star-chain model, i.e. we assume different magnetic fields a α on the σ z α spins, which is useful in the following analytical derivation, but does not significantly change the spectral properties of the model) in the limit of high J, as explained in the main text, only configurations in which σ z α = σ z α are allowed, and therefore the effective Hamiltonian spectrum, up to an irrelevant global shift, becomes where σz is a formal spin that has value +1 when σ z α = +1 ∀α (similarly for −1), and h = α a α .The above (C16) formally coincides with the Star model ( 13) if one identifies h → a .Moreover the mapping preserves the parametrization in b, while h := a α in the gets mapped to a in the above equation.One could therefore assume a α = 0 to be zero for all αs except one, h = a ᾱ, such that the mapping between the two models is complete and the parameterization is formally the same by mapping a ᾱ → a .
b. Parametric scaling an noise-tolerance of H Star in the optimal-degeneracy configuration.We thus consider here the H Star Hamiltonian (13) As mentioned in the main text, by choosing b ≥ 0 and b(N − 3) ≤ a ≤ b(N − 1), it is ensured the presence of a single ground state at energy E 0 , a 2 The optimal degeneracy of 2 N −1 + N − 1 is reached when a = b(N − 3), but it is not necessary for the model to achieve its N 2 scaling of for the heat capacity C. For simplicity, consider the choice a = b(N − 3).The first excited gap is, in this case This means that, for such choice of parameters, in the asymptotic limit of large N , one has 4b ∼ N ln 2 , and consequently a = b(N − 3) ∼ N (N − 3) ln 2 4 .That is, b has a linear scaling in N and a has a quadratic scaling in N .This happens even if we relax the assumption of a = b(N − 3).In that case, it remains valid that 4b For what concerns the parametric error-tolerance for a and b in H Star , notice that we estimated above the gap-error tolerance, which results to be constant (cf.C 1), that is, one should have with |δ| = O(1) bounded by a constant.From the above expression is easy to see that, if treated as independent, b can have an error δ b ∼ O( 1 N ), while for a the admitted error is δ a ∼ O(1).It follows that both the relative error for a and b is inversely quadratic c. Noise-tolerance in the degeneracy-suboptimal configurations.In the subsection above, we considered the case in which the Star model (C18) is forced in its optimal configurations satisfying an exponentially-degenerate first excited level, i.e.E 0 ≤ E deg ≤ E 1 .We saw that this imposes a linear scaling on b and quadratic scaling on a, and a relative error of order O(N −2 ) on both parameters.However, it is possible to show, as we do in Appendix D 1, that (slightly) suboptimal solutions exist, featuring a bounded value of b ∀N , while still achieving quadratic scaling of C. In fact, these solutions can have C to be arbitrarily close to its optimal value C Star max (19)).While referring the reader to Appendix D 1 for the details, it is enough for our purposes to notice that, in such solutions, b can take any finite value larger than a certain treshold b tresh ∼ ln 2, while, the gap between E deg and E 0 still approximates the optimal value The exact finite value of b is not important in this case, and can be taken as given.It follows that in such configurations, while b ∝ O(1), while a ∼ (N − 1)(b − ln 2/2) scales linearly.Consequently we can obtain the error that is admitted on a in these configurations.It follows, given Eq.(C27) and the fact that the optimal gap has a fixed bandwidth tolerance O(1) (cf.C 1), that a similar noise scaling applies to a, i.e.
d. Subtler sources of parameter-noise.Finally, notice that in the implementation of H Star a more general error could arise.That is, the actual tuning of the parameters of the generic spin Hamiltonian (3) to be converted into H Star assumes all J ij = 0 for i > 1 and j > 1.Moreover, assuming (realistically) these contributions to be null, the resulting Hamiltonian is i σ z i σ z 1 . (C30) The Star model assumes b its scaling and relative error tolerance are the same as b, that is (C26) for the optimal degeneracy case C 2 b, or "irrelevant" for the suboptimal configurations discussed above in C 2 c.where the first term correspond to the binomial part of the spectrum (i.e. for σ z 1 = 1), while the second term correspond to the 2 N −1 degeneracy that is obtained for σ z 1 = −1.The above expression can be manipulated into and can be used to compute efficiently relevant quantities such as the average energy, the free energy etc., as from standard statistical mechanics.In particular the average energy is given by Similarly the heat capacity, or energy variance, is given by in temperature units where β = 1.
2. Statistics of the energy levels, exponential suppression above the degeneracy, and slightly suboptimal configurations with better parameter scaling In this Section we analyze the statistics of the energy levels of the Star model.As we argued in App.C 2 a, the Star-chain model becomes equivalent to the former in the limit of large |J|.
The probability of a given energy outcome in from a Gibbs state is, in temperature units β = 1, given by We separated the term I(σ z α ) as it is the one that breaks the permutation symmetry (for cyclic boundary conditions it has only cyclic symmetry).Such term needs therefore all its 2 n levels to be resolved.The remaining part is permutationally symmetric and therefore has a spectrum that can be computed efficiently.Define β) := d dT Tr[Hρ β (H)] = −β 2 d dβ Tr[Hρ β (H)] .
(14) corresponding to the first spin being up, i.e. σ z 1 = 1 , and k spins up among the remaining N − 1 ones.The second class consists of the first spin being down σ z 1 = −1.In ...

FIG. 3 .
FIG. 3. (a,b): Machine learned Hamiltonian parameters (3), for T = 1 and N = 7.(a) shows the local field hi as a function of the spin index, while (b) shows the Jij parameters (color) as a function of the site indices i and j.The resulting model that emerges, sketched in (c), is HStar (13).It consists of a single central spin (corresponding to spin nr. 2 in (a,b), and orange circle in (c)) with a different local magnetic field and that interacts (gray lines) with all the other N − 1 spins homogeneously (black circles) resulting in a Star-shaped connectivity.

FIG. 4 .
FIG.4.Spectrum of the Star model(13) (for N = 9). 2 N −1 eigenvalues form a binomial spectrum(14), while the other 2 N −1 eigenvalues are completely degenerate(15).The gap between the ground energy and the exponentially degenerate level approximately coincides with the optimal gap of the ideal spectrum H deg(9) for D = 2 N −1[35] (red line), as expected from the discussion in Sec.III A.

FIG. 5 .
FIG.5.Representation of the Star-chain model(20).The total number of spins is N = n(m + 1).The orange circles represent the α-spins, coupled to each other through the blue lines, while the black circles represent the (α, i)-spins coupled to their respective α-spin through the gray lines.The values of the local fields hi and coupling terms Jij are reported in the Figure.

FIG. 7 .
FIG. 7. Parameter scaling of the Star model (a) and Starchain model for m = 3 (b) in their configurations that maximise C (we obtain nearly identical values of C for both cases).In the "unconstrained" case (empty circles), b increases linearly with N , and a quadratically.This corresponds to the optimal choice a = b(N −3) of Sec.III A. In the "constrained" case (full circles), it is possible to find solutions in which b is limited by a constant, while a increases linearly.These solutions preserve the same numerical value of C, and are found optimizing the heat capacity over a and b, and choosing as initial point for the optimization b = 2.2 and a = 2N − 3.

FIG. 8 .
FIG. 8. Comparison of the heat capacity, as a function of N , on a linear scale (a), and on a log-log scale (b).The "bound" curves represent numerical maximizations of the general spin Hamiltonian (3), where all parameters are constrained to be in a certain interval, i.e. hi, Jij ∈ [−c, c], with c shown in the legend.The red and black lines are reference quadratic and linear scalings.

FIG. 10 .
FIG.10.Values of hi and of Jij found with our numerical method for N = 4 (left panels), N = 7 (middle panels), and N = 9 (right panels).The upper panels show hi as a function of the site index i = 1, . . ., N , while the lower panels show the value of Jij (the color) as a function of the site indices i and j.Since Jij is only defined for i < j, a white square is shown when such condition is not satisfied.

FIG. 13 .
FIG. 13.Optimization results for the quantum spin Hamiltonian in Eq. (A8) for N = 9.The first row shows the spectrum and the hi as in Figs. and 10.Each of the lower 9 panels displays J (µ,ν) ij

0
the corresponding ground state populations, p implies x > y .
) therefore b scales at least linearly in N and a at least quadratically, as it satisfies b(N − 3) ≤ a ≤ b(N − 1).

:
= b.Noise in the couplings might however affect this constraint.The main consequence would be a splitting of the level E deg due to the fact that the configurations with σ z 1 = −1 would have a binomial spectrum E deg−noisy = −a + of E deg is therefore characterized by constant bandwidth was derived above C 1.It follows that, in general the error of each spin i ≥ 2 should be of order O(N −1 ), i.e.

TABLE II .
Values of the parameters of the Star and Star-chain models, that optimize the heat capacity, plotted in Fig.

TABLE III .
Continuation of the parameters displayed in TableII.