Probe thermometry with continuous measurements

Temperature estimation plays a vital role across natural sciences. A standard approach is provided by probe thermometry, where a probe is brought into contact with the sample and examined after a certain amount of time has passed. In situations where, for example, preparation of the probe is non-trivial or total measurement time of the experiment is the main resource that must be optimized, continuously monitoring the probe may be preferred. Here, we consider a minimal model, where the probe is provided by a two-level system coupled to a thermal reservoir. Monitoring thermally activated transitions enables real-time estimation of temperature with increasing accuracy over time. Within this framework we comprehensively investigate thermometry in both bosonic and fermionic environments employing a Bayesian approach. Furthermore, we explore adaptive strategies and find a significant improvement on the precision. Additionally, we examine the impact of noise and find that adaptive strategies may suffer more than non-adaptive ones for short observation times. While our main focus is on thermometry, our results are easily extended to the estimation of other environmental parameters, such as chemical potentials and transition rates.


I. INTRODUCTION
Temperature plays a prominent role in the study of physical systems, as it is arguably the most relevant state parameter that determines their behaviour.Therefore, thermometry is often a preliminary step in quantum experiments.However, it also introduces measurement disturbance [1] in exchange for precision.The theory of quantum metrology [2][3][4] identifies optimal precision-disturbance trade-offs in parameter estimation with quantum resources, and quantum thermometry is the branch specialised to temperature estimation [5,6].
A common framework to study the precision limits in thermometry is to consider a probe coupled to the sample of interest at temperature T .Over time, the probe gains information about the sample's temperature, which is later accessed through direct measurements on the probe.Relevant experimental realisation of probe thermometry include single-atom probes for ultracold gases [7][8][9], NV centres acting as thermometers of living cells [10,11], and nanoscale electron calorimeters [12][13][14].Theoretically, much progress has been achieved on characterising the fundamental precision limits of probe thermometry in frequentist and Bayesian approaches [15][16][17][18][19][20][21], the precision scaling at ultralow temperatures [22][23][24], the impact of strong coupling and correlations [25][26][27][28][29], measurement back action [30,31], as well as enhanced sensing via nonequilibrium probes [32][33][34][35][36][37][38][39][40][41].While providing remarkable progress on our understanding of thermometry, previous works are based on the assumption that the probe is measured and subsequently reset or discarded.In this work, we depart from this paradigm and consider thermometry through continuous measurements of the probe, where information on the sample's temperature is continuously extracted while the probe is interacting with it.Total measurement time is thus the major resource here, and there is no hidden time cost for probe preparation, as often ignored in previous works.This scenario has to date received little attention, notable exceptions are provided by Refs.[42,43].Another reason to pursue this scenario is the remarkable experimental progress on continuous measurements including charge measurements [44][45][46], homo-and heterodyne detection [47,48], and magnetometry [49][50][51][52].
With this aim, we consider a minimal model where a two-level probe weakly interacts with a thermal bath while being continuously monitored-see Fig. 1.Depending on the nature of the bath, either bosonic or fermionic, this can correspond to a superconducting qubit coupled to the electromagnetic environment or a quantum dot coupled to an electronic reservoir.We construct temperature estimators and characterize their estimation errors, employing a Bayesian approach [18][19][20][21].In the long-time limit (or, the large-data limit), the Fisher information, which can be given analytically, determines bounds on these errors.We dicuss the saturability of these bounds via non-adaptive and adaptive measurement strategies, where the energy gap of the probe is tuned during the protocol via a suitable feedback.Finally, we characterize the robustness of our results to noisy measurements and a finite bandwidth of the detector, bringing our considerations closer to experimental platforms.
The paper is structured as follows: In Section II, we introduce a Markov jump process to describe the system trajectories subject to a bosonic or fermionic bath.The energy gap ω of the system can be changed to improve the performance of the thermometer.The state of the twolevel system evolves on a telegraph-like trajectory (white line) and is monitored continuously by a weak measurement, which results in a noisy signal with finite bandwidth (blue).
In Section III, we discuss thermometry in the large-data limit and present an analytical expression for the Fisher information of the trajectories-Eq.( 14).This can be used not only for thermometry, but also for estimating other parameters such as the thermalisation rate.Section IV is devoted to the Bayesian approach to thermometry.After defining a relative error quantifier and the optimal estimator that minimises it on average, we present the tight Bayesian Cramér-Rao bound in Eq. ( 30) which we then use to implement adaptive feedback on the probe to improve precision.In Section VI, we explore the impact of measurement noise on the quality of our continuously monitored thermometer.Finally, in Section VII, we conclude and discuss some future directions.

II. SYSTEM AND TRAJECTORIES
We consider a two-level system, such as a qubit or a quantum dot, coupled to a thermal bath (see Fig. 1).The two-level system provides the probe which is used to determine the temperature of the sample, provided by the thermal bath.Denoting the two states of the probe by 0 and 1, the system dynamics is described by the rate equation Here, p j (t) denotes the probability that the system is in state j and Γ in(out) denotes the rate of the transition 0 → 1 (1 → 0).It is straightforward to solve Eq. ( 1), making use of p 0 (t) = 1 − p 1 (t), resulting in where p 1 ≡ p 1 (0).
The rate equation (1) describes stochastic jumps between the states 0 and 1. Observing these jumps in real time provides information on temperature, because the jump rates are temperature dependent.We note that, while we consider temperature here, any parameter that the rates depend on may be estimated analogously, e.g., properties of the bath spectral density [53].
Observing the stochastic jumps in the time interval [0, τ ] results in a trajectory ν τ = {n(t)|t ∈ [0, τ ]}, where n(t) ∈ {0, 1} denotes the occupation of the state at time t.The probability density to observe a trajectory is given by where p n0 denotes the probability of the system being in state n 0 ≡ n(0) at t = 0, τ 1 denotes the total time the system spends in state 1 along the trajectory, and k (l) denotes the number of jumps from state 0 → 1 (1 → 0) along the trajectory.In Appendix A we derive Eq. ( 3) and we show how τ 1 , k, and l may be obtained from n(t).We note that because t is a continuous variable, ρ(ν τ |T ) is a probability density with a unit that depends on the number of jumps that occur along ν τ .
The unitless probability to observe a trajectory where the jumps occur within time-windows of width dt is given by ρ(ν τ |T )(dt) l+k .While the above holds true for any pair of rates Γ in and Γ out , we focus here on rates that describe the exchange of energy (and possibly particles) with a thermal bath.Such rates obey a detailed balance relation where β = 1/(k B T ) relates to the inverse temperature, with k B being the Boltzman constant which we set to 1 in this work, and ω > 0 is the energy gap between states 0 and 1 (with 0 denoting the ground state).For a bath exchanging particles, we may set the chemical potential to zero without loss of generality and use the same relation.
For concreteness, we consider two widely-used expressions for the rates.The first describes a bosonic thermal bath where we introduced the Bose-Einstein distribution With these rates, the considered scenario corresponds, e.g., to a superconducting qubit coupled to an electromagnetic environment.We will mainly focus on an Ohmic spectral density with ω well below the cut-off frequency such that κ(ω) = κ ω.
The second expression we use for the rates describes a fermionic bath, describing, e.g., a quantum dot weakly coupled to an electronic reservoir where we introduced the Fermi-Dirac distribution For the fermionic rates, we consider a flat spectral density, such that Γ is independent of frequency.

III. FISHER INFORMATION
Consider T * to be the true temperature of the sample which we want to infer by analysing the measured data.In thermometry, we want to build an estimator T (ν τ ) that maps an observed trajectory ν τ into the best estimate for the temperature.To this end, we need to quantify the accuracy of the estimate in terms of a suitable cost (or error) function.An appropriate choice that does not depend on the absolute scale of the true underlying temperature [54] is the relative square distance Accordingly, we can quantify the overall performance of the temperature estimation protocol by averaging the relative square distance over all possible trajectories at a particular Note that this true relative mean-square distance is not available in an actual experiment where T * is unknown.
In the frequentist approach to thermometry, one could instead quantify the uncertainty in the estimate of the temperature by, e.g., a confidence region [55].
The true relative distance is lower-bounded by the Cramér-Rao inequality [56,57].In particular, for any unbiased estimator Tu.b. that satisfies the true relative distance is lower bounded by Here, F [ρ(ν τ |T )] is the Fisher information of the probability distribution ρ(ν τ |T ) with respect to temperature and reads For the trajectories described in Eq. ( 3), we find the Fisher information (see Ref. [42] as well as App.B for a derivation) where the prime denotes a derivative with respect to temperature.We stress that while we focus on temperature here, Eq. ( 14) may be applied to any parameter encoded in the rates of a two-level system.In the long time limit, we may drop all terms that do not grow with the total time of the trajectory τ , which results in For a bosonic bath, c.f. Eq. ( 5), this expression reduces to For a fermionic bath, c.f. Eq. ( 7), we find If the initial state is the steady state, its contribution to the Fisher information reads For rates obeying detailed balance, the steady state corresponds to a thermal state with the Fisher information A universally applicable and commonly employed frequentist estimator is the maximum likelihood estimator (ML) [58].It is defined as the temperature that maximises the probability (3) of the observed trajectory, which can be given exactly as In the large-data limit, the ML becomes unbiased and saturates the Cramér-Rao bound (12).While the Fisher information sets an ultimate bound on temperature estimation in the asymptotic limit, it can also serve as a means to improve the precision of a thermometer in the course of the measurement.In Sec.V below, we will design improved Bayesian estimation strategies based on the Fisher information.

IV. BAYESIAN THERMOMETRY
We now focus on the Bayesian approach to thermometry and specify our limited a priori knowledge about the temperature in the form of a prior probability density ρ(T ) ≥ 0, dT ρ(T ) = 1.Bayes' rule prescribes how to update our knowledge according to the observed trajectory [59] given the likelihood ρ(ν τ |T ) from Eq. ( 3) and the normalisation factor ρ(ν τ ) := dT ρ(ν τ |T )ρ(T ).The posterior distribution ρ(T |ν τ ) determines our remaining uncertainty about the actual temperature value after observing ν τ .Specifically, we can quantify the uncertainty (error) of a temperature estimate T (ν τ ) in a similar manner as before by taking the average of the relative square distance over the posterior This represents the presumed relative error on temperature an experimentalist would report after recording the trajectory ν τ .We average the presumed relative error over all possible trajectories to get a trajectoryindependent figure of merit for the quality of a given estimator Note that, even though the true temperature T * does not explicitly appear in this expression, we do implicitly assume it is restricted by our specified prior ρ(T ).
In fact, one could arrive at the same expression (24) from a different perspective: Suppose an experimenter observes trajectories ν τ at the fixed, but unknown true temperature T * .Each trajectory occurs with probability ρ(ν τ |T * ) and yields the estimate T (ν τ ), the relative deviation of which from the true value is given by ( 9).Averaged over many repetitions, the relative deviation is (10)-unknown to the experimenter, of course.Then, averaging this also over temperatures drawn from the prior distribution, one finds This quantity is a functional of the chosen estimator function T , and by minimizing D R [ T ], we obtain the associated optimal estimator as [20] TR (ν Had we chosen a different cost function than (23) to quantify the uncertainty of the temperature estimate, we would have obtained a different optimal estimator [18,54].For example, we obtain a Bayesian estimator that is tightly related to the ML (and often easy to calculate, and more precise than ML in the small-data limit) if we simply maximize the posterior For a prior that is flat over the full range of temperatures, this is equivalent to the ML.In contrast, when the temperature is bounded, TMP (ν τ ) adjusts the ML such that it never estimates a temperature outside the prior domain.In our simulations, we work with a flat prior, for which the maximum-posterior estimator becomes Our analysis and techniques will work equally for other choices of prior and cost function; see Appendix D, for a comparison to relevant examples.The Cramér-Rao bound on the true relative square deviation (12) also bounds Bayesian figures of merit.
In particular, if we insert an unbiased estimator into Eq.( 25), we obtain the inequality see also [60,61].Although this bound (30)-sometimes referred to as the tight Bayesian Cramér-Rao boundstrictly holds only for unbiased estimators, we expect that Bayesian estimators will respect and are able to saturate it as their biases generally vanish in the limit of large data.For our system, a vanishing bias is ensured by the Bernstein-Von Mises theorem for Markov processes [62,63] (see also Appendix C for a simple derivation), which states that in the long-time limit, the posterior ρ(T |ν τ ) takes the shape of a Gaussian around the actual temperature, with a variance that is equal to the inverse of the Fisher information.

V. IMPROVING PRECISION: ADAPTIVE VS NON-ADAPTIVE STRATEGIES A. A non-adaptive strategy
So far we have characterised the trajectories in Eq. ( 3) given that the thermometer was prepared in a state p n0 .For long trajectories, this initial state plays a vanishing role in the estimation error; whereas the value ω of the thermometer's energy gap is crucial.In a non-adaptive strategy, we may still be free to tune the gap at the beginning, and it is indeed worth tuning it wisely.While a global optimisation which aims at minimising the error for a given total time τ is complex and may require heavy numerical simulations, we can still design a strategy to tune the gap such that it is optimal at least for long enough times.Our proposed strategy is therefore to tune the gap to one that minimises the error in the long-time limit.That is, we aim at minimising the right hand side of Eq. ( 30) This optimal value only depends on the prior and other fixed parameters.In Fig. 2 we depict a Monte-Carlo simulation of the relative error for the optimal estimator [c.f.Eq. ( 26)] in this non-adaptive scenario.As one can see, asymptotically the error approaches the tight Cramér-Rao bound (30), i.e., asymptotically this is an optimal strategy.

B. A greedy adaptive strategy
Here, we take a step further and assume that we have the freedom to change the probe's energy gap ω during the measurement; that is, we can adaptively tune it.Once again, a global optimisation strategy may be costly.Thus, we will seek a simple alternative.Again, we design a strategy that is optimal in the long-time limit which works as follows: Suppose that at time t we have observed a trajectory ν t and estimate the temperature to be T (ν t ).We tune the gap to the one that maximises T 2 (ν t )F [ρ(ν τ | T (ν t ))].Note that ν τ is a variable that is integrated over when determining the Fisher information, see Eq. ( 13), not to be mistaken with the observed trajectory ν t which is used to build the estimator, see Eq. (15).That is, at any time t and after observing the trajectory ν t we tune the gap to Using the expressions for the Bosonic and fermionic Fisher information we have This strategy guarantees that the gap asymptotically converges to the fixed optimal value, since we expect the estimator to approach the true temperature at long times.The convergence to a fixed gap is important; even if our feedback is delayed, it eventually tunes the gap to the optimal one for the true temperature.In Fig.
( 2) we demonstrate how the use of the adaptive strategy can outperform a non-adaptive scenario and saturate the asymptotic CRB at the optimal gap.
Let us remark that the aforementioned simple strategies are motivated by asymptotic figures of merit.In the transient regime, however, alternative strategies might be more appropriate.

VI. NOISY MEASUREMENTS
Many experimental platforms, e.g., semiconductor quantum dots [64], include noise in the measured trajectory, rather than the telegraph-like trajectory defined above Eq.( 3).Additionally, it is common that the detector is limited by a finite bandwidth, introducing a delay in the readout.In this section, we develop a model for Bayesian parameter estimation including these effects, focusing on Gaussian measurement noise.The model is developed for temperature estimation using the two-level probe system defined in Eq. ( 1), but can be adapted to any other parameter and to more complicated architectures [65].We use the model to simulate non-adaptive as well as adaptive temperature estimation.
The noisy trajectory is defined as ν τ = {D t |t ∈ [0, τ ]}, where D t is the outcome of the detector at time t.When the system resides in the ground (excited) state, the detector signal randomly fluctuates around 0 (1).The evolution of the system under such a measurement is

Strategy
The optimal gap Example: Tmin = 0.1 , Tmax = 10 Optimal value for Example: Tmin = 0.1 , Tmax = 10   (15).Once the gap is chosen at the begining of the process, it will not be adaptively changed.One can see that for the parameter range considered here, this strategy performs reasonably good compared to the adaptive one.(2) A practical adaptive strategy that only controls the gap and leaves the state untouched.The gap is chosen such that in the asymptotic limit it is optimal for a thermal state (i.e., the gap that maximises F [ρ(ντ |T )] in Eq. ( 15)).For the parameters that we consider, the adaptive strategy ( 2 .I, but for a Fermionic bath with flat spectral density (i.e., s = 0).Evidently, the choice of the strategy is more impactful than the Bosonic bath.Specifically, one can see adaptive strategies can improve the asymptotic precision more than an order of magnitude compared to the non-adaptive one.One should note that, this is mainly stemming from the choice of a flat spectral density, that is more appropriate for fermionic baths.Again, we express all the parameters in terms of , such that temperature T , frequency ω, and the coupling Γ have the dimension, while time τ has the dimension −1 .For this table we have set Tmin = 0.1 , Tmax = 10 , and Γ = = 1.
where P τ (ν τ ) = (p 0 (ν τ ), p 1 (ν τ )) T is a column vector with p j (ν τ ) the joint probability of occupying state j ∈ {0, 1} at time τ and observing ν τ .Here W is the rate matrix in Eq. ( 1), determining the time-evolution of the probe system.The matrix M(D|D ) describes the measurement of outcome D, given that D was observed in the previous timestep.Following Ref. [65], we find , where λ is the strength of measurement and γ is the bandwidth of the detector.A strong measurement (λ γ) reduces the noise of the detector, while a weak measurement (λ γ) increases the noise.The bandwidth introduces a delay 1/γ in the detector and substantially dampens all frequency components larger than γ.For λ γ → ∞, the noise and lag vanish, and we recover the telegraph signal with the trajectory given by Eq. ( 3).
The likelihood of observing the trajectory ν τ , given the temperature T , is calculated via the inner product The vector on the right-hand side can be calculated iteratively according to Eq. 35 using an initial distribution P 0 = (p 0 , p 1 ) T .It is then used to update the current state of knowledge using Bayes rule [see Eq. ( 22)].The evolution of the state of knowledge with each increment in the measurement register can be equivalently formulated in the language of continuous signal filtering resulting in the Kushner-Stratonovich equation.This is derived in Appendix E In Fig. 3, we show that noisy measurements do not saturate the Cramer-Rao bound Eq. (30); contrary to what we saw for ideal measurements.Nonetheless, our adaptive strategy-which is similar to the one we use for the ideal measurement-can reach the Cramer-Rao bound for the non-adaptive ideal measurement at long enough times.Unfortunately, we cannot simulate for arbitrary long times, but the possibility for the adaptive noisy measurement to outperform the non-adaptive ideal measurement is not ruled out.Lastly, at short times the performance of the adaptive strategy is actually worse than it is for non-adaptive strategies.Again, note that FIG. 2. The relative error DR[ TR] in thermometry of a bosonic bath as a function of time.Here we evaluate the relative error by a Monte Carlo simulation that approximates Eq. ( 24) by randomly sampling a temperature from the prior according to which we randomly sample a trajectory.We repeat this process 1000 times and take the average.The error is plotted for both the adaptive (solid blue), and the non adaptive (solid black) scenarios.The CRB lines correspond to the r.h.s. of Eq. ( 30).At long times, both strategies approach this asymptotic bound both for adaptive (dashed blue) and nonadaptive (dashed black) scenarios.Our simulations show that the adaptive strategies outperform the non-adaptive ones even in the non-asymptotic times.In the simulations we choose the parameters according to TABLE.I, that is Tmin = 0.1 , Tmax = 10 , = 1, and κ = 1.In the adaptive strategy, the frequency changes in each time step.These graphs are obtained by simulating 1000 trajectories with temperatures that are randomly sampled from the prior distribution.
our adaptive strategy is not necessarily optimal for the noisy scenario; it is rather designed to be optimal for the ideal measurement and at long times.In Appendix F we further discuss that a significant estimation bias that persists at low and high temperatures is the reason behind the noisy measurements failing to saturate the Cramér-Rao bound.At low temperatures this bias arises from there not being sufficient time for transitions to be observed which results in the estimated temperature being below the actual temperature.This low temperature bias is not affected by the measurement strength.In contrast, at higher temperatures the temperature is overestimated.This is reduced when the measurement strength is increased.Additionally, it shows that one of the ways that the adaptive strategy improves the estimation is by reducing the bias.In particular, at low temperatures, being able to adjust the gap of the system allows more transitions to occur in the trajectory leading to a better estimate.
FIG. 3. Performance of the temperature estimation protocol when it is limited by a noisy trajectory with a finite bandwidth γ = 10.0.Here, the relative error is averaged over 1000 trajectories generated at randomly selected temperatures in the range [0.1, 10.0] with an initial gap ω * n. ad .It is plotted as a function of time and is compared to the adaptive and nonadaptive Cramer-Rao bounds.Increasing the measurement strength λ leads to a better accuracy however, the noise prevents the CRB from being reached.For a finite measurement strength, the adaptive strategy performs worse than the nonadaptive strategy at short times.The rest of the parameters are chosen similar to Fig. 2, that is Tmin = 0.1 , Tmax = 10 , = 1, and κ = 1.

VII. CONCLUSIONS & OUTLOOK
Our thermometry protocol based on a continuously monitored two-level system offers theoretical tools for temperature estimation (and estimation of other environmental parameters) in quantum systems to various experimental settings dealing with both bosonic and fermionic environments.In particular, our results can be readily exploited in experimental scenarios, e.g., in temperature and chemical potential estimation using continuously monitored quantum dots.
Our results can be related to and contrasted against several other studies in recent years.Indeed, continuous monitoring as a non-destructive method for interrogating quantum systems has found use in parameter estimation tasks particularly in magnetometry [49][50][51][52].Theoretical works have also put forward proposals to surpass the standard quantum limit, and furthermore address the shortcomings in presence of noise [66][67][68][69].On a more fundamental level, the ultimate Bayesian and frequentist bounds have been addressed [70][71][72][73][74].These bounds cannot be straightforwardly adopted to our problem since they are either problem-specific or fundamental and thus too generic.Furthermore, in our case, the parameter to be estimated is a property of the environment, which distinguishes our setting from magnetometry and other Hamiltonian estimation tasks.We were able to analyt-ically characterise the trajectories-see Eq. ( 3)-which is crucial in finding the frequentist and Bayesian limits in estimation and designing improved non-adaptive and adaptive protocols.
Our investigation leaves several future directions to proceed.We considered purely classical dynamics, without any quantum coherence.In the presence of coherence, our findings should be revisited to find the optimal measurement and whether quantum correlations are beneficial.Furthermore, we consider a single probe scenario.In the presence of multiple interacting (or manybody) probes, quantum correlations may be harnessed for thermometry-similar to the unitarily encoded magnetometry considered in Ref. [75].
Finally, our optimal protocols are based on optimising the Cramér-Rao bound.They are optimal in the limit of large data (long monitoring time).However, in the finite-data regime, one may be able to design better strategies.This could be done, at a massive computational cost, by minimising the relative distance error nu-merically.A smarter algorithm that requires less computational resources-at the cost of being sub-optimal-is desirable.
In the third line, we introduce the matrix J with elements J nm = K nm (∂ T ln K nm ) 2 .In the fourth line, we make use of the fact that K is a stochastic matrix with (1, 1)K = (1, 1) and (1, 1)∂ T K = 0. Finally, we iterate the recursion down to the initial probe state (which may also depend on temperature).
Next, we use that K has the two right-eigenvalues λ 1 = 1 and λ 2 = e −(Γin+Γout)δt with corresponding (unnormalized) right-eigenvectors v 1 = (Γ out , Γ in ) T and v 2 = (1, −1) T , respectively.This implies which holds also for j = 0, as one easily verifies.Inserting this into (B1) and carrying out the geometric sum, we get where the row vector (1, 1)J contains the Fisher information of the transition probabilities, In the continuum limit δt → 0, the transition probabilities are given by (A5), which yields with the short-hand notation Γ in,out = ∂ T Γ in,out .Inserting (B5) into the Fisher information expression (B3) and expanding to linear order in δt at fixed τ = N δt, we arrive at Since the definition of the Fisher information in Eq. ( 13) implies F [ρ(ν τ |T )] = F [p(ν τ |T )], we recover Eq. ( 14) in the main text.We remark that, in the opposite limit of (Γ in + Γ out )δt 1, the probe fully thermalises from one interrogation step to the next, so that the protocol amounts to measuring the initial probe state and N independent probes in thermal equilibrium.Indeed, we find in this regime that (1, 1)J = (1, 1)F (eq) , with ] the thermal Fisher information at equilibrium.Hence, (B3) reduces to the expected use the fact that, for large enough N one should have N j→k /N j = p(k|j, T * ), with T * being the true temperature that has generated the observed trajectory [76].As a result we have ln ρ(T |ν N ) = ln ρ(T ) + (C2) The rest of the proof is similar to the standard Bernstein-Von Mises theorem.Following [63], let the posterior be peaked around some value TMP .By Taylor expanding ln ρ(T |ν) around this value and then taking its exponent, one gets where the first order expansion is zero because d ln ρ(T |ν N ) dT | TMP = 0-by definition of the maximum.In the second line we used that the zeroth order ln ρ( TMP |ν N ) is just a normalisation factor.We also defined In what follows we would like to (i) connect this term to the Fisher information, and (ii) to connect TMP to the true temperature.To this aim note that by taking the first derivative of Eq. (C2) and evaluating at the true temperature we have which vanishes for a prior that has its maximum at the true temperature.This is always the case for the flat prior, which is our focus.As a consequence, the posterior's maximum is actually at the true temperature, i.e., TMP = T * .Now, we also note that by using Eq.(C2) one gets which by setting TMP = T * reduces to Now, as a final step, we have to replace for N 0 (N 1 ) according to their asymptotic values.This is given by the probabiity of thermal occupation of the ground (exited) state multiplied by the total number of measurements.We have By using Eq.(B5) we get Introducing a finite bandwidth γ the outcome D(t) read out on the detector at time t is related to the infinite bandwidth signal by, The observed outcome from the experiment is the solution to the stochastic differential equation By expanding the likelihood P (dD|T ) (Eq. 35) to first order in dt and normalising the probability distribution we obtain the evolution of the ground state population [65], We now have enough information to derive the evolution of the state of knowledge about T .The change in this state of knowledge after an infinitesimal time step of the experiment is dP (T, t|ν τ ) = P (T, t + dt|ν τ , dD) − P (T, t|ν τ ). (E5) The evolution can then be broken up into an instantaneous part due to measurement and the time evolution after measurement dP (T, t|ν τ ) =P (T, t + dt|ν τ , dD) − P (T, t|ν τ , dD) (E6) + P (T, t|ν τ , dr) − P (T, t|ν τ ).
(i) First, the increment to the measurement register is simulated at each time step.This is done using the transition probabilities for the Poisson process and adding filtered white noise according to Eq. E3.This has to be done regardless of the choice of method for simulation.
(ii) The measurement register is then used to solve the evolution of p 0 (t, T ) at each temperature in the support.This is in contrast to the direct method outlined in the main body of the paper where Eq. 35 is simulated.
(iii) The prior is updated using the simulated measurement register and expectation values that were averaged over the previous increment of the prior.Once again this is different to the main text method where Eq. 37 is used to update P (T, t|ν τ ).
The simulation of the results in this way using the second order Milstein method leads to an improvement in calculation time by reducing the number of computational steps required in step (ii).This comes at the cost of introducing numerical instability when the time step is too large for a particular measurement strength.The Kushner-Stratonovich method is likely more sensitive to the size of the dt increment because there are two expansions in dt involved in simulating the evolution of the prior.The blue points and shaded region represents the results for the adaptive strategy when the measurement strength is λ = 25, here new trajectories need to be generated because the gap of the two level system changes in each step.The strength of the measurement has the greatest effect in reducing the bias and the bias is also the main contribution to the relative error.

Appendix F: More simulations of the noisy measurements
The noisy measurement scenario gives rise to a bias at low and high temperatures which prevents the scaling of the error from reaching the CRB.In this appendix the temperature dependence of the thermometry scheme is investigated in more detail.First, the ratio of the estimated and true temperature is shown in Fig. 5 (a).The estimated value is found for 35 different true temperatures in the range [0.1, 10.0] and for each temperature the average estimate of 200 trajectories is calculated at the final time of each trajectory κ τ = 100.At high temperatures this ratio is larger than one meaning the temperature is over estimated and at low temperatures opposite behaviour is observed.An increase of the measurement strength causes the ratio to approach one for high temperatures but this is not observed at low temperatures.The adaptive strategy also improves the bias slightly for both high and low temperatures.
Similarly, the true relative error of the adaptive and non-adaptive strategy is plotted relative to the non-adaptive CRB in Fig. 5 (b).Additionally, the ratio of the adaptive to non-adaptive CRB is plotted for reference.Agreement with this line would be the best an adaptive strategy could do.Here we see again that high temperatures are not able to reach the CRB but for a small window of temperatures the ratio approaches one.For the adaptive strategy, there is a large improvement when there is a large bias and for some temperatures the accuracy can go below the non-adptive CRB.
The right-most panel of Fig. 5 shows that the bias leads to the estimated error D R [ T ] under estimating the true error D R,T * [ T ].This ratio will approach one for more trajectories in the average according to Eq. 25.This is important from an experimental point of view as it shows that in a temperature range where the bias is the main contribution to the error, the actual accuracy of the temperature estimate is lower than would be expected from the variance of the posterior when not enough iterations of the experiment are averaged over.Thus, at short times or for weak measurements, care must be taken to choose an appropriate temperature range for which the thermometer is reasonably accurate.

FIG. 1 .
FIG.1.Sketch of continuous thermometry.A two-level system interacts with a bath of unknown temperature T with coupling rates Γin and Γout into and out of the excited state.The energy gap ω of the system can be changed to improve the performance of the thermometer.The state of the twolevel system evolves on a telegraph-like trajectory (white line) and is monitored continuously by a weak measurement, which results in a noisy signal with finite bandwidth (blue).

FIG. 4 .
FIG.4.Illustration of the temperature that different estimators assign to the same observed trajectory.Suppose that we have a flat prior for temperature between Tmin = 1 and Tmax = 10.The above graphs depict the temperature estimate via different estimators.The gray graphs show the average time that it takes to make m12 jumps up and m21 jumps down, for different temperatures in the prior.The y-axis of the gray lines is not meaningful, it's repeated twice to help understanding the borders of the prior better.The error bars are found as the standard deviation of the total time.The gray lines shed light on when the ML and MP estimators are good enough.For e.g., the bottom right graph shows that at γ0κ(ω)t the relative estimator TR deviates a lot from the ML.However, in the lab it is unlikely to take this long to observe 100 jumps, because it falls outside the error bars of the total time.The parameters are set to: top left k = l = 1, top right k = l = 4, bottom left k = l = 10, and bottom right k = l = 100.

FIG. 5 .
FIG. 5. Comparison of the bias (a), relative error (b) and accuracy of the relative error (c) for noisy measurements with bandwidth γ = 10.0 and different measurement strengths λ.The thermometry protocol was simulated for 35 true temperatures T in the range [0.1, 10.0] and with initial gap ω * n. ad .At each of the temperatures the results are averaged over a set of 200 trajectories the points are joined with lines to guide the eye.The results are plotted at time κ τ = 100.0.The shaded regions depict the standard deviation of the estimated temperature, true relative error and estimated error relative to the true error for each of the sets of trajectories.The black points and shaded regions correspond to the non-adptive strategy with with measurement strength λ = 25.The blue points and shaded region represents the results for the adaptive strategy when the measurement strength is λ = 25, here new trajectories need to be generated because the gap of the two level system changes in each step.The strength of the measurement has the greatest effect in reducing the bias and the bias is also the main contribution to the relative error.

TABLE I .
(1) different strategies proposed in this work and their asymptotic precision for a Bosonic bath-see TABLE.II for a fermionic bath.(1)Thenon-adaptivestrategy, in which the gap is chosen once and then left unchanged.That is, we choose the gap that minimises ρ(T )dT T 2 F [ρ(ντ |T )]−1 with F [ρ(ντ |T )] from Eq.

TABLE II .
Same as TABLE