Model-Independent Sum Rule Analysis Based on Limited-Range Spectral Data

Partial sum rules are widely used in physics to separate low- and high-energy degrees of freedom of complex dynamical systems. Their application, though, is challenged in practice by the always finite spectrometer bandwidth and is often performed using risky model-dependent extrapolations. We show that, given spectra of the real and imaginary parts of any causal frequency-dependent response function (for example, optical conductivity, magnetic susceptibility, acoustical impedance etc.) in a limited range, the sum-rule integral from zero to a certain cutoff frequency inside this range can be safely derived using only the Kramers-Kronig dispersion relations without any extra model assumptions. This implies that experimental techniques providing both active and reactive response components independently, such as spectroscopic ellipsometry in optics, allow an extrapolation-independent determination of spectral weight 'hidden' below the lowest accessible frequency.


INTRODUCTION
Global sum rules applied to response functions play a major role in physics as they quantitatively express fundamental conservation laws. Of interest are also partial, or restricted sum rules, where a properly chosen cutoff frequency separates low-and high-energy degrees of freedom of a physical system. For example, the low-frequency optical spectral weight [1] W (ω c ) = ωc 0 σ 1 (ω)dω (1) is a partial counterpart of the well known f -sum rule ∞ 0 σ 1 (ω)dω = πne 2 /(2m e ) for the optical conductivity σ(ω) = σ 1 (ω) + iσ 2 (ω), where n is the density of charges, e and m e are the charge and the bare mass of electron.
In charge conducting systems, the integral up to a cutoff frequency ω c somewhat larger than the free-carrier scattering rate but below the energies of transitions from occupied to empty bands is proportional to the number of carriers and the inverse band mass averaged over the Fermi surface. For example, in the simple Drude model, integrating out to 10 (20) times the scattering rate recovers 94% (97%) of the sum rule. The effective mass can be strongly affected by electron correlations, especially in a case of narrow bandwidth. The dependence of W (ω c ) on temperature and other parameters, especially across a phase transition, is thus a valuable piece of information about the changes in the electronic system. The changes of W (ω c ) can be very large, as at the ferromagnetic -paramagnetic transition in colossal magneto-resistance manganites [2,3], or rather small as at the superconducting transition in the high-T c cuprates [4,5,6,7]. Even in the latter case, the subtle variations of the low-frequency spectral weight may potentially distinguish between physically different scenarios of superconductivity [8]. If a significant spectral overlap between the free charge and interband peaks is present, the intrepretation can be less obvious and requires direct theoretical calculation of the value of W (ω c ). Therefore it is worth using all means, experimental and computational, to improve the accuracy of W (ω c ) determined from the available spectra. Notably, the partial sum rules can be meaningfully applied not only to the optical spectra, but also in acoustical data, neutron scattering and other spectroscopic techniques.
Because of limitations on the bandwidth of any spectrometer, σ 1 (ω) is not experimentally available down to zero frequency. The direct application of Eq.(1) assuming some low-frequency extrapolations of σ 1 (ω) may lead to significant and uncontrollable error bars. This is most obvious for the optical conductivity of a superconductor, where the spectral weight of the condensate of the Cooper pairs is represented in σ 1 (ω) by a δ-peak at zerofrequency. Here the inductive component σ 2 (ω) has to be used to estimate the condensate spectral weight [9,10]. However, the common procedure used in this case still requires the extrapolation down to zero frequency.
Importantly, certain experimental techniques, such as spectroscopic ellipsometry, or simultaneous measurement of acoustical attenuation and the sound speed allow direct independent measurement of both components of the response function. The purpose of this article is to show that the spectral weight W (ω c ), including a possible zerofrequency δ-peak, can be obtained model-independently, i.e. without any a priori assumptions about the lowand high-frequency spectral behavior, if both σ 1 (ω) and σ 2 (ω) are measured in a limited frequency range [ω min , ω max ]. We also present an efficient numerical algorithm optimized to reduce the output error bars in the case of the noisy data.
In particular, these functions obey the Kramers-Kronig (KK) dispersion relations: where ℘ denotes the principal-value integral. Note that relations (2) and (3) are entirely model-independent since they only use the fact that the response function is analytical in the upper complex semiplane. One can say that the information about σ 1 (ω) is encoded in σ 2 (ω) and vice versa.
The determination of the partial sum-rule integral (1) is intimately related to a more general problem to restore the function σ(ω) itself outside the experimental range. It is well known [11] that a complex function σ(ω) analytical (holomorphic) in a certain domain D can be analytically continued from a subset Γ of the boundary of this domain into the whole domain, including the rest of the boundary. The specific form of such a continuation has been a subject of numerous studies since the late 1920's [13], which are summarized in Ref. 12. In particular, the Carleman-Goluzin-Krylov formulas restore the function exactly [14] inside the analyticity domain from its values at Γ. They have the following general structure: There are several possibilities to choose the kernel Q n (ω ′ , ω). We point out just one option for the case Γ = [ω min , ω max ] [12] that illustrates the general property of these kernels: they oscillate as a function of both variables, with the frequency of oscillations increasing infinitely as n increases. Therefore the 'lim' operation cannot be applied directly to Q n (ω ′ , ω) as it only exists for the whole integral (4). Using Eqs. (4) and (5) one can, in principle, obtain the spectral weight where u n (ω ′ , ω c ) = ωc 0 Q n (ω ′ , ω)dω. In spite of the fact of its formal existence, the analytical continuation of a function from a finite interval appears to be a very ill-posed problem. For example, one can construct an analytical function, which is almost zero in the range [ω min , ω max ] but shows intense spectral structure below ω min (Fig. 1). Let us consider σ(ω) to be a sum of two narrow Lorentzians: the first one with spectral weight A centered at ω 0 below ω min and the second one with the opposite spectral weight −A slightly displaced to ω 0 + ∆, still below ω min . Normally, σ 1 (ω) cannot be negative; however, in this example one can think of σ 1 (ω) as an addition to some positive background response function, which makes the result always positive. A similar function would appear, in particular, if one takes a difference between two optical conductivities of the same sample at two different temperatures in a case when the conductivity contains a single optical phonon peak which shifts as a function of temperature (assuming that the phonon spectral weight remains unchanged).The width of the peaks is assumed to be much less than ∆). The corresponding σ 2 (ω) far from ω 0 would be approximately (−4Aω 0 ω∆/π)(ω 2 − ω 2 0 ) −2 . By decreasing ∆, both σ 1 (ω) and σ 2 (ω) can be made vanishingly small in the range [ω min , ω max ]. Obviously, the slightest noise on top of σ(ω) that would make it indistinguishable from zero prevents the extraction of the mentioned strong structures beyond the accessible range.
In the considered example it was essential that the spectral weights of the two peaks exactly compensate each other in order to get the vanishing values of σ 1 (ω) and σ 2 (ω) at [ω min , ω max ]. Otherwise one would get a detectable term ∼ 1/ω in σ 2 (ω) proportional to the total low-frequency spectral weight [15,16]. This indicates that the 'hidden' spectral weight is much better determined by the limited-range data than the function σ 1 (ω) itself. This is supported by the study of Aspnes [17], who examined the possibility to extrapolate an ellipsometrically measured dielectric function beyond the experimental range using the Kramers-Kronig relations and found that the total spectral weights of a few broad spectral regions can be restored reasonably well while the high-resolution details of the spectra cannot be unambiguously determined. Some analytical treatments of the problem of the finite frequency range can be found in Refs. [18,19,20].

A PRACTICAL ALGORITHM FOR THE NOISY DATA
We are interested in an efficient and accurate numerical scheme to determine the sum-rule integral (1)). It turns out that the straightforward application of the formulas (4) and (6) with strongly oscillating kernels to real data is not practical as it amplifies uncontrollably the experimental noise. In this case one has to look for different numerical algorithms.
The experimental spectra are collections of data points σ 1,j ± δσ 1,j and σ 2,j ± δσ 2,j on a mesh of frequencies ω j (j = 1, .., N ). We assume that the spectral resolution is roughly the same as the distance between neighboring points. According to Eq. (6), W (ω c ) is a linear function of the real and imaginary parts of σ. We note that this directly follows from the fact that the KK relations are linear and is independent from the particular scheme of analytical continuation. Hence it is logical to take the following formula for the calculations: which is the most general linear relation between W (ω c ) and the measured values σ 1,j and σ 2,j . The coefficients u 1,j and u 2,j that we call hereafter u-coefficients have to be chosen in such a way that formula (7) is approximately correct for any arbitrarily chosen response function. Since any causal response function can be represented as a linear superposition of narrow oscillator response functions S x (ω) = S 1x (ω) + iS 2x (ω) centered at all frequencies x, one should optimize the u-coefficients in such a way that it gives a reasonably accurate answer when applied to σ(ω) = S x (ω) for any x. For S 1x (ω), one can take a narrow peaked function, for example a Gaussian, while S 2x (ω) should be the KK transform of S 1x (ω). We introduce a function D(x), which is the inaccuracy of the formula (7) when applied to S x (ω) The integrated inaccuracy can be defined as follows In order to make the formula (7) accurate for all x at the same time, one would need to minimize D 2 int by varying the u-coefficients. If experimental noise is present, one should also take care that the error bar of the resulting spectral weight due to the noise does not become too large. In general, D 2 int and F 2 have to be minimized simultaneously. A more detailed description of this algorithm is given in the Appendix 1.
We implemented this idea in a numerical code Devin [21], which takes a set of data points of σ 1 (ω) and σ 2 (ω) with error bars in a limited range [ω min , ω max ] and returns the estimated value and error bar of W (ω c ) for specified cutoff frequency ω c . It is assumed that the real and imaginary parts of the response function are KKconsistent; otherwise output error bars are unpredictable. Note that the KK consistency of the limited-range data can be tested using exact bounds proposed in Ref. [19].

RANDOM TESTS OF THE METHOD
A series of Monte-Carlo tests were performed where the program answer based on limited-range spectral information can be compared with the exact value of W (ω c ), allowing one to scrutinize the model independence of the method. We generate a random KK-consistent function σ(ω) by adding up a random number of peaks of random spectral weights A k , widths γ k and center frequencies ω 0,k distributed below, inside and above the range [ω min , ω max ]. In particular, a sum of Lorentz peaks plus high-frequency background was taken [22]: where the number of peaks is changed between 1 and 50. The parameters were varied in the following limits: A k -between 0 and 125, γ k -between 0 and 3, ω 0,k -between 0 and 5, ǫ ∞ -between 0 and 5. The first oscillator was always a delta function at zero frequency (ω 0,1 = 0, γ 1 = 0), imitating a condensate peak in superconductors. The 'experimental' data points sent to the program were generated with a step of 0.1 inside the range [ω min , ω max ] by convoluting the true functions with a Gaussian of width 0.1 to mimic the finite resolution and adding random noise with standard deviation of δσ. Each test series consisted of 30 independent generations of σ(ω) and a comparison of the true and calculated W (ω c ).
The test results are summarized in Fig.2. The series (a)-(c) demonstrate the sensitivity of the extracted spectral weight to the data noise. The experimental range [1,4] and the cutoff ω c = 1 are the same, while δσ varies from 0.5 to 10. One can see that for sufficiently small noise the program is able to provide quite accurate value of W (ω c ) for all 30 random inputs, which demonstrates that the method is indeed model-independent. On the other hand, relatively large error bars in series (b) and (c) show that the requirements to the signal-to-noise ratio for this particular set of data points are quite strict (∼ 1 %), although not unrealistic.
Series (a), (d) and (e) give a feeling of how the accuracy of this procedure depends on the width of experimental range at a constant noise amplitude. The range was consecutively narrowed from [1,4] to [1,2]. The range width appears to be a critical factor determining the method precision. The output error bars increase rapidly as we restrict the experimental range -much faster, in fact, than what one would normally expect due to a simple decrease of the number of data points.
By comparing series (a), (f) and (g) one can see that the error bars increase dramatically if the cutoff ω c is taken beyond the experimental range, both below ω min or above ω max . This is another indication that one can reliably determine the total 'hidden' spectral weight but not a part of it since the latter depends on the inaccessible spectral details of σ 1 (ω). Above 6000 cm −1 , σ1(ω) and σ2(ω) were measured directly by spectroscopic ellipsometry; below 6000 cm −1 they were derived with error bars from simultaneous KKconstrained fit of reflectivity [24], measured down to 100 cm −1 and the ellipsometric spectra. (b) The value of W (ωc) calculated for ωc=8000 cm −1 as a function of ω min for ωmax=20000 cm −1 . (c) The value of W (ωc) for ωc=8000 cm −1 as a function of ωmax for ω min =100 cm −1 . The error bars are indicative, as estimated by the program.

REAL-DATA EXAMPLE
In Fig.3 the application of this program to real data is demonstrated. As an example, we calculate the sum-rule integral for the optical conductivity of a high-T c superconducting compound Bi 2 Sr 2 Ca 2 Cu 3 O 10 , based on data published in Ref. 23. We use a set of data which spans the interval of wavenumbers ω/(2πc) between 100 cm −1 and 20000 cm −1 . In Fig.3b each point corresponds to a value of W (ω c ) calculated on the basis of input data from ω min to ω max =20000 cm −1 as a function of ω min . The error bars are relatively small as long as ω min is less than ω c , but grow explosively as ω min exceeds ω c . In Fig.3c each point corresponds to a value of W (ω c ) calculated on the base of input data from ω min =100 cm −1 to ω max as a function of ω max . When ω c is above the highest frequency point, the error bars again tend to diverge.

DISCUSSION AND CONCLUSIONS
One should note that the the involved algorithm that we used to extract the partial sum-rule integral W (ω c ) is not the only possibility to tackle this problem. Perhaps a more conventional and intuitive method is to fit the available spectra with a KK-consistent multi-parameter function, such as the Drude-Lorentz model or the variational function of Ref. [24]. A similar method of the Kramers-Kronig integrals was used by Aspnes [17].
Interestingly, all fitting schemes where the model function is a linear superposition of the trial basis functions eventually reduce to the same type of linear formula (7), as demonstrated in the Appendix 2. However, there is an important difference between the two approaches. In the method that we used in this paper, the u-coefficients are optimized in order to minimize the output error bars δW (ω c ), while in the fitting approach they are predetermined by the specific set of trial basis functions, which may not be always optimal. In this sense, our approach is more general and model-independent.
On another hand, the data fitting approach can be straightforwardly applied also to experimental quantities that depend on σ 1 (ω) and σ 2 (ω) in a non-linear way, for example, the optical reflectivity. Another advantage of the data fitting technique is that it can detect if σ 1 (ω) and σ 2 (ω) are KK inconsistent, for example, due to the systematic experimental uncertainties. Thus it is preferable to use both techniques in a combination.
In summary, we have shown that the partial sum-rule analysis can be accurately performed on the basis of the real and imaginary parts of a response function, if the latter is experimentally available in a limited spectral range. In a sense, nature integrates for us σ 1 (ω) beyond the accessible range and encodes information about this integral in σ 2 (ω) inside the range where the experimental data is available. We have shown that it can be decoded using a simple linear formula (7) with optimally chosen u-coefficients. The determination is accurate only if the cutoff frequency is lying inside the accessible interval. Even though the interval can be, formally speaking, arbitrarily small, the extraction of the sum-rule integral for a narrow interval would require much better data accuracy to obtain equivalent precision of W (ω c ) than for a broad one. We find, however, that the error bars are not uncontrollably large due to the notorious extrapolation uncertainty. They can be, at least in principle, made arbitrarily small by decreasing the experimental noise. This result is valid for all response functions satisfying the KK relations and thus applies to various domains of spectroscopy.
Here we describe a method that we use to optimize the u-coefficients in the approximate formula The starting point is the spectral representation of the response function: where S(ω, is a response of an oscillator centered at frequency x (and at −x to preserve the parity of the response functions). Put differently, any causal response function can be represented as a linear superposition of narrow oscillator functions. In reality, one has to convolute S(ω, x) with an apparatus function A(ω, ω ′ ), for example, a Gaussian, which takes the experimental resolution into account:S(ω, x) = A(ω, ω ′ )S(ω ′ , x)dω ′ .
The key idea of the method is based on the linearity of Eq. (10): in order to make this formula applicable to any σ(ω), one should optimize the u-coefficients in such a way that it gives a reasonably accurate answer when applied to σ(ω) =S(ω, x) for any x.
To be quantitative, one can calculate a discrepancy function D(x) for a given combination of the ucoefficients One can also define the discrepancy integrated over all x In order to make the formula (10) accurate for all x simultaneously (assuming that it is possible!), one should minimize the integral discrepancy by varying the ucoefficients. This is easy since where the coefficients are given by the integrals over x: The minimization of D 2 int reduces to the linear system of equations: In a case when the (frequency-dependent) spectral resolution δ(ω) is given by a Gaussian the above integrals can be taken analytically, which simplifies dramatically the calculations: and ]/2 and δ c = δ(ω c ). Here we used auxiliary functions where erf(x) is the error function, erfi(x) = erf(ix)/i the imaginary error function and 2 F 2 (.., x) the hypergeometric function.
So far we ignored the experimental uncertainty of σ 1 (ω) and σ 2 (ω), which is another source of the output error bars. If we assume that it is just random noise with the standard deviations at each frequency given by δσ 1,j and δσ 2,j then the standard deviation of the output of Eq.(10) is: Our experience shows that the optimization of D 2 int alone may provide very inaccurate results for a noisy input. Instead, the minimization of a compound functional D 2 int + wF 2 , where w is a weighting coefficient which is discussed later, works much better. From a numerical point of view, adding wF 2 enhances the diagonal elements of the matrix A: and makes it better conditioned. This addition prevents the u-coefficients from growing too much, therefore one can consider it as a regularization term.
The coefficient w describes the relative significance of the data noise compared to the inaccuracy of the linear Eq. (10). The subtlety is that the optimal value of w is determined by W (ω c ) and is thus not known a priori. A way out is to use a second optimization loop for w. As a criterion, it is logical to minimize the estimated total error bar (δW ) 2 , i.e. caused by both the formula uncertainty and the input noise.
A simple estimate that we use (which can be perhaps improved) is the following. For a given set of the ucoefficients, one can find W a , F 2 and D 2 int . Since the two error sources are independent, one can approximately determine the range of possible values of the true spectral weight W by the inequality: where κ is a number of the order 1-10. The goal of this parameter is to adjust the accuracy of the rough estimate (15). The variation of κ by one order of magnitude does not significantly affect the value of W 0 , although it does modify δW . We found that κ = 5 gives the best results in the numerical tests described in the main text. It is likely that more accurate estimates can remove an ambiguity here. The inequality (15) can be resolved with respect to W W 0 − δW < W < W 0 + δW.
where (b = κD 2 int /ω c ): After the numerical minimization of (δW ) 2 as a function of w, the u-coefficients become dependent on σ 1,j and σ 2,j . For example, in a case of a very large value of W , the relevant importance of the input error bars is small and the optimal u-coefficients tend to oscillate stronger than in a case of small W . Fig. 4 shows some examples of the u-coefficients optimized as described above and the corresponding discrepancy function D(x) for different sets of experimental frequencies and noise levels. Clearly, the spectral discrepancy function can be made quite small for all x by a proper optimization. The comparison between panels (a) and (b) tells that the discrepancy function is smaller if the range of experimental frequencies is broader. On the other hand, from panels (b) and (c) one concludes that taking error bars into account makes the u-coefficients much smaller, which results in a somewhat larger D(x) but in a better overall accuracy (δW ) 2 (not shown).
Substituting Eq. (19) and (20) into Eq. (18) we obtain, after some transformations, an expression identical to Eq. (10): with the u-coefficients independent of σ ν,j : This implies that the strategy to obtain W (ω c ) by fitting σ 1 (ω) and σ 2 (ω) is nothing else but the application of the linear formula (10) with specific u-coefficients determined by the basis of functionsσ k (ω).