A first look at quasi-Monte Carlo for lattice field theory problems

In this project we initiate an investigation of the applicability of Quasi-Monte Carlo methods to lattice field theories in order to improve the asymptotic error behavior of observables for such theories. In most cases the error of an observable calculated by averaging over random observations generated from an ordinary Monte Carlo simulation behaves like 1/sqrt(N), where N is the number of observations. By means of Quasi-Monte Carlo methods it is possible to improve this behavior for certain problems to up to 1/N. We adapted and applied this approach to simple systems like the quantum harmonic and anharmonic oscillator and verified an improved error scaling.


Introduction
Four-dimensional quantum field theories play a crucial role in the mathematical description of the fundamental forces of nature. The path integral formalism developed by R.P. Feynman has been established since decades to quantize classical field theories and thus, to formulate quantum field theories. A quantum field theory (QFT) describes the behavior of certain particles and their interaction. In the absence of an interaction the path integral of a QFT is usually trivial. It is mostly the interaction term that makes the path integral challenging to evaluate. In cases where the coefficient of the interaction term, the coupling constant, is small enough a perturbative treatment of the path integral is possible and often sufficient. However many interesting quantities, like e.g. the hadron spectrum, decay constants, certain matrix elements and form factors, have to be calculated in a regime of a strong coupling (constant), where a perturbative approach must fail. In such situations it is necessary to treat the path integral non-perturbatively. Because of the lack of closed form solutions the path integral can only be evaluated numerically. In order to do so, it is necessary to give the path integral a mathematically well defined meaning. A straightforward method is it to discretize space and time by introducing a, moreover Euclidean space-time lattice with a fixed spacing between two neighboring lattice points, the lattice spacing. Restricting the system to a finite lattice extension (and applying certain boundary conditions) the infinite-dimensional path integral is converted to a finite-dimensional integral. (In principle, one still has to perform the transition to zero lattice spacing and therefore to infinitely many space-time points at the end.) Such lattice path integrals can easily have dimensions of O(10 9 ) (e.g. simulations of latticediscretized quantum chromodynamics (QCD), the theory of strong interactions of elementary particles). The high dimensionality of the problem restricts the spectrum of applicable algorithms to Monte Carlo-based methods. Especially Markov chain-Monte Carlo (Mc-MC) methods have been successfully applied since the beginning of the study of lattice field theories. With the algorithms employed observables calculated from a Monte Carlo chain of N steps will obey a statistical error proportional to 1/ √ N . Recent developments in the field of Quasi-Monte Carlo (QMC) methods, discussed in more detail in sections 3 to 5, show that under certain conditions it is possible to construct sets of samples of integration points leading to much faster rates of convergence of an observable and a much better asymptotic error behavior of up to 1/N . Such an improved error behavior would decrease the number of samples necessary to achieve a certain error bound, resulting in a drastic reduction of runtime. Note that for present computations in field theory state-of-the-art supercomputers are used. It is unclear, whether QMC methods can be used for lattice field theory simulations. As a first step, to nevertheless investigate this possibility, we will focus in this work on the study of much simpler models, namely the quantum mechanical harmonic and anharmonic oscillator.

Quantum Mechanical Harmonic and Anharmonic Oscillator
In this section we will discuss the basic steps for the quantization of the theory in the path integral approach and the discretization on a time lattice. The first step is the construction of the Lagrangian (resp. the action) of the corresponding classical mechanical system for a given path x(t) of a particle with mass M 0 . For a numerically stable evaluation of the path integral it is essential to pass on to Euclidean time. In this case the Lagrangian L and the action S is given by: Depending on the scenario (harmonic or anharmonic oscillator) the potential V (x) consists of two parts such that the parameter λ controls the anharmonic part of the theory. It should also be mentioned that in the anharmonic case the parameter µ 2 can take on negative values, leading then to a double well potential. The next step is to discretize time into equidistant time slices with a spacing of a. The path is then only defined on the time slices: On the lattice the derivative with respect to the time appearing in (1) (first term) will be replaced by the forward finite difference ∇x i = 1 a (x i+1 − x i ). The choice of the lattice derivative is not unique and requires special care, particularly if one considers more complicated models like lattice QCD. But in [1] it was shown that the lattice derivative chosen here permits a well defined continuum limit. Putting all the ingredients together, we can write down the lattice action for the (an)harmonic oscillator For the path a cyclic boundary condition x d+1 = x 1 can be assumed. In the following the superscript "latt" will be dropped, as we will only refer to the lattice action from now on. The expectation value of an observable O of the quantized theory expressed in terms of the path integral reads as follows: This expression is suitable for a numerical evaluation of certain quantities of the underlying theory. Up to now only Monte Carlo methods are known to give reliable results for dimensions d 10.
One type of such methods, often used in physics, is the Markov chain-Monte Carlo approach mostly applying the weight ∝ e −S(x) for sampling paths {x i } (so-called "importance sampling"). Especially the Metropolis algorithm [2] is suitable and a straightforward solution of (7) (also described in [1]) and serves as a reference method for the QMC approach, which is much less intuitive. The theory of QMC methods is a purely mathematical topic. During the discussion of the key aspects of QMC, following in the next sections, we will stick to a rather mathematical language, being more adequate for the description of a mathematical issue.

Direct Monte Carlo and quasi-Monte Carlo methods
In many practical applications one is interested in calculating quotients of the form (7) where the action S(.) and the observables O(.) are usually smooth functions in high dimensions. In some special situations where one would like to deal with integrands of moderately high dimensions, one may consider an estimator for the integral I 1 in the numerator and I 2 in the denominator of (7) separately, and then take I 1 /I 2 as an estimation of O(x) . Another possibility is to take a joint estimator for the total quantity O(x) using a single direct sampling method. A well known approach based on direct sampling is the so called weighted uniform sampling (WUS) estimator, analyzed in [3]. We will show some characteristics of the WUS estimator in section 6, and we will refer from now on to these methods as plain or direct sampling methods for estimating (7). In many interesting examples, we encounter the case were the action S(.) and the observable O(.) lead to integrals I 1 ,I 2 of Gaussian type. Then the integrals I 1 ,I 2 can be written in the form where C denotes the covariance matrix of the Gaussian density function. A transformation to the unit cube in R d can be applied such that the corresponding integrals take the form Here AA = C is some symmetric factorization of the covariance matrix, and Φ −1 (z) : represents the inverse of the normal cumulative distribution function Φ(·). In the classical plain or direct Monte-Carlo (MC) approach one tries to estimate (8) by generating samples pseudo-randomly. One starts with a finite sequence of independent identically distributed (i.i.d.) samples P N = {z 1 , . . . , z N }, where the points z j , 1 ≤ j ≤ N , have been generated from the uniform distribution in [0, 1] d . Then, the quadrature rule is fixed by taking the average of the function evaluations for f as an approximation of the desired The resulting estimatorQ N is unbiased. The integration error can be approximated via the central limit theorem, given that f belongs to L 2 ([0, 1] d ). The variance of the estimatorQ N is given by As measured by its standard deviation from zero the integration error associated with the MC approach is then of order O(N − 1 2 ). The quality of the MC samples relies on the selected pseudorandom number generators of uniform samples, here we use the Mersenne Twister generator from Matsumoto and Nishimura (see [4]). MC is in general a very reliable tool in high-dimensional integration, but the order of convergence is in fact rather poor.
In contrast, quasi-Monte Carlo (QMC) methods generates deterministically point sets that are more regularly distributed than the pseudo-random points from MC (see [5], [6], [7], [8]). Typical examples of QMC are shifted lattice rules and low-discrepancy sequences. To explain what we mean by "regularly distributed", we define now the classical notion of discrepancy of a finite sequence of points P N in [0, 1) d . Given P N = {z 1 , . . . , z N } a set of points in [0, 1) d , and a nonempty family I of Lebesgue-measurable sets in [0, 1) d , we define the classical discrepancy function by where c B is the characteristic function of B. This allows us to define the so called star discrepancy.
The star discrepancy can be considered as a measure of the worst difference between the uniform distribution and the sampled distribution in [0, 1) d attributed to the point set P N . The usual way to analyze QMC as a deterministic method is by choosing a class of integrand functions F , and a measure of discrepancy D(P N ) for the point sets P N . Then, the deterministic integration error is usually given in the form where V (f ) measures a particular variation of the function f ∈ F . A classical particular error bound in this form is the famous Koksma-Hlawka inequality, where D(P N ) is taken to be the star discrepancy of the point set P N , and V (f ) is the variation in the sense of Hardy and Krause of f . In the context of QMC, a sequence of points in [0, 1) d is called a low-discrepancy sequence if D (P N ) = O(N −1 (log(N )) d ) for all truncations of the sequence to its first N terms.

Quasi-Monte Carlo errors and complexity
There are certain reproducing kernel Hilbert spaces F d of functions f : [0, 1] d → R which are particularly useful for estimating the quadrature error of QMC methods (see [9]). Consider a kernel K : We denote now with ·, · and · the inner product and norm in F d . If the integral is a continuous functional on the space F d , then the worst case quadrature error e N (F d ) for point sets P N = {z 1 , . . . , z N } and quasi-Monte Carlo algorithms for the space F d can be given by due to Riesz' representation theorem for linear bounded functionals. In this case, the representer h N ∈ F d of the quadrature error is given by In QMC error analysis, one usually considers the weighted (anchored) tensor product Sobolev space introduced in [10] with the weighted norm f 2 γ = f, f γ and inner product f, g γ = u⊆{1,...,d} j∈u where for u ⊆ {1, . . . , d} we denote by |u| its cardinality, and (z u , 1) denotes the vector containing the coordinates of z with indices in u, and the other coordinates set equal to 1. The corresponding reproducing kernel is given by There are several other examples considered for error analysis. For example, the weighted Walsh space consisting of Walsh series (see [7, Example 2.8] and [11]). The weighted tensor product Sobolev space allow for explicit QMC constructions deriving error estimates of the form where the constant C(δ) is independent on the dimension d, given that the sequence of weights (γ j ) satisfies (see [12]) Traditional unweighted function spaces considered for integration suffer the from the curse of dimensionality. Their weighted variants describe a setting where the variables or group of variables may vary in importance. Thus, they give a partial explanation of why some very high-dimensional spaces become tractable for QMC.
Explicit QMC constructions satisfying (9) are shifted lattice rules for weighted spaces. The rate (9) can be also obtained for Niederreiter and Sobol' sequences (see [13]).
The idea of "weighting" the norm of the spaces to obtain tractable results can be applied in fact to more general function spaces than smooth function spaces of tensor product form, and many integration examples can be found in [6]. In our numerical experiments, we used so far QMC algorithms based on a particular type of low-discrepancy sequences. Numerical experiments with shifted lattice rules will be carried out in the near future, following new techniques for fixing adequate weights introduced in [14].

Low-discrepancy (t, d)-sequences
The most well known type of low-discrepancy sequences are the so called (t, d)-sequences.
An interval of this form is called an elementary interval in base b. The parameter t is called the quality parameter of the (t, d)-sequences. In [15], theorem 4.17, it is shown that (t, d)-sequences are in fact low-discrepancy sequences. We reproduce this result in the following Theorem 4.3 The star-discrepancy D of the first N terms P N of a (t, d)-sequence in base b, where the implied constants depend only on b and d. If either d = 2 or b = 2, d = 3, 4, we have Explicit constructions of (t, d)-sequences are available. Some of them are the generalized Faure, Sobol', Niederreiter and Niederreiter-Xing sequences. All these examples fall into the category of constructions called digital sequences. We refer to [7] for further reading on this topic.

Randomized QMC
There are some advantages in retaining the probabilistic properties of the sampling. There are practical hybrid methods permitting us to combine the good features of MC and QMC. Randomization is an important tool for QMC if we are interested for a practical error estimate of our sample quadrature Q N to the desired integral. One goal is to randomize the deterministic point set P N generated by QMC in a way that the estimatorQ N preserves unbiasedness. Another important goal is to preserve the better equidistribution properties of the deterministic construction.
The simplest form of randomization applied to digital sequences seems to be the technique called digital b-ary shifting. In this case, we add a random shift ∆ ∈ [0, 1) d to each point of the deterministic set P N = {z 1 , ..., z N } using operations over the selected ring F b . The application of this randomization preserves in particular the t value of any projection of the point set (see [5] and references therein). The resulting estimator is unbiased. The second randomization method we present is the one introduced by Art B. Owen ([16]) in 1995. He considered (t, m, d)-nets and (t, d)-sequences in base b and applied a randomization procedure based on permutations of the digits of the values of the coordinates of points in these nets and sequences. This can be interpreted as a random scrambling of the points of the given sequence in such a way that the net structure remains unaffected. We do not discuss here in detail Owen's randomization procedure, or from now on called Owen's scrambling. The main results of this randomization procedure can be stated in the following  The last two propositions state that after Owen's scrambling of digital sequences we retain unaffected the low discrepancy properties of the constructions, and that after this randomization procedure we obtain random samples uniformly distributed in [0, 1) s .
The basic results about the variance of the randomized QMC estimatorQ N after applying Owen's scrambling to (t, m, d)-nets in base b (or of (t, d)-sequences in base b ) can be found in [17]. We summarize these results in the following Then for the variance V (Q N ) of the randomized QMC estimator it holds For t = 0 we have The above theorem says that the variance of scrambled (0, m, d)-nets is never more than 3 times the variance of the corresponding MC estimator. The bound of the theorem above can be improved (see theorem 13.9 in [7]) to show that the variance of scrambled (0, m, d)-nets are in fact always smaller than the variance of the MC estimator. If the integrand at hand is smooth enough, using Owen's scrambling it can be shown that one can obtain an improved asymptotic error estimate of order O(N − 3 2 − 1 d +δ ), for any δ > 0, see [18]. Improved scrambling techniques have been developed in [19], [20].

Weighted uniform sampling
Weighted uniform sampling is a way of estimating a quotient of integrals of the form by taking the estimatorR where the points z j , 1 ≤ j ≤ N , have been generated from the uniform distribution in [0, 1] d . This estimator was analyzed in [3] and applications have been investigated for example in [21] and [22]. The bias and the root mean square error (RMSE) of this estimator satisfy The bias of the estimator is asymptotically negligible compared with the RMSE. One clear disadvantage of WUS against Mc-MC or Importance Sampling for problems with large regions of relative low values of the integrands is that with WUS we sample over the entire unit cube [0, 1] d uniformly, while Mc-MC and Importance Sampling based techniques try to concentrate in more characteristic or important regions of the integrands. These limitations where observed in our numerical experiments.

Numerical experiments
We consider for our numerical tests the quantum mechanical harmonic and anharmonic oscillator in the path integral approach as described in section 2. For definiteness we repeat here the expression for the action of the system: We investigate the two observable functions x 4 i , using the notation X 2 , X 4 for O 1 (x) , O 2 (x) in our tests.

Harmonic Oscillator
For the harmonic oscillator we can apply immediately the direct sampling approach described in sections 3 and 6 for calculating estimates of observables O by setting in (10). The matrix A is a square root of C, the covariance matrix of the variables x i , appearing in the action if it is expressed as a bilinear form: S = 1 2 x T C −1 x. Different factorizations, namely Cholesky and PCA (principle component analysis) have been tried out. The PCA based factorization seemed to perform better in our tests, which is the reason why we will only show results for this method. The PCA can be explicitely obtained for circulant Toeplitz matrices and the matrix-vector products can be efficiently computed by means of the fast Fourier transform. In the ordinary Mc-MC approximation, we used the Mersenne Twister [4] pseudo random number generator. For the QMC tests, we use randomly scrambled Sobol' sequences using the technique proposed by J. Matousěk [19]. The error of X 2 was obtained by scrambling 10 times the QMC sequence and making 10 runs of an Mc-MC simulation (with different seeds). This procedure is repeated 30 times in both cases to obtain the error of the error. From the results, shown in figure 1, we can see a scaling that agrees perfectly with the expected behavior, namely N −0.5 for Mc-MC and N −1 for QMC, for large N . Although this example is trivial, it was our first successful application of the QMC approach in a physical model and motivated us to pass on to more complicated models.

Anharmonic Oscillator
The WUS approach was also used for this problem to estimate X 4 and X 2 .
With the anharmonic term in the action the distribution function of the variables x i becomes very complicated. This makes it very hard to generate the samples directly from the PDF of the anharmonic oscillator. Instead of this, the anharmonic term and a part of the harmonic term is treated as part of the weight functions f 1 and f 2 in (10), leaving the sampling procedure of the x i as it was for the harmonic oscillator, accept for a different factor µ 2 sim in front of the harmonic term This procedure is neccessary, because of C = A T A being positive definite only if µ 2 sim > 0, which is neccessary for the existence of A −1 during the sampling procedure. Further, it is important to note that the PCA factorization during the generation of the gaussian samples is essential for an efficient reduction of the effective dimension (see [23]) of the problem. For the parameters listed below, we estimated the effective dimensions of the functions 12 to be close to 20 (for a 99% variance concentration). On the other hand we found out that the effective dimension depends also very strong on the parameter T = da, the physical time extent of the system. For small Tvalues, say T < 0.2, the effective dimension is reduced sufficiently good like in the harmonic case, such that the QMC approach leads to a 1/N error scaling. The situation changes for T = 1.5, where the error behaves only like 1/N 0.75 , due to the increase of the effective dimension. The parameters were set to M 0 = 0.5, λ = 1.0, µ 2 = −16. In the two tests the parameters a and µ 2 sim had been adjusted such, that T was kept fixed. We set a = 0.015 and µ 2 sim = 0.015 for d = 100, whereas for d = 1000 a = 0.0015 and µ 2 sim = 0.0015 was chosen. The error analysis of X 2 and X 4 has been adopted from the harmonic oscillator test case described in the last subsection PCA results for RQMC seem satisfactory. The resulting estimation of the ground state energy matches in at least two significant digits with the theoretical value, E 0 = 3.863..., calculated in [24], namelyÊ 0 = 3.856 ± 0.004 for d = 100 andÊ 0 = 3.864 ± 0.003 for d = 1000.

Concluding Remarks
For the harmonic oscillator we found a large-N error behavior as expected for QMC (∼ 1/N ) and Mc-MC (∼ 1/ √ N ). Also for the anharmonic oscillator the estimation procedure leads to a significant improvement when employing the QMC approach. In this case, the error scaling is only of O(N −0.75 ) instead of the theoretically best case of O(N −1 ). Further, we found that the applicability of the WUS approach seems to be limited by the physical time extent T = da. Stable results could only by found for values T ≤ 1.5. On the other hand, the choice of a does not seem to have any effect and the accessible range of T values gives already estimates of the ground state energy, compatible (within errors) with the theoretical prediction (valid in the limit T → ∞ and a → 0). For the case that the improved error scaling and the mild dependence on the lattice spacing a found here will also be present in more elaborate models, the QMC has the potential to become very valuable in the future.