Simulated annealing for tensor network states

S Iblisdir

doi:10.1088/1367-2630/16/10/103022

Introduction

Local quantum Hamiltonians are central to our understanding of condensed matter systems. Our description of exotic phenomena, such as superconductivity or quantum magnetism, relies on finely tuned models for short-range, particle–particle interactions. Although simple in their expression and interpretation, such Hamiltonians are generally difficult to study; most of them cannot be solved exactly or perturbatively, while entanglement makes mean field methods conceptually inadequate for them.

In this respect, tensor network states have risen as very powerful instruments during the last two decades. The notion of matrix product states [1], which underlies the celebrated density matrix renormalisation group method (DMRG) [2], is now regarded as fundamental when dealing with gapped quantum spin chains. Extensions of this notion to two-dimensional (2D) or to critical systems are reshaping our views on issues such as topological order [3, 4] or conformal invariance [5]. There seems to be a deep reason for the adequacy of tensor network states at describing exotic phases: the latter exhibit finite but short-range entanglement. Area laws express this fact in precise statements [6].

The study of strongly correlated systems with tensor network states raises a fundamental issue, which we will refer to as the optimal state problem (OSP). It can be informally formulated as follows: given a particular class of tensor network states and a quantum Hamiltonian, how does one find the best approximation to the ground state of this Hamiltonian within that class? The physical relevance of this problem is obvious; important frustrated or superconducting systems are modeled by hard-to-solve Hamiltonians. But the problem has also aroused interest from a complexity theory perspective. In the case of matrix product states, some non-trivial instances of that problem are now known to belong to the P complexity class [7], while others have been shown to be NP-hard [8]. The Euclidean evolution (EE) or local optimisation method [9] are practical examples of methods for tackling the OSP, and they have proven hugely successful in a wide variety of situations. But they are all deterministic or greedy in some sense. As a result, they sometimes get stuck when applied to Hamiltonians whose energy landscapes are plagued with local minima, as very simple examples show [10]. The algorithms proposed in [7, 10] are conceptual breakthroughs in this respect; their validity is, however, limited to one-dimensional (1D) uniformly gapped Hamiltonians, and their implementations would be very tedious in practice.

The present paper is meant as a contribution to a systematic investigation of the OSP. We here wish to depart from previously proposed search methods and introduce a family of approximation schemes, based on the Metropolis algorithm¹ , for quantum Hamiltonians. These schemes are flexible, efficient, and relatively easy to implement. We will first consider 1D Hamiltonians and matrix product states. A simulated annealing algorithm for the OSP will be introduced, and its validity will be demonstrated in a concrete case study: a finite Ising chain. We will next move on to describe a natural adaptation of the algorithm to two dimensions and an implementation for Ising models on a 7 × 7 square lattice, where the scheme will be shown to compare well with standard search methods. An important advantage of our Markov chains, when dealing with (2D) systems, is the possibility of including the error made when evaluating mean values of observables [9]. This feature results in inherently stable algorithms yielding rigorous upper bounds on ground state energies.

In the present work, we will mainly discuss approximations to ground states of nearest neighbour Hamiltonians. But there are additional reasons to be interested in Monte Carlo methods for tensor networks. They are flexible enough to allow the study of more general problems for which DMRG and EE are not well suited. For example, it would be interesting to use Monte Carlo methods to construct tensor network approximations of high energy states. We will discuss such possibilities. We will conclude with a discussion of open questions, possible improvements of the method, and future work; the appendices contain a brief discussion of convergence and some technical aspects related to the 2D computations.

The Haar–Markov chain

Let us consider a system of n identical 'spin' particles confined on a line segment, interacting via some nearest-neighbour Hamiltonian H. From now on, we will assume open boundary conditions (OBC), and we will denote d the dimension of the Hilbert space $\mathcal{H}$ describing each particle. We aim at approximating the ground state of H with a matrix product state (MPS), i.e., a state of the form:

$\begin{eqnarray}&&|\psi \rangle =\mathop{\sum }\limits_{{{s}_{1}}\ldots {{s}_{n}}}{\rm tr}[{{A}_{1}}({{s}_{1}})\ldots {{A}_{n}}({{s}_{n}})]|{{s}_{1}}\ldots {{s}_{n}}\rangle ,\end{eqnarray} \tag{ 1 }$

where the matrices ${{A}_{1}}({{s}_{1}})$ have size $1\times \chi$ , the matrices ${{A}_{k}}({{s}_{k}}),k=2\ldots n-1$ have size $\chi \times \chi$ , and the matrices ${{A}_{n}}({{s}_{n}})$ have size $\chi \times 1$ . The integer χ, called the bond dimension, is a parameter that governs the amount of entanglement that can be supported by the state. ${{s}_{k}}\in \{1,\ldots ,d\}$ is a local degree of freedom associated with particle k. A possible way to prepare states of the form (1) is through unitary sequential interaction between an ancilla and an array of n particles at fixed locations, as indicated in figure 1. The fact that any MPS can be prepared in this way [12] allows us to define Markov chains on the configuration space $\Omega \equiv U{{(d\chi )}^{\otimes n}}$ without loss of generality; each tensor A_k can be viewed as a piece of some $d\chi \times \;d\chi$ unitary matrix u_k. The (unique) quantum state associated with a configuration $\omega =({{u}_{1}},{{u}_{2}},\ldots ,{{u}_{n}})$ will be denoted $|\omega \rangle$ .

**Figure 1.** Possible preparation of an MPS ( ${\rm OBC}$ ). The n-particle chain and an ancilla are all prepared in some product reference state. Sequential unitary interaction occurs next, and the ancilla is eventually projected and disregarded.
Download figure:
Standard image High-resolution image

**Figure 1.** Possible preparation of an MPS ( ${\rm OBC}$ ). The n-particle chain and an ancilla are all prepared in some product reference state. Sequential unitary interaction occurs next, and the ancilla is eventually projected and disregarded.
Download figure:
Standard image High-resolution image

Let $h(\omega )=\langle \omega |H|\omega \rangle /\langle \omega |\omega \rangle$ denote the energy associated with a configuration ω. We will be interested in Boltzmann probability distributions of the form:

$\begin{eqnarray}&&{{\pi }^{(\beta )}}(\omega )\equiv \frac{{{{\rm e}}^{-\beta h(\omega )}}}{\int {\rm d}\omega ^{\prime} {{{\rm e}}^{-\beta h({{\omega }^{\prime }})}}},\omega \in \Omega ,\end{eqnarray} \tag{ 2 }$

where ${\rm d}\omega$ denotes the Haar measure over $U{{(d\chi )}^{\otimes n}}$ . The ability to efficiently draw samples according to ${{\pi }^{(\beta )}}$ for large values of β would be an invaluable resource and would immediately lead to huge progress on the OSP; as β increases, the support of ${{\pi }^{(\beta )}}$ exponentially concentrates on low energy configurations. However, except for a few trivial cases, direct sampling of ${{\pi }^{(\beta )}}$ seems to be impossible. It is then natural to construct a Markov chain that produces samples (approximately) distributed according to ${{\pi }^{(\beta )}}$ . A simple such Markov chain is defined by the following transition rule (Haar–Markov):

Draw n unitary matrices $\{{{v}_{k}},k=1\ldots n\}\in U{{(d\chi )}^{\otimes n}}$ independently and identically, according to the Haar measure, using the recipe described in [13].
For each v_k, compute the matrix $v_{k}^{\alpha }$ of each v_k, where the power α is tuned at each iteration (see below)² .
From the current configuration $\omega =({{u}_{1}},\ldots ,{{u}_{n}})$ and these unitary matrices $\{v_{k}^{\alpha }\}$ , construct a candidate $\omega ^{\prime}$ as a site-by-site right matrix multiplication (see illustration in figure 2):
$\begin{eqnarray*}&&\omega ^{\prime} =({{u}_{1}}v_{1}^{\alpha },\ldots ,{{u}_{n}}v_{n}^{\alpha }).\end{eqnarray*}$
Accept the proposed move $\omega \to \omega ^{\prime}$ with Metropolis probability:
$\begin{eqnarray}&&P_{{\rm acc}}^{(\beta )}(\omega \to \omega ^{\prime} )={\rm min} \{1,{{{\rm e}}^{-\beta (h(\omega ^{\prime} )-h(\omega ))}}\}.\end{eqnarray} \tag{ 3 }$

**Figure 2.** Schematic illustration of a Haar–Markov move for an n-particle chain. In this example, where d = 2 and $\chi =1$ , the configuration space is $U{{(2)}^{\otimes n}}$ and can be represented as a set of n points on two-spheres. The set of plain (respectively dotted) arrows represents the current (respectively candidate) configuration.
Download figure:
Standard image High-resolution image

**Figure 2.** Schematic illustration of a Haar–Markov move for an n-particle chain. In this example, where d = 2 and $\chi =1$ , the configuration space is $U{{(2)}^{\otimes n}}$ and can be represented as a set of n points on two-spheres. The set of plain (respectively dotted) arrows represents the current (respectively candidate) configuration.
Download figure:
Standard image High-resolution image

This Markov chain is strongly ergodic and satisfies the celebrated detailed balance condition. It is furthermore defined on a compact configuration space; these conditions suffice to establish convergence (see appendix A for a brief discussion). We also note that the proposed moves are global; they involve each particle of the system.

When β is varied with time t in such a way that ${{{\rm lim} }_{t\to \infty }}\beta (t)=\infty$ , the concatenated Markov chains result in a simulated annealing scheme. Choosing the value of the parameter α is a delicate matter [14]; an appropriate tradeoff has to be found between ambitious and modest moves. We have observed that the Haar–Markov chain is indeed sensitive to this parameter; good performances were obtained when α was tuned so that the average acceptance rate of the proposed moves is about $1/4$ . The Haar–Markov chain is, by construction, sign problem free for any Hamiltonian, since $h(\omega )\in \mathbb{R}$ . We believe this property is important. Of course, we expect the complexity of some difficult Hamiltonians will manifest itself in the form of long convergence times, but there are no ill-defined probabilities here.

A case study: the Ising model

We have tested the Haar–Markov chain on the following Ising Hamiltonian:

$\begin{eqnarray}&&H=-\mathop{\sum }\limits_{k=1}^{n-1}\sigma _{k}^{x}\sigma _{k+1}^{x}-\mathop{\sum }\limits_{k=1}^{n}\sigma _{k}^{z}.\end{eqnarray} \tag{ 4 }$

It is critical in the thermodynamic limit. We have considered a system made of n = 40 particles and have aimed at estimating its ground state energy using an MPS with bond dimension equal to $2,4,8,16$ . A crucial issue is to decide the rate at which β should be increased from one step to the next. Motivated by early results on fast simulated annealing [15], we have been interested in testing the power of linear schedules and have proceeded in two phases. Phase A: the initial inverse temperature was set to ${{\beta }_{0}}=1$ and increased at each iteration by an equal amount $d\beta =0.05$ . When the relative error for the obtained energy is about 10⁻³, this computation is stopped. Phase B: the current optimal solution is further treated at high inverse temperatures (between 10⁵ and 10⁷). In this phase of the algorithm, it is more efficient to propose candidates where changes affect only one (randomly picked) site at a time rather than to make global moves. We make no claim of optimality regarding this particular scheme; possibly more ambitious choices such as, e.g., an exponential schedule, could lead to faster algorithms. Our results are summarized in table 1. The estimates for the ground state energy are comparable to those output by EE.

Table 1. Ground state energies for the Ising model using Haar–Markov chains. The first column records the bond dimension used in the computations. The second column reports the estimate for the ground state energy output by our simulated annealing scheme. The state obtained after simulated annealing is then used as a seed for Markov chains at very small temperatures. The resulting ground state energies are shown in the third column. The fourth column shows the results obtained through EE for the sake of comparison.

χ	${\sf Phase} \;{\sf A}$	${\sf Phase} \;{\sf B}$	${\sf EE}$
2	$-50.354476$	$-50.503292$	$-50.503303$
4	$-50.489549$	$-50.564197$	$-50.567694$
8	$-50.338708$	$-50.559158$	$-50.569413$
16	$-50.354480$	$-50.533931$	$-50.569425$

Two remarks are in order before moving to two dimensions. (i) One of the motivations for the present work is to eventually be able to address Hamiltonians that cannot be analysed with a greedy method. As discussed below, some further work will be needed before we can fully study this possibility. However, we can mention a small step in the right direction. In [10], the authors consider a very simple Hamiltonian $H=-2{{\sum }_{i}}\sigma _{i}^{x}\sigma _{i+1}^{x}-1/2{{\sum }_{i}}\sigma _{i}^{x}$ and show that a greedy method based on local moves, initialized in the state $|\downarrow \downarrow \ldots \downarrow \rangle$ , will remain in that state. (The same is true for EE.) We have checked that, in turn, even when initialized in that state, our method does move away and eventually provide the global optimum $|\uparrow \uparrow \ldots \uparrow \rangle$ . Of course, this issue with greedy methods can be fixed through correct initialisation. But that is precisely the point; some disordered Hamiltonians are such that we do not know a priori how to correctly make this initialisation, which brings us to our second comment. (ii) Just like other simulated annealing schemes used in other contexts, ours should not be viewed so much as a tool to obtain the global optimum, but to provide a seed close enough to it that, when completed with a more greedy method, it yields the correct solution.

Extension to two dimensions

The Haar–Markov chains defined above carry through to 2D systems. Let us consider an ${{n}_{x}}\times \;{{n}_{y}}$ square lattice (OBC assumed). First, a unitary configuration space Ω can again be associated with the set of all PEPS with a fixed bond dimension χ. One simply observes that each row of a PEPS ψ can be viewed as an MPS with a 'physical dimension' $d^{\prime} =d{{\chi }^{2}}$ , after reshuffling the indices of the tensors defining ψ. As a result, the choice of configuration space introduced in one dimension applies to each row of a PEPS and shows there is an onto map from $U{{(d{{\chi }^{3}})}^{\otimes {{n}_{x}}{{n}_{y}}}}$ to the set of OBC PEPS. The next issue is to define a cost function. Unlike the 1D case, the energy of a configuration ω generally cannot be evaluated exactly now. Therefore we should not aim at approximating the energy itself but find some suitable approximation. There is some freedom when making a choice for this replacement. The details and motivations for our own choice, which we will denote $\check{h}(\omega )$ , are given in appendix B. Importantly, $\check{h}$ is by construction a rigorous upper bound on the ground state energy of H for any Hamiltonian. As a result, the instability issue studied in [16] simply never shows up. We again consider distributions of the form given by equation (2), with h replaced with $\check{h}$ ; strong ergodicity and reversibility are still satisfied. Thus, for a sufficiently slow cooling schedule, the algorithm must again converge to a global minimum of $\check{h}$ over Ω. We have tested our scheme in the case of the Ising model:

$\begin{eqnarray}&&H=-{{J}_{x}}\mathop{\sum }\limits_{\langle k,l\rangle }\sigma _{k}^{x}\sigma _{l}^{x}-{{h}_{z}}\mathop{\sum }\limits_{k}\sigma _{k}^{z},\end{eqnarray} \tag{ 5 }$

for which our computations could be compared to a EE [17]. Again, we have proceeded in two steps: (A) finite temperature Markov chain and (B) small temperature (zero here) refinements. Our results, displayed in table 2, are very encouraging. An advantage of the present scheme, when compared to other search methods, is its relative simplicity: it completely skips sophisticated computations of single-site environments [9].

Table 2. PEPS computations for the 2D Ising model ground state energy. A square 7 × 7 lattice was considered. The third column relates to our computations, the fourth to EE. Bond dimension is equal to 2; rank χ_c of matrix product boundary operators is equal to 4.

J_x	h_z	HM	EE
1	$1/2$	$-85.825$	$-85.869(3)$
1	1	$-91.492$	$-91.520(8)$
$1/2$	1	$-57.192$	$-57.377(3)$

Beyond ground states

We now describe some extensions that will be studied elsewhere.

Periodic boundary conditions

Just as we saw in the case of PEPS, a simple reshuffling of indices allows us to regard any periodic boundary condition (PBC) MPS [1] as an OBC MPS with effective physical indices of higher rank $d^{\prime} =d\chi$ for two 'edge' sites. The same algorithm could therefore be used to study systems with boundaries identified.

Low energy eigenstates

Let $\{|{{\phi }_{0}}\rangle ,\ldots ,|{{\phi }_{l-1}}\rangle$ denote MPS approximations for the first l eigenstates of some Hamiltonian H. An approximation scheme for the $(l+1)$ -th eigenstate can be simply constructed from the modified figure of merit:

$\begin{eqnarray*}&&{{h}_{{\rm LS}}}(\omega )=h(\omega )+\lambda \mathop{\sum }\limits_{k\lt l}|\langle {{\phi }_{k}}|\omega \rangle |,\end{eqnarray*}$

where $\lambda \gt 0$ is a Lagrange multiplier whose value can be chosen at our convenience.

High energy eigenstates

Assume we wish to find an approximation to some eigenstate with energy in an interval $[{{e}_{1}},{{e}_{2}}]$ . A possibility could be to minimize the following objective function:

$\begin{eqnarray*}&&{{h}_{{\rm HS}}}(\omega )=\theta (h(\omega )-{{e}_{1}})\theta ({{e}_{2}}-h(\omega ))+\lambda [\langle \omega |{{H}^{2}}|\omega \rangle -{{\langle \omega |H|\omega \rangle }^{2}}],\end{eqnarray*}$

where θ denotes the Heaviside step function and $\lambda \gt 0$ denotes again some Lagrange multiplier.

Spectral decomposition and time evolution

Consider some Hamiltonian H and some MPS $|\phi \rangle$ , and let:

$\begin{eqnarray*}&&|\phi \rangle =\mathop{\sum }\limits_{i}{{c}_{i}}|{{\psi }_{i}}\rangle \end{eqnarray*}$

denote the spectral decomposition of $|\phi \rangle$ in terms of the eigenstates of H. To construct MPS approximations for the states appearing in this decomposition, one could try to optimize the figure of merit:

$\begin{eqnarray*}&&{{h}_{{\rm SD}}}(\omega )=\left| \langle \omega |\phi \rangle \right|+\lambda [\langle \omega |{{H}^{2}}|\omega \rangle -{{\langle \omega |H|\omega \rangle }^{2}}],\ \ \ \lambda \lt 0.\end{eqnarray*}$

When a solution $\omega _{1}^{\star }$ is found, one repeats the optimisation with the state:

$\begin{eqnarray*}&&|\phi \rangle -\langle \omega _{1}^{\star }|\phi \rangle |\omega _{1}^{\star }\rangle ,\end{eqnarray*}$

and so on. Proceeding in this way, à la Gram–Schmidt, one sequentially constructs an approximate spectral decomposition:

$\begin{eqnarray*}&&|\phi ^{\prime} \rangle =\mathop{\sum }\limits_{i}\langle \omega _{i}^{\star }|\phi \rangle |\omega _{i}^{\star }\rangle \end{eqnarray*}$

for $|\phi \rangle$ . Time evolution for ϕ is next approximated as:

$\begin{eqnarray*}&&{{{\rm e}}^{-{\rm i}Ht}}|\phi \rangle =\mathop{\sum }\limits_{i}\langle \omega _{i}^{\star }|\phi \rangle \ {{{\rm e}}^{-{\rm i}\langle \omega _{i}^{\star }|H|\omega _{i}^{\star }\rangle t}}\ |\omega _{i}^{\star }\rangle .\end{eqnarray*}$

The same idea could be tested in the case of PEPS.

Two-dimensional Hamiltonians at finite temperature

Let $\{{{e}_{i}},|{{\psi }_{i}}\rangle \}$ denote the spectral decomposition of some 2D quantum Hamiltonian, and assume we are interested in the mean value of some observable A at finite temperature:

$\begin{eqnarray*}&&\langle A\rangle =\frac{1}{Z}\mathop{\sum }\limits_{i}{{{\rm e}}^{-{{e}_{i}}/T}}\langle {{\psi }_{i}}|A|{{\psi }_{i}}\rangle ,\end{eqnarray*}$

where $Z={{\sum }_{i}}{{{\rm e}}^{-{{e}_{i}}/T}}$ denotes the partition function of the system. Let ${{\alpha }_{1}},\ldots ,{{\alpha }_{M}}$ denote samples representative of the distribution ${{p}_{T}}(i)={{{\rm e}}^{-{{e}_{i}}/T}}/Z$ . $\langle A\rangle$ can be estimated as:

$\begin{eqnarray*}&&\langle A\rangle \simeq \frac{1}{M}\mathop{\sum }\limits_{j=1}^{M}\langle {{\psi }_{{{\alpha }_{j}}}}\mid A\mid {{\psi }_{{{\alpha }_{j}}}}\rangle .\end{eqnarray*}$

The law of large numbers ensures the average error made with this estimate should decrease as $1/\sqrt{M}$ . Yet another Markov chain can be designed to obtain approximations to such representative states. Let κ denote an upper bound on $||H|{{|}_{\infty }}$ . Such an upper bound can generally be constructed easily. Let ϕ denote the current ( ${\rm PEPS}$ ) state of the Markov chain, and let $\check{h}(\phi )$ denote an approximation for $\langle \phi |H|\phi \rangle /\langle \phi |\phi \rangle$ .

Draw a small positive random number $\delta e$ .
Use the method described above to try and construct an approximate eigenstate $\phi ^{\prime}$ of H with approximate energy $\check{h}(\phi ^{\prime} )$ in $[\check{h}(\phi )-\delta e,\check{h}(\phi )+\delta e]\cap [-\kappa ,+\kappa ]$ .
If such a state $\phi ^{\prime}$ can be constructed, the state of the Markov chain is updated to $\phi ^{\prime}$ with Metropolis probability ${\rm min} \{1,{{{\rm e}}^{-(\check{h}(\phi ^{\prime} )-\check{h}(\phi ))/T}}\}$ .

Entanglement renormalisation

We have here limited ourselves to MPS and PEPS, but the analysis evidently carries through to self-similar tensor network states [5]. Configuration spaces of the form we have used are actually very natural in that context, because it is demanded that the tensors defining the state be unitary.

Fermions

The notions of MPS and PEPS have been extended to the case of fermionic degrees of freedom [19]. In particular, the mean value of any operator for a system in a pure fermionic PEPS can be (efficiently) translated into the mean value of a PEPS defined on an associated bosonic system. Our Markov chains therefore carry through to the fermionic case. Since the mean of a (fermionic) Hamiltonian will be by construction a real number, there is still no sign problem.

Biased priors

Conceivably, the convergence of the Markov chain could be significantly boosted if the candidate were not drawn 'isotropically' but according to some well chosen biased prior $\pi _{{\rm pick}}^{(\beta )}(\omega ^{\prime} |\omega )$ . Reversibility could still be satisfied if the Metropolis–Hastings rule were used to decide acceptance [20]:

$\begin{eqnarray}&&P_{{\rm acc}}^{(\beta )}={\rm min} \{1,\frac{{{\pi }^{(\beta )}}(\omega ^{\prime} )}{{{\pi }^{(\beta )}}(\omega )}\times \frac{\pi _{{\rm pick}}^{(\beta )}(\omega |\omega ^{\prime} )}{\pi _{{\rm pick}}^{(\beta )}(\omega ^{\prime} |\omega )}\}.\end{eqnarray} \tag{ 6 }$

Discussion

In summary, we have introduced a stochastic algorithm to search optimal tensor network states and demonstrated its validity in one and in two dimensions. This algorithm is rooted in strongly ergodic reversible Markov chains defined on a connected compact configuration space. Therefore, one can prove that convergence to a global optimum is always possible for some sufficiently slow schedule. A deeper problem would be to bound the mixing time [18] of the Haar–Markov chains at a fixed temperature. Solving this problem would be interesting even in one dimension and for simple Hamiltonians.

The first results we have obtained are encouraging, both for 1D and 2D systems; we believe the class of Markov chains introduced here deserves further study. Our method has turned out to be slower than, say, EE, but not prohibitively so. The simulated annealing scheme presented here involves choices that have to be made in any concrete computation: namely, the rate with which a candidate is accepted and the way the pseudo temperature β is varied from one iteration to the next (the so-called schedule). We have observed the algorithm is very sensitive to these choices (especially the schedule). Besides a detailed analysis of schedules, the algorithm could be improved significantly, thus leading to finer results. For instance, departing from the Haar measure to pick a candidate might boost the global convergence properties; in particular, biased priors in the spirit of the Metropolis–Hastings algorithm are worth considering. Other choices of configuration space might also improve the global performances. In two dimensions, an interesting improvement worth mentioning is the possibility of combining the present algorithm with the tensor renormalization group method [25] instead of the sequential row contraction scheme used here (see appendix A). Another issue to consider is the possibility of abandoning simulated annealing altogether and searching for optimal tensor networks through parallel tempering [22]. Detailed studies in the context of classical statistical mechanics exhibit frustrated systems where the latter method performs better than the former [23]. After these improvements are analysed in a future work, it will be possible to tackle the original motivation of our work: the study of Hamiltonians difficult to treat with ${\rm DMRG}$ or EE, such as [8, 24]. An important clarification should, however, be made regarding the purpose we should pursue. We do not seek to outperform the greedy techniques; these are as fast as can be and provide excellent results in their range of validity. Rather, we believe our heuristics should ideally be used as a preconditioner, supplying a good 'seed' for them when necessary.

An advantage of our scheme is its great flexibility with respect to the objective function. We have seen several contexts where it could find some use, e.g., the investigation of high energy states. Another aspect of the simulated annealing scheme is its ease of implementation; its basic ingredient is a black box that computes the energy of a configuration (or an approximation thereof). This is particularly true in two dimensions, where the avoidance of single site environment computations is a significant simplification.

Acknowledgements

I am grateful to José Ignacio Latorre, David Pérez-García, Mauricio Mariño, Mari-Carmen Bañuls, Roman Orús, Angelo Lucía, Germán Sierra, Luca Tagliacozzo, and Andrew Ferris for discussions and useful comments at various stages of the present work. I also thank Michael Lubasch for having provided me with energy estimates that made the comparisons of table 2 possible. Financial support from the Ramón y Cajal programme (RYC-2009-04318), Spanish Grant FIS2010-16185, and Grup de Recerca Consolidat (Generalitat de Catalunya) 2009 SGR21, is acknowledged.

Appendix A.: Convergence

The Haar–Markov chain is strongly ergodic; regarding $m=O(1/\alpha )$ , consecutive steps of one such chain as a single super-step, one easily sees that the probability for a transition between any pair of configurations $(\omega ,\omega ^{\prime} )$ is strictly positive. Also, the acceptance rule satisfies the celebrated detailed balance condition (reversibility):

$\begin{eqnarray}&&{{\pi }^{(\beta )}}(\omega )P_{{\rm acc}}^{(\beta )}(\omega \to \omega ^{\prime} )={{\pi }^{(\beta )}}(\omega ^{\prime} )P_{{\rm acc}}^{(\beta )}(\omega ^{\prime} \to \omega ).\end{eqnarray} \tag{ A1 }$

The configuration space is, moreover, compact. (It is also connected.) The general theory of the Metropolis algorithm [21] can therefore be invoked to claim that, for any fixed β, any initial probability distribution π₀ is mapped after τ steps to a distribution ${{\pi }_{\tau }}$ , which is exponentially close (in total variation distance) to the target distribution:

$\begin{eqnarray*}&&\parallel {{\pi }_{\tau }}-{{\pi }^{(\beta )}}{{\parallel }_{{\rm TV}}}\leqslant 2\eta {{\left( \beta \right)}^{\tau }}.\end{eqnarray*}$

The constant $\eta (\beta )$ is guaranteed to be strictly lower than 1. Thus $||{{\pi }_{\tau }}-{{\pi }^{(\beta )}}|{{|}_{{\rm TV}}}$ can be made lower than any constant $\delta \gt 0$ in $O({\rm ln} (1/\delta ))$ steps. Finding a useful upper bound for $\eta (\beta )$ is, however, a difficult problem in general [18]. As stated in the introduction, it has recently been proven that for 1D gapped Hamiltonians and MPS, the OSP has polynomial complexity. In the present context, bounds on $\eta (\beta )$ combined with simulated annealing could lead to an alternative proof of that result, together with a very simple approximation scheme.

Appendix B.: Extension to two dimensions: addendum

We start with some general considerations regarding PEPS for an ${{n}_{x}}\times {{n}_{y}}$ square lattice system with open boundary conditions. The mean value of any observable Y for a PEPS ψ can be expressed as:

$\begin{eqnarray}&&\langle \psi |Y|\psi \rangle =\langle {{\partial }_{{\rm top}}}|\mathop{\prod }\limits_{k=1}^{{{n}_{y}}-2}{{\mathcal{T}}_{k}}|{{\partial }_{{\rm bot}}}\rangle ,\end{eqnarray} \tag{ B1 }$

where the boundary 'states' $|{{\partial }_{{\rm top}}}\rangle ,|{{\partial }_{{\rm bot}}}\rangle$ are MPS, and the transfer operators ${{\mathcal{T}}_{k}}$ are matrix product operators [9], each with a fixed bond dimension. The rhs can generally not be evaluated exactly in a time that grows polynomially with ${{n}_{x}},{{n}_{y}}$ and the bond dimension. It is thus common to evaluate the rhs of equation (B1) approximately as follows:

1.
A 'bottom' edge MPS $|\partial {{^{\prime} }_{{\rm bot}}}(0)\rangle =|{{\partial }_{{\rm bot}}}\rangle$ is initialized.
2.
$\forall k=1\ldots {{n}_{y}}-2$ , an MPS $|\partial {{^{\prime} }_{{\rm bot}}}(k)\rangle$ with fixed bond dimension χ_c, approximating ${{\mathcal{T}}_{{{n}_{y}}-k-1}}|\partial _{{\rm bot}}^{\prime }(k-1)\rangle$ , is constructed.
3.
The amplitude $\langle {{\partial }_{{\rm top}}}|\partial _{{\rm bot}}^{\prime }({{n}_{y}}-2)\rangle$ is eventually proposed as an estimate for $\langle \psi |Y|\psi \rangle$ .

We have not proceeded differently. The following lemma, which is an easy corollary of the triangular and the Cauchy–Schwarz inequalities, provides a bound on the error resulting from this approximate contraction scheme.

Lemma B.1. Let $\mathbb{V}$ denote a finite-dimensional vector space. For any finite sequence of elements $\{{{v}_{1}},\ldots {{v}_{m}}\}\subset \mathbb{V}$ ,

$\begin{eqnarray}&&\parallel {{v}_{1}}-{{v}_{m}}\parallel \leqslant \mathop{\sum }\limits_{j=1}^{m-1}\parallel {{v}_{j}}-{{v}_{j+1}}\parallel .\end{eqnarray} \tag{ B2 }$

Moreover, $\forall u,v,u^{\prime} ,v^{\prime} \in \mathbb{V}$ ,

$\begin{eqnarray}&&|(u^{\prime} ,v^{\prime} )-(u,v)|\leqslant ||u^{\prime} ||\cdot ||v-v^{\prime} ||+||u-u^{\prime} ||\cdot (||v^{\prime} ||+||v-v^{\prime} ||).\end{eqnarray} \tag{ B3 }$

The inequality (B2) allows us to sequentially compute an estimate for how much $|\partial {{^{\prime} }_{{\rm bot}}}({{n}_{y}}-2)\rangle$ and $\prod _{k=1}^{{{n}_{y}}-2}{{\mathcal{T}}_{k}}|{{\partial }_{{\rm bot}}}\rangle$ differ, while the inequality (B3) allows us to use this estimate to bound the error on $\langle {{\partial }_{{\rm top}}}|\prod _{k=1}^{{{n}_{y}}-2}{{\mathcal{T}}_{k}}|{{\partial }_{{\rm bot}}}\rangle$ . These considerations are relevant to our Monte Carlo scheme when applied to PEPS. When dealing with a concrete Hamiltonian H and a configuration $\omega \in \Omega$ , they provide an estimate $h^{\prime} (\omega )$ for $\langle \omega |H|\omega \rangle ,$ as well as an upper bound ${{\delta }_{h}}(\omega )$ on the error. Similarly, an estimate $n^{\prime} (\omega )$ for the square norm $\langle \omega |\omega \rangle$ , together with an upper bound ${{\delta }_{n}}(\omega )$ on the norm error, can be computed. The presence of these errors ${{\delta }_{h}}(\omega )$ and ${{\delta }_{n}}(\omega )$ is a major distinction between MPS and PEPS; the following definition is introduced to guarantee they are properly taken into account.

Definition B.2. A PEPS configuration ω is pointless at χ_c if ${{\delta }_{h}}(\omega )\geqslant |h^{\prime} (\omega )|$ , ${{\delta }_{n}}(\omega )\geqslant n^{\prime} (\omega )$ , or both.

We are now in a position to write down the objective function we have used in our concrete computations:

$\begin{eqnarray}&&\check{h}(\omega )=\left\{ \begin{array}{ccccccccccccccc} +\infty \;{\rm if}\;\omega \;{\rm is}\;{\rm pointless}, \\ \quad \mathop{{\rm max} }\limits_{a,b\in \left\{ 0,1 \right\}}\left\{ \frac{{{h}^{\prime }}(\omega )+{{\left( - \right)}^{a}}{{\delta }_{h}}(\omega )}{{{n}^{\prime }}(\omega )+{{\left( - \right)}^{b}}{{\delta }_{n}}(\omega )} \right\}\;{\rm else}. \\ \end{array} \right.\end{eqnarray} \tag{ B4 }$

The rationale behind this choice is most easily explained with a comparison between two extreme cases. Given a Hamiltonian H, and having to choose between an exact description of its ground state in terms of some PEPS ψ₀, which cannot be contracted reliably, and an approximate description $|{{\phi }_{0}}\rangle$ we can work with, the latter option should be preferred.

Plugging $\check{h}(\omega )$ into the Boltzmann weights (2) guarantees that the random approximation scheme will provide rigorous upper bounds on the ground state energy of H. The use of $\check{h}(\omega )$ effectively steers Markov–chains away from pointless PEPS. As a result, none of the instability issues addressed in [16] has to be faced.

Simulated annealing for tensor network states

Article metrics

Author e-mails

Author affiliations

Dates

Abstract

Introduction

The Haar–Markov chain

A case study: the Ising model

Extension to two dimensions

Beyond ground states

Periodic boundary conditions

Low energy eigenstates

High energy eigenstates

Spectral decomposition and time evolution

Two-dimensional Hamiltonians at finite temperature

Entanglement renormalisation

Fermions

Biased priors

Discussion

Acknowledgements

Appendix A.: Convergence

Appendix B.: Extension to two dimensions: addendum

Footnotes

Simulated annealing for tensor network states

Article metrics

Share this article

Author e-mails

Author affiliations

Dates

Abstract

Introduction

The Haar–Markov chain

A case study: the Ising model

Extension to two dimensions

Beyond ground states

Periodic boundary conditions

Low energy eigenstates

High energy eigenstates

Spectral decomposition and time evolution

Two-dimensional Hamiltonians at finite temperature

Entanglement renormalisation

Fermions

Biased priors

Discussion

Acknowledgements

Appendix A.: Convergence

Appendix B.: Extension to two dimensions: addendum

Footnotes