Symmetric tensor networks for generative modeling and constrained combinatorial optimization

Constrained combinatorial optimization problems abound in industry, from portfolio optimization to logistics. One of the major roadblocks in solving these problems is the presence of non-trivial hard constraints which limit the valid search space. In some heuristic solvers, these are typically addressed by introducing certain Lagrange multipliers in the cost function, by relaxing them in some way, or worse yet, by generating many samples and only keeping valid ones, which leads to very expensive and inefficient searches. In this work, we encode arbitrary integer-valued equality constraints of the form Ax⃗=b⃗ , directly into U(1) symmetric tensor networks (TNs) and leverage their applicability as quantum-inspired generative models to assist in the search of solutions to combinatorial optimization problems within the generator-enhanced optimization framework of Alcazar et al (2021 arXiv:2101.06250). This allows us to exploit the generalization capabilities of TN generative models while constraining them so that they only output valid samples. Our constrained TN generative model efficiently captures the constraints by reducing number of parameters and computational costs. We find that at tasks with constraints given by arbitrary equalities, symmetric matrix product states outperform their standard unconstrained counterparts at finding novel and better solutions to combinatorial optimization problems.


I. INTRODUCTION
Tensor networks (TNs) have emerged in the past few years as a very powerful tool to address challenging problems in high-dimensional spaces, such as quantum many-body physics [2], electronic structure of chemical compounds [3], supervised learning [4], generative modeling [5], anomaly detection [6], combinatorial optimization [7][8][9], and privacy leaking [10], to name a few.The reason for their wide applicability lies in part for their well-understood mathematical structure and well-controlled expressibility.
Just as artificial neural networks (ANNs) are universal approximators of generic distributions [11], TNs can in principle model any (discrete) distribution with enough computational resources [12].While ANNs are inspired by the functioning of the brain (in particular their expressibility determined by the number of activation functions, mimicking the firing of neurons in the brain), themselves nonlinear functions of their input, TNs are inspired by quantum mechanics, arguably a theory with richer mathematical structure than that of ANNs.For instance, the Hilbert space structure of quantum mechanics allows to exploit global internal symmetries in quantum states through the use of representation theory.This can have important consequences when we aim to efficiently express probability distributions with redundancies that appear as a result of such symmetries.
It may appear at first sight that exploiting internal symmetries such as U (1) or SU (2) would bear little importance in problems outside physics.In this work, we show that by exploiting U (1) symmetry in TN states it is possible to encode arbitrary integer-valued equalities of the form A x = b in a probabilistic model, extending the family of so-called tensor network Born machine (TNBM) models [5].Therefore, we refer to our new family of symmetric quantum-inspired TN generative models as s-TNBMs.
This type of equalities appear in many combinatorial optimization problems with hard constraints, such as in portfolio optimization, warehouse location, scheduling, transportation, to name a few [13,14].Current state-of-the-art methods at dealing with combinatorial optimization problems with hardconstraints, such as MIXED-INTEGER LINEAR PROGRAM-MING (MILP) solvers [15,16], simulated annealing solvers [17], or more recently, Graph Neural Networks (GNNs) [18][19][20] and fully quantum models such as the Quantum Approximate Optimization Algorithm (QAOA) [21], suffer from at least one of the following two drawbacks: they must explicitly break the integrality constraint to then project back to the nearest integer solution (as done in standard MILP solvers relying on the branch-and-bound method [14]), and/or they only work for cost functions expressed as polynomials in the binary variables, with linear or quadratic being the most common ones.Here, we focus on a recent proposal, the generator-enhanced optimization (GEO) framework [1], which bypasses both of these limitations.
GEO solvers leverage classical or quantum generative models to assist the solution of combinatorial optimization problems, by learning the correlations from observations (i.e., bitstrings and their corresponding costs) and proposing new bitstring solution candidates which resemble those in the lowest cost sector.This is done by reweighting the seen information, which is set as the dataset to train the generative model.This reweighting helps to give more importance to those samples in the training dataset with the lowest cost (see Sec. II for more details).It was shown in Ref. [1] the power of quantuminspired generative models based on TNBMs within GEO, matching the performance of state-of-the-art metaheuristics tailored in the last 30 years for a specific variant of the port-folio optimization problem.These remarkable results were obtained without imposing any bias in the TNBM model to enforce the cardinality constrains (i.e., fixed desired number k of assets in the portfolio out of N ) of the combinatorial problems.
In this work, we exploit the rather simple structure of TNs, in particular their linearity and local connectivity, to encode arbitrary equality constraints, cardinality being a specific realization of these, for the purpose of both generative modeling and combinatorial optimization.We do so by mapping the problem of finding solutions satisfying the constraints A x = b to that of finding the set of quantum numbers (QNs) appearing in symmetric TNs.In this context, QNs effectively correspond to labels of different irreducible representations, in this case of U (1), defined on each local Hilbert space where each tensor is supported.
For equality constraints with high-degree of symmetry, where the coefficients in matrix A have low variance, we find that symmetric TNs correspond to an efficient encoding of the valid space.For a subset of these, such as cardinality type constraints appearing in problems such as portfolio optimization [1], one can write down explicitly the TN ansatz for the entire valid space, which can be used as an initial ansatz for finding optimal solutions.For arbitrary integer-valued equalities finding the solution space is likely #P-complete (since its associated decision problem, (0,1)-INTEGER PROGRAM-MING, is already NP-complete [22]), but we provide a novel message-passing algorithm with QNs acting as messages, that is able to encode a large fraction of these solutions within a few QNs, thereby allowing to generate many unseen solutions satisfying A x = b when given access to some (valid) training data, even when the entries of A and b are random integers.At the same time, by exploiting the dimensionality of each QN, the resulting TN ansatz is able to capture not only the constraints efficiently, but also biases in the dataset, therefore favoring certain solutions over others.
Moreover, by constraining the TN so as to only sample within the valid subspace, we guarantee computational savings, in both memory usage by exploiting block-sparsity of the tensors as well as computational time by guaranteeing that only the needed tensor components get updated.Beyond the ability to impose arbitary equality constraints, TNs enjoy sophisticated optimization algorithms, such as the Density Matrix Renormalization Group (DMRG) algorithm [23,24] for Matrix Product States (MPS), as well as perfect sampling [25] (which avoid long autocorrelation times typical of Monte Carlo based sampling algorithms).
While we envision other uses of constrained TNs as efficient representations of constrained datasets in various contexts, we find that some of its most crucial advantages are captured within the GEO framework of Ref. [1].This is because GEO exploits the presence of existing training data to generate new and better solutions to combinatorial optimization problems by capturing biases in the training set (corresponding to bitstrings of different costs), which our constrained TN approach is capable of incorporating when given data subject to equality constraints.
The structure of the paper is as follows.In Sec.II we state the problem we are interested in, namely combinatorial optimization problems subject to equality constraints, and describe our approach to the problem using a generative model based on tensor networks.In Sec.III we give a brief introduction to symmetric TNs placing emphasis on symmetric MPS, a special TN structure which will be used in our numerical experiments.In Sec.IV we explain the main encoding and generative algorithms.In Sec.V we present various results showcasing the types of canonical problems symmetric MPS can be applied to and compare its performance against vanilla versions (i.e.without imposing any constraint on them).In Sec.VI we give strong arguments in favor of using TNs for describing symmetric states over ANN architectures, implying in particular the efficiency of symmetric TNs at capturing arbitrary equality constraints when restricted to U (1) symmetric states.Sec.VII presents our conclusion, and Sec.VIII gives an outlook for future work.

II. PROBLEM STATEMENT
In combinatorial optimization problems, one is interested in finding the minimum of a cost function over a specific set of instances, called valid (or feasible) space.Often the set of valid instances grows exponentially with the problem size (such as in the presence of cardinality constraints), which prevents listing exhaustively all possible candidates.Although not all combinatorial problems have explicit constraints, the class of real-world problems that we focus on here are those which can be written as: min C( x), where C : {0, 1} N → R, x ∈ {0, 1} N , and A ∈ Z m×N , b ∈ Z m .(Some of these conditions may be relaxed in principle and our construction still applies; for instance, we may convert each integer coefficient to a rational and viceversa via a rescaling of all coefficients in each equality.Similarly, we may work with general integer-valued variables instead of just binary ones by accordingly increasing the dimension of each site vector space in the TN as we will describe in the next section).We will denote by S the valid or solution space, i.e. the set of all bitstrings satisfying the equality constraints.While not all hard constraints are of the form appearing in Eq. 1 [26] our focus is on equality type constraints because these are the ones that best exploit the TN ansatz used in this work.When the cost function C is linear, the above statement reduces to the standard form of (0,1)-INTEGER LINEAR PRO-GRAMMING (see Appendix A Sec.A of the Appendix for an introduction into this class of problems), and when C is at most quadratic, the corresponding problem can be recasted to a QUADRATIC UNCONSTRAINED BINARY OPTI-MIZATION (QUBO) with penalty terms enforcing the equality constraints [27].In the GEO framework used here, the form of C( x) can be arbitrary, i.e., GEO is a black-box solver.While many powerful methods exist for dealing with the minimization part of the algorithm such as branch-and-bound based methods for linear cost functions, and simulated annealing for problems with arbitrary cost functions, a major roadblock in many of these solvers is dealing precisely with equality constraints (hard constraints) that restrict the valid space, and the standard approaches taken are to relax both the integrality and equality constraints as in branch-and-bound based solvers [22], to impose such constraints via introducing Lagrange multipliers directly into the cost function as in most QUBO solvers [27] (which for many equalities results in an overparameterized cost function), or worse yet, to generate many samples only to keep those that meet the constraints.
Inspired by current generative modeling approaches to the task of combinatorial optimization such as the ones from Refs.[1,28,29], we leverage the generalization capabilities of Tensor Network Born Machines (TNBMs) to address the optimization task in Eq. ( 1) while constraining them so that they only output valid samples.Crucially, we do not make any assumption on the nature of the cost function (hence our approach is not limited to linear/quadratic cost functions).Our approach builds on top of the generator-enhanced optimization (GEO) approach of Ref. [1] when subject to arbitrary integer-valued equality constraints.Specifically, denoting constrained-GEO the constrained extension of GEO, the problem statement we are interested in can be phrased as follows

CONSTRAINED-GEO
Here, L is the training loss function in the form of the Cross-Entropy or Negative Log Likelihood (NLL), P( x) is the model distribution represented as a TNBM, i.e.P( x) = |Ψ( x)| 2 /Z with Ψ( x) expressed as a TN and Z = x∈{0,1} N |Ψ( x)| 2 , and p T ( x) is the training distribution given by the softmax where M = x∈T e −C( x)/T is a normalization factor, and the cost function C( x) is in principle arbitrary (not restricted to polynomial cost functions) and T is a Lagrange multiplier favoring lower cost samples as T → 0. The motivation for the use of symmetric MPS (and in general, symmetric TNs) is that, as we shall show next, they permit to incorporate equality constraints directly into a generative model given in terms of a symmetric MPS, where tensors acquire a block sparse form.In Fig. 1 we schematically describe our proposed constrained-GEO framework for solving arbitrary combinatorial optimization problems subject to equality constraints which we will describe in more depth in Sec.IV.

III. SYMMETRIC TENSOR NETWORKS
In this section we give a brief introduction to U (1) symmetric MPS (for more details we refer to the Appendix B).Our motivation here is to set up the notation and terminology that will be used in the following sections as well as to showcase how symmetric MPS allows to encode arbitrary integervalued equality constraints.

A. Vanilla and U (1) Symmetric Matrix Product States
Tensor networks (TNs) are a special ansatz to represent high dimensional data, such as quantum states, based on the contraction of tensors defined on networks.The tensors, whose entries are tunable parameters, are located at the sites of the network, while the links between tensors capture the correlations among different sites and represent the amount of entanglement shared among them.It has been shown that many states of physical relevance most prominently those obeying the so-called area law of entanglement [30,31], can be efficiently represented by such ansatz.Among the different classes of TNs, MPS corresponding to one-dimensional TNs, has found the widest application, in great part due to its low connectivity as well as the presence of a canonical form, which permits very optimized updating and contracting algorithms [24,32].A perspective that is useful when introducing symmetries in a tensor network is to view each tensor as a linear map from some input vector space V in to output vector space V out .For instance, a 3-rank tensor in an MPS may be of the form |α = a,β T a αβ |a, β where |α ∈ V in and |a, β ∈ V out .Here we make a distinction between upper indices appearing in each tensor component, referring to them as site (or physical) indices (indices that are not contracted in the TN), and lower indices corresponding to link (or virtual) indices (indices that are contracted and that control the expressibility of the TN ansatz).The dimension of each link's vector space is determined by the bond dimension χ, controlling the amount of entanglement shared across nearest neighbors.Its size determines in turn the expressibility of the resulting MPS ansatz.One of the benefits of TNs is that one can reason with them solely based on their diagrammatic representation.For the tensor above, this is represented as < l a t e x i t s h a 1 _ b a s e 6 4 = " D I u M U h W D u 1 X i 2 6 e F v g 9 z m 3 + 6 P y , where the direction of the arrows reflect whether they refer to incoming/outgoing states.
TNs not only permit to construct states based on their entanglement structure, but also allow to exploit global internal symmetries such as U (1) and SU (2), widely present in many problems of physical relevance.This essentially constrains the form of each tensor so as to fulfill this global requirement.The simplest kind of symmetric TN is the U (1) symmetric MPS.An arbitrary vector |v ∈ V can be expanded as a linear combination of |n, t n ∈ V n vectors of fixed QN or charge n, and degeneracy label t n = 1, • • • , d n , where d n is the degeneracy of charge n.By going to this basis of welldefined charge, a symmetric MPS can be written in terms of constrained tensors of the form [33,34] T a α,β = (T na nα,n β ) where where I (O) denotes the set of incoming (outgoing) indices, and n denotes the flux of the tensor, a label carrying the charge of the tensor.When using the canonical form (as will be the case throughout in this work), a symmetric MPS is usually chosen so that the flux of the entire MPS is only carried by one of the tensors (instead of spread across various tensors), the canonical center (also known as orthogonality center), while the remaining ones have flux zero (n = 0 in Eq. 3).Thus, by exploiting global U (1) symmetry we see that each tensor in the MPS factorizes into the product of two tensors.The first one, (T na nα,n β ) ta tα,t β , gives the components of tensor T w.r.t. the degeneracy vectors {|t n }, for that reason this tensor goes by the name of degeneracy tensor [33].The second tensor, δ Nin,Nout depends solely on the conservation of U (1) QNs, and it is referred to as structural tensor (for an arbitrary symmetry group G the structural tensor can take a rather nontrivial form) [33].For fixed physical QN (fixed n a appearing in Eq. ( 3)), the resulting matrix T a α,β will be block diagonal, with Charge conservation imposes a block-sparse structure in the tensors appearing in the MPS.Example shown for tensor T [3] .
blocks of order d nα × d n β .These blocks are sometimes referred to as quantum number (QN) blocks.In other words, charge conservation constrains the form of the tensors to become block sparse, leading to a more efficient representation of the MPS.

B. Constructing U (1) symmetric MPS
Having discussed the formalism of U (1) symmetric MPS, we now explain how to explicitly construct a symmetric MPS that only generates bitstrings fulfilling different equality constraints of the form A x = b, with x ∈ {0, 1} N , A ∈ Z m×N , and b ∈ Z m .Our construction relies on identifying the site (physical) charges as the entries of the A matrix together with the empty charge ∅ (i.e. the charge 0 in the case of integer valued scalar charges), and the total charge of the MPS (i.e. the flux) as the entries of b.The remaining task consists in determining all link charges consistent with global charge conservation.This will automatically give us the components of each structural tensor (the degeneracy tensor can always be trivially constructed by augmenting the dimension of each QN block).We remark, however, that finding the set of all possible configurations of link charges that add up to flux b is likely #P-hard, since its associated decision problem, (MULTIPLE) SUBSET SUM, is at least NP-hard (the variant with A and b being one dimensional and having non-negative integer entries is for instance NP-complete [35]).Therefore, in general we will not be interested in finding all such charges for a given A, b, but rather in finding a way to incorporate them into the MPS once these are found (finding all charges for small-sized problems may be done via exhaustive search, random search or dynamic programming).

I. Cardinality constraint -an exactly solvable case.
The original motivation behind the construction of symmetric MPS arose from capturing the conservation law of total spin Ŝz along the z direction in spin chains (or alter-natively, the conservation of particle number in systems of spinless bosons and fermions).These two types of conservation laws can be viewed as cardinality constraints of the form i x i = κ, where x i ∈ {0, 1}, and κ ∈ N 0 s.t.0 ≤ κ ≤ N .We shall focus on the case κ ≤ N/2 (assuming N even) since once we find a way to generate bitstrings fulfilling i x i = κ ≤ N/2 we can simply flip all bits so as to generate Given the simplicity of the constraint, we can construct the set of all charges explicitly.For concreteness, we select the canonical center of the MPS to be located at the first site, where we insert the flux of charge κ; see Fig. 2. The construction of the MPS is accomplished once we determine the charges at the link indices.To distinguish between site charges and link charges we will denote the former via n i with i = 1, • • • , N , and the link charges as for the link charge immediately to the right of charge n i .As mentioned at the beginning of this subsection, the site charges will be given by the coefficients of the A matrix, together with the ∅ charge.For cardinality, n i ∈ {∅, 1}.
The construction of the link charges requires solving the set of equations n i+1,R = n i,R − n i+1 with the boundary terms (5) Crucially, while the solution space is given by the binomial coefficient |S| = N κ (which in particular for κ ≈ N/2 and N 1 grows as |S| ∼ O(e N / √ N )), the number of parameters in the symmetric MPS ansatz grows as O(N (κ + 1)).Thus, the symmetric MPS ansatz is an efficient encoding of the solution space.

II. Arbitrary single equality.
Next we consider an equality of the type i a i x i = b with a i , b ∈ Z, which appears for instance in SUBSET SUM-type problems.In this setting, we can still impose this constraint into an MPS, which is accomplished by first setting the local site charges as n i ∈ {∅, a i }.As opposed to the cardinality constraint above, we cannot a priori determine the set of link charges for a general equality.A standard approach for this problem would be to use backtracking: construct all possible link charges solving n i,R = n i+1 + n i+1,R in a forward sweep, and discard those that are not consistent with charge conservation in a backward sweep.In practice, one can do better than this and apply a meet-in-the-middle technique [36] whereby we locate the canonical center near the middle of the MPS and enumerate exhaustively all configurations of charges to the left and to the right, as well as their corresponding sums.Using a binary search tree on top to match configurations of charges to the left and to the right adding up to the total charge of interest brings down the time complexity from O(2 N ) to O(N 2 N/2 ).Alternatively, when the coefficients a i , b are non-negative one may use a standard dynamic programming approach with pseudopolynomial runtime O(N b) [37].

III. Arbitrary set of equalities. The most general case of arbitrary number of equalities of the form
is just a small extension of the previous case.Now, each site charge is instead given by a vector of length m as , while the flux is given by the vector b.In Appendix C we consider AS-SIGNMENT-type equalities which appear also in many combinatorial optimization problems and that can be encoded in an MPS of bond dimension scaling at worst exponentially with the number of equalities (i.e.these do not scale with system size N ).

IV. CONSTRAINED-GEO
Our framework is divided into three stages (see Fig. 1): 1) Initialization, 2) Training, and 3) Sampling, followed by Evaluation of the generalization performance.First, we construct the structural tensors based on the training data, or, if an efficient MPS representation of the valid space is available (such as in cardinality type constraints), we may also generate training data uniformly from it.Second, we minimize the training loss function via the DMRG-inspired algorithm put forward in Ref. [5].After a few sweeps we sample from the resulting MPS and compute the costs of the samples, and retrain the model with those samples together with the initial training dataset, as we detail next.

A. Initialization: Constrained Embedding
The first step consists in selecting all unique bitstrings from the dataset.Next we construct the set of link charges by first fixing the position of the canonical center as well as that of the flux.For the purposes of illustration we set both to be at the last site as shown in Fig. 3 a).The determination of all link charges for each bitstring is performed in O(N m) steps by solving recursively n i+1,R = x i+1 A i+1 + n i,R starting from either end of the MPS.We store in memory each local configuration of charges i → (n i,R , n i+1 , n i+1,R )), and move on to the next bitstring.We repeat the same procedure for the next bitstring in the set, and obtain a new configuration of charges.Once we loop over all distinct bitstrings from the training set we are ready to set up the structural tensors by reading off the configuration of charges.At this step, there are two alternative ways of proceeding forward, as illustrated in Fig. 3 b).The first one is to directly map each configuration of charges (n i,R , n i+1 , n i+1,R ) to a nonzero component of the structural tensor.Together with the finding of charges, this preprocessing step takes O(N m|T |) steps when using a hashtable for storing the configuration of charges.The second approach is to only store the configuration of all link charges consistent with the training data plus constraints, i → (n i,R , n i+1,R ), and subsequently determine all configurations of charges (n i,R , n i+1 , n i+1,R ) consistent  with the flux of each tensor.In contrast with the previous method, this preprocessing step scales as O(dN m|T |) when using a hashtable, where d is the dimension of each site, which for our purposes will be 2 given that we are dealing with bitstrings.Thus we see that this latter method takes in general twice as much time as the first one.The benefit nevertheless from using this latter method is that this permits us to find more configuration of charges, which in turn will lead to better generalization, i.e. more (unique) bitstrings can be generated from the second method than with the first one.
To illustrate these two methods we consider the cardinality dataset formed by bistrings fulfilling i x i = κ.Focusing on bitstrings of length N = 6 and cardinality κ = 3 we choose the training data T = {111000, 101010, 010101, 000111}.The resulting MPS using the first method will generate 6 new bitstrings (10 in total), while the second method will generate the entire solution space of 6  3 = 20 bitstrings.The extraction of link charges from data is illustrated in Fig. 4.There we show all possible quantum numbers for each link index, n i,R , consistent with the cardinality constraint.In total, there are 6 3 = 20 bitstrings.Out of them, only 4 are needed to extract all link charges.For arbitrary cardinality constraint κ and bitstring length N , it can be shown that only min(κ, N − κ) + 1 bitstrings, are needed to exactly construct the fixed-cardinality MPS using the second method.In our work we shall use the second method when initializing the MPS.
We remark that the structural tensor crucially depends on the location where we insert the flux, which corresponds to the location of the canonical center.This is important to bear in mind because the training procedure will move the canonical center when optimizing each local tensor as we shall describe next (however the most costly step is setting up the structural tensors at the beginning, as updates during training in TNs are local, which will only change the link charges around each site, and these charge updates can be determined efficiently).
Lastly, we note that encoding arbitrary number of equality constraints (without training data) into a TN may lead to an exponential scaling of the bond dimension (see Sec. C of the Appendix for the particular case of ASSIGNMENT-type equalities).The present approach, guarantees that, in the worst case scenario, the bond dimension be lower bounded by the number of bitstrings in the training set, which allows for better scalability of the algorithm.

B. Training
Once we have set up the structural tensors deterministically, either via the initialization step described in Sec.IV A or by exact construction of the MPS, we can proceed with the training step.The process is an extension of the Density Matrix Renormalization Group (DMRG) inspired algorithm described in Ref. [5] in the presence of equality constraints.The training process aims to minimize the loss function where , and p T is a softmax surrogate to the cost function, and is given as in Eq. ( 2).Since the Born machine |Ψ( x)| 2 is constructed out of a symmetric MPS Ψ, we shall henceforth refer to this as symmetric-TNBM (s-TNBM).The various steps involved in the training are summarized in Fig. 5 and in more detail consist of the following: 1 For fixed location of the canonical center, construct the corresponding s-TNBM for a given bitstring x ∈ T by fixing the site charges to x i A i .Merge the two neighboring tensors at the canonical center.In TN notation: , where the link charges are constrained by charge conservation as Because of charge conservation, the merged tensor will preserve block-sparsity.
2 Compute gradient of NLL as where . The term Ψ ( x) forces the site charges n i , n i+1 to take the values , where α is the learning rate. .

The resulting spectrum
Λ is truncated, keeping the χ greatest singular values.This results in a compressed MPS.We reshape the tensors back to their original shapes by applying the inverse of the reshape , where we have identified ΛV † .Note the canonical center has shifted to the right at this point.
We repeat these four steps at every site of the MPS when going left-to-right, and an analogous series of steps when going right-to-left except that now the canonical center moves one site to the left after every update.
The main advantage of using symmetric MPS for tasks constrained by equalities is that most of the linear algebra operations such as merging tensors and factorizing just described can be carried out block-wise, with each block corresponding to a QN block.This leads to a more efficient optimization procedure than when using dense (unconstrained) tensors.
To showcase the benefits of using symmetric MPS in practice we consider a cardinality dataset with N = 50, κ = 25 and |T | = 1000 training samples.Our benchmark experiments were carried out on a Macbook Pro with a 1.4 GHz Quad-Core Intel Core i5, using 4 threads and on Julia and the ITensors.jlpackage for smart index tensor network contractions [38].The results showcased in Fig. 6 indicate that a considerable advantage is obtained in using symmetric MPS over vanilla MPS in terms of computational resources.We report memory allocations as well as the wall-time employed during the entire training algorithm (including computing the gradients, replacing the tensors by the gradients, factorizing the resulting tensors, as well as computing the NLL after each sweep) averaged over 3 sweeps.In both cases we see the substantial savings on computing time and space memory: while for vanilla MPS these resources scale polynomially as we increase the bond dimension χ, for symmetric MPS the increase in resources is barely present, as a result of a total of 21 QN blocks at the middle link.

C. Sampling
Drawing samples from ANNs can be challenging, especially when using Markov Chain Monte Carlo (MCMC) sampling based algorithms due to very long autocorrelation times that limit their efficiency.
In contrast, TNs enjoy the property of perfect sampling [25].The bit-wise sampling of MPS is done without rejections and sequentially as one moves from one end of the MPS to the other, consisting on computing a series of single-site conditional density matrices and conditional single-site probabilities.The resulting sampling algorithm avoids long autocorrelation times typical of MCMC sampling-based algorithms that are needed to extract reliable samples.Since there is no correlation between samples, it is straight-forward to parallelize the sampling process by, for instance, making use of multithreading.
Once samples are drawn, we evaluate the generalization performance of the algorithm by computing the costs of the bitstrings.The performance of the generative model will be measured according to 1) the quantity of new, valid samples (i.e.bitstrings not contained in the dataset, but that still fulfill the constraints), and 2) the quality of new samples as measured by the cost (higher-quality samples corresponding to lower-cost samples).These samples may be merged with the

V. RESULTS AND DISCUSSION
To test the performance of s-TNBMs we consider the following tasks.In the first part, Sec.V A, we will analyze how well symmetric MPS are able to generate new valid solutions to A x = b from a given set of known solutions.This will test the ability of the structural tensors at capturing QNs that are shared among different bitstrings, which in turn will lead to generalization.In the second part, we will endow QN blocks with a degeneracy factor, taking full advantage of both QNs and their dimensions, so as to model arbitrary probability distributions (not just uniform on the space of solutions) subject to arbitrary equality constraints.We will test this on two tasks: first, in Sec.V B, we shall consider the training dataset to be sampled from the ground state of a well-known Hamiltonian in condensed matter physics: the one dimensional Transverse Field Ising (TFI) model [39], subject to the constraint that only bitstrings of fixed cardinality are valid.Next, in Sec.V C, we will test the ability of s-TNBMs at finding new, high-quality solutions to combinatorial optimization problems subject to equality constraints by mapping this problem to a generative modeling task, as briefly discussed in Sec.II, and detailed in Sec.IV.

A. Validity-based generalization: Entanglement assisted generation of solutions to arbitrary equality constraints
Evaluating the generalization performance of unsupervised generative models is a challenging task.For the following experiments, a candidate generalization metric should favor valid samples that are not only new, but unique as well.Our focus will be thus to measure variants of the set of new, and unique solutions, denoted as |g sol | (see Ref. [40] where this and other various metrics were introduced).
In the next experiments we shall be interested in generating new solutions to a system of linear constraints of the form A x = b, where x ∈ {0, 1} N , A ∈ Z m×N , b ∈ Z m from a set of known solutions.In our first experiment we consider a single equation of the form i α i x i = b, where α i and b are random integers uniformly distributed in the range [−r max , r max ].We start with 100 unique known solutions to this equation, which we shall also call seeds (found e.g. by random search, or SUBSET SUM solvers when possible) and the goal is to generate as many solutions as possible.Since there is no bias in the training dataset (i.e.all solution bitstrings carry the same probability), we do not need to carry any explicit training, and instead we only make use of the initialization step explained in Sec.IV A, followed by the sampling step on the resulting s-TNBM.Since the solution space may grow with system size (this growth could be in principle exponential), we bound the number of solutions found by the number of total queries to the s-TNBM generator, which we set to be Q = 10 4 .The results of this experiment are shown in Fig. 7. Remarkably we find that, despite the absence of obvious global U (1) symmetry given the nature of the equality constraint, we do find very good generalization performance.It is interesting to see that the performance increases as we scale up system size N .The number of generated solutions grows at a pace which would be consistent with exponential (we do not rule out power law) as a function of system size N for fixed r max , even for large values of r max .Importantly, the set of QNs at the middle link, |{n N/2,R }| (which corresponds to the largest structural tensor link dimension in the s-TNBM), is of size at worst linear as a function of r max for fixed N , at least for the system sizes and choices of r max simulated here.Note this set size is a measure of the complexity of datasets constrained by equalities, and it provides a lower bound on the minimum bond dimension needed for an MPS to generate a given number of new, distinct solutions.Clearly, real-world equality constraints will have a greater degree of symmetry so the results shown here for a single equality with random coefficients A are already encouraging.This would in turn translate into a smaller bond dimension for a given fixed system size, and therefore better scalability as we go to larger system sizes.
To have a better intuition on these results and on the role played by the bond dimension, it is helpful to consider the most symmetric equality constraint, that of cardinality, where α i = 1 ∀i.As argued in Sec.III B, an exact MPS representation of the entire valid space requires a bond dimension of at least χ = O(κ) (in fact χ = min(κ, N − κ) + 1), while the solution space grows exponentially in N (for κ ≈ N/2).How does this relate to the generalization capabilities of s-TNBMs?In Fig. 4 we gave an intuition of this mechanism, focusing on N = 6, κ = 3.For concreteness we choose the canonical center to be located at the first site.We label all link indices by their QNs.Next, we apply the initialization step of the algorithm explained in Sec.IV, with the set of four colored bitstrings in Fig. 4.These four samples already covers all possible QNs for each link with bond dimension 2, 3, 4, 3, 2 respectively.The corresponding MPS is able to generate all valid 6  3 = 20 bitstrings.The choice of the four samples is not unique as long as they patch the entire set of link QNs from start, at n 0,R = κ (viewing the extra dashed index on the left of the MPS in Fig. 4 as having charge κ) to finish, at n 6,R = 0 (viewing the extra dashed leg on the right of the MPS in Fig. 4 as having charge 0).Of course this experiment carries over to any cardinality and system size.The upshot is that the minimum number of bitstrings needed to generate all solutions to a given equation is lower bounded by the number of QNs at the middle link.In other words, how well our s-TNBM is able to generalize depends crucially on the number of distinct paths that can traverse this middle link.
We scale up the previous experiment from one to five equations, where now A, b entries are chosen uniformly random in [−r max , r max ].The results of the experiment are shown in Fig. 8.We are interested here in checking how efficiently does our s-TNBM generalize as a function of training set size, i.e. how many new, unique solutions, as a function of the training set size |T | are found by the s-TNBM.The training set is provided by random search and keeping valid samples.We find that, even for random instances as the ones considered in here, where r max ∈ {1, 2}, our s-TNBM is able to generate new solutions with a power-law exponent close to 2, as shown by the best-fit power-law exponent 1.8 and 2.4 in Fig. 8 So far our focus has been solely on the generalization capabilities of s-TNBMs at generating new solutions to A x = b, without any comparison to the performance of vanilla TNBMs at the same task.In Fig. 9 we consider that comparison.A vanilla TNBM does not include any bias from the constraints, so one must train them using the DMRG-like algorithm from Ref. [5] (and extended to the presence of equality constraints in Sec.IV B).The experiment now considers a set of two equations with integer coefficients randomly selected in the range [−2, 2].We are interested in the number of new, unique solutions, |g sol |, found after a given number of queries Q to the generator.A useful metric in this sense is the coverage, as introduced in Ref. [40], and defined as where |S| is the number of solutions to the system of equations.This metric measures effectively the number of new, unique solutions found by the generator within the remaining available solution space (not contained in the training set).Note that this quantity is implicitly a function of the number of queries to the generator.Given the system size chosen, N = 20, we can compute explicitly all solutions to random system of equations, which for our experiment turns out to be |S| = 9452.We feed in both symmetric and vanilla MPS the same amount of training data, given as T = |S|, with < 1, which consist of unique, valid samples.When the training set contains 10% of the solutions ( = 0.1), both vanilla and symmetric MPS are able to generate new solutions with comparative performance.Two crucial differences between the vanilla and symmetric MPS results is that a priori we do not know which bond dimension gives the best performance using vanilla MPS and so one is left with trying many of them only to pick up the one that gives the best results, in this case χ = 22.This is not the case when using symmetric MPS, where the bond dimension is fixed entirely by the training set and the constraints, i.e. it is given by the QN set size at the middle link, χ = |{n N/2,R }|.Perhaps more importantly, there is no training required when using symmetric MPS because the ansatz already contains the inductive bias coming from the constraints, and we only need the initialization step from Sec. IV A. The benefits of using symmetric MPS at learning datasets with equality constraints become more evident in the presence of scarcity of training samples: when we lower this amount down to 1% ( = 0.01), vanilla MPS produces about 3% of the remaining unique solutions after Q queries.This is true even for various choices of learning rates used in here, with the results of α = 0.02 displayed in Fig. 9, whereas the symmetric version does so at around 50%, down from around 55% when = 0.1.

B. Quality-based generalization: Ground state reconstruction of local symmetric Hamiltonians
In this part we consider a generative model capturing equality constraints as well as a distribution bias.Our goal here is to learn the distribution corresponding to the ground state of the Transverse-Field Ising (TFI) model subject to an equality constraint which for simplicity we take it to be of cardinality type.The TFI Hamiltonian reads where s ∈ [0, 1], and {X i , Z i } are Pauli matrices.Here we shall take s = 1/2, corresponding to the critical point of the TFI model, and the system size to be N = 20.This Hamiltonian by itself is not U (1) symmetric (however it does have an inherent global Z 2 symmetry, since the Hamiltonian is invariant under the parity operator P = N i=1 X i ), but its ground state is nevertheless efficiently captured by an MPS of low bond dimension, χ ≈ 8 (note however that for general N , the MPS ansatz is suboptimal at the critical point since entanglement grows logarithmically S ∼ log N , and instead the Multi-scale Entanglement Renormalization Ansatz (MERA) is a better choice, see Ref. [41]).The target distribution corresponds to that constructed from taking the Born machine associated with this ground state followed by a fixed cardinality filter, that is, we project the resulting MPS onto the manifold of states constrained to only sample bitstrings fulfilling i x i = κ, where for concreteness we choose the cardinality κ = N/2 = 10.Thus the entire solution space (composed of all valid bitstrings fulfilling the cardinality constraint) is |S| = 20  10 ≈ 185 × 10 3 .In Fig. 10  terms of their respective Kullback-Leibler (KL) divergences using vanilla vs. symmetric MPS for a training dataset size of |T | = |S| (note that the set may include repeated samples, in which case their probabilities are weighted according to their frequency).We see that while for = 0.05 both vanilla and symmetric MPS are able to learn the underlying distribution just as well, when data is scarce, 0.01, the symmetric counterpart performs better.These results are obtained without major refinements, in particular the learning rate is fixed throughout and of value α = 0.02 (in principle it would be advisable to adjust this as a function of the size of the tensors [5]), no DMRG noise is used, and no minibatching or multiple inner steps (i.e.multiple consecutive gradient updates at the same site) has been used.These tricks can bring down the value of the loss function by escaping local minima and may be crucial steps for larger scale models.
It is a well-known fact in ML models, that the testing loss may develop a U-shaped curve as a function of model capacity [42].This is a consequence of the bias-variance tradeoff.In Fig. 10 (right panel) we show that both vanilla and symmetric MPS display such a U-curve, as a function of bond dimension (see also [43]).This type of overfitting behavior is static in that it only depends on the complexity of the MPS as parameterized by its bond dimension χ.On the left and central panel in the same figure we also report a different kind of overfitting behavior which we dub dynamic in that it develops as a function of sweeps, and is more pronounced in the case of symmetric MPS, where at large enough χ, the best testing loss is localized at around a minimum after a few sweepssuch dynamic U-curve is also present in vanilla MPS, but its convexity is less striking.
We remark that we have also tried more suitable Hamiltonians with an inherent cardinality constraint, such as the Heisenberg model.This considerably improves the testing loss (since the Hamiltonian already satisfies the constraint) but we found the minimum in the U-curve to be located at the minimum bond dimension needed to learn the constraint, χ = 11 for N = 20, which of course is not the general case for arbitrary cost functions (not just the ones stemming from local Hamiltonians).
C. Quality-based generalization: Finding optimal solutions to constrained combinatorial optimization problems The previous example was motivated by physics, in particular by exploiting the locality of the cost function in the form of the TFI Hamitonian.Here we illustrate how s-TNBMs can also be used to find novel and optimal solutions to combinatorial optimization problems subject to equality constraints, outperforming vanilla TNBMs.As an example, let's consider the cost function to be the negative separation cost of bitstrings, which is the negative separation of the farthest two '1' separated only by '0' bits.For instance, for a bitstring 01011001 the negative separation cost would be −3.The goal of the task is to generate bitstrings with the largest negative separation cost (in absolute value) as possible, subject to an equality constraint which we take to be the cardinality constraint.This is achieved using the constrained-GEO framework described in Sec.IV, where the cost function appearing in the softmax corresponds to the negative separation cost subject to a cardinality constraint.Because of this constraint, we assign a cost zero to invalid samples (bitstrings not fulfilling this constraint).The starting point is a set of valid solutions that have a negative separation cost at or above C 0 .We remark that as opposed to the previous task, with a physics motivation, the current exercise serves as a proxy for typical cardinality constrained optimization problems appearing in industry problems, such as portfolio optimization [13] and traveling salesman [44], both NP-hard problems [44,45].
The results of this experiment are shown in Fig. 11, where the learning rate is fixed to α = 0.02 throughout, and we choose two different system sizes.For N = 20 and κ = 10, the training set contains only bitstrings of cost greater than or equal to C 0 = −6, while for N = 30 and κ = 15, the training set contains only bitstrings of cost greater than or equal to C 0 = −8.To test the performance of both models at capturing low cost samples outside the training set we make use of the utility metric, as introduced in Ref. [40].It amounts to computing the average cost of the 5% lowest cost samples obtained from querying the trained models.The motivation for this metric is that, often times, one is not interested in a FIG.11.Constrained combinatorial optimization with cardinality constraint and negative separation cost function.Final bitstring distributions using the optimal choices of bond dimensions for vanilla MPS vs. symmetric MPS, for a given softmax temperature choice T .The cardinality constraint is cardinality i xi = N/2, with N = 20 (top panels) and N = 30 (bottom panels), being the size of the bitstrings.We are interested in the final utility metric after training convergence.Blue histograms correspond to the model distributions, extracted after sampling Q = 10000 times from the TNBM; orange histograms correspond to the distribution of |T | = 1848 training bitstrings, which is formed of uniformly distributed, unique bitstrings satisfying the cardinality constraint but restricted to the subspace of bitstrings of cost greater or equal to −6 for N = 20, and −8 for N = 30; and green histograms correspond to the uniform distribution, extracted from the uniform distribution of all bitstrings satisfying the cardinality constraint.Left panels: results for vanilla MPS at T = 1.Middle panels: results for symmetric MPS at the same temperature T = 1.Right panels: results for symmetric MPS at T = 1/2.unique lowest cost sample (which could have been found by mere luck).By taking instead a small percentage of the lowest cost samples, we make a more robust assessment that the model is indeed obtaining consistently high quality (low cost) samples.Our results indicate that, while the vanilla TNBM is not able to extract any better samples than a constrained uniform sampler, corresponding to the model that samples all bitstrings of the given cardinality uniformly, the symmetric version not only does sample already within the valid target space of fixed cardinality (thus avoiding samples with cost zero), but their samples are of higher quality as evidenced by the utility metric U .The value of this metric is even further optimized when decreasing the temperature in the softmax function, giving further evidence that the s-TNBM is able to learn the bias and that biasing the training set helps the s-TNBM to propose even better candidates.Crucially, we see that the performance gap between symmetric and vanilla TNBMs increase as we increase the system size from N = 20 to N = 30: in the latter case, the vanilla TNBM is not able to capture the cardinality constraint to the extent that less than 20% of the samples are valid, and more importantly, the quality of the valid ones, as measured by the utility, is still far from that obtained using FIG.12. Flowchart of all the steps entering in the tensor network generative algorithm for combinatorial optimization problems with hard-constraints.See also Algorithm 1 for a detailed description of the various steps.s-TNBM.Build * uniform TN from constraints (all QN blocks set to same value) Extract training set via perfect sampling from s-TNBM Compute utility from mean of lowest cost 5% samples while Constrained TN embedding from constraints and samples else p Rebuild * uniform TN from constraints (all QN blocks set to same value) Train MPS with distribution p T for a few sweeps Extract training set via perfect sampling from s-TNBM Compute utility from mean of lowest cost 5% samples end while Return x = argmin(C( x): x ∈ T )) * : When an efficient TN representation of the solution space does not exist, an alternative construction is to search randomly samples satisfying the constraints and applying the Constrained TN embedding step on these.This will only construct an approximate TN to the entire solution space (note that in general the TN ansatz will contain more solutions than are fed in from a set of seeds, as argued in Sec.IV A).
In the next experiment we take full advantage of the knowledge that an exact and efficient representation of the valid space exists in terms of a symmetric MPS of fixed cardinality.We consider here a much larger system with N = 50 and cardinality κ = 25, which results in a solution space of size |S| = 50  25 ≈ 1.26 × 10 14 .We use the following protocol, which is also summarized in Fig. 12 for general equality constraints and detailed in Algorithm 1.The first step is to sample uniformly from the fixed cardinality MPS.To guarantee that this sampling is uniform over the space of fixed cardinality bitstrings, we set all allowed QN blocks to be of size 1 × 1 and filled in by the the same value (the value will be set by the normalization condition of the TN).This forms our initial training set.We train as usual for one sweep with this initial choice of MPS and sample at least as many times as samples there are in the training set.For our experiments the size of the training set is always fixed to be |T | = 100, while the number of samples or queries on the generator is Q = 10 4 (we emphasize again that sampling on an MPS can be done very efficiently).
With the samples at hand we keep the 100 lowest cost samples as the new training data set.At the next step we apply the initialization step described in Sec.IV A so as to extract the most relevant QNs from where we construct our next initial MPS.Next we train for one sweep with an effective temperature which we choose to be T = 1/2std(C({ x ∈ T })), with std the standard deviation.We next sample Q times and keep the lowest cost 100 samples for our next step training dataset.We repeat this procedure for as long as the results improve.The results of this experiment are shown in Fig. 13.After only 6 steps (and only a couple minutes of computational time on a regular laptop), the symmetric MPS is able to reach a minimum cost of C = −20, with the absolute minimum corresponding to C = −26.To put this result into context, we remark that the space of bitstrings with cost C ≤ 20, satisfies |S| C≤20 /|S| ≈ 10 −7 (see App. D for an exact analytical expression), so that a constrained uniform sampler over the solution space would require at least of the order of O( 103 ) times more samples than the ones carried out here, indicating that our MPS indeed is able to find its way to a subspace near the global minimum, even when this effectively corresponds to a tiny fraction of the entire valid space, and in turn the cost function is highly nonlocal.
This optimization scheme bears some resemblance with the QAOA and other approaches following an exploration/exploitation paradigm: the TN REBUILD step serves as a mixer near the optimum solution (which learns over the best data from the previous iteration), while the TN Embedding step, moves directly along the direction of minimum by learning only the most relevant QNs (latent features) from the lowest cost samples from the previous iteration.Our approach, being data driven, is agnostic to the form of the cost function, as illustrated in our experiments.

VI. COMPARISON TO NEURAL NETWORK ARCHITECTURES
TNs are an efficient representation of states with limited amount of entanglement.In recent years there has been a lot of excitement about the possibility of using other representations that may be more efficient than TNs at capturing long-range entanglement.RBMs, the building block of many deep neural network architectures, have been suggested to be a more efficient representation than TNs for certain classes of states displaying long-range entanglement [46,47].Here we argue that U (1)-symmetric states may be more challenging to construct using RBMs, and likely the same true for any G-symmetric state, with G any global internal symmetry, when compared to TNs.The intuition behind this argument relies on the fact that TNs are multi-linear models.In particular, the contraction of an MPS with one-site operators can be performed locally.Under a global group transformation represented by a unitary matrix U (g) ⊗N , with g ∈ G, the new MPS can be constructed straight-forwardly, see Fig. 14(a-b).In contrast, for a RBM, after applying such global group transformation, it is very challenging to get the weights of the new RBM.Unlike for a TN, we have to train the new RBM.So it becomes natural to impose such symmetries in TNs, in contrast to RBMs or ANNs in general.
For a TN, group operations on sites will introduce group representations on each link consistent with charge conservation.So a global symmetry transformation will generate group representations on each link of the MPS.While for RBM or ANN, it is very challenging to have this property.
Another argument in favor of TNs for describing symmetric states is precisely due to the linearity of the TN.As shown in Fig. 15(a,b), it is very easy to construct the linear hybridization of two MPS, the new tensors are a direct sum of these MPS tensors.The bond dimension would be the sum of bond dimensions of Ψ 1 (x) and Ψ 2 (x), and the construction is exact.In contrast, for a RBM, we are not able to construct the new RBM directly from the original RBM parameters.The only option left is to train a new RBM, which mimics the summation, which is only an approximation.

VII. CONCLUSIONS
In this work we have shown how one can use symmetric TNs for the purpose of generative modeling and constrained combinatorial optimization.We have tested their performance on various datasets serving as proxies for the kinds of problems our generative algorithm, dubbed here constrained-GEO, is applicable to.Most prominently, we include combinatorial optimization problems subject to arbitrary integer-valued equality constraints, which appear in many problems of industrial interest.We have found that by mapping the problem of equalities to a problem involving U (1) global charges, a symmetric MPS generative model is able to generalize (even in the presence of random instances of A and b).This class of models, referred here as s-TNBMs, has a series of benefits which we recount: • By exploiting U (1) global charge conservation, we are able to reduce the space complexity by working with block-sparse tensors, as opposed to dense tensors, as shown on the right panel of Fig. 6.
• In turn, given the block-sparse structure of tensors allows for faster learning of the data, often requiring an order of magnitude less number of sweeps to converge when training for the problem sizes studied in here.
-This point translates into less computational time per operation (contraction, merging) as shown on the left panel of Fig. 6; less number of sweeps as shown on Fig. 10(a-b); and no time is spent on sampling invalid samples.
• It permits to have a better intuition on the complexity of the data, as viewed when counting the number of QN blocks appearing in the MPS (i.e. the minimum bond dimension needed to exactly capture all charges from the data).
• It allows to generalize in the presence of scarcity of data.
• It allows to find better solutions in the context of combinatorial optimization problems subject to constraints.
Of all these points, the last two ones are perhaps the most relevant ones.Classical ML models require many samples for them to generalize.By exploiting internal global symmetries like U (1), which in turn are hard to impose in ANNs architectures as argued in Sec.VI (see also Refs.[48,49]), we are able to reduce the amount of training data even further.The last point is also worth stressing: symmetric TNs were originally conceived to efficiently encode quantum states subject to global symmetries, but in the context of finding groundstates of Hamiltonians, as well as in the context of Hamiltonian dynamics.What we have found in this work is that constraining TNs may be a key step at not only decreasing the complexity of the learning algorithm, but crucially, at finding novel and better solutions to constrained combinatorial optimization problems, which would be otherwise out-of-reach by other classical ML and state-of-the-art optimization heuristics.

VIII. OUTLOOK
A clear future extension of the current work is to analyze the role of other symmetries.Trivial extensions include other abelian symmetries, like Z 2 , which appear in physical models such as the TFI model discussed in this work, but also in some toy datasets like the parity dataset, for which TNs have already shown superior performance over ANNs such as RBMs at both learning and generative modeling performance [50][51][52][53][54]. Perhaps more intriguing would be to study the role of nonabelian symmetries in the context of generative modeling.Of special relevance is SU (2), perhaps the other most relevant quantum-inspired symmetry apart from U (1), and for which a way to construct TNs that are symmetric under SU (2) already exists in the literature [55][56][57].A more challenging and practically relevant question is to address the presence of inequality constraints (i.e. of the type A x ≤ b), which also abound in many combinatorial optimization problems.This is an ongoing work and will be presented in a subsequent publication.
We now put our work into the broader context of current progress in the domains of constrained combinatorial optimization and ML models exploring the role of symmetries and constraints.In the domain of combinatorial optimization, GNNs have become a very promising framework (see e.g., [18,58]).In this case, one can encode the coefficients of matrix A and vector b as features in a bipartite graph.This approach leverages generative modeling for node selection and variable selection, steps that appear in branch-and-bound for solving MILP problems.In the context of quantum models addressing QUBO problems with hard-constraints, such as XY-QAOA [59,60], current methods suffer from the fact that only a very limited set of equality constraints can be addressed, namely those for which one can implement local moves around the valid space (and which can be implemented by at most two-site, local gates), such as cardinality-type constraints.Likewise, a recent tensor network approach was proposed to tackle the open-pit problem subject to equality constraints [9].While bearing some resemblance to our work, this approach however is only valid for equality constraints with a locality structure, in that only few local variables can appear at any time in any equality, and where in turn the cost function must be mapped to a local Hamiltonian.Our approach is not limited by specific equality constraints nor by specific choices of cost functions.
On the side of implementing symmetries in the context of ML, there has been very recent interest in equivariant ANNs, both classical (see e.g.Refs.[61,62]), and quantum (see e.g.Refs.[63][64][65][66]).Equivariant ANNs exploit symmetry in data so that the output samples also preserve this symmetry, such as is the case of E(3) equivariant ANNs (which transform data under translations and rotations, with applications including image processing).In contrast, our work exploits symmetry at the level of the generator, in this case a s-TNBM, so that it only outputs samples satisfying arbitrary equality constraints, where different equality constraints correspond to different irreps of U (1) (i.e.different fluxes in TN terminology).Lastly, there has been recent interest in using autoregressive neural networks for modeling physical systems with local gauge symmetries [67,68], which are fundamentally different from the global internal symmetries we are interested in here.The latter have been studied in works [69,70] for the simplest U (1) symmetry corresponding to an equality of cardinality type, using Recurrent Neural Networks (RNNs).However at present it is not clear how to extend their construction to arbitrary equality constraints, and if so, whether the resulting RNN corresponds to a more efficient representation than that of symmetric MPS.
An exciting milestone ahead is to demonstrate the power of our s-TNBM models in the context of large-scale industrial applications.Achieving such quantum-inspired advantage would be a promising step towards demonstrating practical quantum advantage with near-term devices.This could be possible by exploiting recent efficient proposals [71,72] to map TNs to quantum circuit generative models such as quantum-circuit Born machines [73].
For our purposes, we shall be considering the MPS with open boundary conditions (OBC) [75] so that the left-and rightmost tensors are actually vectors so that χ 0 = χ N = 1 (so that the trace is not needed).A perspective that is useful when dealing with symmetry group transformations on tensors, is to view tensors as linear maps from an input vector space V in to output vector space V out .A significant simplification in many tensor network algorithms involving MPS is to use the canonical form [24], which is used throughout in this work.When put in canonical form, tensors to the left of the canonical center correspond to left isometries, satisfying |β = a,α T a α,β |a, α with a (T a ) † T a = 1 and where |a, α ∈ V out and |β ∈ V in , while tensors to the right of the canonical center correspond to right isometries, satisfying |α = a,β T a α,β |a, β with a T a (T a ) † = 1 and where |α ∈ V in and |a, β ∈ V out .Each link or virtual state can be represented as The size of each link vector space is sometimes known as bond dimension, and its magnitude controls the amount of entanglement shared by connected tensors.The states associated to the upper index i v will be referred to as physical states, since they are the ones that appear in the original wavefunction expansion (B1), at variance with the virtual states just discussed.

U (1) Symmetric Matrix Product States
Since tensors are linear transformations, the tensor network ansatz makes it very convenient to exploit the toolbox of representation theory.In essence, by requiring that our state |Ψ be invariant under a global group transformation G represented by a unitary matrix U g , with g ∈ G s.t.[33] (U g ) ⊗N where V g , W g are unitary representations of the same group G.The upshot of this is that each tensor will be constrained to have nonzero entries only at those entries fulfilling Eq. (B5).
Our focus here will be on tensors that are symmetric under G = U (1), where representations are one dimensional and parameterized as U g = e −inφ , with n the so-called charge operator and φ ∈ [0, 2π).This means that our state must be invariant under this group transformation up to a phase: where the state |Ψ is an eigenstate of the charge operator n with eigenvalue n, n|Ψ = n|Ψ . (B7) Note that, the case n = 0 corresponds to the state being invariant (as in (B4)), while for n = 0 one obtains a covariant state.Such phase factor only arises for states that are covariant under Abelian group transformations like U (1) (in the case of non-Abelian group transformations, covariant states transform instead under a unitary matrix of dimension 2 or higher).
The different charges n label different irreps of U (1) so that for each vector space V we may decompose this as where where I (O) denotes the set of incoming (outgoing) indices.
From (B9) we see that each tensor in the MPS factorizes into the product of two tensors.The first one, (T n b na,nc ) t b ta,tc , gives the components of tensor T w.r.t. the degeneracy vectors {|t n }, reason why this tensor goes by the name of degeneracy tensor.The second tensor, δ Nin,Nout depends solely on the conservation of U (1) charge, and it is referred to as structural tensor (for an arbitrary symmetry group G the structural tensor can take a rather nontrivial form).For fixed physical charge (fixed n b appearing in (B9)), the resulting matrix T b a,c will be block diagonal, with blocks of size d na × d nc .These blocks are sometimes referred to as quantum number (QN) blocks.A symmetric MPS will be in general not only composed of invariant tensors fulfilling (B9) but also covariant tensors of well defined charge n = 0 and fulfilling Alternatively, we may transform a covariant tensor to an invariant tensor upon introducing an extra index carrying charge n and of dimension 1.The charge n in a covariant tensor is also sometimes known as the flux.For a given MPS, there is at least one covariant tensor, and this is usually taken to be located at the canonical center of the MPS.Alternatively, we may decide to spread the total charge across different sites.It is however preferred to concentrate all flux in one single site as it is easier for bookkeeping purposes (we do not have to keep track of which sites have flux and which do not) and the resulting complexity (as measured by the size of the tensors) is not affected by this choice.Lastly, we remark that for the purposes of our work where we have multiple arbitrary equality constraints, the above formalism still carries through.The major difference is that now we require the state to be invariant under multiple global group transformations of the form U g = e −i j αj nj φ .Since each of the operators nj acts locally on each tensor, it is straightforward to generalize the above formulas to the case of arbitrary number of charges (i.e.translation invariance is not a requirement).

Appendix C: Exact MPS construction for assignment type problems
In Sec.III B we gave an exact construction of the MPS in the presence of a cardinality constraint.Here we give the derivation for another widely present equality constraint in combinatorial optimization problems, namely, an assignment type constraint.This is of the form with A some set of site indices.The valid space for this constraint can be encoded in an MPS of χ = 2, with link charges n i,R ∈ {∅, 1}.See Fig. 16 for an illustration for a single equality.For m number of assignment type equalities, the link charges take values in n i,R ∈ {∅, 1} m , see Fig. 16 for an example of two equalities.Thus at worst the bond dimension grows exponentially with the number of assignments.
Appendix D: Degeneracy counting for negative separation cost function The negative separation cost function together with the cardinality constraint as introduced in Sec.V C has κ − 1 bitstrings with minimum cost function, which is C min = −N + κ − 1.This corresponds to having a single connected block of zeros of size N −κ, with ones at each edge.For the remainder we shall refer to an arbitrary-sized block of zeros with ones at each side as a domain wall (to connect with the statistical physics literature).Here we wish to characterize the degeneracy (that is, number of bitstrings) for arbitrary value of the cost function and for any system size.This will be useful as a performance benchmark of the trained s-TNBM against our baseline which is a uniform sampler on the space of bitstrings with the right cardinality κ.Since for large system sizes N 30 it becomes numerically challenging to extract the degeneracy factors, we aim at extracting these analytically.
The problem of counting such configurations is a straightforward problem of combinatorics.The approach we take here is recursive.We present results for the case when there is a single dominant domain wall, meaning one domain wall is bigger than the rest in a given bitstring (this is analogous to a low-energy approximation above the absolute minimum when viewing the cost function as an energy function).Our results are as follows.
Let a ∈ Z with 2a < N − κ, where κ is the Hamming weight (cardinality).The degeneracy in the number of bitstrings with cost C = −N + κ + a − 1 is given as

FIG. 1 .
FIG. 1.The proposed constrained-GEO framework.Our framework consists of four steps.Step I involves encoding bitstrings satisfying A x = b type constraints into a block sparse tensor network.An efficient embedding of the constraints can be performed in the presence of some training data satisfying the constraints.In Step II, the initialized tensor network is trained by minimizing the cross-entropy or negative log likelihood where the training distribution corresponds to a softmax which serves as a surrogate to the optimization cost function of interest (see Eq. (2)).The training procedure guarantees block-sparsity of tensors.Step III involves collecting new samples from the model distribution by perfect sampling.In Step IV, new samples can be merged with the training dataset to repeat the process or stop when the user is satisfied with the amount of novel samples or a target desired cost is reached.

FIG. 2 .
FIG.2.Cardinality constrained MPS.a) The canonical center is chosen at the first site, corresponding to the site where we insert the flux κ, which for concreteness we choose to be of value κ = 3. Tensors connected with each other share the same tensor indices.b) Charge conservation imposes a block-sparse structure in the tensors appearing in the MPS.Example shown for tensor T[3] .

2 FIG. 3 .
FIG.3.Initialization in the generative algorithm.a) Given a valid bitstring x satisfying A x = b, we assign to each site charge the vector xi Ai, with Ai corresponding to the i'th column of matrix A. With fixed site charges and the flux, corresponding to the vector b, this uniquely determines the set of link charges ni,R in (N − 1)m steps.Carrying this procedure for all bitstrings allows us to construct the structural tensors δN in ,N out appearing in our symmetric MPS ansatz.b) Method 1: we map each configuration of charges (ni,R, ni+1, ni+1,R) to a nonzero entry in the structural tensor.Different dots correspond to different charges.Black lines indicate compatible configurations.Method 2: same as in Method 1 but we now determine all possible site charges ni consistent with the set of link charges {(ni,R, ni+1,R)}.Method 2 contains more compatible configurations of charges, hence more bitstrings are generated using Method 2.

FIG. 4 .
FIG. 4.MPS in the cardinality dataset.The training dataset is given by the four bitstrings T = {111000, 101010, 010101, 000111} fulfilling i xi = κ, with N = 6, κ = 3. Numbers on top of each link in the MPS correspond to new link charges ni,R extracted from each bitstring going from the left one (in yellow) to the right one (in blue).The colored paths in the bottom diagram correspond to the link quantum numbers extracted from each bitstring as we move from left to right in each bitstring (every jump corresponds to a bit taking value 1).

FIG. 6 .
FIG.6.Profiling computational resources for computing the average training step using vanilla vs. symmetric MPS in the cardinality dataset.The system size chosen is N = 50 and the cardinality κ = 25.We train both vanilla and symmetric MPS (initialized in the uniform superposition of fixed cardinality) using |T | = 1000 bitstrings.The results are averaged over 3 sweeps.Left (right) panel: Wall-time elapsed (memory allocations) as a function of maximum bond dimension in the MPS.

FIG. 7 .
FIG. 7. Generalization performance using the s-TNBM: one equation i αixi = b, where b, αi are random integer coefficients uniformly distributed in the range [−rmax, rmax].The input is 100 bitstrings found by random search, and the goal is to generate as many new, unique solutions after Q = 10 4 queries on the s-TNBM.Left panel: Missing ratio of new, unique solutions |g sol | found after Q queries, as a function of system size N for various choices of rmax.Data averaged over 10 samples and error bars corresponding to standard errors.Right: growth in the number of distinct charges at the middle link for various choices of N .

FIG. 8 .
FIG. 8. Generalization using the s-TNBM: five equations of the form A x = b.Left: Number of new and unique solutions generated when sampling Q = 10 4 times on our s-TNBM as a function of training set size |T | of unique training bitstrings for a system size N = 50 and rmax ∈ {1, 2} (solid and dashed lines, respectively).Dotted lines correspond to best power-law fit for each case.Right: Corresponding QN set size at the middle link as a function of |T |.
. The corresponding QN set size at the middle link grows linearly, and in fact scales roughly as |T |, |{n N/2,R }| ≈ |T |.

FIG. 9 .
FIG. 9. Comparison between vanilla and symmetric TNBM at generating new solutions to A x = b as measured by the coverage.We consider two equations with N = 20 binary variables, where the entries in A and b are randomly selected integers in the set {−2, −1, 0, 1, 2}.The goal is to generate new solutions from a fixed |T | = |S| number of samples, where |S| is the total number of solutions fulfilling the two equations, which is |S| = 9452.Results shown after Q = 10 4 queries.Different bond dimensions correspond to best results out of 10 trials using vanilla MPS (color lines).The symmetric MPS results do not require training, only the initialization step explained in Sec.IV A. Black dashed lines correspond to the coverage after sampling from the resulting symmetric MPS, whereas red dashed lines correspond to the maximum coverage using symmetric MPS, obtained after exact tensor network contraction.Top: coverage for a training dataset with = 0.1, and bond dimension χ = 37 for the symmetric MPS.Bottom: same but with = 0.01, and bond dimension χ = 26 for the symmetric MPS.
FIG. 10.Loss curves for symmetric vs. vanilla MPS in the TFI Hamiltonian subject to cardinality constraint in the presence of scarcity of data.a) KL divergence using vanilla MPS for different bond dimensions.Solid lines correspond to training KL, dashed lines correspond to testing KL. χ = 16 corresponds to the minimum testing KL. b) same but for symmetric MPS.c) U-curves corresponding to final testing losses for ∈ {0.005, 0.01, 0.05}, as a function of bond dimension χ.Error bars correspond to standard error after 10 independent training iterations.

FIG. 13 .
FIG. 13.Finding optimal solutions when efficient MPS representation is available, using Algorithm 1. Data shown for N = 50, κ = 25, and negative separation cost as cost function.The utility extracted from the model distribution after the first step is U model |t=1 = −11.05,while that at the final step is U model |t=6 = −20.0.The uniform sampler has utility U uniform = −9.64.Bond dimension and learning rate are fixed throughout all steps and given by χ = 30, α = 0.02, respectively.

FIG. 14 .
FIG. 14. Illustration of how a TN and RBM are transformed under global symmetry.(a) Ψ(x) is an MPS with G symmetry.The MPS remains constant under a global rotation for all sites.(b) The local tensors can be associated with the original ones by contracting local tensor T with gate U .(c) Global symmetry transformation on a RBM, which results in a new RBM with updated weights.Due to the sparsity structure of the RBM network, it becomes very challenging to get the updated RBM parameters after the symmetry transformation.
|Ψ = |Ψ , (B4) we may constrain the form of each tensor in the MPS as dictated by the transformation a ,b ,c(U g ) c,c (V g ) b,b T b a ,c (W g ) † a ,a = T b a,c , B8)    where V n are irreps of fixed charge n, each of dimension (or degeneracy) d n .Thus, any arbitrary vector |v ∈ V can be expanded as a linear combination of |n, t n ∈ V n vectors of fixed charge and degeneracy label t n , where t n = 1, • • • , d n , with d n the degeneracy of charge n.By going to this basis of well-defined charge, the invariance condition on the tensors, (B5), translates intoT b a,c = (T n b na,nc ) tn btn a ,tn c δ Nin,Nout , a ,b ,c(U g ) c,c (V g ) b,b T b a ,c (W g ) † a ,a = e −inφ T b a,c .(B11)Covariant tensors can be most compactly written in the basis of well-defined charge asT b a,c = (T n b na,nc ) tn b tn a ,tn c δ Nin+n,Nout .(B12)

FIG. 16 .
FIG. 16.Charges in an MPS in the presence of assignment type equalities.a) One equation.b) Two equations.
Training steps in the generative algorithm.1.Two neighboring tensors are merged in the s-TNBM.2. Computing the gradient of the s-TNBM w.r.t. the merged tensor.3. The two neighboring tensors get updated by the gradient term.4. The new tensor is factorized back into two neighboring tensors.The canonical center (orange tensor) has been shifted one site in the process.We repeat this four steps across all sites in the MPS from left to right and back corresponding to one full sweep.
15. Linearity of TN vs nonlinearity of RBM.(a) Linear hybridization of two different MPS Ψ1(x) and Ψ2(x) can be accomplished by constructing a new MPS with each tensor taking the direct sum as shown in (b).However, for nonlinear models such as RBM or ANNs, it's very challenging to construct the sum of P1(x) and P2(x) directly without any training.