Machine learning renormalization group for statistical physics

We develop a machine-learning renormalization group (MLRG) algorithm to explore and analyze many-body lattice models in statistical physics. Using the representation learning capability of generative modeling, MLRG automatically learns the optimal renormalization group (RG) transformations from self-generated spin configurations and formulates RG equations without human supervision. The algorithm does not focus on simulating any particular lattice model but broadly explores all possible models compatible with the internal and lattice symmetries given the on-site symmetry representation. It can uncover the RG monotone that governs the RG flow, assuming a strong form of the c-theorem. This enables several downstream tasks, including unsupervised classification of phases, automatic location of phase transitions or critical points, controlled estimation of critical exponents, and operator scaling dimensions. We demonstrate the MLRG method in two-dimensional lattice models with Ising symmetry and show that the algorithm correctly identifies and characterizes the Ising criticality.


I. INTRODUCTION
Renormalization group (RG) is an elegant conceptual framework and powerful computational method in statistical physics and quantum field theory.RG extracts the relevant features at every given scale by progressively coarsening the degrees of freedom in a physics system in hierarchies.Conventionally, real space RG relies on human physicists to design coarse-graining transformations based on their intuition of the physical system.In this research, we aim to explore the potential for artificial intelligence (AI) to learn optimal RG transformations automatically from energy models.Unsupervised machine learning is particularly well-suited for this task due to its ability to learn low-dimensional representations or relevant features from data and to remove noise and irrelevant features without supervision.This approach aligns with the goal of learning RG transformations, which aim to extract relevant features of a physical system by transforming fine-grained configurations to coarse-grained ones while preserving essential information and correlations.
Prior research has demonstrated that neural networks can learn to perform hierarchical feature extraction at the configuration level [1][2][3][4][5][6][7][8][9][10][11].However, a more fascinating aspect of RG is its ability to quantitatively analyze the flow of physics theory in the parameter space at the model level [12,13].Therefore, this research aims to develop a novel machine-learning renormalization group (MLRG) method that can automatically formulate RG flow equations, discover RG monotones, propose effective theories, identify critical points, and estimate critical exponents, all starting from the symmetry and dimension of the physical system.
In this study, we will focus on statistical mechanics models defined on regular lattices and develop machine learning algorithms to analyze them within the real space RG framework.Our proposed MLRG algorithm is depicted in Fig. 1.We introduce two restricted Boltzmann machines (RBMs) [14][15][16][17] to model the lo- Three AI models are involved.The teacher and student models are both generative models based on RBM with different predefined neural network connects, representing the finegrained and coarse-grained local energy models.The moderator model is a discriminative model to predict the RG monotone.The moderator first guides a Hamiltonian Monte Carlo (HMC) sampler to propose the teacher model parameters that is most worth training, and then predicts the student model parameters following the RG flow.After both RBMs get their model parameters, the teacher generates the data to train the student.The gradient back-propagates from the student to the moderator to train the RG monotone network.
After training, moderator gains good knowledge about the RG flow throughout the parameter space, which can then be used for various downstream tasks.
cal energy model at different scales on the fine-grained and coarse-grained lattices respectively.The fine-grained model serves as the teacher by generating samples of visible configurations, which are then used to train the coarse-grained model.The coarse-grained model acts as the student and learns its energy model to describe the training data provided by the teacher.We also introduce a third model, the RG monotone network, as a moderator that observes the teacher-student learning process and learns to predict how the model parameters of the coarse-grained model are related to those of the fine-grained model.This allows the moderator model to learn the RG flow and use its knowledge to guide a Hamiltonian Monte Carlo (HMC) [18][19][20] sampler to propose new model parameters that are most worth training.After training, we can use the machine-learned RG flow to identify RG fixed points in the parameter space and automate the analysis of physical properties at these fixed points.The paper will be organized as follows.We will first introduce the MLRG algorithm in Sec.II, which includes (i) the teacher-student learning system Sec.II B to model the RG flow and (ii) the moderator Sec.II D and HMC sampling Sec.II E system to extract RG monotone and use it to guide the training.The teacher and student are modeled by equivariant RBMs, as formulated in Sec.II A, whose point group representation choices are elaborated in Sec.II C. We then demonstrate the application of MLRG on 2D Ising models in Sec.III.Sec.III A describes the problem setup.Sec.III B shows the machine-learned RG monotone and RG flow diagram.A few quantitative results are then presented, including the critical point Sec.III C, the ground state degeneracy Sec.III D, and the scaling dimension Sec.III F. Sec.III E explains how to use Newton's method to localize the RG fixed point.We summarize the advantage and limitations of MLRG and comment on its connections to related works in Sec.IV.

A. Statistical Mechanics Models
We begin with a general definition of a statistical mechanics system on a lattice.Let the lattice be described by a graph G = (V, E), where V denotes the set of vertices (sites) and E denotes the set of edges (bonds).On each site i ∈ V, we introduce a vector s i ∈ R n to characterize the on-site degrees of freedom, which are generally referred to as a spin.The entire spin configuration s on the lattice can be viewed as a map s : V → R n .A central theme of equilibrium statistical physics is to model the probability distribution p(s) ∝ e −E(s) using a local energy function E(s), such as where ϵ J is a scalar function describing the energy associated with each pair of spins (s i , s j ) across the edge ⟨ij⟩.The subscript J represents the collection of parameters that parametrize the energy model.Symmetry plays a significant role in defining the spin degrees of freedom and constraining the energy function.
Let G be the internal symmetry group and R be the point group of the lattice.Assuming G and R commute and form a direct product group G × R, the on-site spin s i should form an n-dimensional linear representation of the G × R symmetry.Under the symmetry action g ∈ G and r ∈ R, the spin s i transforms as where T g and T r are matrix representations of the internal g and the point group r symmetry transformations.r(i) denotes the resulting site obtained from the original site i by applying the point group transformation r.
The energy function E J (s) must be invariant under the internal and the lattice symmetry (including the point group and translation symmetries).This requires ϵ J to satisfy the following symmetry constraint: for all (g, r) ∈ G × R.This means it is sufficient to define the energy function ϵ J on a representative bond and use the lattice symmetry to carry it to every other bond on the lattice (assuming all bonds on the lattice are related by lattice symmetry transformations).More explicitly, suppose the spin components s aα i are labeled by two indices a and α, separately indexing the representation space basis of the point group and internal symmetries.One possible design of an equivariant bond energy function ϵ J on the representative bond is to use a tensor network: where repeated indices are automatically summed over, as illustrated in Fig. 2 as depicted in Fig. 2(b).The tensor J contains all parameters that determine the energy function.
They can be viewed as coupling constants among different symmetry representations.Although Eq. (4) does not yet represent the most general equivariant energy model, it is already sufficiently expressive if the representation space dimension n is large enough.So we will not dive into more complicated equivariant neural network models [21][22][23][24][25][26] but adopt this tensor network design to construct the equivariant restricted Boltzmann machines later.In summary, given the internal symmetry group G and the lattice graph G (with the lattice symmetry given by the automorphism group of G), one can specify the onsite spin s i by its representation under the internal and point group symmetries.Statistical mechanical models are generally defined by symmetric energy functions E J (s), parametrized by J, as in Eq. ( 1).The RG analysis aims to determine how the model parameters J effectively change across different scales.

B. Real Space Renormalization Group
For concreteness, we will focus on two-dimensional lattice models.In particular, we will consider the Lieb lattice [27] (bond-intercalated square lattice) as shown in Fig. 3(a,b).The Ising model defined on the Lieb lattice is physically equivalent to the simple square lattice but the Lieb lattice is more convenient for describing real-space RG schemes.Extending our approach to other lattices and higher dimensions is possible in general.Starting with an energy model defined on the Lieb lattice, our RG scheme can be described as follows: (i) Divide the lattice into overlapping 2 × 2 blocks as in Fig. 3 Repeating the procedure recursively, the energy model parameters will be renormalized to a larger and larger lattice scale.
The key objective of this RG scheme is to learn the new local energy model.For this purpose, we view the squareand cross-graph local energy models as two restricted Boltzmann machines (RBMs) [14][15][16][17].We will call the RBM on the fine-grained square graph the teacher machine, in which the corner spins x i are the visible variables and the decorated spins z i are the hidden variables, see Fig. 3(c).Its energy model reads: The RBM on the coarse-grained cross graph will be called the student machine, in which the corner spins x i are still the visible variables, but there is only one hidden spin z 0 at the center, see Fig. 3(d).Its energy model is : The teacher and student RBMs are separately parametrized by J tch and J std .In both Eq. ( 5) and Eq. ( 6), the bond energy function ϵ J (x i , x j ) is defined on a representative bond along the i → j direction using the tensor network model Eq. ( 4).In Fig. 3(c,d), the representative bonds are colored in blue, and arrows indicate the bond directions.
Both the square and cross graphs respect the C 4v point group symmetry, which is generated by a four-fold rotation C 4 and a mirror reflection σ, see Fig. 3(c,d).The mirror reflection σ is always assigned to preserve the representative bond (in blue).It only imposes symmetry constraints on the bond energy model ϵ J as required by Eq. (3).The four-fold rotation C 4 further takes the representative bond to other bonds (of other colors), under which the bond energy model will change equivariantly following Eq.(3).This way, the RBMs will respect the internal G and point group R = C 4v symmetries by design.
The objective is to train the student RBM to learn the coarse-grained local energy model from the visible spin configurations generated by the teacher RBM.Given the teacher model parameters J tch , we optimize student model parameters J std by minimizing the Kullback-Leibler (KL) divergence between the visible variable distributions of the two models, where the marginal distributions p tch (x) and p std (x) are defined by tracing out the hidden spins: Although directly evaluating the KL divergence in Eq. ( 7) is not tractable, the optimization can be performed by stochastic gradient descent following the standard contrastive divergence (CD) [15] training technique for the RBM.
In this way, the student machine automatically learns the coarse-grained model parameters J std given the finegrained model parameters J tch of the teacher machine.This contrasts with the conventional real space RG approach, where human physicists must design the coarsegraining rule that maps the visible spin configuration x to a hidden spin z 0 (x) in each block.Our approach differs in two aspects: (i) The coarse-grained variable z 0 is no longer a deterministic map of x but is defined in a probabilistic manner via a conditional distribution p std (z 0 |x) given by the student machine.(ii) The conditional distribution p std (z 0 |x) is not specified by humans but is learned from the data.This allows the student machine to develop the optimal RG transformation within its available representation space dimensions, extracting the most relevant features z 0 from the visible configurations x without supervision.
After training, we can replace the teacher model parameters J tch with the student model parameters J std and move on to train the next-generation student.Therefore, generation by generation, the teacher-student learning iteration will trace the RG flow of J in the parameter space.

C. Choosing Point Group Representations
To demonstrate how well the student RBM can approximate the teacher RBM, we consider a concrete example based on the two-dimensional Ising model on the square lattice.In this case, the internal symmetry is G = Z 2 , and the point group symmetry is R = C 4v .Then every spin variable (either visible or hidden) in both RBMs will be specified as a representation of Z 2 × C 4v .The Z 2 group only has one non-trivial representation, i.e., the odd (signed) representation that transforms as s → −s.So we will assume every component of the spin to transform as this odd representation under the Z 2 Ising symmetry.The C 4v point group has richer irreducible representations, summarized in Tab.I.The expressive power of the RBM depends on the choice of the C 4v point group representations for each spin, which will be elaborated in the following.A single Ising spin in the conventional Ising model corresponds to a Z 2 -odd variable carrying A 1 representation, which does not transform under the point group symmetry at all.However, under RG, the coarse-grained spin (such as the z 0 spin in the student RBM) can carry an enlarged point group representation.From Fig. 3(c,d), it is clear that the RG procedure is effectively merging the four hidden spins in the teacher model (square graph) to one hidden spin in the student model (cross graph).Therefore the hidden spin z 0 in the student model must contain more internal structures and carry orbital angular momentum (i.e., non-trivial representation of C 4v ) to resolve the complication of inhomogeneous hidden spin configurations in the teacher model.
For example, by saying that a Z 2 spin variable z 0 carries the A 1 ⊕ E representation, we imply that z 0 ∈ R 3 is a three-component vector, consisting of three Ising variables: one Ising variable forms the A 1 representation and remains invariant under lattice rotations, and two Ising variables form the two-dimensional E representation that can rotate as a vector.Then the C 4 and σ transformations are represented as according to the transformation matrices listed in Tab.I. Suppose we start from a fine-grained teacher model with both the visible and hidden spins as ordinary Ising spins in the A 1 representation, i.e., x i , z i ∈ {±1}.The parameter tensor J tch will contain only one component given the symmetry constraints.According to the square graph in Fig. 3(c), the teacher model is described by the energy function below The hidden spins z i can be immediately traced out (marginalized) in the partition function, leaving us an effective model for visible spins x i , described by where J eff = 1 2 log cosh(2J tch ) is an effective coupling that depends on J tch .Correspondingly, the visibile spin discributions is To build a coarse-grained student model to approximate p tch (x), the first step is to specify the choice of the point group representation for the coarse-grained variable z 0 in the student RBM.For example, if z 0 is chosen to carry the A 1 ⊕E representation with totally three components as z 0 = (z 1 0 , z 2 0 , z 3 0 ), where each component is an Z 2 Ising variable that z a 0 ∈ {±1}, the student model can be written as (see Fig. 3(d) for the connectivity of these variables) The fluctuation of the A 1 representation z 1 0 mediates an equal amount of positive correlation ⟨x i x j ⟩ between every pair of visible spins, which is indeed the dominant correlation pattern in a ferromagnetic Ising model.However, to better match the teacher model p tch (x), the ⟨x 0 x 2 ⟩ and ⟨x 1 x 3 ⟩ correlations should be further weakened relative to others because there is no direct coupling between these diagonal spins in the original teacher model Eq.(11).This is where the E representation plays a role, as the z 2 0 and z 3 0 fluctuations will mediate some negative correlation in ⟨x 0 x 2 ⟩ and ⟨x 1 x 3 ⟩ due to the minus signs in Eq. (12), which helps to bring the student model's visible distribution p std (x) closer to that of the teacher model p tch (x).This argument explains why including more point group representations to the hidden spin in the student model is generally helpful to improve its expressive power.0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00 0.02 0.04 0.06 0.08 0.10 Within this setup, we tested the performance of a series of student RBMs, built with different choices of the point group representation for the hidden spin z 0 .The performance is evaluated by the minimal KL divergence D KL (p tch (x)∥p std (x)) achieved after optimizing the student machine.In Fig. 4, we show the minimum KL divergence over a large range of the parameter J tch of the teacher model under different representation choices of z 0 .The general representations are constructed by combining irreducible representations A 1 , B 1 , and E. The other two irreducible representations A 2 and B 2 are not used in this specific calculation because they will not couple to the x i spins in the A 1 representation due to the constraint from the mirror reflection symmetry σ.But in more general cases, when the representation of x i is larger, all irreducible representations can appear on z 0 in principle.This calculation quantitatively demonstrates that the student RBM can learn to approximate the visible spin distribution from the teacher RBM.The approximation can be progressively improved by introducing larger representations on the hidden spin z 0 , even though an exact match might only be achievable in the large representation dimension limit n → ∞.
In practice, the RBM can only handle a finite representation space dimension n.So we have to truncate the representation space at a maximum dimension n max .This will introduce inaccuracy in each RG step, but the hope is that the error introduced by the truncation will become irrelevant under RG flow, such that the truncation only affect the RG fixed point behavior in a controllable manner.This is also the underlying assumption of many real space RG schemes, where each RG step does not need to preserve the partition function exactly.
As we carry out the MLRG procedure by recursively training the student RBM by the teacher RBM and replacing the teacher with the trained student, the RG flow is supposed to bring us (close) to some fixed point RBM model, at which we can further investigate the universal properties (such as critical exponents and operator scaling dimensions) of the RG fixed point.We will provide numerical evidence to show that the MLRG algorithm can locate RG fixed points and estimate their universal properties more accurately when the representation dimension n gets larger, such that it could offer a useful and controllable numerical method to automatically map out phase diagrams and study critical phenomena in statistical physics models without human supervision.

D. Learning the RG Monotone
While the above teacher-student learning approach for real space RG is appealing, it faces a serious challenge in actual training.This challenge comes from the stochastic nature of training the RBM, such that the student RBM parameter J std will always fluctuate around its optimum in every RG step.This inevitably injects random noise into the entire RG flow, causing the model to perform random walks in the parameter space.Since the RG flow near the critical point (i.e., unstable RG fixed point) is particularly sensitive to small perturbations, stochastic RG flow will almost always miss the critical point.Therefore, without addressing the problem of parameter space random walk, the above naive MLRG algorithm is useless for studying any critical phenomena.
To address this challenge, we introduce a third AI system, called the moderator, which operates outside the teacher-student system.The moderator's objective is to monitor the stochastic RG flow over time in many different scenarios and learn the underlying deterministic RG flow throughout the entire parameter space.The key idea here is to assume the existence of an RG monotone C(J) ∈ R, which is a real scalar function of the RBM model parameters J, such that the RG flow can be formulated as a gradient flow of the RG monotone dJ/dℓ = −∇ J C(J), where ℓ parametrize the RG step.This is the strongest form of the c-theorem [28][29][30][31].Instead of trying to construct such RG monotone by humans, we introduce a feed-forward neural network C θ (J), called the RG monotone network, to model the function C(J) and train it jointly with the RBMs in the RG flow.Here θ denotes the collection of all parameters in the neural network.
The training starts from a random choice of the teacher model parameters J tch in the parameter space.The moderator takes the initial condition J(ℓ = 0) = J tch and evolves J(ℓ) from ℓ = 0 to ℓ = 1 by solving the RG equation following the gradient signal provided by the RG monotone network C θ .The neural ordinary differential equation (neuralODE) technique [32,33] is employed here to enable gradient backpropagation through the ODE solver.The moderator then passes the solution to the student RBM as its model parameters J std = J(ℓ = 1).
The teacher RBM then starts sampling visible spin configurations and sends them to the student RBM.The student RBM receives the training data and evaluates the KL divergence in Eq. ( 7) as the total loss function.All models are trained jointly by minimizing the KL divergence using the CD training technique.The loss function gradient will eventually back-propagate to θ to update the RG monotone network.The training is performed at many random choices of initial parameters J tch , such that a large amount of training data can be aggregated to optimize the RG monotone throughout the parameter space.
Although the nature of the training is still stochastic, the random noise will be averaged out in fitting the RG monotone, such that after training, the optimal fit C θ * (J) can be used to generate a deterministic RG flow following Eq.( 13).In this way, the random walk behavior in the parameter space can be avoided, which enables the MLRG algorithm to locate RG fixed points accurately.

E. Sampling the Parameter Space
Next, we will discuss how to sample J tch more efficiently to speed up the training.Since the RBM model parameters J live in high-dimensional parameter space, uniform sampling might not be an efficient strategy.We propose an importance sampling strategy that focuses on RG fixed points.
The sampler makes use of the knowledge about the RG monotone to sample the parameters J from the following probability distribution with some inverse-temperature β as a hyperparameter.
We set β = 0 initially, such that the sampler will propose model parameters J uniformly in the parameter space.The RG monotone network gradually accumulates knowledge about the RG flow as the training progresses.RG fixed points emerge as local saddle points where the gradient ∇ J C θ (J) = 0 vanishes.We then gradually increase β, so the sampler will be biased to sample more around RG fixed points (including both stable and unstable).This design encourages the RG monotone network to be trained more intensively near RG fixed points, such that the fixed point location can be estimated more accurately, a desirable feature for the automatic discovery of critical points (phase transitions) in statistical mechanics systems.
Since J varies continuously, the Hamiltonian Monte Carlo (HMC) approach [18][19][20] becomes a natural choice of the sampling method.HMC is a Markov chain Monte Carlo algorithm that uses Hamiltonian dynamics to efficiently propose moves in the parameter space of the target distribution p(J).It is a powerful method for sampling complex distributions of continuous variables in high-dimensional space.We implement the method using multiple HMC samplers with replica exchange to mitigate the possibility of getting trapped at a single fixed point.
In summary, the RG monotone network C θ (J) plays a fundamental role in the MLRG algorithm, as illustrated in Fig. 1.It first guides the HMC sampler to propose new teacher model parameters J tch following the probability distribution in Eq. ( 14) and then predicts the student model parameters J std by solving the RG flow differential equation in Eq. ( 13).In this sense, it indeed serves as a moderator to moderate the teaching-learning process.In return, the RG monotone network gets trained when the student RBM learns to generate similar spin configurations as those generated by the teacher RBM.It is worth mentioning that the HMC sampled teacher model parameters J tch are detached from the computational graph, such that the RG monotone network parameters θ will only receive gradient signals from the student model parameters J std .In this way, the training of the RG monotone will not be affected by the HMC sampling.

A. Symmetry Assignment and Coupling Parameters
To apply the MLRG method, we focus on twodimensional lattice models with Ising symmetry.We do not need to specify any particular Ising model Hamiltonian, as the MLRG can automatically explore all possible models that are consistent with the internal and lattice symmetry, given the on-site symmetry representation.
In the following, we will always take the G = Z 2 internal symmetry and the R = C 4v point group symmetry.Regarding the symmetry representation of on-site spins, we consider that every spin (whether visible or hidden) is odd under Z 2 and carries A 1 or A 1 ⊕ E or 2A 1 ⊕ E representation under C 4v .The choice of these point group representations is based on our experience that they are the most efficient representation within their respective representation dimensions in minimizing the KL divergence between teacher and student RBMs, as shown in Fig. 4. The above symmetry assignment is summarized in Tab.II, which is all we need to set up the equivariant RBMs and prepare the MLRG model for training.
Settings of symmetries and spin representations for the two-dimensional Ising MLRG.
For Z 2 spins, the (representative) bond energy function in Eq. ( 4) reduces to the following form where a, b label the basis of the C 4v symmetry representation.The internal symmetry representation labels α, β in Eq. ( 4) are omitted because the Z 2 group only has one non-trivial irreducible representation.The coupling J is a matrix that can be parametrized as follows under different choices of the point group representations: • A 1 ⊕ E representation (3-dimensional, 5 coupling parameters) • 2A 1 ⊕ E representation (4-dimensional, 10 coupling parameters) In the above matrices, we use lines to separate different irreducible representations of the C 4v symmetry.Some matrix elements are zero because of the restriction imposed by the mirror reflection symmetry σ, as required by Eq. (3) in general.In all cases, the parameter J 0 always denotes the Ising coupling between the first A 1 representations, directly connected to the bare Ising coupling in the lattice model.

B. RG Monotone and RG Flow
Choosing the smallest point group representation A 1 , the coupling matrix J is parametrized by a single variable J 0 .In this case, the RG flow is simply defined in the one-dimensional parameter space.The RG monotone C(J 0 ), determined through the MLRG method, is depicted in Fig. 5(a).A local maximum of the RG monotone is observed at J 0 * = 0.82 ± 0.02.Since the RG flow is designed to follow the gradient descent trajectory of the RG monotone as in Eq. ( 13), it implies that the parameter J 0 will depart from the unstable fixed point J 0 * and flow towards one of the two stable fixed points, either J 0 = 0 or J 0 → ∞.The RG flow can also be understood from the plot of −∂ J0 C in Fig. 5 The maximal position J 0 * := argmax J0 C(J 0 ) of the RG monotone C(J 0 ) provides an estimation of the Ising critical point that separates the paramagnetic phase (J 0 = 0) and the ferromagnetic phase (J 0 → ∞).The exact Ising critical point of the Lieb lattice model is located at The estimation J 0 * = 0.82 ± 0.02 is close to but still deviates from the exact value J 0c .This is because the dimension of the A 1 representation is small, which limits the ability of the student RBM to approximate the teacher RBM in the MLRG algorithm.We should expect the estimation to be improved as the on-site spin involves larger representations of the point group.Going beyond the A 1 representation, the RG flow will be defined in high-dimensional parameter space.To visualize the RG monotone C(J), we separate the parameter J 0 from the remaining parameters J 1,2,••• in the coupling matrix J.We always initialize the RG flow by setting J 0 to the bare Ising coupling of the lattice model and allowing J 1,2,••• to be generated under the RG flow.On the sub-manifold spanned by J 0 and the most relevant flow direction of J 1,2,••• (which denotes a particular linear combintation of J 1 , J 2 , • • • parameters along which the gradient of the RG monotone is maximal), we can plot the RG monotone C(J) obtained by the MLRG method, as well as its gradient directions (RG flow directions).An example is shown in Fig. 6, which is obtained by training with the A 1 ⊕ E on-site representation.
The MLRG algorithm learns the RG monotone C(J) throughout the parameter space, based on which the RG C (J ): -0.1 0.0 0.1 0.2 0.3 0. flow diagram can be obtained.As we tune the bare coupling J 0 to the critical point J 0 * along the horizontal axis, the RG flow takes us to the true RG fixed point J * away from the horizontal axis.The parameter J * defines a statistical mechanics model approximating the Ising conformal field theory (CFT).The approximation is expected to be better with larger on-site point group representations.

C. Determining the Critical Point
To estimate the Ising critical point J 0 * , we start by initializing the coupling matrix J with a given J 0 ∈ R and J 1,2,••• = 0, denoted as Then we flow the coupling matrix J by solving the RG equation dJ/dℓ = −∇ J C(J) from ℓ = 0 to some large ℓ.
We denote the solution as J(J 0 , ℓ) as it depends on the initial condition J 0 and the RG scale ℓ.Under our RG scheme, the linear system size will be effectively enlarged by 2 ℓ/2 at the RG scale ℓ.Fig. 7 shows the RG monotone C(J(J 0 , ℓ)) as a function of different initial value of J 0 at different RG scales ℓ.
One can see that the RG monotone peaks at the critical point, and the peak becomes sharper with longer RG flow ℓ.With a sufficiently large ℓ, we can estimate the critical point J 0 * by finding the local maximum of the RG monotone The result J 0 * will be insensitive to the RG scale ℓ as long as it is large enough that the RG flow has converged.We trained several different MLRG models for each on-site point group representation choice and estimated the Ising critical point J 0 * using the abovementioned method.Our result is shown in Fig. 8.We can see a clear trend that the estimated critical point converges to its exact value J 0c with larger point group representations (hence stronger RBM models).

D. Boltzmann Weight Tensor
To further analyze the physical properties at different RG fixed points, we define the Boltzmann weight tensor T (J) from the RBM energy model given the coupling parameter J.The tensor elements are specified as where x = (x 0 , x 1 , x 2 , x 3 ) denotes the visible spins jointly and z denotes the hidden spins jointly.T (J) is a rank-4 tensor with each tensor element encoding the Boltzmann weight of a particular configuration of visible spins x, as illustrated in Fig. 9.We can use either the teacher RBM as Eq. ( 5) or the student RBM as Eq. ( 6) for the energy model E J (x, z).They should not have much difference as long as the model parameter J has converged to an RG fixed point where the teacher and the student should behave the same in the ideal limit.Nevertheless, in the following analysis, we will always adopt the teacher model as we found it a little more accurate than the student model.The Boltzmann weight tensor T (J) enables us to define various physical properties of the statistical mechanical model conveniently.For example, the ground state degeneracy Z(J) (regularized partition function) [34,35] can be defined as the ratio of the following tensor contractions (repeated indices are summed automatically) For Ising models, the ground state degeneracy characterizes the order of the broken symmetry group: Z = |Z 1 | = 1 at the paramagnetic (disordered) fixed point where the internal Z 2 symmetry is preserved, and Z = |Z 2 | = 2 at the ferromagnetic (ordered) fixed point where the internal Z 2 symmetry is broken.Following the approach described in Sec.III C, we start with the initialization of the coupling matrix J by a single parameter J 0 as in Eq. ( 20) and follow the RG flow to obtain J(J 0 , ℓ).At different RG scale ℓ, we plot the ground state degeneracy Z(J(J 0 , ℓ)) as a function of the initial parameter J 0 , the result is shown in Fig. 10.As J 0 goes across the critial value J 0 * , the ground state degeneracy transitions from Z = 1 (paramagnet) to Z = 2 (ferromagnet).The transition becomes sharper with increasing ℓ along the RG flow.The intersection of the ground state degeneracy curve at different RG scales can be used to infer the Ising critical point J 0 * .As we enlarge the point group representation, J 0 * progressively approaches the exact Ising critical point at J 0c as shown in Fig. 8.
The MLRG can automatically label different symmetry-breaking phases by their ground state degeneracy using this analysis.This can be viewed as an approach for unsupervised phase classification.The ground state degeneracy curves Z(J(J 0 , ℓ)) also provide a method to estimate the critical point J 0 * by finding their intersections.

E. Locating the Monotone Saddle Point
Our goal is to estimate the universal properties at the critical point.This requires us to accurately locate the RG fixed point parameter J * within the high-dimensional parameter space, as exemplified in Fig. 6.
However, if we simply follow the RG flow to approach J * , it is almost inevitable that we will miss it.This is due to the tendency of the RG flow to deviate from the unstable fixed points.Fortunately, the MLRG algorithm has been trained to learn the RG monotone function C(J).This feature enables us to find the RG fixed point by locating the saddle point of the RG monotone via Newton's method.By doing so, we can effectively circumvent the complication of fine-tuning the parameters in the parameter space.
Given the RG monotone C(J) and starting from an initial guess of J near an RG fixed point, Newton's method recursively improves the approximation by the following update where ∇ J C denotes the gradient vector and ∇ 2 J C denotes the Hessian matrix of C(J) at J. Fig. 11 demonstrates the flow of J along the vector field of −(∇ 2 J C) −1 ∇ J C. Differing from the RG flow in Fig. 6, Newton's method converges to the RG fixed point in its neighborhood from all directions, regardless of whether the RG fixed point is stable or unstable.This enables us to pinpoint the RG fixed point from an initial guess near it.C (J ): -0.1 0.0 0.1 0.2 0.3 0. To locate the Ising critical fixed point J * in the parameter space, the initial guess can be constructed by setting the J 0 component of the matrix J to its critical value J 0 * , which can be estimated by methods described in Sec.II D or Sec.III D. The convergence can be monitored by evaluating C(J).We stop the iteration when C(J) does not change significantly between successive steps.The iteration is expected to converge to a saddle point J * of the RG monotone where ∇ J C = 0.

F. Calculating Scaling Dimensions
Once we locate the RG fixed point J * by the iteration in Eq. (24), we can substitute it into Eq.( 22) to construct the fixed-point Boltzmann weight tensor T (J * ).The tensor can then be used to compute the transfer matrix M , with the following matrix elements The eigenvalues of the transfer matrix M are related to the scaling dimensions ∆ k (k = 0, 1, • • • ) of operators in the CFT [13,[34][35][36][37], with c being the central charge.The proportionality factor can be fixed by requiring the identity operator (i.e., the vacuum state) to have zero scaling dimension ∆ 0 = 0.This allows us to extract the lowest few scaling dimensions associated with the most relevant operators.Fig. 12 shows the estimation of the lowest scaling dimension ∆ 1 by the Ising MLRG with different point group representation choices.This should be the scaling dimension of the Z 2 -odd Ising order parameter, whose exact value is known to be ∆ 1 = 1/8.We can see that the MLRG can approach the exact value with larger point group representations.

IV. SUMMARY AND DISCUSSION
In this study, we proposed the MLRG algorithm and used it to analyze many-body lattice models in statistical physics.Our method incorporates several modern machine learning techniques, including using equivariant neural networks [21][22][23][24][25][26] for RBMs and neural ordinary differential equations [32,33] for modeling the RG flow.The source code and raw data are available in the MLRG GitHub repository [38].
The MLRG algorithm demonstrates the power of machine learning to automate and enhance the study of statistical physics systems.We used the representation learning capability provided by generative modeling to learn the optimal RG transformation from the data directly, without requiring direct human intervention or prior knowledge about the system (apart from symmetry and dimensionality).
The design of the MLRG algorithm exemplifies the paradigm of introspective learning [39], an effective approach in machine learning for scientific discovery.It refers to the ability of the algorithm to self-analyze and self-adapt during the learning process, drawing on multiple levels of learning and analysis to generate a more comprehensive and insightful understanding of the problem at hand.MLRG's design incorporates this introspective nature through its multi-level, multi-model architecture.It uses two "lower-level" machines, the teacher and student models, to mimic the renormalization group process of extracting relevant features by representation learning.It also incorporates a "higher-level" machine, the moderator model, which learns the RG flow and uses its knowledge to guide the sampling of new model parameters that are most worth training.This multi-level, multi-model design allows for a clear separation between task performance, carried out by the teacher and student models, and knowledge extraction, carried out by the moderator model.This separation endows the high-level model with error-correcting capabilities, enabling it to resist the influence of noise in RBM training on learning the RG flow.This ability to compress knowledge and correct errors is crucial for AI to make scientific discoveries [40].
The MLRG approach is related to (and distinguished from) existing works in the following ways: • Monte Carlo Renormalization Group (MCRG): MCRG [41][42][43][44][45][46][47][48][49][50][51][52][53][54] uses Monte Carlo sampling to facilitate non-perturbative RG techniques.This process begins with sampling configurations from a fine-grained model given the model parameters.It then employs local RG transformations to generate coarse-grained configurations and estimates the new model parameters accordingly.This approach aligns with our method of training the student RBM from the teacher RBM.In particular, recent developments [4,7,8] have introduced generative models to learn the optimal RG transformation, a goal that aligns with ours.
However, our method leverages specific lattice structures, such as the Lieb lattice, to restrict the RG operation within a small spin cluster.In the lack of this design, MCRG has to perform the sampling on a much larger lattice and model many longrange multi-spin couplings, which increases computational complexity.Furthermore, MCRG does not learn the RG monotone and can not automatically locate RG fixed points, such that it requires fine-tuning model parameters when studying critical point properties, adding to the computational burden.
• Deep Learning and RG: Several studies [55][56][57][58][59] have drawn parallels between deep neural networks and RG.Notably, they recognize that generation is the inverse process of renormalization [6,11].Therefore, generative models can be used to implement data-driven RG.These discussions focus on optimizing RG transformations using deep learning, which often relies on Monte Carlo simulated spin configurations for hierarchical feature extraction.Despite their effectiveness in extracting features of stable phases, they lack controlled accuracy in predicting universal properties of phase transitions, as they did not learn the RG equation or the RG monotone.
• RG Flow-Based Generative Modeling: Techniques such as Neural-RG [2,6] and RG-Flow [10,11] embed RG transformations in multi-level flow-based generative models [60][61][62], applying deep learning methods to learn optimal RG transformations from model Hamiltonians by minimizing free energy.These methods are based on the invertible RG framework, which designs the local RG transformation as a bijective (invertible) deterministic map from spin configurations to relevant and irrelevant features.
However, the requirement for bijectivity limits the possibilities for RG transformation.Moreover, flow-based models struggle to model discrete variable probability distributions, limiting their application in various statistical mechanics problems.They also lack an asymptotic exact limit, meaning they can typically only serve as configuration update proposers to accelerate Monte Carlo calculations rather than replacing Monte Carlo as an unbiased simulation method.
In contrast, MLRG applies to both discrete and continuous variables, offers more flexible RG transformations with stochastic coarse-graining maps, allows the extension of on-site degrees of freedom, and is exact when the on-site degrees of freedom tend to infinity.These characteristics make MLRG a valuable method for studying statistical physics.
• Information Theoretical Approach to RG: Some research has explored the information theoretical criterion for optimal RG, indicating that RG transformation should maximize the mutual information of relevant features with the environment [1,5,[63][64][65][66] or minimize the mutual information among irrelevant features [6].While MLRG does not conflict with these principles, it does not explicitly use them as optimization criteria.Instead, it uses the match of marginal distributions of teacher and student models on their common spins to define optimal RG transformations, a method that is more direct and easier to optimize.
As a numerical algorithm to solve statistical mechanical problems, the MLRG showcases several advantages: • Efficiency in Small Cluster Sampling: Compared to traditional Monte Carlo (MC) simulations, the MLRG algorithm operates on a smaller, lighterweight scale.It only requires sampling spin configurations within a small cluster of spins (within a unit cell).Moreover, the Gibbs sampling in different spin clusters can be effectively parallelized on modern computing devices.This compact operation allows the algorithm to perform more efficiently than larger-scale simulations.
• Exploration of the Full Parameter Space: The MLRG algorithm is designed to efficiently traverse the entire parameter space in a single pass of training.This differs significantly from MC simulations and Tensor Network Renormalization Group (TNRG) [34,35,[67][68][69][70], which must solve the Hamiltonian multiple times at different parameters.
The MLRG algorithm's ability to explore the full parameter space reduces computational costs and time spent scanning the phase diagram, which can be beneficial when the parameter space dimension is large.
• Discovery and Analysis of Critical Points: The MLRG makes use of the knowledge about the RG monotone to identify critical points (i.e., unstable RG fixed points).It uses machine-learned RG flow to calculate fixed-point properties, eliminating the need for finite-size scaling.This automated discovery and analysis process enhances the algorithm's capability to analyze critical properties in complex physical systems, especially for multi-critical points with multiple unstable directions.
• Controlled Convergence of the Algorithm: Similar to tensor-network methods, the MLRG algorithm's estimation of physical properties converges as the dimension of the latent space increases.This behavior ensures that the algorithm's predictions become increasingly accurate and reliable as more computational resources are investigated.
• Interpretability: Compared to other machinelearning methods for RG, the MLRG approach provides better interpretability by labeling the hidden spins in the model with symmetry representation, leading to physically meaningful coupling parameters and RG rules.This feature is particularly valuable in physical sciences, where the goal extends beyond accurate prediction to gaining physical insights.
The current training of the RBMs that model local energy in MLRG is a stochastic process.This brings about fluctuations and noise, which can result in instability in the RG monotone network.The stability of the RG monotone network directly impacts the reliability of critical property predictions.A potential approach to improve this stability is to replace the RBMs with tensor networks, like in Fig. 9.This could allow for a deterministic training approach based on tensor network optimization, thus reducing the noise and increasing the stability of the RG monotone network.With the stability and reliability improvements, the MLRG can be extended to handle more complicated spin models.These models involve larger internal symmetry groups, such as S n groups, which are significant to the study of categorical symmetry [71,72] and entanglement transitions [73][74][75][76][77][78].

FIG. 1 .
FIG.1.High-level architecture of the MLRG algorithm.Three AI models are involved.The teacher and student models are both generative models based on RBM with different predefined neural network connects, representing the finegrained and coarse-grained local energy models.The moderator model is a discriminative model to predict the RG monotone.The moderator first guides a Hamiltonian Monte Carlo (HMC) sampler to propose the teacher model parameters that is most worth training, and then predicts the student model parameters following the RG flow.After both RBMs get their model parameters, the teacher generates the data to train the student.The gradient back-propagates from the student to the moderator to train the RG monotone network.After training, moderator gains good knowledge about the RG flow throughout the parameter space, which can then be used for various downstream tasks.

FIG. 2 .
FIG. 2. (a) Tensor network representation of the bond energy function ϵJ (si, sj) on the representative bond.(b) The kernel K is invariant under the internal symmetry g ∈ G.

FIG. 3 .
FIG. 3. (a) Fine-grained and (b) coarse-grained Lieb lattices.Shaded squares mark out the local blocks.In each block, the teacher and student RBMs are respectively defined on the (c) square and (d) cross graphs.The arrows mark the bond directions.All bonds are related by C4v point group transformations, which include the four-fold rotation C4 and the mirror reflection σ.
(a). (ii) Within each block, replace the local energy model on a square graph Fig. 3(c) with a new model on a cross graph Fig. 3(d), such that their marginal distributions on the corner spins match as closely as possible.(iii) Embed the new local energy model back into the original lattice.The new lattice Fig. 3(b) becomes a coarse-grained Lieb lattice, with lattice constant enlarged by √ 2.
representations of the C4v group.The columns shows their dimensions, Cartesian products (x-axis along the representative bond), and transformation matrices Tr for r = C4, σ (the two generators of C4v).
FIG.4.Minimal KL divergence DKL(p tch (x)∥p std (x)) (in the unit of bit) vs. the parameter J tch of the teacher RBM, for different choices of point group representations of z0 in the student RBM.

FIG. 5 .
FIG. 5. (a) RG monotone C(J0) and (b) its negative gradient −∂J 0 C(J0) (beta function), as functions of the unique coupling parameter J0 in the A1 representation.The shaded area indicates the two-standard deviation uncertainty range, estimated based on ten independently trained MLRG models.The vertical solid line marks the estimated critical point J0 * , while the exact critical point J0c is displayed as the vertical dashed line.Arrows indicate the RG flow directions.

FIG. 6 .
FIG. 6. RG flow diagram for models with A1 ⊕ E representation.J0 parametrize the Ising coupling between A1 spins.J1,2,••• parametrize the remaining coupling in Eq. (17) in their most relevant direction.The background color (and contours) indicate the RG monotone C(J).The arrows trace out the RG flow directions.The stable (unstable) fixed points are marked by black (red) dots.

FIG. 7 .
FIG. 7. Flow of the RG monotone C(J(J0, ℓ)) starting from given J0 parameters, under the representation choice of (a) A1, (b) A1 ⊕ E, and (c) 2A1 ⊕ E. ℓ labels the steps of RG flow.The vertical solid line marks the estimated critical point J0 * , while the exact critical point J0c is displayed as the vertical dashed line.

FIG. 9 .
FIG. 9. Tensor representation of the RBM energy model, such that each tensor element encodes the Boltzmann weight of a configuration of visible spins.

FIG. 10 .
FIG. 10.Flow of the ground state degeneracy Z starting from given J0 parameters, under the representation choice of (a) A1, (b) A1 ⊕ E, and (c) 2A1 ⊕ E. ℓ labels the steps of RG flow.The vertical solid line marks the estimated critical point J0 * , while the exact critical point J0c is displayed as the vertical dashed line.

FIG. 11 .
FIG. 11.Flow diagram of Newton's method iteration Eq. (24) to local RG fixed points for models with A1⊕E representation.Both stable and unstable RG fixed points are attractive under Newton's flow.The dashed curves mark the borders between different attractive basins.