Paper The following article is Open access

Categorical representation learning and RG flow operators for algorithmic classifiers

, , and

Published 2 February 2023 © 2023 The Author(s). Published by IOP Publishing Ltd
, , Citation Artan Sheshmani et al 2023 Mach. Learn.: Sci. Technol. 4 015012 DOI 10.1088/2632-2153/acb488

2632-2153/4/1/015012

Abstract

Following the earlier formalism of the categorical representation learning, we discuss the construction of the 'RG-flow-based categorifier'. Borrowing ideas from the theory of renormalization group (RG) flows in quantum field theory, holographic duality, and hyperbolic geometry and combining them with neural ordinary differential equation techniques, we construct a new algorithmic natural language processing architecture, called the RG-flow categorifier or for short the RG categorifier, which is capable of data classification and generation in all layers. We apply our algorithmic platform to biomedical data sets and show its performance in the field of sequence-to-function mapping. In particular, we apply the RG categorifier to particular genomic sequences of flu viruses and show how our technology is capable of extracting the information from given genomic sequences, finding their hidden symmetries and dominant features, classifying them, and using the trained data to make a stochastic prediction of new plausible generated sequences associated with a new set of viruses which could avoid the human immune system.

Export citation and abstract BibTeX RIS

1. Introduction

The renormalization group (RG) [2] is a powerful and useful set of methods developed in statistical physics and quantum field theory to deal with many-body problems. It helps physicists to establish the connection between the microscopic laws of physics and the macroscopic collective behaviors of the system. It starts with a many-body system at the microscopic scale and then performs the coarse-graining iteratively to group the fundamental building blocks together into larger and larger clusters. Meanwhile, it constructs effective descriptions of the clusters at each different scale and extracts the effective interaction among them. In the end, the many-body system can be reduced to a few-body system at the highest scales, which enables the understanding of complex systems and their collective behaviors at a large scale.

This idea can be particularly useful for representation learning and classification tasks in machine learning. There are many examples of many-body systems in machine learning tasks. For instance, an image can be viewed as a system of many pixels, and a sequence can be viewed as a system of many tokens. It is desired to see whether the idea of RGs can also be applied to extract the overall representations of images and sequences from their microscopic representations.

In terms of mathematics, the existence of profound connections between quantum field theory and geometry/topology has been a source of many exciting research activities. One of them, as an example, is the interesting connection between the theory of RG flows for quantum field theories in physics, and the geometric theory of Ricci flows in mathematics. The theory of Ricci flows was developed by Richard Hamilton in the 80's [37]. Given a smooth manifold, M, a Riemannian metric, g, on M defines a bilinear positive-definite product on tangent space, $T_{p}M$, for each point $p\in M$. This bilinear form is a 2-tensor locally in an open neighborhood $U\subset M$ of p, which will have a matrix representation. One can then investigate whether infinitesimal deformations of the metric on M would provide interesting information about its geometry or topology. For instance, given a 1-parameter family, $g_{t}, t\in (a,b)$ of metrics on M, one can study the variation of g with respect to the parameter t. The derivative $\displaystyle{\frac{\partial g_{t}}{\partial t}}$ will then provide for every fixed choice of t and fixed point p a bilinear inner product form (i.e. a 2-tensor) on $T_{p}M$. It turns out that variation of metric in a 1-parameter family provides one with a differential equation

where the term on the right-hand side is the Ricci curvature tensor, $\text{Ric}^{g_{t}}$, named after Gregorio Ricci-Curbastro, measuring that, how for each fixed choice of gt , the geometry of space is curved as one moves along the geodesics on the manifold M.

The connection between RG flows for nonlinear sigma models in physics and the Ricci flow for Riemannian manifolds in mathematics has been quite known for a while since the earlier work of Daniel Freidan [8], Zamalodchikov [9], Tseytlin [10], as well as groundbreaking work of Gregory Perelman [11] in proof of Poincare conjecture, and more recently Carfora [12]. At the end of this article, in the appendix section, we briefly provide an expository account of RG flows in the context of Ricci geometry following the work of Carfora [12]. It must be noted that our focus in the current article is to implement RG flows for developing algorithmic architectures in mathematical artificial intelligence; therefore, later, we quickly diverge from its connection to Ricci geometry and focus solely on RG networks. We encourage interested readers to study the resources provided above to gain a deeper understanding of the connections between the two frameworks in physics and mathematics.

When it comes to implementing RG flow theory in machine learning, the key challenge lies in the difficulty of constructing the coarse-graining transformation at each RG step. In physics, the RG rules are usually specified by humans, such as the majority vote in real-space RG or the momentum-shell integration in field-theoretic RG. These intuitions may not be immediately applicable to the realistic dataset of images and sequences, as the underlying coarse-graining rules may be much more complicated compared to physics systems. This calls for machine learning methods to enable the algorithm to design and optimize the RG transformation in adaptation to the given dataset. One important idea is borrowed from the holographic duality in physics, which states that the RG transformation can be viewed as a holographic mapping of a field configuration from a flat (boundary) space to a hyperbolic (bulk) space with one-higher dimension, such that the long-range correlation in the original field configuration can be equivalently represented as short-range correlation in the bulk space. So the optimal RG transformation can be defined as a bijective holographic mapping that disentangles the features at different hierarchies as much as possible. This allows us to embed the bijective holographic map in the flow-based generative model and use the unsupervised machine learning technique to train the optimal RG transformation. This idea is first proposed in [13] and further developed in later works [14, 15]. The current article further develops the machine-learning RG method by combining the RG-flow model with neural ordinary differential equation (ODE) techniques [16], and explores its application to representation learning of sequential data.

2. From conventional RG to machine-learning RG

The similarity between the RG and deep learning has been noticed in the literature [1727]. Recently, it is realized that the conventional RG flow can be viewed as an optimal transport of the correlated data distribution towards uncorrelated noise distribution [28]. See appendix for the mathematical formulation for the conventional RG. However, there are several aspects that should be upgraded before the idea of RG can find useful applications in machine learning. The main differences between the conventional RG and the machine-learning RG are summarized in table 1, and discussed as follows.

Table 1. Differences between conventional RG and machine-learning RG proposed in [13, 14].

 Conventional RGMachine-learning RG
Base spaceSmooth manifoldDiscrete lattice
RG flowContinuousDiscrete
RG equationDifferential equationRecurrence equation
RG fixed pointConformalGeneral
RG schemeHuman-specified (fixed)Machine-designed (learnable)
Data drivenNoYes
Algebraic structureSemigroupGroup
InvertibilityNon-invertibleInvertible
Holographic bulkNot availableAvailable
Hyperbolic geometryNot definedEmergent

2.1. Continuous v.s. discrete

The conventional RG in the quantum field theory typically assumes that the field is defined on a smooth base manifold. However, this assumption is typically not the case for machine learning applications. For example, images are defined on discrete pixels, and texts are defined on discrete words. The discrete nature of most datasets in machine learning requires us to generalize the base manifold from continuous space to discrete lattice.

The discretization of the base manifold also forces the RG flow to be discrete because it is no longer possible to perform infinitesimal dilation on a discrete lattice. Therefore, instead of writing down a differential equation to describe the continuous RG flow, the discrete RG flow should be described by a recurrent equation. However, in the continuum limit (when the lattice spacing approaches zero), the recurrent equation should converge to the differential equation, which will be shown in section 3.11.

2.2. Semigroup v.s. group

The conventional RG keeps decimating information in each step of coarse-graining. As a result, the conventional RG is not invertible and only forms a semigroup instead of a group, despite of its inaccurate name of renormalization 'group'. Recent development in physics [29] reveals that the RG flow can actually be viewed as a holographic mapping, which is invertible. This not only makes a profound connection from RG to quantum gravity but also promotes RG to a group.

The conventional RG studies how perturbations of the action (or deformations of the field configuration) get renormalized at larger and larger scales. The invertible RG has a completely different mindset: it aims to answer how the correlated field configurations on the holographic boundary can be disentangled to uncorrelated noises in the holographic bulk or how the strongly-coupled quantum field theory on the holographic boundary can be reformulated as the weakly-coupled dual gravitational theory in the holographic bulk. By establishing the holographic mapping, any deformation of the field configuration on the holographic boundary can be translated into an excitation in the holographic bulk and analyzed more conveniently. Therefore invertible RG is a more powerful paradigm of RG. Nevertheless, it can always fall back to the conventional RG by a forgetful map that forgets about the holographic bulk degrees of freedom.

2.3. Human v.s. machine

The conventional RG scheme is designed by humans. Due to the limitation of human intelligence, the conventional RG typically assumes that the action must take a fixed form with specific types of terms, and the RG flow only changes the coefficients of these terms, such that the action can only flow within a predefined moduli space. Although the moduli space allows us to parameterize the action conveniently, it also restricts our imagination. A more general RG flow can go beyond the moduli space, as new terms can be generated under RG, and even the field content can change under RG (microscopic and macroscopic descriptions of a system can be fundamentally different, as advocated by the emergence principle). However, such a general RG scheme is not analytically tractable by humans. It is not even clear how to design the RG scheme if the form of the action and the field content are all unknown. Thus it becomes desirable to introduce artificial intelligence to learn the optimal RG scheme automatically from the big data of field configurations generated by a field theory. By learning to generate similar field configurations from independent random noise in the holographic bulk, the machine will create the optimal holographic mapping, which also specifies the optimal (invertible) RG scheme.

2.4. Conformal v.s. general fixed point

Conventional RG typically assume a conformal fixed point to start with. Given the conformal symmetry at the fixed point, the RG transformation is always taken to be the dilation operator in the conformal group, which corresponds to the rescaling of spacetime and fields together. Given the RG transformation, one can study how a perturbation (or deformation) of the field evolves under dilation. If the perturbation grows stronger/weaker at larger scales, then the perturbation is said to be relevant/irrelevant (with respect to the conformal fixed point). More quantitatively, the conformal dimensions can be defined as the eigenvalues of the dilation generator, such that relevant/irrelevant fields are simply distinguished by their positive/negative conformal dimensions. Intuitively, relevant fields are low-energy/slow-varying modes to be kept under coarse-graining, and irrelevant fields are high-energy/fast-varying modes to be decimated (or integrated out).

However, the more general machine-learning RG does not assume a conformal fixed point because the real-world data (like images or texts) may not be scale-invariant and hence not respect the conformal symmetry. Therefore, the dilation operator is not well-defined, and one can not prescribe an explicit RG scheme from the beginning. The RG scheme has to be learned from data using a data-driven approach. In fact, the real-world data is more likely to be closer to Gaussian fixed points. So even if one learns the RG scheme, it is not immediately clear whether the RG transformation can be used to infer the conformal dimension, as the data could be far from any conformal fixed point.

2.5. Relevant v.s. irrelevant

Therefore, the traditional idea of calculating scaling dimensions as eigenvalues of the dilation generator no longer works in more general RG approaches. We need a different way to define what is relevant and what is irrelevant. [14] proposes an elegant and universal definition of irrelevant degrees of freedom using holographic duality and information theory. The key idea is that irrelevant fields are those degrees of freedom that should be decimated under coarse-graining, so they should appear to us as random noise (i.e. independent/uncorrelated random variables). Since the irrelevant fields are actually the holographic bulk field under the holographic duality, the above idea can also be rephrased to a statement that holographic bulk fields are almost uncorrelated. The goal of machine-learning RG is to learn the RG transformation that automatically identifies and separates such irrelevant degrees of freedom in a field theory. We will explain this approach in more detail in section 3.5, after introducing the concrete construction of the machine-learning RG algorithm.

For now, we would like to comment that the information-theoretic definition of the irrelevant field is consistent with the conformal dimension definition in the conformal limit. Because a negative conformal dimension in the conformal field theory (CFT) indicates that the field correlation will decay exponentially in the dual anti-de Sitter holographic bulk, which is equivalent to the statement that the holographic bulk field is short-range correlated, which look like independent random noises beyond a finite correlation length, and are therefore irrelevant in the information-theoretic sense.

3. Machine-learning RG via flow-based generative models

3.1. Sequential data and quantum field on one-dimensional lattice

The idea of RG can be used to construct novel generative models for unsupervised learning. The discussion will mainly focus on sequential data, although generalizations to images and graphs are possible. A sequence is an ordered set of objects $a = (a_1,a_2,\ldots)$, where each object $a_i\in A$ is taken from an object set A (also known as the vocabulary). In machine learning, each object ai is usually embedded as a vector φi in a finite-dimensional vector space $\mathbb{R}^n$ (assuming the dimension to be n). Denote the embedding map as $E:A\to\mathbb{R}^n$, the sequence can be represented as an ordered set of vectors $\phi = (\phi_1,\phi_2,\ldots)$, where $\phi_i = E(a_i)\in\mathbb{R}^n$.

One can also view φi as a quantum field on a one-dimensional discrete lattice, as described by the mapping $\phi:I\to\mathbb{R}^n$, where $I\subset\mathbb{N}$ denotes the index set (equipped with an ordering). Each index $i\in I$ labels an object (or its vector embedding) in the sequence and the set I describes the one-dimensional lattice. The size (cardinality) $\big|I\big|$ of the index set corresponds to the length of the sequence. Let $\mathsf{Map}(I, \mathbb{R}^n): = $ $\{\phi: I\to \mathbb{R}^n\}$ be the associated space of all maps from the index set I to the vector space $\mathbb{R}^n$. The objective of unsupervised machine learning is to model the probability measure $p(\phi)\mathcal{D}\phi$ given the dataset of sequences.

3.2. Conventional renormalization forms a semigroup

The conventional notion of RG transformation $\mathcal{R}:\mathsf{Map}(I,\mathbb{R}^n)\to\mathsf{Map}(I^{^{\prime}},\mathbb{R}^n)$ corresponds to a coarse-graining map that extracts the relevant (coarse-grained) field $\phi^{^{\prime}} = \mathcal{R}(\phi)$ from the original (fine-grained) field φ and discards the remaining (irrelevant) field degrees of freedom. The renormalization transformation always reduces the degrees of freedom. Therefore the index set will become smaller $\big|I^{^{\prime}}\big|\leqslant \big|I\big|$ under the renormalization transformation. Because of the information loss, it is no longer possible to recover the original field configuration φ from the coarse-grained configuration $\phi^{^{\prime}}$. Therefore the renormalization transformation $\mathcal{R}$ is not invertible and only forms a semigroup.

3.3. Invertible renormalization forms a group

The key idea to make the renormalization transformation invertible is to keep the irrelevant field $\zeta^{^{\prime}}$ together with the relevant field $\phi^{^{\prime}}$ as the joint output of the renormalization transformation. Intuitively, the relevant/irrelevant fields are the low-/high-energy modes in the field configuration. What the renormalization transformation does is to separate the irrelevant field $\zeta^{^{\prime}}$ and the relevant $\phi^{^{\prime}}$ field given the original field φ as input. The criterion to separate irrelevant fields will be elaborated in section 3.5.

Invertible renormalization was first proposed under the name of exact holographic mapping [29], which further leads to applications in flow-based generative models for unsupervised machine learning [1315]. An invertible renormalization transformation is a bijective map $\hat{\mathcal{R}}:\mathsf{Map}(I,\mathbb{R}^n)\to\mathsf{Map}(I^{^{\prime}}, \mathbb{R}^n)\otimes\mathsf{Map}(J^{^{\prime}}, \mathbb{R}^n)$, under which the original field φ splits to the relevant field $\phi^{^{\prime}}$ and the irrelevant field $\zeta^{^{\prime}}$

Equation (1)

where $\phi^{^{\prime}} = (\ldots,\phi^{^{\prime}}_{i^{^{\prime}}},\ldots)_{i^{^{\prime}}\in I^{^{\prime}}}\in \mathsf{Map}(I^{^{\prime}}, \mathbb{R}^n)$ and $\zeta^{^{\prime}} = (\ldots,\zeta^{^{\prime}}_{j^{^{\prime}}},\ldots)_{j^{^{\prime}}\in J^{^{\prime}}}\in \mathsf{Map}(J^{^{\prime}}, \mathbb{R}^n)$. The bijectivity requires $\big|I^{^{\prime}}\big|+\big|J^{^{\prime}}\big| = \big|I\big|$, i.e. the numbers of relevant and irrelevant features must add up to the total number of features in the original field. The inverse renormalization transformation $\hat{\mathcal{R}}^{-1}$ will also be called the generation transformation $\hat{\mathcal{G}}$, denoted as

Equation (2)

As the transformation is invertible, the RG is promoted from a semigroup to a group.

3.4. RG flow

The invertible renormalization transformation enables us to define invertible RG flow on both the field configuration level and the probability measure (or the action) level.

3.4.1. RG flow on the field level

Repeating the invertible renormalization transformation, an RG flow can be defined (on the field configuration level) via the following iteration

Equation (3)

where $\phi^{(k)}\in\mathsf{Map}(I^{(k)},\mathbb{R}^n)$, $\zeta^{(k)}\in\mathsf{Map}(J^{(k)},\mathbb{R}^n)$ are the relevant and irrelevant fields, and $\hat{\mathcal{R}}^{(k)}:\mathsf{Map}\\(I^{(k-1)},\mathbb{R}^n)\to\mathsf{Map}(I^{(k)}, \mathbb{R}^n)\otimes\mathsf{Map}(J^{(k)}, \mathbb{R}^n)$ is the (bijective) renormalization transformation at the kth step. The condition $\big|I^{(k)}\big|+\big|J^{(k)}\big| = \big|I^{(k-1)}\big|$ is always satisfied as a necessary condition for the bijectivity. The iteration defines a flow of quantum fields, called the renormalization flow ($\mathcal{R}$-flow):

Standard image High-resolution image

Along the $\mathcal{R}$-flow, the field configuration will be coarse-grained progressively $\phi^{(0)}\to\phi^{(1)}\to\phi^{(2)}\to\ldots$, and the relevant degrees of freedom will be reduced (as $\big|I^{(0)}\big|\geqslant\big|I^{(1)}\big|\geqslant\big|I^{(2)}\big|\geqslant\ldots$). Through this process, a sequence of irrelevant fields $\zeta^{(1)},\zeta^{(2)},\ldots$ is also produced, which was discarded in the conventional renormalization approach but kept in the invertible renormalization approach. Suppose all the relevant degrees of freedom are eliminated after K steps of the renormalization transformation (i.e. $\big|I^{(K)}\big| = 0$), the entire $\mathcal{R}$-flow corresponds to a map that encodes the original field $\phi\equiv \phi^{(0)}$ to the collection of irrelevant fields $\zeta\equiv \{\zeta^{(k)}\}_{k = 1:K}$, denoted as $\zeta = \hat{\mathcal{R}}(\phi)$.

Retaining these irrelevant fields allows the RG flow to be inverted. The inverse flow is also called the generation flow ($\mathcal{G}$-flow) that reconstructs the original field, as defined by the following inverse iteration

Equation (5)

or given by the dual diagram of equation (4)

Standard image High-resolution image

The entire $\mathcal{G}$-flow corresponds to a map that decodes the irrelevant fields ζ to the original field φ, denoted as $\phi = \hat{\mathcal{G}}(\zeta)$.

3.4.2. RG flow on the probability measure (action) level

The RG flow of field $\phi\to\zeta$ induces a flow of the associated probability distribution over $\mathsf{Map}(I,\mathbb{R}^n)$. Under the bijective map between the original field φ and the irrelevant field ζ, the probability measure must remain invariant

Equation (7)

Given $\zeta = \hat{\mathcal{R}}(\phi)$ and $\phi = \hat{\mathcal{G}}(\zeta)$, equation (7) implies that the probability distributions are related by

Equation (8)

where $\big|\det\partial_\phi\hat{\mathcal{R}}(\phi)\big|$ denotes the absolute value of the Jacobian determinant of the transformation $\hat{\mathcal{R}}$, and similarly for $\big|\det\partial_\zeta\hat{\mathcal{G}}(\zeta)\big|$. More specifically, in each step of the transformation, the probability measure is deformed by (along the $\mathcal{G}$-flow)

Equation (9)

In quantum field theory, the field action is defined as the negative log-likelihood of the field configuration, i.e. $S_{\Phi}^{(k)} = -\log p_{\Phi}^{(k)}$ and $S_{\mathrm{Z}}^{(k)} = -\log p_{\mathrm{Z}}^{(k)}$. In terms of the field action, the transformation relates

Equation (10)

where the coupling action $S^{(k)}_{\Phi\mathrm{Z}}(\phi^{(k)},\zeta^{(k)})$ is defined to be the log Jacobian determinant of the $\hat{\mathcal{G}}^{(k)}$ transformation,

Equation (11)

Therefore the renormalization transformation $\hat{\mathcal{R}}$ of the relevant field $\phi^{(k)}$ induces the deformation $\bar{\mathcal{G}}$ of the relevant field action $S_{\Phi}^{(k)}(\phi^{(k)})$ along the generative direction

Standard image High-resolution image

In this way, the renormalization flow of the action $\bar{\mathcal{R}}: = \bar{\mathcal{G}}^{-1}$ is defined as the pullback of the renormalization flow $\hat{\mathcal{R}}$ of the field. The RG transformation is invertible on both the field and the action level, making the RG literally a group.

3.5. Criterion to separate irrelevant fields

What has not been explained so far is the criterion to separate relevant fields from irrelevant fields. [14] argues that the irrelevant field should look like independent random variables (or random maps), because the irrelevant fields are supposed to be discarded under the conventional RG flow, meaning that (in the ideal limit) they do not contain information and should appear like random noise. Guided by this intuition, [14] further proposes the minimal bulk mutual information (minBMI) principle as the designing principle of renormalization flow, that the optimal renormalization transformations $\{\hat{\mathcal{R}}^{(k)}\}_{k = 1:K}$ should be defined as the maps that minimize the mutual information among all irrelevant fields

Equation (13)

The minimum is achieved when the irrelevant fields are statistically independent, i.e.

Equation (14)

such that all mutual information vanishes.

The optimal solution of $\hat{\mathcal{R}}$ that converges to this limit can be found using machine learning approaches by constructing a trainable bijective map $\hat{\mathcal{G}}: = \hat{\mathcal{R}}^{-1}$ (as the composition of smaller bijective maps $\hat{\mathcal{G}}^{(k)}$ at each RG step) to reproduce the data distribution $p_{\Phi}(\phi)$ starting from the independent prior distribution $p_{\mathrm{Z}}(\zeta)$ in equation (14). The related methods were developed in [1315] under the name of neural-RG. A conventional choice is to take each $p_{\mathrm{Z}}(\zeta^{(k)}_{j}) = (2\pi)^{-n/2}\exp(-\frac{1}{2}\Vert\zeta^{(k)}_{j}\Vert^2)$ to be the standard normal distribution (Gaussian with zero mean and unit variance), such that

Equation (15)

This action describes that the irrelevant field fluctuation is massive in the holographic bulk, which is compatible with the idea of holographic duality.

3.6. Hierarchical structure and hyperbolic space

As the renormalization transformation reduces the relevant degrees of freedom, the size of the relevant index set gradually reduces $\big|I^{(k)}\big|\leqslant \big|I^{(k-1)}\big|$. To be more concrete, we can restrict our discussion to the case where the degrees of freedom is reduced by half under each renormalization transformation, i.e. $\big|I^{(k)}\big| = \big|I^{(k-1)}\big|/2$, such that

Equation (16)

Then the condition $\big|I^{(k)}\big|+\big|J^{(k)}\big| = \big|I^{(k-1)}\big|$ implies $\big|J^{(k)}\big| = 2^{-k}\big|I^{(0)}\big|$. The RG flow will stop when $\big|I^{(K)}\big|\lt1$, which sets the total number K of RG steps to be

Equation (17)

As illustrated in figure 1, the hierarchical structure of the RG flow generates a ordered collection of index sets $\{J^{(k)}\}_{k = 1:K}$, which can be union into a hyperbolic lattice (a discrete hyperbolic space), described by

Equation (18)

Figure 1.

Figure 1. Hierarchical structure of the RG flow. The renormalization/generation flows can be viewed as the encoding/decoding maps between the original field in the flat space (holographic boundary) and the irrelevant field in the hyperbolic space (holographic bulk).

Standard image High-resolution image

Instead of thinking the irrelevant fields as separate mappings $\zeta^{(k)}\in\mathsf{Map}(J^{(k)},\mathbb{R}^n)$, we can treat them jointly as a field $\zeta\in\mathsf{Map}(J,\mathbb{R}^n)$ defined on the hyperbolic lattice J. Therefore, the $\mathcal{R}$-flow $\zeta = \hat{\mathcal{R}}(\phi)$ and the $\mathcal{G}$-flow $\phi = \hat{\mathcal{G}}(\zeta)$ respectively define the encoding and decoding maps that connect the field φ in one-dimensional flat space to the field ζ in two-dimensional hyperbolic space, which explicitly realize the holographic duality in quantum gravity.

3.7. Realization of bijective transformation

To optimize the renormalization transformation $\hat{\mathcal{R}}$, one relies on the construction of a trainable bijective map to model $\hat{\mathcal{R}}$. Machine learning community has provided several realizations of trainable bijective maps, including real-valued non-volume preserving (real NVP) [30] and neural ordinarty differential equation (ODE) [16]. In the following, we will focus on the neural ODE realization, as it can capture multi-modular features better than real NVP, which is more suitable for processing sequences of discrete objects.

3.7.1. Neural ODE

Each single-step renormalization transforms $(\phi^{^{\prime}},\zeta^{^{\prime}}) = \hat{\mathcal{R}}(\phi)$ can be realized by an ODE. Starting from $\phi(0) = \phi$, first evolve $\phi(t)$ from t = 0 to t = 1 following

Equation (19)

where fθ is a trainable function (realized as a neural network) parameterized by neural network parameters θ. Then split the result as $\phi(1) = (\phi^{^{\prime}},\zeta^{^{\prime}})$ to obtain $\phi^{^{\prime}}$ and $\zeta^{^{\prime}}$. t is considered as an auxiliary time. The inverse transformation is given by the time-reversal evolution, therefore, the mapping is indeed bijective as desired.

Apart from the transformation, the log Jacobian determinant of $\hat{\mathcal{R}}$ can also be evaluated. Based on the ODE in equation (19), one have

Equation (20)

which can be integrated to

Equation (21)

Given that $\hat{\mathcal{G}}: = \hat{\mathcal{R}}^{-1}$, its log Jacobian determinant is simply given by a negation,

Equation (22)

which will be useful for the evaluation of the coupling action in equation (11).

3.7.2. Locality and translational symmetry

It is possible to design the ODE function fθ to respect locality and translational symmetry. The idea is to realize fθ using layers of convolutional neural networks with finite kernel followed by element-wise activations.

3.8. Objective function

The objective is to train the generative model, such that the model distribution $p_{\Phi}(\phi)$ matches the data distribution $p_\text{dat}(\phi)$ as much as possible. The objective can be achieved by minimizing the Kullback–Leibler (KL) divergence

Equation (23)

where $S_{\Phi}(\phi) = -\log p_{\Phi}(\phi)$ is the model action (as the negative log-likelihood), and $H(p_\text{dat}) = -\int\mathcal{D}\phi\; p_\text{dat}\\(\phi)\log p_\text{dat}(\phi)$ is the data entropy. As the data entropy $H(p_\text{dat})$ is independent of the model parameter, it can be dropped from the loss function $\mathcal{L}$. Therefore the loss function is essentially the ensemble average of the model action on the dataset. By minimizing the average action, the ODE function fθ in each RG transformation will get trained. Upon convergence, the algorithm will find the optimal invertible RG flow that maps the (presumably) strongly coupled original field φ on the holographic boundary to the weakly coupled irrelevant field ζ in the holographic bulk.

3.9. Summary of the algorithm

Given a set of sequences from the data, the learning algorithm goes as follows.

  • (a)  
    For each given sequence $a = (a_1,a_2,\ldots)$, represent each object ai in the sequence as a vector $\phi_i = E(a_i)\in\mathbb{R}^n$. Denote the sequence of vectors as a vector field $\phi = (\phi_1,\phi_2,\ldots)\in\mathsf{Map}(I,\mathbb{R}^n)$.
  • (b)  
    Starting with $\phi^{(0)} = \phi$, apply the renormalization transformation iteratively,
    for $K = \log_2\big|I\big|$ steps (until all relevant fields are eliminated).
    • 1.  
      Each step of the transformation is implemented by solving an ODE
      starting from the initial condition $\phi^{(k-1)}(0) = \phi^{(k-1)}$, integrating from t = 0 to t = 1, and then splitting the final result into $\phi^{(k-1)}(1) = (\phi^{(k)},\zeta^{(k)})$.
    • 2.  
      While solving the ODE, simultaneously integrate along the time evolution to obtain the coupling action
  • (c)  
    Starting from the initial condition $S_{\Phi}^{(K)} = 0$, collect the action in the reverse order (along the generation flow)
    where $S_{\mathrm{Z}}^{(k)}(\zeta^{(k)})$ is given by
    The resulting total action will be denoted as $S_{\Phi}(\phi): = S_{\Phi}^{(0)}(\phi^{(0)})$.
  • (d)  
    Train the model to minimize the loss function

3.10. Potential applications and advantages after training, the model could potentially be used for the following tasks

  • Inference of hierarchical latent representation. Using $\zeta = \hat{\mathcal{R}}(\phi)$, one can infer the hierarchical latent representation ζ of any sequence encoding φ. The high-level representations ($\zeta^{(k)}$ with a large k) can be viewed as the encoding of the entire sequence, which can be used in downstream tasks like classification and translation.
  • Likelihood estimation. Using $S_{\Phi}(\phi)$, one can estimate the probability density $p_{\Phi}(\phi)\propto \exp(-S_{\Phi}(\phi))$ for any field configuration φ. This will be useful for anomaly detection.
  • Sample generation. As a generative model, new samples can be generated by first sampling ζ in the hyperbolic space and then transforming to $\phi = \hat{\mathcal{G}}(\zeta)$ using the generation flow, which may find applications in completing missing objects in a sequence.

The proposed algorithm is advantageous in the following aspects.

  • Disentangled features in scales. The optimal RG flow distills features at different scales, allowing the model to capture the long-range and multi-scale correlation in the sequential data. The features are automatically arranged in a hyperbolic space, making it easy to access/control.
  • Efficient inference/generation. The hierarchical and iterative approach enables the model to infer latent fields or generate original fields in $\Theta(N)$ complexity (given the sequence length N), which is superior compared to the $\Theta(N^2)$ complexity of transformer-based approaches, especially when the sequence is long.
  • Ability to process hierarchical structure. The renormalization transformation can progressively extract coarse-grained features from fine-grained features, making it capable of capturing global features (such as the parity of bit strings). In comparison, as shown in [31], self-attention-based models can not efficiently model hierarchical structures unless the number of layers/heads increases with sequence length.

3.11. Recovering conventional RG by integrating out irrelevant fields

Finally, we would like to comment that the invertible renormalization can fall back to the conventional renormalization by integrating out irrelevant fields. Recall equation (3) that in each step of the invertible renormalization transformation, the original (fine-grained) field $\phi^{(k-1)}$ is separated into the relevant $\phi^{(k)}$ and irrelevant $\zeta^{(k)}$ fields by $(\phi^{(k)},\zeta^{(k)}) = \hat{\mathcal{R}}^{(k)}(\phi^{(k-1)})$. The invertible renormalization $\hat{\mathcal{R}}^{(k)}$ can be downgraded to a non-invertible renormalization $\mathcal{R}^{(k)}$ by a forgetful map which forgets about the irrelevant field $\zeta^{(k)}$, such that $\phi^{(k)} = \mathcal{R}^{(k)}(\phi^{(k-1)})$ only transforms the fine-grained field $\phi^{(k-1)}$ to the coarse-grained field $\phi^{(k)}$.

According to equation (10), the actions are related by

Equation (24)

where the irrelevant field $\zeta^{(k)}$ is massive, and is described by the Gaussian action $S_{\mathrm{Z}}^{(k)}(\zeta^{(k)}) = \frac{1}{2}\Vert\zeta^{(k)}\Vert^2$ as in equation (15). Because $\zeta^{(k)}$ represents the high-energy modes that should be integrated out under renormalization, one can argue that the fluctuation of $\zeta^{(k)}$ can be treated perturbatively due to its large mass, which justifies the expansion of the action around $\zeta^{(k)}\to 0$,

Equation (25)

As the approximate action is quadratic in $\zeta^{(k)}$, one can perform a Gaussian integration for $\zeta^{(k)}$, under which the action becomes

Equation (26)

Therefore one can define the renormalization transformation $\bar{\mathcal{R}}^{(k)}$ on the action via $S_{\Phi}^{(k)} = \bar{\mathcal{R}}^{(k)}(S_{\Phi}^{(k-1)})$, in correspondence to the field renormalization $\phi^{(k)} = \mathcal{R}^{(k)}(\phi^{(k-1)})$. Based on equation (26), the explicit form of the renormalization operator $\bar{\mathcal{R}}$ can be given

Equation (27)

such that

Equation (28)

which reproduces the pullback construction of the action renormalization. If one further define the infinitesimal generator of $\bar{\mathcal{R}}$ as $\bar{\mathfrak{r}} = \log\bar{\mathcal{R}}$, the renormalization flow can be expressed as a differential equation [2, 32] $\partial_{\ell}S_{\Phi} = \bar{\mathfrak{r}}S_{\Phi} = -S_{\Phi\mathrm{Z}}-\frac{1}{2}(\partial_{\zeta}^2 S_{\Phi\mathrm{Z}}-(\partial_{\zeta}S_{\Phi\mathrm{Z}})^2).$

4. Experiments on genomic sequences

4.1. Problem overview

Extracting the hidden information of genomic sequences has been a critical subject in biological research, with relevance to epidemiology, immunology, protein design and many other subfields. With its great similarity to natural language processing problems, there are numerous studies on applying machine learning techniques to extract information from genomic sequences. Various machine learning architectures include word2vec [33, 34], bidirectional long short-term memory [35], transformer [36] etc. However while the existing algorithms can provide a single-gene level embedding, they do not provide a canonical sequence embedding and the hierarchical information is not clear from the natural language models. Thus we apply the RG idea from the previous sections to the genomic sequence representation problem, where the hierarchical structure provides the biological information at different energy levels, i.e. the deeper layer can capture the longer correlation in the sequence and thus provide a canonical embedding of the sequence with the deepest layer. We take the Influenza HA amino acid sequences as an example 7 , where the sequences are regarded as one-dimensional lattices as described in section 3.1. Below are samples of the sequence data; there are clear global features embedded, as one can see the similarities between different sequences.

4.2. Single amino acid distribution learning

Before proceeding with the sequence into the RG scheme, we need to verify that local features are efficiently learned. Thus we first look at the single amino acid distribution learning. As shown in figure 2, at a fixed location i among the sequences, there is a discrete distribution labeled by amino acid, we pick up that amino acid from each sequence. Then each sample is labeled by $a = (a_i)$, where $a_i\in A$ represents the single amino acid. We apply the pre-trained single-amino-acid level embedding $E: A\rightarrow \mathbb{R}^n$ from [35] where n = 20. After embedding, the boundary field is $\phi = (\phi_i)$, where $\phi_i = E(a_i)\in \mathbb{R}^n$. To remove the difficulty in transforming the boundary discrete distribution to the bulk uncorrelated continuous Gaussian distribution, we add small randomness on the boundary field, i.e. $\overline{\phi_i} = \phi_i + \epsilon$, where $p_Z(\epsilon)$ takes the standard normal distribution with small variance. For simplicity, in the following, we still use φi to denote the fields with small randomness.

Figure 2.

Figure 2. Sample Influenza HA amino acid sequences.

Standard image High-resolution image

With this setup, we train a neural ODE model to realize the bijective transformation between the data distribution to a Gaussian distribution as described in section 3.7.1. To further speed up the training process, we have added the Jacobian and Kinetic regularization to find an optimal bijective map as in [37]. The input data is the 4th single amino acid from 1000 Influenza HA sequences.

The ODE transformation $f_\theta(x,t)$ is constructed by a feed-forward network with four sequential hidden layers as shown in figure 3. The hidden layers are the concatsquash (CS) layers defined in [38]:

Equation (29)

where W0, W1, W2 are parameter matrices with shape $(d_{o},d_{i})$, $(d_{o},1)$, $(d_{o},1)$ respectively, di and do are input and output dimensions. We can consider x as a di -dimensional vector, and concate the time variable, then $f_{\mathrm {CS}}: \mathbb{R}^{d_i+1}\rightarrow \mathbb{R}^{d_o}$. The hyperbolic tangent activation functions are applied after the first three CS layers. Then the model is trained such that $f_{\theta}(x,t)$ describes the flow from the data to a Gaussian variable.

Figure 3.

Figure 3. Neural ODE structure. The input is the single amino acid vector representation $\phi\in \mathcal{R}^n$ with n = 20. The hidden layers are concatsquash (CS) layers. The latent variable has dimension 64, and the output has dimension n = 20.

Standard image High-resolution image

As shown in figure 4, a two-dimensional feature space can be obtained by applying the t-distributed stochastic neighbor embedding (t-SNE) algorithm [39] to the original data and the flow generated data embedding vectors. The flow-generated data are obtained by taking the inverse transformation $\hat{\mathcal{R}}^{-1}$ from vectors with the Gaussian distribution. The original data distribution with the multi-modular feature can be perfectly captured after training.

Figure 4.

Figure 4. Training result on a single amino acid with t-SNE representation. (A): The original data distribution (red) has a multi-modular structure, while the initial flow model's distribution (blue) is standard normal. (B): After training with neural ODE, two distributions coincide.

Standard image High-resolution image

4.3. Amino acid sequence distribution learning

To train on sequences, we adapt the hierarchical RG scheme described in section 3.6. As discussed in the previous section, the input sequence is represented as labels $a = (a_1,a_2,\ldots, a_I)$ with the cardinality I denoting the length of the sequence. With the pretrained embedding $\phi_i = E(a_i)\in \mathbb{R}^n$, the boundary fields are represented as $\phi = (\phi_1,\phi_2,\ldots, \phi_I)$. Thus we have the initial boundary fields $\phi^{(0)} = \phi$, and we can run the renormalization flow using equation (4). On the other hand, the generation flow equation (6) reconstructs the original field. Following the notation of MERA networks, each renormalization transformation layer consists of a disentangler layer and a decimator layer:

Equation (30)

where a disentangler layer disentangles the local correlations, and a decimator layer separates the decimated fields out as the bulk fields. In figure 5, we illustrate the model structure with green blocks as disentanglers and yellow blocks as decimators. Each block is a bijective transformation with the neural ODE structure as in figure 3. We can further explicitly write down the transformation equations: the covering length of a disentangler or a decimator is defined as the kernel length l. Then there are $\frac{I}{2^k l}$ blocks in the kth layer. For the mth block in the kth layer, where $m\in \lbrace 0,\ldots,\frac{I}{2^k l} -1\rbrace$, the transformation is given by

Equation (31)

Figure 5.

Figure 5. An illustration of the MERA structure with kernel l = 2. Green blocks are disentangler blocks, and yellow blocks are decimator blocks. After each decimator layer, half of the fields are redefined as bulk fields ζ.

Standard image High-resolution image

Half of the resulting fields are redefined as the bulk fields: $\zeta^{(k+1)}_{2^{k+1}(ml+a)}: = \phi^{(k+1)}_{2^{k+1}(ml+a)}$. Here we have chosen a scheme that after each layer, every other existing field is redefined as a new bulk field. Since there are position-dependent features among the sequences, to respect the local features of the sequence, we take independent block transformations as they are labeled by both layer index k and block index m. With this setup, we train on the objective $\mathcal{L} = \mathop{\mathbb{E}}_{\phi\sim p_\text{dat}}S_{\Phi}(\phi)$ as described in section 3.9.

In figures 6 and 7, we show the result when I = 4, l = 2 and when I = 16, l = 4 with the same set of data in the previous section. To compare the joint distribution, we concatenate the vector embeddings of the 4 and 16 amino acids for each sequence and train the t-SNE algorithm with these concatenated vectors. We also computed the normalized logarithmic probability defined as $\log_n p = \log p /(nI)$ with n the embedding dimension and I the sequence length, as shown in table 2. The numbers in the parentheses are normalized logarithmic probability before training. Both results show that the original data joint distribution can be captured using the RG scheme with local neural ODE blocks.

Figure 6.

Figure 6. Training result on a length 4 amino acid sequence dataset with t-SNE representation. (A): The original data distribution (red) has a multi-modular structure, while the initial flow model's distribution is standard normal (blue). (B): After training RG scheme with neural ODE blocks, two distributions concide.

Standard image High-resolution image
Figure 7.

Figure 7. Training result on a length 16 amino acid sequence dataset with t-SNE representation. (A): The original data distribution (red) has a multi-modular structure, while the initial flow model's distribution is standard normal (blue). (B): After training the RG scheme with neural ODE blocks, two distributions coincide.

Standard image High-resolution image

Table 2. Average normalized logarithmic probability from original data and generated data with sequence length 4 and 16.

 Normalized log prob
 Seq length = 4Seq length = 16
Original data1.52 (0.45)1.52 (0.25)
Generated data1.50 (0.19)1.50 (0.19)

With training on the full sequence, one can have hierarchical information from each layer. This may give a natural way for the search for the escaping virus. The shallow layers mainly capture the local information, while the deeper layers hold the global information. The escaping virus should have good local fitness (i.e. the sequence must be grammatically correct locally; otherwise, it does not represent a functioning virus) and have significant mutations from the known virus in the dataset to escape the immune system. Then one can use this separation of information levels to design rules for escaping viruses or train on a downstream classification task.

4.4. Learning viral escape mutation

We conclude our investigation on learning protein sequence distribution by studying the predictive performance of viral escapes with our model. Viral escapes are those mutations in viral protein sequences that make them unrecognizable from the human immune system. In other words, although they are still effective on the human body and cause infection, the immune system does not flag the mutated sequence as a threat to the body. Such mutations can be single or multiple; that is, only one or few amino acids can be instantly mutated; hence, identifying underlying patterns in viral escape mutations will be essential for viral vaccine development. As described in [35], in terms of language models, a viral sequence can be regarded as textual data, and a viral escape mutation is seen as a word change in a sentence that does change the semantics of the sentence; however, the sentence is still grammatically meaningful. With this analogy, a viral escape is the one capable of making the immune system falsely flag the mutant as a harmless sequence (change in sentiment), while the mutant preserves the virus evolutionary structure (grammatically correct). Therefore, among all possible mutations in a viral sequence, we search for viral escapes, which result in a high semantic change and high grammaticality in our model. Figure 8 depicts an example of all possible mutations in the test sequence.

Figure 8.

Figure 8. Sample of possible mutations in the test sequence. The first column (word) represents the mutant amino acid, and the second column (position) indicates the mutation position in the sequence.

Standard image High-resolution image

Following this idea (constrained semantic change search (CSCS) [35]), we first train our model on a corpus of viral sequences in an unsupervised fashion, then take a given viral protein sequence with its known viral escape mutations and rank the mutations based on their grammaticality and semantic change. In our construction, the semantic change caused by a mutation is regarded as the change in the internal representation of the deepest layer in our construction before and after the mutation happens. In other words, given the test sequence $a = (a_1, a_2,\ldots ,a_i,\ldots, a_I)$ and its mutant counterpart $\bar{a} = (a_1, a_2,\ldots ,\bar{a}_i,\ldots, a_I)$, a semantic change will be noted as $\Delta \zeta = \big|\zeta^{(K)}(a) -\zeta^{(K)}(\bar{a})\big|$ where K indicates the deepest layer in the hierarchical structure of our model.

According to the CSCS objective, grammaticality can be defined as how probable a mutation is in a, i.e. the probability value that the model evaluates for a mutation. With this definition, one natural definition of grammaticality is the conditional probability $p(\bar{a_{i}}\big|a)$ on the mutation $\bar{a_{i}}$ in the test protein [35]. In our model, however, the joint probability $p_{\Phi}(a)$ is the optimization objective that is used to evaluate the input grammaticality. Therefore the final score for each mutation is defined as:

Equation (32)

Note that throughout evaluating our model on viral escape mutations, we only consider a single mutation in test data. We also keep the size of our samples to 32 amino acids, and with only 25 different amino acids as the building blocks of sequences, there will be 768 (24 × 32) possible mutations. Among those, a small subset will be viral escape mutations already given to us. For this experiment, we used the escape mutations dataset in [40] that indicates 65 out of those 768 mutations are viral escapes. After calculating both sentiment change and grammatically of mutations, we ranked each mutation based on their ranking score in equation (32). Mutations with the highest ranking value will be considered as predicted viral escapes, and consequently, lower rankings indicate less probability for a mutation to be a viral escape. figures 9(A) and (B) illustrate grammaticality and semantic change of all mutations, including the viral escapes (red points), which clearly shows that viral escapes tend to show a high grammaticality.

Figure 9.

Figure 9. (A) Table of possible mutations with their grammaticality and semantic change. In the list of columns, 'pos' means the position in the sequence to be mutated, 'sub' refers to the substitution word, 'mut' is the mutant amino acid, and 'is-escape' shows which substitution is a viral escape. (B) Grammaticality v.s. semantic change of all mutations. Red points indicate the viral escape mutations. Note that the graph is not scaled.

Standard image High-resolution image

We also calculated the area-under-curve (AUC) of the ranking scores of mutations in figure 10. Our results clearly indicate that both grammaticality and semantic change quantities have a similar impact on the AUC value.

Figure 10.

Figure 10. AUC graph of mutations in the test viral protein sequence with 32 amino acids. Grammaticality (p) and semantic change ($\Delta \zeta $) have an equivalent contribution to the final AUC value.

Standard image High-resolution image

5. Conclusion

We proposed a hierarchical flow-based generative model motivated by the RG in physics. The key idea of RG is to separate relevant and irrelevant features at each scale recursively. The conventional RG throws away irrelevant features and is information lossy. For this reason, the conventional RG is irreversible and only forms a semigroup. We promote the RG to a group by keeping the irrelevant features and formulating RG as a bijective holographic map. The renormalization map encodes the original data into (almost) uncorrelated irrelevant features. The renormalization map can be learned from the data as the optimal transport that transform the data distribution toward the uncorrelated normal distribution, which can be trained within the framework of a flow-based generative model. The generative map (as the reverse of the renormalization map) can then decode randomly sampled irrelevant features into data to achieve the generation task. We apply the construction to bio-sequence modeling. After training, the RG-flow generative model can not only generate new sequences but also estimate the log-likelihood of any given sequence. Using the log-likelihood estimation, we can evaluate whether a sequence is grammatically valid. This enables us to predict viral escape by looking for grammatically valid sequences with large enough semantic changes. Our approach can achieve similar performance as the current state of the art.

As for the limitations, the current machine-learning RG algorithm still relies on an explicit hierarchical structure in the neural network, which has less flexibility in designing the RG scheme (for example, the sequence must be reduced by half at every RG step). It is desired to go beyond this constraint and formulate a more flexible RG scheme. One possible future direction is to incorporate the idea of RG with diffusion-based generative models, which formulates RG flow as a diffusion process. The mathematical foundation has been formulated in appendix, which shows that the conventional RG flow can be understood as a Ricci flow in response to the continuous deformation of the base manifold. This can be naturally combined with diffusion-based generative models. We will explore this possibility in future works.

Acknowledgment

The first author would like to acknowledge support of National Science Foundation SBRI Grant No.2109928, as well as support by the National Science Foundation under Cooperative Agreement PHY-2019786 (the NSF AI Institute for Artificial Intelligence and Fundamental Interactions, http://iaifi.org/). The second author was supported by a startup fund provided by UCSD and the UC Hellman fellowship.We acknowledge the discussion with Hong-Ye Hu.

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Appendix: Analytic construction of RG flow operator on moduli space of smooth maps

Appendix. A special example of our construction, using Laplacians and curvature form

A large part of current section is based on work of Carfora [12] in relating RG flows for a specific set of QFT's and the Ricci flow construction for Riemannian manifolds. Moreover, the content in this section owes its existence to another highly recommended source, specially for a working mathematician, that is the outstanding work of Kevin Costello [41] in his mathematical formulation of perturbative quantum field theory.

For the time being, we use the introduction to a geometric construction of the RG flow, outlined below, as it is suitably intuitive, pleasantly elegant, and mainly since it will provide us later with the shortest pathways to generalize our constructions in several ways, for instance: by deviating from the classical setup, via altering (and generalizing) our action integrals made with Laplacians and curvature forms to more general actions, or by altering the base geometrical spaces from smooth manifolds to non-smooth algebraic varieties, or discrete lattices.

Let $C, X$ denote respectively a compact oriented Riemann surface and a compact oriented smooth manifold of dimension at least 2, both equipped with a Riemannian metric, and defined over a base number field $\mathbb{K}$. Let $\text{Map}(C, X): = \{f: C\to X\}$ be the associated space of all continuous maps from domain C to X. The construction of RG flow is based on considering a family of Lagrangians $\mathcal{L}(f, \phi_{i}, i = 1,\ldots, n)$ associated to this space, defined as a morphism

taking a tuple of fields $(f, \phi_{1},\ldots, \phi_{n})$ on X to the space of smooth integrable functions on X. Note that here the notation $H^{*}(X, \mathbb{K})^{\otimes n}$ means that the fields $\phi_{i}, i = 1, \ldots, n$ are realized as sections of a sheaf of differentially graded algebras over X, sitting in appropriate cohomological degrees on X. Moreover, we require the Lagrangians to be invariant under the action of diffeomorphism groups, $\mathcal{D}\textit{iff}(C), \mathcal{D}\textit{iff}(X)$ on C and X respectively.

Integrating the Lagrangian over the associated domain Riemann surface induces the Lagrangian action integral

Let the metric tensors on C and X be respectively denoted by $\mu_{mn}, m,n = 1,2$ and $g_{ij}, i,j = 1,\ldots n$. Suppose that the local coordinates on C are given by x (that is $x: = (x_{1}, x_{2})$). Then a typical form of such Lagrangian action integral as defined above is given as

Equation (33)

where λ is a coupling parameter, νC is a measure on C, $\rho: X\to \mathbb{K}\in C^{\infty}(X)$ is a smooth function on X, and $\mathcal{K}$ is the Gaussian curvature on C with respect to the metric µ. Here the fields associated to the Lagrangian action integral are given as

Remark 1.

remark By this notation we mean that the coupling constant λ has dependence on parameters $g, \rho$.

Remark 2.

remark By writing the action in terms of the Laplacian + curvature form in equation (33), we have assumed to study the RG flow of this particular CFT. However, RG flow can be more generally defined for any field theory with any action to start with, not necessarily near a conformal fixed point. See section 2.4 for more discussions of RG flow around general fixed points.

Appendix. Deformation family of Lagrangian action integrals

Let us denote

where $\phi_{0}: = \lambda^{-1}(g_{0},0)$ is a field associated to the fixed choice of g0. One interesting case of study is to identify the moduli space (the geometric space representing the family) of smooth maps $f: C\to X$ which minimizes the action integral $\mathcal{S}(f, \phi_{0})$ for a fixed choice of metric g0 over X. These are often identified with vacuum states of the underlying governing physical theory for our system of particles. A rather more interesting question is whether the vacuum states of the underlying theory are stable with respect to infinitesimal deformations of the geometry of C and X, respectively, especially in quantum physics, where fields and geometry of space undergo algebraic or analytic fluctuations. This question could be rigorously studied via inducing deformations of the fields involved in our physical theory, that is

where the function $h\in C^{\infty}(X, T^{\vee}X^{\otimes 2})$ is a symmetric bilinear smooth differential form on X and $\rho\in C^{\infty}(X, \mathbb{K})$ is a smooth function on X. Introducing these deformation parameters, one can study the set of extremizing maps $f: C\to X$ of the action integral $\mathcal{S}(f, \phi)$, that is smooth harmonic maps minimizing $\mathcal{S}(f, \phi)$, where $\mathcal{S}(f, \phi)$ is obtained as a local deformation around $\mathcal{S}(f, \phi_{0})$ induced by deforming the geometry of $C, X$. Let us consider a generalized deformed Lagrangian action

Equation (34)

where as before $h\in C^{\infty}(X, T^{\vee}X^{\otimes 2})$, $\Gamma\in C^{\infty}(X, \mathbb{K})$, and $\omega\in C^{\infty}(X, \wedge^{2}T^{\vee}X)$ an antisymmetric bilinear form are all regarded as infinitesimal induced deformation parameters. Note that here the deformation parameters $\phi_{1}: = \lambda^{-1}h$, $\phi_{2} = \lambda^{-1} (\lambda \rho)$, $\phi_{3}: = \lambda^{-1}U$ and $\phi_{4}: = \lambda^{-1}\omega$ may, roughly speaking, be regarded as local coordinates in the space of deformations of $\mathcal{S}(f, \phi_{0})$. Hence we can rewrite one such deformation in terms of the other as an extension

Equation (35)

Moreover, it must be noted that depending on the underlying physical theory, one may consider situations where $\mathcal{S}(f, \phi_{0})$ is required to be invariant under conformal transformations $(C, \mu_{mn})\to (C, e^{-\psi}\mu_{mn})$, in which case, shall one be interested to preserve the conformal invariance of the deformed Lagrangian action $\mathcal{S}(f, \phi)$, one requires that the deformation fields ρ and U vanish, as they break the conformal symmetry, however the deformations $h, \omega$ can be non-vanishing, as their associated integrals are preserved under conformal group action on C.

Appendix. Moduli functors associated to deforming fields and maps simultaneously

We mimic the approach of algebraic geometers for constructing our moduli spaces. Consider the following situation. Let $\mathcal{T}\to \text{Spec}{\mathbb{K}}$ be a finite type parametrizing scheme. The notation means that $\mathcal{T}$ is a space (known as parametrizing scheme in algebraic geometry terms) constructed over field of numbers $\mathbb{K}$ that is topologically compact. Let $\mathfrak{M}\text{ap}(C, X): \mathcal{S}\text{ch}/\mathbb{K}\to \mathcal{A}\text{b}/ \mathbb{K}$ be defined as a two-category (i.e. a category which contains objects, their morphisms, and their morphisms of morphisms, also known as 2-morphisms), such that the category is fibered over a base category of finite type (parametrizing) schemes over $\mathbb{K}$. The objective of such functor is to produce families of maps from C to X parametrized by schemes such as $\mathcal{T}$. To state the latter functionality of $\mathfrak{M}\text{ap}(C, X)$ in more mathematical formal terms, we say that the groupoid sections of $\mathfrak{M}\text{ap}(C, X)$ over any $\mathcal{T}$ are given by the sheaf of Abelian groups of $\mathcal{T}$-families of smooth maps from C to X, that is the groupoid sections of our moduli functor are given by families of maps

Equation (36)

such that for any $t\in \mathcal{T}$ the t-fibers of the family $\,\tilde{f}\mid_{t}\cong \{f_{t}: C\to X\}$ are given by smooth continuous maps from domain Riemann surface C to X. Roughly speaking, the functor $\mathfrak{M}\text{ap}(C, X)$ provides us with a platform to parametrize the smooth maps from C to X in a systematic way over any chosen parametrizing scheme. For instance, given any $\mathcal{T}: = \text{Spec}(\mathbb{K})$, geometric reduced point, the groupoid sections of $\mathfrak{M}\text{ap}(C, X)(\mathcal{T})$ are given by single maps $f:C\to X$. Similarly, the fibers of $\mathfrak{M}\text{ap}(C, X)$ over a line, L (which as a geometric scheme belongs to our category, $\mathcal{S}\text{ch}/\mathbb{K}$, of schemes of $\mathbb{K}$) provides a one dimensional family of maps $f_{L}: C_{L}\to X$, and the fibers of $\mathfrak{M}\text{ap}(C, X)$ over a surface provides a two dimensional family of maps, etc.

Now as the geometric structure of $C,X$, and hence f undergo deformations in our theory, similar to Feynman path integration formalism, we compute the vacuum states of the theory, by taking a stochastic average over all admissible weighted morphisms $f: C\to X$ which satisfy smoothness property. In doing so, we further allow certain induced correlation fields, defined in our theory, induced by evaluating the map f at a finite number of smooth distinct marked points $p_{1}, \ldots, p_{l} \in C$. Moreover, we use the Lagrangian action integral constructed in previous section as a weight function associated to each single map $f:C\to X$. Doing so, we obtain an integral over the space parametrizing tuples $(f: C\to X, p_{1}, \ldots, p_{l})$, where $p_{i}, i = 1, \ldots ,l$ are distinct smooth marked points on C

Equation (37)

Here $D_{\phi}(f\,)$ is a measure over $\mathfrak{M}\text{ap}(C, p_{1}, \ldots, p_{l},X)$. Note that by construction $\mathcal{S}(f, \phi)$ is regarded as a deformation of $\mathcal{S}(f, \phi_{0})$, hence following the construction in (35), one may rewrite correlation function (37) in terms of $\mathcal{S}(f, \phi_{0})$ as follows

Equation (38)

Appendix. Renormalization semigroup flow

The construction of the renormalization semigroup flow is based on the fact that, in order to make the above integrals well-defined, one may merely consider certain controllable deformation regimes for the fields φi , that is; one would like to consider a family of the fields $\phi_{i}(\mathcal{T})$, where the scheme $\mathcal{T}$ is the parametrizing scheme, used in (36) governing the geometric deformations of maps $f_{\mathcal{T}}: C_{\mathcal{T}} \to X$ induced by perturbation of geometric structures of C and X. The idea is to consider an infinitesimal deformation flow, called renormalization semigroup flow (as it turns out that our construction in this example only provides a semigroup rather than a group), over the moduli space of maps and field deformations, that is, to consider a morphism

Equation (39)

which has a lift to a morphism on moduli space of action integrals

Equation (40)

taking $\mathcal{S}(f, \phi_{0})$ to $\mathcal{S}(f_{\mathcal{T}}, \phi_{\mathcal{T}})$, which satisfies the semigroup property. Remark 3.

remark We remark again that we are considering, generally speaking, our fields $\phi_{i}, i = 1,\ldots, n$ as living in our field algebra, that is the vector space $H^{*}(X, \mathbb{K})^{\otimes n}$ generated by differentially graded forms on X. Moreover, the action integrals are regarded as morphisms from $\mathfrak{M}\text{ap}(C, X)(\mathcal{T})\times H^{*}(X, \mathbb{K})$ to the underlying ground field $\mathbb{K}$, and hence, realized as the dual space $\left(\mathfrak{M}\text{ap}(C, X)(\mathcal{T})\times H^{*}(X, \mathbb{K})^{\otimes n}\right)^{\vee}$.

We now elaborate further on renormalization flow. In order to define it we need to formulate a deformation process, applied to geometry of $C, X$, then compute the induced deformations of associated fields φi and f with support on deformed X as shown in equation (39). Note that the functorial construction of the moduli space of maps allows us to perform this task in a rigorous algebraic manner. Take a scheme $\mathcal{T}$ (naively speaking, schemes have as their skeleton the geometrical spaces however, they come further equipped with extra topological or algebraic properties). As we noted above, the fibers of the moduli functor $\mathfrak{M}\text{ap}(C, X)$ over $\mathcal{T}$ (i.e. $\mathfrak{M}\text{ap}(C, X)(\mathcal{T})$) provide us with a $\mathcal{T}$-family of maps from C → X as in (36). Now choose an algebraic deformation (a perturbation) of $\mathcal{T}$ and denote it by $\mathcal{T}^{^{\prime}}$. Then the fibers $\mathfrak{M}\text{ap}(C, X)(\mathcal{T}^{^{\prime}})$ provide a $\mathcal{T}^{^{\prime}}$-family

realized as a deformation of the former $\mathcal{T}$-family maps from C to X.

One way of constructing such algebraic deformation is to construct $\mathcal{T}^{^{\prime}}$ as a nilpotent thickening of $\mathcal{T}$. We elaborate on this notion, using the language of ideals over the ring of polynomial functions.

Take the polynomial ring $\mathbb{C}[x_{1}, \ldots, x_{n}]$. In classical algebraic geometry, the set of prime ideals generated by different expressions involving the variables $x_{1}, \ldots, x_{n}$ makes a space, isomorphic to the 'affine' space $\mathbb{C}^{n}$. Now in order to obtain more interesting spaces, one may consider an ideal, say as an example $\mathcal{I} = (x_{1}x_{2}-x_{3}^2)$, and consider the quotient ring $\mathbb{C}[x_{1}, \ldots, x_{n}]/ \mathcal{I}$. This expression means that all polynomials generated by the expression $x_{1}x_{2}-x_{3}^2$ vanish on this quotient ring. Now the set of prime ideals $p\subset \mathbb{C}[x_{1}, \ldots, x_{n}]/ \mathcal{I} $ provides us with set of geometric points of the algebraic space (algebraic variety) given as the solution set to the polynomial equation $x_{1}x_{2}-x_{3}^2 = 0$. Let us denote this algebraic variety as $\mathcal{T}$. In order to obtain a nilpotent thickening of $\mathcal{T}$ one can simply construct the quotient ring $\mathbb{C}[x_{1}, \ldots, x_{n}]/ \mathcal{I}^{l}$ for some l. The set of prime ideals in the latter provides one with the set of geometric points of the variety obtained as the solution set to $(x_{1}x_{2}-x_{3}^2)^{l} = 0$, call the latter space $\mathcal{T}^{^{\prime}}$. Due to the natural inclusion of ideals $\mathcal{I}^{l}\subset \mathcal{I}$, one can immediately obtain a natural inclusion of $\mathcal{T}\hookrightarrow \mathcal{T}^{^{\prime}}$. This deformation is called a nilpotent extension of $\mathcal{T}$ of order l. Given a deformation as such nilpotent extension, $\iota_{TT^{^{\prime}}}:T\hookrightarrow T^{^{\prime}}$, as we elaborated earlier, the renormalization flow must satisfy the property that

Since the action of the RG flow is realized as a pullback in our construction, one is able to define its induced action on the correlation function defined in (38) as follows

Equation (41)

Let us work out a concrete example. Example 4.

example For simplicity, let us assume that $\mathbb{K}$ is given as a field of characteristic zero, such as $\mathbb{C}$, the field of complex numbers. Consider the case where $\mathcal{T}: = \text{Spec}(\mathbb{K}[x_{1}, x_{2}, \ldots, x_{n}]/ (x_{2}, \ldots, x_{n}))\cong \mathbb{A}^{1}$ is given by taking the Zariski spectrum of the affine line in the direction x1, given by ideal $\mathcal{I} = (x_{2}, \ldots, x_{n})$ over $\mathbb{K}$. Locally, after choosing a coordinate chart $(x_{1}, \ldots, x_{n})$, the set geometric points in $\mathcal{T}$ is the set of points on the x1 axis in $\mathbb{C}^{n}$. Now we introduce an infinitesimal deformation of $\mathcal{T}\hookrightarrow \mathcal{T}^{^{\prime}}$, induced by a nilpotent extension of order 2, by taking $\mathcal{T}^{^{\prime}}: = \text{Spec}(\mathbb{K}[x_{1}, \ldots, x_{n}]/\mathcal{I}^{2})$. There exists a canonical short exact sequence

whose kernel is governed by the conormal sheaf (which in here is identified by sheaf of differential one forms on $\mathcal{T}$, that is $\Omega_{\mathcal{T}}$). This roughly speaking realizes the second order nilpotent thickening of $\mathcal{T}$, as the cotangent bundle, $\Omega_{\mathcal{T}}$, of $\mathcal{T}$. We would like to deform the correlation function (38) in the direction of fibers of $\Omega_{\mathcal{T}}$. This amounts to setting $\mathcal{RG}_{\mathcal{T}}$ as the differential operator which deforms the fields in direction of fibers of cotangent bundle of $\mathcal{T}$, that is, RG flow acts on the fields as a map $\phi\to \phi+d\phi$ and hence its induced action on the action integral is given by

Equation (42)

Therefore, viewing the RG flow as a differential operator acting on the action integral, Z, and rewriting the variation of Z, induced by nilpotent deformation of $\mathcal{T}$, in terms of Z itself, we obtain a differential equation governing the change in Z, that is

Equation (43)

Footnotes

Please wait… references are loading.
10.1088/2632-2153/acb488