Deep learning symmetries and their Lie groups, algebras, and subalgebras from first principles

Roy T Forestano; Konstantin T Matchev; Katia Matcheva; Alexander Roman; Eyup B Unlu; Sarunas Verner

doi:10.1088/2632-2153/acd989

1. Introduction

Symmetries play a fundamental role in modern physics [1]. Physical systems with continuous symmetries exhibit conservation laws that are universally applicable and indispensable in understanding the system's behavior and evolution. In particle physics, symmetries provide an organizing principle behind the observed particle zoo and its interactions, and guide model-builders in the search for viable extensions of the standard model (SM) [2]. At the same time, the mathematical study of symmetries is interesting in its own right and has a rich history.

Over the last decade, there has been increased interest in applications of machine learning (ML) to high-dimensional physics data as a sensitive tool for event simulation, data analysis, and statistical inference [3–5]. More recently, ML is also being used to facilitate tasks that traditionally have fallen within the domain of theorists, e.g. performing symbolic computations [6, 7] or deriving analytical formulas by training a symbolic regression on synthetic data [8–20].

Applications of ML to the study of symmetries have been pursued by a number of groups in different contexts. One line of work investigates how a given symmetry is reflected in a learned representation of the data [21, 22] or in the ML architecture itself, e.g. in the embedding layer of a neural network (NN) [23]. Several proposals attempt to design special ML architectures (equivariant NNs) which have a desired symmetry property built in from the outset [24–30] and test their performance [31]. Incorporating the symmetry directly into the ML model makes it more economical (in terms of learned representations), interpretable and trainable. The approach can be extended to cover discrete (permutation) symmetries as well [32, 33]. Such efforts pave the way for data-driven blind searches for new physics which stress-test the data for violations of a well-established symmetry of the SM [34–36].

More recently, ML is also being applied to address more formal theoretical questions. For example, a good understanding of the symmetries present in the problem can reveal conserved quantities [37, 38] or hint at a more fundamental unified picture [39]. ML has been used to discover the symmetry of a potential [23, 40, 41], to decide whether a given pair of inputs is related by symmetry or not [42], to distinguish between scale-invariant and conformal symmetries [43], and to explore the string landscape [44–46]. Recent work made use of generative adversarial networks to learn transformations that preserve probability distributions [47]. ML applications have also found their way into group theory, which provides the abstract mathematical language of symmetries. For example, recent work used ML to compute tensor products and branching rules of irreducible representations of Lie groups [48] and to obtain Lie group generators of a symmetry present in the data [49, 50].

The main goal of this paper is to design a deep-learning method that mimics the traditional theorist's thinking and is capable of discovering and categorizing the full set of (continuous) symmetries in a given dataset from first principles, i.e. without any prior assumptions or prejudice. The only input to our procedure is a labeled dataset $\{\mathbf{x};y\}$ like the one in equation (1) below. An oracle $\varphi(\mathbf{x}) = y$ can then be learned from the dataset, or alternatively, can be provided externally. With those ingredients, we go through the following objectives:

Discovery of symmetries. In the first step, described in section 3.1, we learn to generate a symmetry transformation, $\mathbf{x}\stackrel{\mathbf{f}}{\to} \mathbf{x}^{^{\prime}}$ , which preserves the oracle values. The transformation f is encoded in a NN trained on the dataset. In general, there will be many possible symmetry transformations f, and their study and categorization from a group theory point of view is the main goal of this paper.
Discovery of symmetry generators. Having learned how to generate arbitrary (finite) symmetry transformations f, we then focus on infinitesimal transformations $\delta\mathbf{f}$ , which give us in turn the symmetry generators J.
Discovery of Lie subalgebras. By adding suitable terms to the loss function, our procedure requires that the learned set of generators $\{\mathbf{J}_{\alpha}\}$ forms a closed algebra. This allows us to discover subalgebras of the symmetry group, and identify them by their structure constants. Since the number of generators N_g is a free input, in cases when the training is unsuccessful (as quantified by the loss), we can rule out the existence of N_g-dimensional subalgebras.
Discovery of the full Lie algebra. The maximum value of N_g which gives a vanishing loss, indicates the dimension of the full Lie algebra. The corresponding learned set of generators describes the full symmetry group of the dataset.
Identification of the symmetry group and its subgroups. The learned sets of generators found in the previous two steps are then used to obtain the structure constants of the respective full algebra and its subalgebras, and thus to identify the corresponding symmetry group and its subgroups.

Our study complements and extends previous related work in [23, 40, 41, 49, 50]. We note that our procedure is completely general, and does not anticipate what symmetries might be present in the dataset—instead, the symmetries are learned from scratch. In addition, our method is basis-independent since we do not choose a specific convenient basis for the learned transformations and generators. Consequently, our results will not always match the nice canonical forms of the generators found in the group theory textbooks. Nevertheless, as the examples below explicitly demonstrate, all our learned transformations and generators will satisfy the defining properties of the respective symmetry groups and subgroups.

The paper is organized as follows. In section 2 we introduce the setup for our analysis and the corresponding notation and conventions. The main steps of our deep-learning procedure are outlined in section 3, which also explains and motivates the necessary ingredients for the loss function. For readers who are not yet fully comfortable with a deep-learning approach, section 4 outlines an analogous linear algebra approach that often (but not always—see section 8 for counterexamples) can accomplish similar objectives. Each of the remaining sections contains a separate completely worked-out example illustrating our method. The examples are distinguished by the choice of oracle and the dimensionality of the feature space. In section 5 we choose an oracle that returns the Euclidean distance in feature space, whose dimensionality, in turn, is chosen to be n = 2 in section 5.1, n = 3 in section 5.2, and n = 4 in section 5.3. In section 6 we focus on the Lorentz group, i.e. the oracle computes the pseudo-Euclidean distance in four-dimensional Minkowski space-time. In section 7 we consider the squeeze transformations in n = 2 dimensions whereby the oracle returns the product of the two features. To demonstrate the universal applicability of our technique, in section 8 we show two examples of discontinuous oracles. We summarize and conclude in section 9.

2. Setup and notations

Our starting point is a labeled dataset containing m samples of n features and a target label y:

$\begin{align} m\ \mathrm{samples} \left\{ \begin{array}{ccccc} x_1^{(1)}, & x_1^{(2)}, & \ldots , & x_1^{(n)}; & y_1\\ x_2^{(1)}, & x_2^{(2)}, & \ldots , & x_2^{(n)}; & y_2\\ \vdots & \vdots & \vdots & \vdots & \vdots \\ x_m^{(1)}, & x_m^{(2)}, & \ldots , & x_m^{(n)}; & y_m\\ \end{array} \right. \, . \end{align} \tag{ 1 }$

In ML parlance, the dataset (1) is an m × n dataframe with n features and m samples. In what follows, we use the sample index a lot more often than the feature index, thus we use an explicit subscript for the sample index and hide the feature index in the boldface vector notation x:

$\begin{align} {\mathbf{x}}\equiv \left\{x^{(1)}, x^{(2)},\ldots, x^{(n)} \right\}. \end{align} \tag{ 2 }$

This allows us to write the input features in a compact form as

$\begin{align} \left\{\mathbf{x}_i\right\} \; \equiv \; \left\{\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m\right\} \, . \end{align} \tag{ 3 }$

In order to study the symmetries of the data (1), we need to know the function $y(\mathbf{x})$ , which can be given analytically or numerically in terms of an oracle $\varphi: \mathbb{R}^n \rightarrow{\mathbb{R}}$ capable of computing the corresponding output target labels $y_1,y_2,\ldots,y_m$ :

$\begin{align} y_i \; = \; \varphi(\mathbf{x}_i) \, , \qquad i \, = \,1,2,\ldots,m \, . \end{align} \tag{ 4 }$

This leads to two basic scenarios:

The function $y(\mathbf{x})$ is known analytically, and that same function has been used as in (4) to compute the labels in (1) exactly. This case is of interest to theorists, and this is the approach adopted in this paper as well—for each of our examples below, we specify the relevant analytic oracle $\varphi(\mathbf{x})$ and proceed to study the resulting symmetries. Note, however, that we never take advantage of the knowledge of the analytical form of the oracle, e.g. we do not differentiate or symbolically manipulate in any other way the function $\varphi(\mathbf{x})$ . For our purposes, we only use the oracle numerically—our method treats it simply as a black box, which, given the values of x, can produce the numerical value of the label y.
The functional dependence $y(\mathbf{x})$ is a priori unknown and the dataset (1) is all that is available to us. This is the typical case encountered by data scientists. Now, one needs to go through a preliminary step of first creating the oracle ϕ (usually in the form of a NN trained on the dataset (1)), which is capable of computing and reporting the values of $y = \varphi(\mathbf{x})$ to us. This is a standard regression task which is of no interest here since it can be accomplished using one of the many established ML regression techniques. We can therefore safely assume that this preliminary step has already been completed and we have such a numerical oracle $y = \varphi(\mathbf{x})$ already available.

Since we are only using the oracle numerically, from our point of view there is no real difference between the above two cases. In what follows the oracle $\varphi(\mathbf{x})$ will be used in the exact same way, regardless of its origin.

Given this general setup, our main task is to derive the symmetry transformation $\mathbf{f}: \mathbb{R}^n \rightarrow \mathbb{R}^n$

$\begin{align} \mathbf{x}^{^{\prime}} \; = \; \mathbf{f}(\mathbf{x}) \, , \end{align} \tag{ 5 }$

which preserves the ϕ-induced labels of our dataset (1). In other words, we want to find the function $\mathbf{f}(\mathbf{x})$ for which

$\begin{align} \varphi(\mathbf{x}^{^{\prime}}_i) \; \equiv \; \varphi(\mathbf{f}(\mathbf{x}_i)) = \varphi(\mathbf{x}_i), \quad \forall i \,= \,1,2,\ldots,m \, . \end{align} \tag{ 6 }$

The particular instantiation of the symmetry $\mathbf{f}(\mathbf{x})$ will depend on the initialization of our parameters, so by repeating the procedure with different initializations, we will in principle obtain a whole family of symmetry transformations.

Next, we focus on infinitesimal symmetry transformations and proceed to study the corresponding set of generators $\{\mathbf{J}_{\alpha}\}$ , with $\alpha = 1,2,\ldots,N_g$ , where we use lowercase Greek letters to label the generators of symmetry transformations. A given set of generators $\{\mathbf{J}_{\alpha}\}$ forms a Lie algebra if the closure condition is satisfied, i.e. if all Lie brackets $\bigl[ \, .\, ,\, .\, \bigr]$ can be represented as linear combinations of the generators in the set:

$\begin{align} \bigl[ \mathbf{J}_{\alpha}, \mathbf{J}_{\beta}\bigr] = \sum_{\gamma=1}^{N_g} a_{[\alpha\beta]\gamma} \mathbf{J}_\gamma . \end{align} \tag{ 7 }$

The coefficients $a_{[\alpha\beta]\gamma}$ are the structure constants (anti-symmetric in their first two indices, as implied by the square brackets) whose values will reveal the symmetry group present in our dataset.

In principle, the number of generators N_g is a hyperparameter that must be specified ahead of time (similarly to the choice of the number of clusters in certain clustering algorithms like K-means). Therefore, when we find a closed algebra at a given N_g value, we are only guaranteed that it is a subalgebra, and we must proceed to try out higher values for N_g as well. The full algebra will then correspond to the maximum value of N_g for which a closed algebra of distinct generators is found to exist.

3. Deep learning approach

In our approach, we model the output function f with a NN ${\mathcal F}_{\mathcal W}$ with n neurons in the output layer, corresponding to the n transformed features of the data point $\mathbf{x}^{^{\prime}}$ . The trainable network parameters (weights and biases) will be generically denoted with $\mathcal W$ . During training, they will evolve and converge to the corresponding trained values $\widehat{\mathcal W}$ of the parameters of the trained network ${\mathcal F}_{\widehat{\mathcal W}}$ , i.e. the hat symbol will denote the result of the training. Once the parameters $\widehat{\mathcal W}$ are found, we can find the structure constants using SMs. We choose a suitable loss function that ensures the desired properties of the trained network. The following subsections discuss the individual contributions to the loss function, which in our implementation are combined and minimized simultaneously.

The NN ${\mathcal F}_{\mathcal W}$ is implemented as a sequential feed-forward neural network in PyTorch [51]. The examples in sections 5 and 6 are simple enough to be done with no hidden layers, no bias, and no activation function, i.e. with linear transformations (see section 4). For the examples in the later sections, we do add hidden layers. Optimizations are performed with the Adam optimizer with a learning rate between 0.03 and 0.1. The loss functions were designed to achieve a fast and efficient training process without the need for extensive hyperparameter tuning. The training data (3) was 300 hundred points and was sampled from a standard normal distribution.

An alternative approach to predicting the generators from the model parameters, which utilizes an identical loss function, can be carried out where a NN, ${\mathcal F}: \{\mathcal{G}_{\alpha}, \mathbf{a} \} \to \{\widehat{\mathcal G}_{\alpha}, \hat{\mathbf{a}} \}$ , takes a set of randomly initialized n × n generators $\{\mathcal{G}_{\alpha} \}$ and an $N_b \times N_g$ structure constant array $\{\mathbf{a} \}$ , flattens each individual generator and the structure constant array, and proceeds to converge the elements to the desired set of generators $\{\widehat{\mathcal{G}_{\alpha}} \}$ and structure constant array $\{\hat{\mathbf{a}}\}$ . Here $N_b = {N_g \choose 2} = \frac{N_g(N_g-1)}{2}$ denotes the number of unique one-directional Lie brackets for a given number of generators N_g.

This alternative approach can be implemented as a module list of sequential NN layers consisting of two hidden layers for the generators and a single sequential layer consisting of two hidden layers for the structure constants. Each layer consists of a bias and the hidden layers use the rectified linear unit activation function. The optimizer, average learning rate, model hyperparameters, and training data generation were identical to the alternative implementation described above. Whereas the original method feeds the data directly into the network, the data in this approach is fed into the loss function to be transformed by the model's predicted generators.

3.1. Invariance

The invariance under the transformation (5) is enforced by requiring that the labels before and after the transformation remain the same. For this purpose, we include the following mean squared error term in the loss function L:

$\begin{align} L_{\mathrm{inv}}(\mathcal W, \{\mathbf{x}_i\}) = \frac{1}{m}\sum_{i=1}^m \left[ \varphi\left({\mathcal F}_{\mathcal W}(\mathbf{x}_i)\right)-\varphi(\mathbf{x}_i) \right]^2 \, . \end{align} \tag{ 8 }$

A NN trained with this loss function will find an arbitrarily general (finite) symmetry transformation ${\mathcal F}_{\widehat{\mathcal W}}$ parametrized by the values of the trained network parameters $\widehat{\mathcal W}$ .

In order to find multiple symmetries, the process can be repeated. Alternatively, several networks can be trained concurrently, by modifying the loss function to ensure that the resulting transformations are sufficiently distinct (see section 3.3 below).

3.2. Infinitesimality

In order to focus on the generators of the possible set of symmetry transformations, we restrict ourselves to infinitesimal transformations $\delta{\mathcal F}$ in the vicinity of the identity transformation I:

$\begin{align} \delta{\mathcal F} \; \equiv \; \mathbf{I} + \varepsilon \, {\mathcal G}_{\mathcal W} \, , \end{align} \tag{ 9 }$

where ε is an infinitesimal parameter and the parameters ${\mathcal W}$ of the new NN ${\mathcal G}$ will be forced to be finite, which ensures that (9) is an infinitesimal transformation. The loss function (8) can then be rewritten as

$\begin{align} L_{\mathrm{inf}}(\mathcal W, \{\mathbf{x}_i\}) = \frac{1}{m\varepsilon^2}\sum_{i=1}^m \left[ \varphi(\mathbf{x}_i + \varepsilon {\mathcal G}_{\mathcal W}(\mathbf{x}_i))-\varphi(\mathbf{x}_i) \right]^2 \, , \end{align} \tag{ 10 }$

where we have introduced an explicit factor of ε² in the denominator to account for the fact that generic transformations scale as ε [41]. Once we minimize the loss function, the trained NN ${\mathcal G}_{\widehat {\mathcal W}}$ will represent a corresponding generator

$\begin{align} \mathbf{J} \; = \; {\mathcal G}_{\widehat {\mathcal W}} \, , \end{align} \tag{ 11 }$

where

$\begin{align} \widehat{\mathcal W}\equiv\mathop{\mathrm{arg}\,\mathrm{min}}\limits_{\mathcal W} \biggl(L({\mathcal W}, \{\mathbf{x}_i\}) \biggr) \end{align} \tag{ 12 }$

are the learned values of the NN parameters which minimize the loss function. The result for $\widehat{\mathcal W}$ , and therefore, for J, will in principle depend on the starting values ${\mathcal W}_0$ of the network parameters at initialization. If we now repeat the training N_g times under different initial conditions ${\mathcal W}_0$ (or with different values of the hyperparameters), we will obtain a set of N_g (in general distinct) generators $\{\mathbf{J}_{\alpha}\}$ , $\alpha = 1,2,\ldots,N_g$ .

3.3. Orthogonality

In order to make the set of generators $\{\mathbf{J}_{\alpha}\}$ , $\alpha = 1,2,\ldots,N_g$ , found in the previous step maximally different, we introduce the following additional orthogonality term to the loss function

$\begin{align} L_{\mathrm{ortho}}({\mathcal W}, \{\mathbf{x}_i\}) = \sum_{\alpha < \beta}^{N_g} \left( \sum_{p} {\mathcal W}_{\alpha}^{(p)} {\mathcal W}_{\beta}^{(p)}\right)^2, \end{align} \tag{ 13 }$

where the index p runs over the individual NN parameters ${\mathcal W}_{\alpha}^{(p)}$ .

3.4. Closure

In order to test whether a certain set of distinct generators $\{\mathbf{J} _{\alpha}\}$ , $\alpha = 1,2,\ldots,N_g$ , found in the previous steps, generates a group, we need to check the closure of the algebra (7). This can be done in several ways. The most principled would be to minimize

$\begin{align} L_{\mathrm{closure}} (a_{[\alpha\beta]\gamma}) = \sum_{\alpha<\beta}\mathrm{Tr} \left(\mathbf{C}_{[\alpha\beta]} ^T\mathbf{C}_{[\alpha\beta]}\right), \end{align} \tag{ 14 }$

with respect to the candidate structure constant parameters $a_{[\alpha\beta]\gamma}$ , where the closure mismatch is defined by

$\begin{align} \mathbf{C}_{[\alpha\beta]} (a_{[\alpha\beta]\gamma}) \equiv \bigl[ \mathbf{J}_{\alpha}, \mathbf{J}_{\beta}\bigr] - \sum_{\gamma=1}^{N_g} a_{[\alpha\beta]\gamma} \mathbf{J}_\gamma . \end{align} \tag{ 15 }$

Since $L_{\mathrm{closure}}$ is positive semi-definite, $L_{\mathrm{closure}} = 0$ would indicate that the algebra is closed and we are thus dealing with a genuine (sub)group. In practice¹ , we perform the minimization of (14) simultaneously with the previously discussed contributions to the loss function, by replacing ${\mathbf{J}}_{\alpha} \to {\mathcal G}_{\mathcal W_{\alpha}}$ in (15). The advantage of this simultaneous construction is that every set of generators that we obtain at any given value of N_g is already forming an algebra that is 'as closed as possible'. Then, the size of the achieved total training loss is an indicator whether for that value of N_g a closed algebra exists or not.

Once a closed set of valid generators has been found, we can retrain the NN in a conveniently chosen canonical basis and obtain the canonical form of the set of generators, whose structure constants in turn reveal the nature of the group behind the found symmetry transformations ${\mathcal G}_{\widehat{\mathcal W}_{\alpha}}$ , $\alpha = 1,2,\ldots,N_g$ .

Sections 5.1–8 illustrate the steps above with several examples of increasing complexity. A flowchart of our symmetry finding procedure is shown in figure 1. The chosen values for n (the dimensionality of the feature space), N_g (the requested number of independent symmetry generators) and ε (the parameter of the infinitesimal symmetry transformation (9), taken to be ε = 0.001 in the numerical experiments below) are passed to the generator module parametrizing the transformation in terms of either a NN ${\mathcal F}_{\mathcal W}$ (section 3) or a matrix $\mathbb{G}$ (section 4). The parameters of the transformation are directly accessible to the module evaluating the loss function. Next we create the sample of feature vectors (3), ideally aiming for a complete coverage of the domain of interest (for our numerical experiments below, we sample 300 points from a standard normal distribution). These feature vectors are then processed using the current state of the generator into the transformed feature set $\{\mathbf{x}_i^{^{\prime}}\}$ . Both $\{\mathbf{x}_i\}$ and $\{\mathbf{x}_i^{^{\prime}}\}$ are subsequently passed to the oracle ϕ for the evaluation of the respective sets of target labels $\{y_i\}$ and $\{y_i^{^{\prime}}\}$ . All of this information is fed into the loss module which in turns informs the generator through the backpropagation feedback loop about the required adjustments in the values of the generator parameters. When the process converges (by either reaching a target value for the loss, or after a sufficiently large number of epochs), the N_g symmetry generators can be read out directly from the current state of the generator module.

**Figure 1.** A flowchart of the symmetry finding procedure outlined in sections 3 and 4.
Download figure:
Standard image High-resolution image

4. Linear algebra approach

The universal approximation theorems [52] guarantee that the deep-learning approach of the previous section can handle almost any symmetry transformation, including a highly non-linear one. At the same time, a large class of interesting symmetries arising in physics are linear transformations for which the usual formalism of linear algebra would suffice. Furthermore, the analysis of the symmetry generators involves infinitesimal transformations, which are represented with linear operators. For those reasons and to captivate the readers who are not yet at ease with the technical intricacies of ML, in this section, we reformulate our analysis from the previous section in the language of linear algebra. We follow the same notation and conventions, but replace the calligraphic font symbols representing NNs with corresponding blackboard-bold symbols representing n × n matrices acting on the n-dimensional feature space.

In this section we are interested in the linear subclass of the transformations (5), which are encoded in a generic matrix $\mathbb{F}$

$\begin{align} \mathbf{x}^{^{\prime}} = \mathbb{F}\, \mathbf{x} \, , \end{align} \tag{ 16 }$

whose n² components $\mathbb{F}_{ij}$ are determined by minimizing the loss function

$\begin{align} L_{\mathrm{inv}}(\mathbb F, \{\mathbf{x}_i\}) = \frac{1}{m}\sum_{i=1}^m \left[ \varphi\left({\mathbb F}\,\mathbf{x}_i\right)-\varphi(\mathbf{x}_i) \right]^2 \, . \end{align} \tag{ 17 }$

If the minimization is successful, then such a linear symmetry exists and is represented with the learned matrix $\widehat{\mathbb F}$ .

In analogy to (9), we can write the corresponding infinitesimal linear transformation as

$\begin{align} \delta \mathbb{F}(\varepsilon) = \mathbb{I} + \varepsilon\, \mathbb{G} \, , \end{align} \tag{ 18 }$

where $\mathbb{I}$ is the unit n × n matrix and $\mathbb{G}$ is an n × n matrix whose components $\mathbb{G}_{ij}$ are yet to be determined through the optimization. In order to obtain a single generator matrix $\mathbb{J}$ , we can use a loss function analogous to (10)

$\begin{align} L_{\mathbb{J}} (\mathbb{G},\{\mathbf{x}_i\}) = \frac{h_{\mathrm{inv}}}{m\varepsilon^2} \sum_i^m\left[\varphi\left( \mathbf{x}_i+\varepsilon\,\mathbb{G}\, \mathbf{x}_i \right) - \varphi(\mathbf{x}_i) \right]^2 + h_{\mathrm{norm}} \left[\mathrm{Tr} (\mathbb G^T \mathbb{G}) - 2\right]^2 , \end{align} \tag{ 19 }$

where the constants $h_{\mathrm{inv}}$ and $h_{\mathrm{norm}}$ in (19) are hyperparameter weights determining the relative contribution of the two terms in the loss function (19) enforcing the symmetry invariance and finite normalization² conditions, respectively. The actual generator ${\mathbb{J}}$ is then obtained by minimizing (19):

$\begin{align} \mathbb{J} = \mathop{\mathrm{arg}\,\mathrm{min}}\limits_{\mathbb{G}} \left( L_{\mathbb{J}} (\mathbb{G},\{\mathbf{x}_i\})\right) \, . \end{align} \tag{ 20 }$

By repeating this procedure several times with different initial conditions, we obtain a different generator $\mathbb{J}$ each time. Alternatively, we can produce all N_g generators in one go by adding together and minimizing simultaneously N_g copies of the loss function (19):

$\begin{align} \sum_{\alpha=1}^{N_g} L_{\mathbb{J}} (\mathbb{G}_{\alpha},\{\mathbf{x}_i\})\, , \end{align} \tag{ 21 }$

which will lead to a set of N_g generators $\mathbb{J}_{\alpha}$ , $\alpha = 1,2,\ldots, N_g$ , and their respective infinitesimal transformations $\delta\mathbb{F}_{\alpha} \equiv \mathbb{I} + \varepsilon\, \mathbb{J}_{\alpha}$ .

At this point the generators $\mathbb{J}_{\alpha}$ are completely decoupled and independent of each other. We can force them to be different by adding a loss term analogous to (13):

$\begin{align} L_{\mathrm{ortho}}({\mathbb G}, \{\mathbf{x}_i\}) = \sum_{\alpha < \beta}^{N_g} \left[ \text{Tr}({\mathbb G}_{\alpha}^T {\mathbb G}_{\beta})\right]^2. \end{align} \tag{ 22 }$

Finally, we can enforce the closure property by including a loss term analogous to (14):

$\begin{align} L_{\mathrm{closure}} (a_{[\alpha\beta]\gamma}) = \sum_{\alpha<\beta}\mathrm{Tr} \left(\mathbb{C}_{[\alpha\beta]} ^T\mathbb{C}_{[\alpha\beta]}\right), \end{align} \tag{ 23 }$

where

$\begin{align} \mathbb{C}_{[\alpha\beta]} (a_{[\alpha\beta]\gamma}) \equiv \bigl[ \mathbb{G}_{\alpha}, \mathbb{G}_{\beta}\bigr] - \sum_{\gamma=1}^{N_g} a_{[\alpha\beta]\gamma} \mathbb{G}_\gamma \, , \end{align} \tag{ 24 }$

and minimizing the total loss function with respect to the parameters $a_{[\alpha\beta]\gamma}$ as well.

5. Length-preserving transformations

In this section, we focus on transformations which preserve the Euclidean length of the feature vector, i.e. our oracle ϕ will return

$\begin{align} \varphi(\mathbf{x}) \equiv |\mathbf{x}| = \left\{\sum_{i=1}^n [x^{(i)}]^2 \right\}^{1/2}. \end{align} \tag{ 25 }$

In this case we expect our method to discover the symmetry of the orthogonal group O(n), whose generators can be written in terms of Kronecker deltas as

$\begin{align} \left({\mathbb O}_{mn}\right)^{ij} \; = \; \, \delta_m^i \delta_n^j - \delta_m^j \delta_n^i \, , \end{align} \tag{ 26 }$

in which case the algebra is given by

$\begin{align} \left[{\mathbb O}_{mn}, {\mathbb O}_{pq} \right] = \delta_{np} {\mathbb O}_{mq} + \delta_{mq} {\mathbb O}_{np} - \delta_{mp} {\mathbb O}_{nq} - \delta_{nq} {\mathbb O}_{mp}. \end{align} \tag{ 27 }$

Note that the generators ${\mathbb O}_{mn}$ are labeled by two indices, which indicate the plane of rotation.

5.1. Rotations in two dimensions

In this subsection we start with the simplest case of two dimensions, n = 2, which should correspond to the single-generator group O(2). First we try to generate a single generic (i.e. not necessarily infinitesimal) symmetry transformation. For this purpose, we train our network ${\mathcal F}_{\mathcal W}$ with the invariance loss (8). This exercise was successful and, depending on the initialization, we found various symmetry transformations. They all involved a rotation around the origin in the $(x^{(1)}, x^{(2)})$ plane. One representative symmetry transformation is depicted in figure 2.

**Figure 2.** A representative symmetry transformation preserving the oracle function (25), found by our procedure in the two-dimensional exercise in section 5.1. In this and all subsequent such figures, the arrows represent the displacements ${\mathbf{x}}^{^{\prime}}-{\mathbf{x}}$ resulting from the symmetry transformation. The color map illustrates the oracle function $\varphi(\mathbf{x})$ .
Download figure:
Standard image High-resolution image

**Figure 2.** A representative symmetry transformation preserving the oracle function (25), found by our procedure in the two-dimensional exercise in section 5.1. In this and all subsequent such figures, the arrows represent the displacements ${\mathbf{x}}^{^{\prime}}-{\mathbf{x}}$ resulting from the symmetry transformation. The color map illustrates the oracle function $\varphi(\mathbf{x})$ .
Download figure:
Standard image High-resolution image

Next, we turn to the algebra of symmetry generators. We begin with a single generator, $N_g = 1$ , in which case we do not need to include the orthogonality and closure terms in the loss function. The training is successful and the loss function is driven to zero, as seen by the blue solid line in figure 3. The resulting generator in matrix form is pictorially visualized in the top row of figure 4, and we immediately recognize the familiar matrix ${\mathbb O}_{12}$ from (26) generating rotations in the 12-plane

$\begin{align} {\mathbb O}_{12} = \left( \begin{array}{rr} 0 & 1 \\ -1& 0 \end{array} \right) \, . \end{align} \tag{ 28 }$

Note that the generator has the expected antisymmetric property.

**Figure 3.** The evolution of the training loss with the number of epochs for the two-dimensional exercise in section 5.1. Our algorithm finds one symmetry generator (the training loss shown in blue steadily decreases) but not two different symmetry generators (the training loss shown in orange stays large).
Download figure:
Standard image High-resolution image

**Figure 4.** Top row: a successfully learned generator for n = 2 and $N_g = 1$ . Bottom row: unsuccessfully learned 'generators' for n = 2 and $N_g = 2$ . In this and all subsequent such figures, each panel represents a learned generator ${\mathbb J}_{\alpha}$ in matrix form, where the values of the individual elements of the matrix are indicated by the color bar.
Download figure:
Standard image High-resolution image

**Figure 4.** Top row: a successfully learned generator for n = 2 and $N_g = 1$ . Bottom row: unsuccessfully learned 'generators' for n = 2 and $N_g = 2$ . In this and all subsequent such figures, each panel represents a learned generator ${\mathbb J}_{\alpha}$ in matrix form, where the values of the individual elements of the matrix are indicated by the color bar.
Download figure:
Standard image High-resolution image

Having found one symmetry generator, we next check if there is a second distinct generator. For this purpose, we add the orthogonality and closure terms in the loss function and repeat the training. This time the training is unsuccessful and the loss flattens after about 50 epochs, as seen by the orange solid line in figure 3. In the second row of figure 4 we show the result for the two candidate generators found in this case. We note that while they do have the expected form for a generator of rotations in two dimensions, they are essentially the same transformation, and differ only by an overall sign. This implies that they fail the orthogonality condition—indeed, we find that the dominant contribution to the large total loss in that case is from the orthogonality loss (13). Since the total loss is large and does not improve with further training (see the orange line for $N_g = 2$ in figure 3), these two are not valid generators and should be discarded. We thus conclude that there is no Lie algebra with $N_g = 2$ distinct generators in this scenario.

We note in passing that upon repeated training runs in the $N_g = 2$ case, we sometimes find that the algorithm chooses to create generators which are orthogonal to each other, but are not genuine symmetry transformations, i.e. the orthogonality loss is driven to zero, but only at the expense of a large invariance loss (10). In principle, it is difficult to predict what the machine will choose to do when presented with two mutually exclusive requirements. For example, in some runs the two candidate generators ended up being identical instead of differing by a minus sign. Also note that the normalization of the two candidate generators in the second row of figure 4 is off—this is because the program chose to violate the normalization condition as well, in order to help the orthogonality loss, which is the dominant penalty in that case.

Next, we check the cases with $N_g\gt2$ . Similarly, we find that the training ends up in a large loss and that there is no consistent solution for a closed algebra. We therefore conclude that in this n = 2 example there is only a one-parameter symmetry group with a generator given by (28).

Although the example of this subsection was rather trivial, it did outline and validate the main steps of our method. More complicated examples follow below. In each of those examples, we shall primarily be interested in the cases that lead to valid (sub)algebras, which are of interest to mathematicians. Hence we shall only show results from the successful training runs leading to small values for the loss. As already discussed above, the unsuccessful training runs leading to a large loss rule out the existence of (sub)algebras with that many generators. In those cases the exact value of the loss is determined by secondary factors like the choice of hyperparameters, the initial seed, etc

5.2. Rotations in three dimensions

In this subsection, we proceed to study symmetry transformations which preserve the oracle (25) in n = 3 dimensions. Once again, we find that training with the invariance loss (8) alone always leads to a valid symmetry transformation. We also observe that the matrix form $\mathbb{F}$ is antisymmetric, in agreement with the expectation for the orthogonal group O(3). Now that the data is three-dimensional, however, it is difficult to visualize the symmetry transformation directly as in figure 2. Instead, we choose to plot the symmetry axis (in this case the axis of rotation) defined by the real eigenvector. Figure 5 illustrates the transformations found at different stages of the training for the case of $N_g = 3$ generators. At the top of each panel we list the relative angle in degrees for each pair of axes. Note how the symmetry axes start out oriented at random, but the orthogonality loss term (13) gradually drives them to a mutually orthogonal configuration.

**Figure 5.** The axes of the eigenvectors of the three generators found at intermediate stages of the training: after 1 epoch (top left), after 10 epochs (top right), after 100 epochs (bottom left), and after 300 epochs (bottom right). For convenience, at the top of each panel we list the angle (in degrees) between each pair of axes.
Download figure:
Standard image High-resolution image

In order to analyze the group structure of the found symmetry transformations, we proceed to study the generators of infinitesimal transformations. First we try to determine the dimensionality of the full algebra, i.e. the maximal number of linearly independent generators resulting in a closed algebra. Figure 6 shows loss curves for several different values of N_g: 1, 2, 3 and 4 (we do not show results for $N_g\geqslant 5$ since they had large losses). We observe that the training was successful only for the cases of $N_g = 1$ and $N_g = 3$ . This implies that the full algebra has 3 generators, in agreement with the expectation of $n(n-1)/2$ generators for an orthogonal O(n) group. At the same time, the successful training at $N_g = 1$ reveals a single generator subgroup of O(3) which is nothing but the O(2) discussed in the previous section.

**Figure 6.** The same as figure 3, but for the n = 3 exercise considered in section 5.2. The training results in valid closed algebras with one or three generators, but not two or four.
Download figure:
Standard image High-resolution image

The results from a typical training run at $N_g = 3$ are shown in figure 7, where the top row depicts the three successfully learned generators ${\mathbb J}_{\alpha}$ , $\alpha = 1,2,3$ , while the bottom panel is a pictorial representation of the structure constants in matrix form as follows. Each row (labeled $\alpha\beta = 12,31,23$ ) represents one of the three possible commutators, whereas the columns (labeled $\gamma = 1,2,3$ ) represent the found generators $\mathbb J_\gamma$ shown in the top panels of the figure. Then, the entry in each cell represents the structure constant $a_{[\alpha\beta]\gamma}$ from the defining equation (7).

**Figure 7.** Top row: the successfully learned generators for the n = 3, $N_g = 3$ exercise considered in section 5.2. Bottom panel: the corresponding structure constants in matrix form (see text).
Download figure:
Standard image High-resolution image

A careful inspection of the top row in figure 7 reveals that the three generators found in our example are approximately

$\begin{align} \mathbb J_1 &\approx \left( \begin{array}{rrr} 0 & 0 & 1 \\ 0 & 0 & 0 \\ -1 & 0 & 0 \end{array} \right) = -\, \mathbb O_{31},\phantom{;\frac{1}{\sqrt{2}}\, (\mathbb O_{12}+\mathbb O_{23}),} \end{align} \tag{ 29a }$

$\begin{align} \mathbb J_2 &\approx \frac{1}{\sqrt{2}} \left( \begin{array}{rrr} 0 & -1 & 0 \\ 1 & 0 & -1 \\ 0 & 1 & 0 \end{array} \right) = -\,\frac{1}{\sqrt{2}}\, (\mathbb O_{12}+\mathbb O_{23}), \end{align} \tag{ 29b }$

$\begin{align} \mathbb J_3 &\approx \frac{1}{\sqrt{2}} \left( \begin{array}{rrr} 0 & -1 & 0 \\ 1 & 0 & 1 \\ 0 & -1 & 0 \end{array} \right) = -\,\frac{1}{\sqrt{2}}\, (\mathbb O_{12}-\mathbb O_{23}).~~\, \end{align} \tag{ 29c }$

The bottom panel shows that the algebra of the found three generators $\mathbb J_{\alpha}$ is given by

$\begin{align} \left[ \mathbb J_{\alpha}, \mathbb J_{\beta} \right] = -\,\epsilon_{\alpha\beta\gamma} \mathbb J_\gamma , \end{align} \tag{ 30 }$

in which we recognize the usual SO(3) algebra involving the Levi-Civita permutation symbol $\epsilon_{\alpha\beta\gamma}$ . From now on we will be using the Einstein summation convention for repeated generator-type indices $\alpha, \beta, \gamma, \dots$ .

5.3. Rotations in four dimensions

In this subsection, we generalize the discussion from the previous two subsections to the case of n = 4 dimensions. The new twist here will be the existence of multiple nontrivial subalgebras of different dimensionality.

The discovery of a single finite symmetry transformation with the invariance loss (8) is straightforward and always succeeds in finding some finite rotation in four dimensions, represented with an orthogonal matrix. Therefore in this subsection, we focus only on the identification of the (sub)algebras. As before, we repeat the training for various number of multiple distinct generators ( $N_g = 2,3,4,5,6,7$ ) and with the orthogonality and closure losses turned on. Representative loss curves are shown in figure 8. We observe that a closed algebra is found in four of those cases, namely $N_g = 2,3,4,6$ , which we now discuss in turn (the trivial case of $N_g = 1$ is of course always possible and will not be specifically discussed from now on).

**Figure 8.** The same as figure 6, but for the n = 4 exercise considered in section 5.3. The training results in valid closed algebras with one (not shown), two, three, four and six generators, but not five or seven.
Download figure:
Standard image High-resolution image

5.3.1. Two generator subalgebras

One of our found examples of a closed subalgebra with $N_g = 2$ generators is shown in figure 9. The top panels indicate that the two found generators can be roughly approximated as

**Figure 9.** Top row: the successfully learned generators for the n = 4, $N_g = 2$ exercise considered in section 5.3.1. Bottom panel: the corresponding structure constants. The vanishing of the structure constants indicates that this is an abelian subgroup.
Download figure:
Standard image High-resolution image

$\begin{align} \mathbb J_1 &\approx \left( \begin{array}{rrrr} 0 & 1 & 0 & 0 \\ -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right) = {\mathbb O}_{12}, \end{align} \tag{ 31a }$

$\begin{align} \mathbb J_2 &\approx \left( \begin{array}{rrrr} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & -1 & 0 \end{array} \right) = {\mathbb O}_{34}, \end{align} \tag{ 31b }$

in which we recognize the rotation generators ${\mathbb O}_{12}$ and ${\mathbb O}_{34}$ from (26). As seen from the matrix forms in (31), these two generators commute, since the two rotations are done in completely different planes. Therefore, the algebra formed by $\mathbb J_1$ and $\mathbb J_2$ is abelian, which is independently verified by the bottom panel in figure 9.

One should keep in mind that we do not control the overall orientation of the generators found by our procedure. The example in figure 9 was judiciously chosen to be easily recognizable in terms of the canonical generators (26). A generic training run typically returns the generator set in some random orientation, which, however, still preserves the commutation properties. One such generic example is shown in figure 10.

**Figure 10.** The same as figure 9, but for a different training run, still with $N_g = 2$ .
Download figure:
Standard image High-resolution image

**Figure 10.** The same as figure 9, but for a different training run, still with $N_g = 2$ .
Download figure:
Standard image High-resolution image

This time, the two found generators $\mathbb J_1$ and $\mathbb J_2$ are more general linear combinations of the six canonical generators ${\mathbb O}_{12}$ , ${\mathbb O}_{13}$ , ${\mathbb O}_{14}$ , ${\mathbb O}_{23}$ , ${\mathbb O}_{24}$ , and ${\mathbb O}_{34}$ of the O(4) group. Nevertheless, the found generators $\mathbb J_1$ and $\mathbb J_2$ still commute and form an abelian two-dimensional subalgebra of the full symmetry group.

5.3.2. Three generator subalgebras

Next we discuss the discovered subalgebras with three generators ( $N_g = 3$ ). Since SO(4) contains SO(3) as a subgroup³ , we know that such subalgebras should exist, and indeed, we find such solutions, as seen in figure 8. As mentioned above, a generic training run produces the three generators in a random orientation. For ease of interpretation, in figure 11 we show a judiciously chosen example, in which the generators depicted in the top panels can be recognized to be approximately

$\begin{align} {\mathbb J}_{1} \approx {\mathbb O}_{24}, \qquad {\mathbb J}_{2} \approx {\mathbb O}_{23},\qquad {\mathbb J}_{3} \approx {\mathbb O}_{34}. \end{align} \tag{ 32 }$

This algebra involves rotations primarily in the last three feature dimensions, while the first feature is largely unaffected by the symmetry. The bottom panel of figure 11 confirms that this is the SO(3) algebra from equation (30). Of course, a more generic training run results in a set of three generators that involve all four feature dimensions, but still have the same commutation relations.

**Figure 11.** Top row: the successfully learned generators for the n = 4, $N_g = 3$ exercise considered in section 5.3.2. Bottom panel: the corresponding structure constants, which can be identified as those of the SO(3) algebra (see text).
Download figure:
Standard image High-resolution image

**Figure 11.** Top row: the successfully learned generators for the n = 4, $N_g = 3$ exercise considered in section 5.3.2. Bottom panel: the corresponding structure constants, which can be identified as those of the SO(3) algebra (see text).
Download figure:
Standard image High-resolution image

5.3.3. Four generator subalgebras

Figure 8 shows that our method finds an algebra with four generators as well. To see its origin theoretically, define a new generator basis in terms of sums and differences of pairs of commuting generators from the original basis (26) [53]

$\begin{align} {\mathbb S}_1 \equiv \frac{1}{2}\left( {\mathbb O}_{34}+{\mathbb O}_{12}\right),\quad {\mathbb D}_1 \equiv \frac{1}{2}\left( {\mathbb O}_{34}-{\mathbb O}_{12}\right), \end{align} \tag{ 33a }$

$\begin{align} {\mathbb S}_2 \equiv \frac{1}{2}\left( {\mathbb O}_{42}+{\mathbb O}_{13}\right),\quad {\mathbb D}_2 \equiv \frac{1}{2}\left( {\mathbb O}_{42}-{\mathbb O}_{13}\right), \end{align} \tag{ 33b }$

$\begin{align} {\mathbb S}_3 \equiv \frac{1}{2}\left( {\mathbb O}_{23}+{\mathbb O}_{14}\right),\quad {\mathbb D}_3 \equiv \frac{1}{2}\left( {\mathbb O}_{23}-{\mathbb O}_{14}\right). \end{align} \tag{ 33c }$

This change of basis decouples the algebra (27) of the original generators as follows

$\begin{align} \left[ {\mathbb S}_i, {\mathbb S}_j\right] &= -\,\epsilon_{ijk}\,{\mathbb S}_k, \end{align} \tag{ 34a }$

$\begin{align} \left[ {\mathbb D}_i, {\mathbb D}_j\right] &= -\,\epsilon_{ijk}\,{\mathbb D}_k, \end{align} \tag{ 34b }$

$\begin{align} \left[ {\mathbb S}_i, {\mathbb D}_j\right] &= 0.\qquad\quad \end{align} \tag{ 34c }$

Therefore, a closed algebra of four generators can be formed either from the set of three $\mathbb S$ 's plus any one of the $\mathbb D$ 's, or from the set of three $\mathbb D$ 's plus any one of the $\mathbb S$ 's. In either case, three of the generators will satisfy SO(3)-type commutation relations, while the fourth one will commute with everyone else. This expectation is confirmed in figure 12, which shows our usual representation of the found generators and their algebra for one representative result from a training run with $N_g = 4$ . We observe that the found generators are approximately

**Figure 12.** Top row: the successfully learned generators for the n = 4, $N_g = 4$ exercise considered in section 5.3.3. Bottom panel: the corresponding structure constants.
Download figure:
Standard image High-resolution image

**Figure 12.** Top row: the successfully learned generators for the n = 4, $N_g = 4$ exercise considered in section 5.3.3. Bottom panel: the corresponding structure constants.
Download figure:
Standard image High-resolution image

$\begin{align} {\mathbb J}_1 \equiv \frac{1}{2}\left( {\mathbb O}_{21}+{\mathbb O}_{43}\right) = -\, {\mathbb S}_1, \end{align} \tag{ 35a }$

$\begin{align} {\mathbb J}_2 \equiv \frac{1}{2}\left( {\mathbb O}_{32}+{\mathbb O}_{14}\right) = -\, {\mathbb D}_3, \end{align} \tag{ 35b }$

$\begin{align} {\mathbb J}_3 \equiv \frac{1}{2}\left( {\mathbb O}_{13}+{\mathbb O}_{42}\right)= \, {\mathbb S}_2,\quad \end{align} \tag{ 35c }$

$\begin{align} {\mathbb J}_4 \equiv \frac{1}{2}\left( {\mathbb O}_{14}+{\mathbb O}_{23}\right)= \, {\mathbb S}_3.\quad \end{align} \tag{ 35d }$

Therefore, ${\mathbb J}_1$ , ${\mathbb J}_3$ and ${\mathbb J}_4$ form an SO(3) algebra as implied by equation (34a ), and furthermore, all three of them commute with ${\mathbb J}_2$ , as implied by equation (34c ). This pattern is precisely what we observe in the lower panel of figure 12.

5.3.4. Six generator algebras

Finally, we get to the case of N_g which will reveal the full algebra of SO(4). Figure 13 shows the result from a representative training run seeking a closed algebra with $N_g = 6$ generators. Among the set of learned generators we can approximately recognize ${\mathbb J}_1 \approx {\mathbb O}_{43}$ , ${\mathbb J}_2 \approx -{\mathbb D}_{3}$ , ${\mathbb J}_3 \approx {\mathbb S}_{3}$ , ${\mathbb J}_4 \approx {\mathbb O}_{12}$ , ${\mathbb J}_5 \approx {\mathbb O}_{42}$ and ${\mathbb J}_6 \approx {\mathbb O}_{13}$ .

**Figure 13.** Top row: the successfully learned generators for the n = 4, $N_g = 6$ exercise considered in section 5.3.4. Bottom panel: the corresponding structure constants.
Download figure:
Standard image High-resolution image

The analysis of the last three subsections can be readily extended to even higher dimensions ( $n\geqslant 5$ ). We have checked a few more values of n and the method works each time—we obtain valid finite symmetry transformations which preserve the oracle (25), we find the closed algebra of the full set of $n(n-1)/2$ generators of the orthogonal O(n) group, as well as valid subalgebras. For fun, in figure 28 we depict our derived 45 generators of the SO(10) group.

In conclusion of our discussion of orthogonal groups, in figure 14 we summarize our results for the found closed algebras and subalgebras for $n\leqslant 5$ and different number of generators N_g. Each cell in the table represents a separate exercise at a fixed number of dimensions n and for a fixed number of generators N_g. The cells are color coded by the base-10 logarithm of the lowest value of the loss attained during training. The training is terminated once the loss reaches 10⁻¹², therefore blue cells correspond to successful training runs resulting in closed algebras. The right-most blue cell in each row corresponds to the full algebra, in this case SO(n), as indicated. The preceding blue cells correspond to various subalgebras. In fact we did not anticipate the existence of the $N_g = 4$ subalgebras in the case of n = 4 and n = 5, but our model surprised us.

**Figure 14.** Results from our search for Lie algebras of transformations preserving the Euclidean oracle (25) in $n = 2,3,4,5$ dimensions and for different number of generators N_g. The cells are color coded by the base-10 logarithm of the lowest value of the loss attained during training.
Download figure:
Standard image High-resolution image

6. Lorentz transformations in four dimensional Minkowski space

In this section we consider the four-dimensional Minkowski spacetime $(t,x,y,z)$ (in natural units with c = 1). The usual Lorentz transformations preserve the quadratic form

$\begin{align} \varphi(t,x,y,z) = t^{\,2}-x^2-y^2-z^2, \quad (t,x,y,z)\in {\mathbb R}^4, \end{align} \tag{ 36 }$

hence this will be our oracle in this section. The four input features are

$\begin{align} x^{(0)}=t, \quad x^{(1)}=x, \quad x^{(2)}=y, \quad x^{(3)}=z, \end{align} \tag{ 37 }$

where in keeping with the standard physics notation we start labelling the features from 0.

Our analysis proceeds as usual. First, we find symmetry transformations which are in general combinations of boosts and rotations. Upon inspection, we verify that they have the desired symmetry properties

$\begin{align*} {\mathbb F}_{0i}={\mathbb F}_{i0},~~ {\mathbb F}_{ij}=-{\mathbb F}_{ji},~~ {\mathbb F}_{00}={\mathbb F}_{ii}=0,~~ \forall i=1,2,3. \end{align*}$

Next we analyze the algebras of generators. Before presenting the numerical results, for the reader's convenience we summarize some relevant information about the mathematical structure of the Lorentz group $O(1,3)$ . It has six generators: the three generators of boosts ${\mathbb K}_i$ ,

$\begin{align} {\mathbb K}_1 &= \left( \begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right), \end{align} \tag{ 38a }$

$\begin{align} {\mathbb K}_2 &= \left( \begin{array}{rrrr} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right), \end{align} \tag{ 38b }$

$\begin{align} {\mathbb K}_3 &= \left( \begin{array}{rrrr} 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{array} \right), \end{align} \tag{ 38c }$

and the three generators of rotations, ${\mathbb L}_i$ , given by

$\begin{align} {\mathbb L}_1 &= \left( \begin{array}{rrrr} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \end{array} \right), \end{align} \tag{ 39a }$

$\begin{align} {\mathbb L}_2 &= \left( \begin{array}{rrrr} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \end{array} \right), \end{align} \tag{ 39b }$

$\begin{align} {\mathbb L}_3 &= \left( \begin{array}{rrrr} 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right) , \end{align} \tag{ 39c }$

The full Lorentz algebra can be summarized as

$\begin{align} [{\mathbb L}_i, {\mathbb L}_j] = \epsilon_{ijk}\,{\mathbb L}_k,\!\phantom{-} \end{align} \tag{ 40 }$

$\begin{align} [{\mathbb L}_i, {\mathbb K}_j] = \epsilon_{ijk}\, {\mathbb K}_k,\phantom{..} \end{align} \tag{ 41 }$

$\begin{align} [{\mathbb K}_i, {\mathbb K}_j] = - \epsilon_{ijk}\, {\mathbb L}_k.\,\, \end{align} \tag{ 42 }$

The subgroup structure is most easily seen in a different basis,

$\begin{align} {\mathbb X}_1 = {\mathbb K}_1 - {\mathbb L}_2, \end{align} \tag{ 43a }$

$\begin{align} {\mathbb X}_2 = {\mathbb K}_2 + {\mathbb L}_1, \end{align} \tag{ 43b }$

$\begin{align} {\mathbb X}_3 = {\mathbb K}_3,\phantom{\ + {\mathbb L}_1} \end{align} \tag{ 43c }$

$\begin{align} {\mathbb X}_4 = {\mathbb L}_3,\phantom{\ + {\mathbb L}_1} \end{align} \tag{ 43d }$

$\begin{align} {\mathbb X}_5 = {\mathbb L}_2,\phantom{\ + {\mathbb L}_1} \end{align} \tag{ 43e }$

$\begin{align} {\mathbb X}_6 = {\mathbb L}_1.\phantom{\ + {\mathbb L}_1} \end{align} \tag{ 43f }$

The resulting algebra in the ${\mathbb X}_i$ basis is summarized in table 1. By inspection, we see that the Lorentz algebra has several non-trivial subalgebras.

Table 1. The Lorentz algebra in terms of the generators (43). The cells show the result from commuting one of the generators in the leftmost column with one of the generators in the top row.

	${\mathbb K}_1-{\mathbb L}_2$	${\mathbb K}_2+{\mathbb L}_1$	${\mathbb K}_3$	${\mathbb L}_3$	${\mathbb L}_2$	${\mathbb L}_1$
	${\mathbb X}_1$	${\mathbb X}_2$	${\mathbb X}_3$	${\mathbb X}_4$	${\mathbb X}_5$	${\mathbb X}_6$
${\mathbb X}_1$	0	0	$-{\mathbb X}_1$	$-{\mathbb X}_2$	$+{\mathbb X}_3$	$+{\mathbb X}_4$
${\mathbb X}_2$	0	0	$-{\mathbb X}_2$	$+{\mathbb X}_1$	$+{\mathbb X}_4$	$-{\mathbb X}_3$
${\mathbb X}_3$	$+{\mathbb X}_1$	$+{\mathbb X}_2$	0	0	$-{\mathbb X}_1-{\mathbb X}_5$	${\mathbb X}_2-{\mathbb X}_6$
${\mathbb X}_4$	$+{\mathbb X}_2$	$-{\mathbb X}_1$	0	0	$-{\mathbb X}_6$	$+{\mathbb X}_5$
${\mathbb X}_5$	$-{\mathbb X}_3$	$-{\mathbb X}_4$	$+{\mathbb X}_1+{\mathbb X}_5$	$+{\mathbb X}_6$	0	$-{\mathbb X}_4$
${\mathbb X}_6$	$-{\mathbb X}_4$	$+{\mathbb X}_3$	$-{\mathbb X}_2+{\mathbb X}_6$	$-{\mathbb X}_5$	$+{\mathbb X}_4$	0

6.1. Two generator subalgebras

There are several two-dimensional subalgebras:

Abelian subalgebras. There are two abelian subalgebras: the first one is generated by the generator set $\{{\mathbb X}_1,{\mathbb X}_2\}$ , while the second is generated by $\{{\mathbb X}_3,{\mathbb X}_4\}$ . A representative example of the former case is shown in figure 15. We recognize the two found generators to be approximately
$\begin{align} {\mathbb J}_1 &\approx -\left( {\mathbb K}_3 - {\mathbb L}_1 \right), \end{align} \tag{ 44a }$

$\begin{align} {\mathbb J}_2 &\approx -\left( {\mathbb K}_1 + {\mathbb L}_3 \right). \end{align} \tag{ 44b }$
This result is in agreement with equations (43a ) and (43b )—the transformations are combinations of boosts along and rotations about two of the axes, in this case x and z.
Nonabelian subalgebras. As seen in table 1, the Lie algebra of the Lorentz group has two nonabelian subalgebras, generated by $\{{\mathbb X}_1, {\mathbb X}_3\}$ and $\{{\mathbb X}_2,{\mathbb X}_3\}$ , respectively. A representative training example for this nonabelian case is shown in figure 16. The top panels show that the two found generators are approximately
$\begin{align} {\mathbb J}_1 &\approx -\, {\mathbb K}_3 = -\, {\mathbb X}_3,\kern3.9pt \end{align} \tag{ 45a }$

$\begin{align} {\mathbb J}_2 &\approx {\mathbb K}_1 - {\mathbb L}_2 = {\mathbb X}_1, \end{align} \tag{ 45b }$

while the bottom panel confirms that their Lie bracket is approximately given by $[{\mathbb J}_1, {\mathbb J}_2]\approx [-{\mathbb X}_3,{\mathbb X}_1] =$ $-{\mathbb X}_1 = -\, {\mathbb J}_2$ , as expected from table 1.

**Figure 15.** Top row: successfully learned generators for the n = 4, $N_g = 2$ exercise considered in section 6.1, using the pseudo-Euclidean oracle (36). Bottom panel: the corresponding structure constants. This example shows an abelian subalgebra.
Download figure:
Standard image High-resolution image

**Figure 15.** Top row: successfully learned generators for the n = 4, $N_g = 2$ exercise considered in section 6.1, using the pseudo-Euclidean oracle (36). Bottom panel: the corresponding structure constants. This example shows an abelian subalgebra.
Download figure:
Standard image High-resolution image

**Figure 16.** The same as figure 15, but showing a non-abelian subalgebra found by our method.
Download figure:
Standard image High-resolution image

6.2. Three generator subalgebras

Table 1 reveals several three-dimensional subalgebras:

SO(3). The set $\{{\mathbb X}_4,{\mathbb X}_5,{\mathbb X}_6\}$ generates a subalgebra isomorphic to the Lie algebra of the SO(3) group of rotations in three dimensions, see section 5.2. A representative training example is shown in figure 17. The generators can be recognized as
$\begin{align} {\mathbb J}_1 &\approx {\mathbb L}_3, \end{align} \tag{ 46a }$

$\begin{align} {\mathbb J}_2 &\approx {\mathbb L}_1, \end{align} \tag{ 46b }$

$\begin{align} {\mathbb J}_3 &\approx {\mathbb L}_2. \end{align} \tag{ 46c }$
In addition, we also found examples with subalgebras isomorphic to SO(3) consisting of one rotation and two boosts, or one boost and two rotations.
Hom(2). The set $\{{\mathbb X}_1,{\mathbb X}_2,{\mathbb X}_3\}$ generates a subalgebra isomorphic to the Lie algebra of Hom(2), the group of Euclidean homotheities:
$\begin{align} [{\mathbb X}_1, {\mathbb X}_2] = 0, ~ [{\mathbb X}_3, {\mathbb X}_1] = {\mathbb X}_1, ~ [{\mathbb X}_2, {\mathbb X}_3] = -{\mathbb X}_2. \end{align} \tag{ 47 }$
A representative training example is shown in figure 18. From the panels in the top row we recognize the found three generators as
$\begin{align} {\mathbb J}_1 &\approx +\, {\mathbb K}_2 -\, {\mathbb L}_1, \end{align} \tag{ 48a }$

$\begin{align} {\mathbb J}_2 &\approx -\, {\mathbb K}_1 -\, {\mathbb L}_2, \end{align} \tag{ 48b }$

$\begin{align} {\mathbb J}_3 &\approx +\, {\mathbb K}_3.\phantom{{\mathbb L}_2,\quad} \end{align} \tag{ 48c }$
The bottom panel in figure 18 confirms that their algebra is approximately that of equation (47).
E(2). The set $\{{\mathbb X}_1,{\mathbb X}_2,{\mathbb X}_4\}$ generates a subalgebra isomorphic to the Lie algebra of E(2), the Euclidean group:
$\begin{align} [{\mathbb X}_1, {\mathbb X}_2] = 0, ~ [{\mathbb X}_4, {\mathbb X}_1] = {\mathbb X}_2, ~ [{\mathbb X}_2, {\mathbb X}_4] = {\mathbb X}_1. \end{align} \tag{ 49 }$
A representative example is shown in figure 19, from which we recognize the found three generators as
$\begin{align} {\mathbb J}_1 &\approx -\, {\mathbb K}_3 -\, {\mathbb L}_2, \end{align} \tag{ 50a }$

$\begin{align} {\mathbb J}_2 &\approx -\, {\mathbb L}_1,\quad\kern14.5pt \end{align} \tag{ 50b }$

$\begin{align} {\mathbb J}_3 &\approx -\, {\mathbb K}_2 +\, {\mathbb L}_3. \end{align} \tag{ 50c }$
As expected, two of the generators involve combinations of boosts along and rotations about two of the axes, in this case y and z, while the third generator is a rotation about the remaining axis, namely x. The bottom panel in figure 19 shows that the resulting algebra is approximately that of equation (49).
Bianchi VII ${}_a$ . The set $\{{\mathbb X}_1,{\mathbb X}_2,{\mathbb X}_3+a{\mathbb X}_4\}$ with $a\ne 0$ generates a Bianchi VII ${}_a$ subalgebra. A representative training example is shown in figure 20, from which we recognize the found three generators as
$\begin{align} {\mathbb J}_1 &\approx -\, {\mathbb K}_3 -\, {\mathbb L}_3 = -({\mathbb X}_3+{\mathbb X}_4), \end{align} \tag{ 51a }$

$\begin{align} {\mathbb J}_2 &\approx +\, {\mathbb K}_1 -\, {\mathbb L}_2 = {\mathbb X}_1,\phantom{=\quad{\mathbb X}_5.} \end{align} \tag{ 51b }$

$\begin{align} {\mathbb J}_3 &\approx -\, {\mathbb K}_2 -\, {\mathbb L}_1 = -\, {\mathbb X}_2.\phantom{+{\mathbb X}_5.} \end{align} \tag{ 51c }$
This set corresponds to a Bianchi VII ${}_a$ subalgebra with a = 1, whose structure constants are indeed consistent with the lower panel in figure 20.
SL(2,R). The set $\{{\mathbb X}_1,{\mathbb X}_3,{\mathbb X}_5\}$ generates a subalgebra isomorphic to the Lie algebra of $SL(2,R)$ , the group of isometries of the hyperbolic plane:
$\begin{align} [{\mathbb X}_1, {\mathbb X}_3] &= -\, {\mathbb X}_1,\phantom{{\mathbb X}_5.\quad\!} \end{align} \tag{ 52a }$

$\begin{align} [{\mathbb X}_5, {\mathbb X}_1] &= -\, {\mathbb X}_3,\phantom{{\mathbb X}_5\quad.\!} \end{align} \tag{ 52b }$

$\begin{align} [{\mathbb X}_3, {\mathbb X}_5] &= -\, {\mathbb X}_1 -\, {\mathbb X}_5. \end{align} \tag{ 52c }$
A representative training example is shown in figure 21, from which we recognize the found three generators as
$\begin{align} {\mathbb J}_1 &\approx {\mathbb K}_1 - {\mathbb L}_3, \end{align} \tag{ 53a }$

$\begin{align} {\mathbb J}_2 &\approx -\, {\mathbb L}_3, \end{align} \tag{ 53b }$

$\begin{align} {\mathbb J}_3 &\approx {\mathbb K}_2.\kern21.5pt \end{align} \tag{ 53c }$
It is easy to verify that their structure constants given in the lower panel of figure 21 are consistent with equation (52).

**Figure 17.** Top row: the successfully learned generators for the n = 4, $N_g = 3$ exercise considered in section 6.2, using the pseudo-Euclidean oracle (36). Bottom panel: the corresponding structure constants, which can be identified as those of an *SO(3)* type subalgebra.
Download figure:
Standard image High-resolution image

**Figure 17.** Top row: the successfully learned generators for the n = 4, $N_g = 3$ exercise considered in section 6.2, using the pseudo-Euclidean oracle (36). Bottom panel: the corresponding structure constants, which can be identified as those of an *SO(3)* type subalgebra.
Download figure:
Standard image High-resolution image

**Figure 18.** The same as figure 17, but showing an example of a *Hom(2)* subalgebra (47).
Download figure:
Standard image High-resolution image

**Figure 19.** The same as figure 17, but showing an example of a E(2) subalgebra (49).
Download figure:
Standard image High-resolution image

**Figure 20.** The same as figure 17, but showing an example of a Bianchi VII ${}_a$ subalgebra.
Download figure:
Standard image High-resolution image

**Figure 20.** The same as figure 17, but showing an example of a Bianchi VII ${}_a$ subalgebra.
Download figure:
Standard image High-resolution image

**Figure 21.** The same as figure 17, but showing an example of an $SL(2,R)$ subalgebra (52).
Download figure:
Standard image High-resolution image

**Figure 21.** The same as figure 17, but showing an example of an $SL(2,R)$ subalgebra (52).
Download figure:
Standard image High-resolution image

6.3. Four generator subalgebras

The only four-dimensional subalgebra is generated by $\{{\mathbb X}_1,{\mathbb X}_2,{\mathbb X}_3,{\mathbb X}_4\}$ . A representative example is shown in figure 22. The four found generators are given by

**Figure 22.** Top row: the successfully learned generators for the n = 4, $N_g = 4$ exercise considered in section 6.3, using the pseudo-Euclidean oracle (36). Bottom panel: the corresponding structure constants.
Download figure:
Standard image High-resolution image

$\begin{align} {\mathbb J}_1 &\approx {\mathbb L}_3 = {\mathbb X}_4, \end{align} \tag{ 54a }$

$\begin{align} {\mathbb J}_2 &\approx {\mathbb K}_2 + {\mathbb L}_1 = {\mathbb X}_2, \end{align} \tag{ 54b }$

$\begin{align} {\mathbb J}_3 &\approx -\, {\mathbb K}_1 + {\mathbb L}_2 = -\, {\mathbb X}_1, \end{align} \tag{ 54c }$

$\begin{align} {\mathbb J}_4 &\approx -\, {\mathbb K}_3 = -\, {\mathbb X}_3. \end{align} \tag{ 54d }$

The lower panel of figure 22 illustrates the corresponding structure constants. The results are consistent with table 1. For example, ${\mathbb J}_1$ and ${\mathbb J}_4$ commute because $[{\mathbb X}_3,{\mathbb X}_4] = [{\mathbb K}_3,{\mathbb L}_3] = 0$ . Similarly, ${\mathbb J}_2$ and ${\mathbb J}_3$ commute due to $[{\mathbb X}_1,{\mathbb X}_2] = 0$ . The results shown in the remaining rows in the lower panel of figure 22 can be analogously verified with the help of table 1.

6.4. Six generator algebras

Our method is capable of finding the full six-dimensional algebra as well. The result is shown in figure 23, and can be roughly identified as

**Figure 23.** Top row: the successfully learned generators for the n = 4, $N_g = 6$ exercise considered in section 6.4, using the pseudo-Euclidean oracle (36). Bottom panel: the corresponding structure constants.
Download figure:
Standard image High-resolution image

$\begin{align} {\mathbb J}_1 &\approx -\, {\mathbb L}_1, \quad {\mathbb J}_2 \approx -\, {\mathbb K}_2, \quad {\mathbb J}_3 \approx -\, {\mathbb L}_3, \end{align} \tag{ 55a }$

$\begin{align} {\mathbb J}_4 &\approx +\, {\mathbb K}_1, \quad {\mathbb J}_5 \approx -\, {\mathbb K}_3, \quad {\mathbb J}_6 \approx -\, {\mathbb L}_2. \end{align} \tag{ 55b }$

The corresponding structure constants are shown in the lower panel of figure 23.

Figure 24 summarizes the results for the value of the overall loss function obtained in typical training runs, as a function of the requested number of generators N_g. The green circles represent the cases when a closed subalgebra was successfully found, and the loss is essentially machine zero. The remaining cases in figure 24 correspond to unsuccessful trainings, i.e. no closed algebra with that many generators was found. In those cases, the total loss remained relatively large, due to tension with either the orthogonality condition (red crosses) or the closure condition (magenta diamonds).

**Figure 24.** The final value of the overall loss function as a function of the requested number of generators N_g for the $SO(1,3)$ example of section 6. The colored symbols identify the dominant contribution to the overall loss: magenta diamonds for the closure term (14) and red crosses for the orthogonality term (13). For the green circles the total loss is zero to within machine precision. The learning rate was 0.005 and ε = 0.001.
Download figure:
Standard image High-resolution image

7. Squeeze mapping in two dimensions

Consider again two dimensions, but now let the oracle return the product of the two input features:

$\begin{align} \varphi(\mathbf{x}) = x^{(1)}\, x^{(2)}. \end{align} \tag{ 56 }$

This oracle function is illustrated in figure 25 as a color heatmap representing the oracle values in the $(x^{(1)}, x^{(2)})$ Cartesian plane. We see that the set of points with the same oracle values form hyperbolas.

Proceeding as before, our method finds the symmetry transformation illustrated with the vector field in figure 25. The corresponding generator is

$\begin{align} \mathbb{J} = \left( \begin{array}{rr} -1 & 0 \\ 0 & 1 \end{array} \right). \end{align} \tag{ 57 }$

We verified that in this example the method is unable to find more than one generator.

These results are precisely what one would expect. The symmetry which preserves the oracle (56) is the squeeze mapping

$\begin{align} \mathbb F = \left( \begin{array}{cc} \frac{1}{\ell} & 0 \\ 0 & \ell \end{array} \right), \end{align} \tag{ 58 }$

where $\ell$ is a scale factor. Considering infinitesimal transformations (18) with $\ell = 1 + \varepsilon$ immediately leads to the generator (57).

8. Discontinuous oracles

8.1. Piecewise linear oracle

Our setup is not limited to only continuous oracle functions like the ones discussed so far in the preceeding sections. Our method can also work with piecewise-defined functions like

$\begin{align} \varphi(\mathbf{x}) = \left\{ \begin{array}{ll} -x^{(2)}, \qquad \mathrm{for}\,x^{(1)} < 0, \\ +x^{(1)}, \qquad \mathrm{for}\,x^{(1)}\geqslant 0. \end{array} \right. \end{align} \tag{ 59 }$

The resulting oracle function is illustrated with the color map in figure 26. Since the oracle is now a function which is not continuously differentiable, our parametrization of the symmetry transformation needs to be properly generalized.

The advantage of using a NN as a universal function approximator is highlighted in figure 26. In the top panel we use no hidden layers, while in the bottom panel we use a deep learning architecture with three hidden layers. The found symmetry transformation in each case is then shown with the vector field and superimposed on the oracle color map. The results are noticeably different, particularly near the locations of discontinuity in the oracle function. The deep-learning approach in the bottom panel correctly identifies a transformation which preserves the oracle values everywhere within the considered domain. In contrast, the shallow approach in the top panel is unable to adjust the transformation near the discontinuity, which leads to locations near the boundary $x^{(1)} = 0$ where the arrows run across the equipotential contours, violating the conservation law.

8.2. Manhattan distance oracle

In this subsection we consider one more example of an oracle in two dimensions (n = 2), which, while continuous, is not continuously differentiable:

$\begin{align} \varphi(\mathbf{x}) = |x^{(1)}|+|x^{(2)}|. \end{align} \tag{ 60 }$

The results from our procedure are shown in figure 27 in complete analogy to figure 26. In the top panel we use no hidden layers, while in the bottom panel we use a deep learning architecture with three hidden layers and bias. We observe that the deep-learning approach can again correctly handle the discontinuities, always generating transformations along, but never across, the contours of equal oracle function values.

9. Conclusions

In this paper, we studied a fundamental problem in data science which is commonly encountered in many fields: what is the symmetry of a labeled dataset, and how to identify its group structure? For this purpose, we designed a deep-learning method which models the generic symmetry transformation and its generators with a fully connected NN. We then constructed suitable loss functions which ensure that the applied transformations are symmetries and that the corresponding set of generators forms a closed (sub)algebra. An important advantage of our approach is that we do not require any advance knowledge of what symmetries can be expected in the data, i.e. instead of testing for symmetries from a predefined list of possibilities, we learn the symmetry directly.

Our procedure is very general and is universally applicable in a wide variety of situations (see, e.g. [54]). The centerpiece of our analysis is an oracle $\varphi(\mathbf{x})$ which defines the conserved quantity. The oracle can be completely general, as illustrated with several examples in the paper. For example, it can be a continuous bilinear function, as in the case of the Euclidean or Minkowski distances and the squeeze mapping; the corresponding symmetries were discussed in sections 5–7. It can also be a discontinuous function (section 8.1) or a function which is not smooth and continuously differentiable (section 8.2).

In the process of deriving the full set of symmetries, the method also allows us to analyze the complete subgroup structure of the symmetry group. In the paper we worked out explicitly the subgroup structure of the rotation groups SO(2) (section 5.1), SO(3) (section 5.2) and SO(4) (section 5.3), as well as the Lorentz group $SO(1,3)$ ((section 6)). As one last tour de force example, we successfully applied our method to the case of Euclidean length-preserving rotations in n = 10 dimensions. The resulting 45 generators of the corresponding group SO(10) commonly used in grand unification [55] are shown in figure 28.

**Figure 28.** The set of $N_g = 45$ generators found by our method in the case of the SO(10) group (length-preserving rotations in n = 10 dimensions). Due to the complexity of the problem, the learning rate for this example was reduced to 0.01.
Download figure:
Standard image High-resolution image

**Figure 28.** The set of $N_g = 45$ generators found by our method in the case of the SO(10) group (length-preserving rotations in n = 10 dimensions). Due to the complexity of the problem, the learning rate for this example was reduced to 0.01.
Download figure:
Standard image High-resolution image

The symmetries discussed in this paper have important implications for simulation-based inference, and in particular parameter retrieval from observations. In a typical inverse problem scenario, the (possibly multi-dimensional) labels y play the role of observed variables, and the features x play the role of input parameters. The parameter retrieval task is to infer x given y. In the presence of a symmetry this inversion is not unique, but results in a whole family of valid input parameters, all related by a symmetry. Therefore, knowing that there is a symmetry in the data to begin with can be very useful in parameter retrieval algorithms, especially when the data is very high dimensional and the symmetry is difficult to see by the human eye.

Note that the same approach can be extended or modified to solve other common problems and tasks in group theory and/or Beyond-Standard-Model (BSM) model building not discussed in this paper.

Derivation of the Cartan subalgebra. In our approach, this can be trivially accomplished by setting $a_{[\alpha\beta]\gamma} = 0$ in (15), which will force all the learned generators to commute.
Direct product decompositions. We can also look for symmetries whose algebras can be expressed as direct sums of two separate subalgebras—we just have to include closure terms in the loss function for each of the two individual subalgebras.
Internal symmetries. Note that our approach can be readily generalized to look for internal symmetries [41], which can prove useful in model building in high energy theory.
Canonical basis for the generators. Since our method is basis-independent, the generators are obtained in a generic basis, where the structure constants may be difficult to recognize. We could, however, aid the program in finding a canonical basis of generators by including a term in the loss function which encourages sparsity among the structure constants [56].

In this paper we focused on idealized examples with no noise or statistical fluctuations in the determination of the labels. It will be instructive to test our method on noisy datasets, including real experimental data. We postpone this study to a future publication.

In conclusion, our study opens the door for using a ML approach in the study of Lie groups and their properties and further bridges the fields of formal mathematics and theoretical computer science.

Acknowledgment

We thank A Davis, S Gleyzer, K Kong, S Mrenna, H Prosper and P Shyamsundar for useful discussions. We thank P Ramond for group theory insights and inspiration. This work is supported in part by the U.S. Department of Energy Award Number DE-SC0022148.

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: https://github.com/royforestano/Deep_Learning_Symmetries [57].

Deep learning symmetries and their Lie groups, algebras, and subalgebras from first principles

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Setup and notations

3. Deep learning approach

3.1. Invariance

3.2. Infinitesimality

3.3. Orthogonality

3.4. Closure

4. Linear algebra approach

5. Length-preserving transformations

5.1. Rotations in two dimensions

5.2. Rotations in three dimensions

5.3. Rotations in four dimensions

5.3.1. Two generator subalgebras

5.3.2. Three generator subalgebras

5.3.3. Four generator subalgebras

5.3.4. Six generator algebras

6. Lorentz transformations in four dimensional Minkowski space

6.1. Two generator subalgebras

6.2. Three generator subalgebras

6.3. Four generator subalgebras

6.4. Six generator algebras

7. Squeeze mapping in two dimensions

8. Discontinuous oracles

8.1. Piecewise linear oracle

8.2. Manhattan distance oracle

9. Conclusions

Acknowledgment

Data availability statement

Footnotes

Deep learning symmetries and their Lie groups, algebras, and subalgebras from first principles

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Setup and notations

3. Deep learning approach

3.1. Invariance

3.2. Infinitesimality

3.3. Orthogonality

3.4. Closure

4. Linear algebra approach

5. Length-preserving transformations

5.1. Rotations in two dimensions

5.2. Rotations in three dimensions

5.3. Rotations in four dimensions

5.3.1. Two generator subalgebras

5.3.2. Three generator subalgebras

5.3.3. Four generator subalgebras

5.3.4. Six generator algebras

6. Lorentz transformations in four dimensional Minkowski space

6.1. Two generator subalgebras

6.2. Three generator subalgebras

6.3. Four generator subalgebras

6.4. Six generator algebras

7. Squeeze mapping in two dimensions

8. Discontinuous oracles

8.1. Piecewise linear oracle

8.2. Manhattan distance oracle

9. Conclusions

Acknowledgment

Data availability statement

Footnotes