Detecting symmetries with neural networks

Identifying symmetries in data sets is generally difficult, but knowledge about them is crucial for efficient data handling. Here we present a method how neural networks can be used to identify symmetries. We make extensive use of the structure in the embedding layer of the neural network which allows us to identify whether a symmetry is present and to identify orbits of the symmetry in the input. To determine which continuous or discrete symmetry group is present we analyse the invariant orbits in the input. We present examples based on rotation groups SO(n) and the unitary group SU(2). Further we find that this method is useful for the classification of complete intersection Calabi-Yau manifolds where it is crucial to identify discrete symmetries on the input space. For this example we present a novel data representation in terms of graphs.


Introduction
One ubiquitous feature in nature is the presence of symmetries, ranging from the ultra-small captured by the symmetries underlying the Standard Model of Particle Physics to the isotropy and homogeneity of our Universe on cosmological scales; and in every day life when one wants to identify objects in a picture with a neural network.The question we pursue in this paper is: Can we use neural networks to detect symmetries in an underlying data product?
We present a method which is suitable for data questions where we have samples of a function of the input variables f (x input ).This situation is present in supervised learning.The presence of a symmetry is simply the statement that inputs which are transformed under some symmetry transformation x input → S(x input ) lead to the same output f (S(x input )) = f (x input ).
The key idea which we utilise to find symmetries, is the fact that objects which are invariant under symmetries are clustered together in the embedding space (i.e. the second to last layer in our neural networks).As a first step, this reveals the presence of symmetries.Effectively, this is rather similar to word embeddings found in word2vec [1], which has also been utilised to identify similarities between chemical elements [2].By analysing the relation of the points in the input space we are then able to identify the nature of the symmetry, i.e. we determine the generators of the symmetries.
We test this method on artificial datasets with an underlying rotational group SO (2) and SO (3), and show how we can identify a unitary group (here: SU (2)) and distinguish it from larger symmetry groups (here: SO(4)).To show the applicability of the identification of generators in higher dimensional datasets (e.g.images), we discuss how we can identify SO(2) in the context of rotated MNIST data.
We use this method in the context of the classification of consistent vacua in string theory.
Finding distinct ways to obtain string vacua is a crucial step in improving our understanding of string theory as a theory of quantum gravity.One aspect is the classification of consistent string backgrounds, in particular Calabi-Yau manifolds (CYs).To obtain a classification one needs to remove redundancies arising from multiple representations of the same manifold.We apply our method to the case of complete intersection Calabi-Yau manifolds (CICYs).Utilising a novel representation in terms of graph networks, we perform the supervised classification task for two topological invariants, the Hodge numbers h1,1 and h 1,2 .When analysing the embedding layer, we are able to re-identify the known identities in the dataset.
The rest of the paper is organised as follows.In Section 2 we describe how symmetries can be found in the embedding layer.We then examine the orbits in the input layer to identify the underlying symmetry in Section 3, before presenting our conclusions.

Finding Symmetries
In this section we present a method of how to identify previously 'unknown' symmetries in a dataset by examining the clustering behaviour in the embedding layer.We study this method on two types of examples -continuous and discrete symmetries.
In the first part, we discuss two examples based on real and complex-valued functions.For this we take the Mexican hat potential in two dimensions which features an SO(2)-symmetry, and an SU (2) invariant superpotential (holomorphic function).The procedure to find symmetries is as follows: Within these potentials, we define classes which are defined by a respective value of the potential.This enables us to construct a classification problem. 1 We train our network to address this classification task and examine the representation in the embedding layer.This reveals that the representation distinguishes between points connected via the symmetry and points not connected but still in the same class.Coarsely speaking, the network clusters symmetry invariant points and there is a gap in the embedding layer to the other points in the class.
In the second part, we study discrete symmetries in the context of classification of CICYs in three dimensions.We take multiple representatives of each manifold, and train the network to classify some topological invariants, the Hodge numbers h 1,1 and h 1,2 .Again, by analysing the structure of the embedding layer, we are able to identify finer grained classes compared to the trained classes.These finer grained classes are comprised of different representatives of the respective CICY manifold.The neural network must use other quantities which it is not trained on.
Depending on the dimension of the embedding space, we use a dimensional reduction with TSNE [3] to be able to plot the data points and to visualise its structures.This identification of a symmetry in the dataset is then used in a second step to construct the generators associated with this symmetry.This is discussed in Section 3 and this step allows us to identify the underlying symmetry.

Mexican-Hat-Potential
We start with a two dimensional function with an underlying SO(2)-symmetry: where we use a = 2.3 for our numerical experiments.Here, two types of points appear: Points with the same value for the potential (1) which are related by a symmetry transformation and points which are not related by a symmetry.Examples of such points can be found in the plot of the potential shown in the right panel of Figure 1.
We formulate our classification problem as follows: we define 11 classes for the function where the values of these classes are as follows: Then we sample points by randomly picking values for x and y, and checking whether they belong to one of the classes.For training, we use balanced training sets with ∼ 1000 representatives per class.We train a simple network consisting of 7 dense layers with 80 hidden units with ReLu-activation and a final layer with 11 dimensional softmax output activation. 2 We use categorical crossentropy with Adam optimiser. 3We train our network on this classification task to a reasonable training accuracy (above 95 percent). 4We then visualise the representation on the embedding layer by applying TSNE on this 80-dimensional data set which can be found in Figure 1.
Looking at a specific class, one can directly see that the separating property is the norm of the point.To be precise, points bigger than the norm of the minimum of the potential at r = a/2 are separated by points with smaller norm.In Figure 1 we can identify for multiple of these classes that they clearly split in two regions whereas for classes with elements from only 'one' radius they are not split.

Superpotential
We now demonstrate the method on an example with an SU (2)-symmetry.To do this we examine the following complex valued function where x = (x 1 , x 2 ), y = (y 1 , y 2 ) ∈ C 2 and transform in the fundamental and anti-fundamental representation of SU (2) respectively.Such holomorphic functions appear for instance in super- symmetric field theories and are referred to as superpotentials.Here we are interested in finding the symmetries in this superpotential.In addition to the SU (2)−symmetry, this superpotential has two independent scaling symmetries: where a, b = 0.However, we check that orbits of these symmetries are not present in our datasets.
Proceeding as before, we firstly sample points for the superpotential and categorise them regarding their outputs.We have one classification with 11 class labels for the real part and one classification for the imaginary part.We choose the following numerical ranges, which are symmetric around zero: With this classification we cover the entire output range in the open subset Re(z), Im(z) ∈ (−5., 5.).Again, we sample the points by randomly picking values for x and y, and checking whether their real and imaginary part both belong to one of these classes.As in the previous case, we trained a simple network consisting of 7 dense layers with 60 neurons and ReLuactivation, followed by two 11-dimensional dense layers with softmax activation.As before, we use categorical crossentropy for each of these output layers with an Adam optimiser.For training we used a balanced set with ∼ 1000 representatives per class and we terminated training at an accuracy of slightly above 95 percent.Again, we visualise the structure of the 60-dimensional embedding layer by applying TSNE and show the resulting two dimensional space in Figure 2.
In this projection, it is tedious to find different regions as a consequence of having 121 different Re(x 1 y 2 x 2 y 1 ) Figure 2: Left: This is a TSNE-projection of the 60-dimensional embedding space (perplexity 40).The coloured dots mark the same classes as highlighted on the right hand side.Gray dots denote the other points in the embedding.Right: SU (2) invariant quantity x 1 •y 2 −x 2 •y 1 .Most classes have two distinct representatives but some only have one.For instance, the yellow and light orange class have a single SU (2) invariant.In the embedding layer there are no distinct clusters for these points unlike for the other points.
classes.We highlight some examples of the separation in the point clouds in Figure 2 with one and two distinct SU (2) representatives respectively.This can be seen by computing the invariant quantity of SU (2) ij x i y j (where ij = − ji and 12 = 1) and find that there are two different values for most of our classes.Once again, the latent representation reveals the symmetry structure of the problem.As a consistency check we find that no such structure is observed on the input data.

Discrete Case: Identifying distinct string theory vacua
After these warm-up exercises we now discuss an example where finding the symmetries in a dataset are crucial to answer a question in mathematical physics: How many distinct vacua of string theory can be constructed in a particular class of string models?
Knowing which distinct ways one can obtain string vacua is a crucial question in our understanding of string theory as a theory of quantum gravity.One sub-question is associated to classifying consistent background geometries for string theory, in particular CY-manifolds [5].
CICYs provide an interesting class of such backgrounds: their classification has been achieved in three and partially in four dimensions [6,7] and models on such spaces are among the most realistic string vacua constructions to date [8,9].The initial enumeration features many representations which are related by a priori unknown symmetries.Although they have been identified in a heroic effort for three and four dimensions, it is unknown what the symmetries are in higher dimensions.The knowledge of these symmetries is necessary in order to tackle the combinatorial complexity of the initial enumeration which renders a classification in higher dimensions currently unfeasible.
CICYs are realized as complete intersections in products of complex projective spaces whose classical description we now review (cf.[10] for more details).

Construction -classical description
A CICY can be described by its configuration matrix which, for instance, can look like this The notation is to be understood as follows: The first column of the matrix denotes the dimension of the projective space, here our space is the product space P 1 × P 2 × P 3 .The other columns encode the information on the polynomials which define the hypersurface in the ambient product space.The entries in a given column refer to the multi-degrees in the corresponding projective space.The CICY is defined as the zeros of these polynomials.To write the polynomials explicitly for this example, we have to define the coordinates of each space: P 1 is denoted with x a , were a = 0, 1, the P 2 coordinates by y i with i = 0, 1, 2, and for P 3 we have z m with m = 0, 1, 2, 3.The polynomials can be written as (before imposing any scaling of the projective spaces): where c ai and d aijmnpq are complex coefficients.Therefore, the configuration matrix describes a family of CICYs parametrised by the space of the coefficients.Many basic properties do not depend on the explicit form of the polynomials, but only on the configuration matrix (so for example the Euler characteristic depends on the configuration matrix rather than on the explicit polynomials).This feature is the strength of this notation, and one of the motivations to introduce it.For the hypersurface to be a CY-manifold, the rows have to satisfy the following relation between the degree of the projective factor and its appearance in all polynomials: Restricting to manifolds of fixed complex dimension d leads to the constraint on the number of projective components where k denotes the number of equations.In combination with the observation that a P 1 factor with a quadratic constraint is redundant, it can then be shown that there is only a finite number of such configuration matrices [11].In [6] 7890 of such matrices were singled out for the case of threefolds, utilising some additional identities which are discussed below.This dataset can be found online [12].In [13] it was pointed out that 435 of these matrices are redundant and describe the same CICY.For fourfolds 921, 497 configuration matrices were obtained in [7] and in higher dimensions the corresponding sets of configuration matrices are unknown.In the following we focus on the case of three-folds.

Identities -discrete symmetries
The simplest identities which leave the underlying CICYs unchanged are permutations of rows and columns in the configuration matrices.
Beyond this, there are several further identities how configuration matrices are linked to each other which can be checked explicitly for small configuration matrices and the identities can then be applied in general [6].To obtain the classification one can choose one of these respective representations.They can be summarised as follows: Here n denotes a vector containing the dimensions of 'arbitrary' projective spaces.a, b denote vectors containing zeros everywhere but in one entry which equals one.c denotes vector with cross sum two.q are appropriate matrices to render the configuration matrix consistent.

CICYs as graphs -new data representation
The representation in terms of configuration matrices is not permutation invariant, although we are interested in properties which are insensitive to the choice of permutation.This can be achieved when considering a graph representation of the configuration matrix.Such mappings to graphs have shown improved performance such as in classifying properties of molecules [14].
For this novel representation of CICYs we mapped the right part of the configuration matrix (which is sufficient to reconstruct the whole matrix) to a graph.An example of such a graph is shown in Figure 3.We assign different weights to connections in rows and columns respectively.This representation has the advantage that our notation of CICYs is invariant under the permutation of rows and columns.

Configuration matrix
Graph representation Next neighbours As the next step, we have to prepare the data in such a way that we can feed the graphs in our network.Therefore, we have to translate the properties of the graphs into a numerical description.We use the next neighbours of each point which are shown for our example in Figure 3 on the right side.We calculated these features for all CICYs and hence obtained a dictionary for all types of points in this dataset, finding 285 types.This naturally gives a 285dimensional feature vector with integer entries.As these feature vectors do not uniquely identify a CICY we also use the eigenvalues of the adjacency matrix of the graph as input.In summary, we took the feature vector which has a clear length consisting of integers and the eigenvalues of the adjacency matrix, padded with additional zeros as input for our network.This leads to a 315-dimensional input vector.Note that the identities correspond to local operations on our graphs.

Training of the network
Our target output data are the topological invariants h 1,1 and h 1,2 which were obtained in [15].
For this supervised learning task, we now proceed as in the continuous case, in particular as in the SU (2) case with two output classification layers, one for h 1,1 and one for h 1,2 .
We started from the classified input-output pairs, and constructed 500 random representatives of each class using identities (if applicable) and permutations.As next step, we constructed the 315-dimensional input vector as previously described.We note that in this representation each class has a different number of representatives, depending on the number of identities which can be applied.For example the so called quintic hypersurface 4 5 (9) just has one representative because no identities can be applied here.However for other CICYs we obtain between 100 and 300 representatives.We end up with around 600, 000 different input vectors.The clear advantage of this input is that we can be sure that two different data-points always describe two distinct matrices which are not related via permutations.The network we use is a simple multilayer-perceptron with ReLu-activation functions and two softmax-classifications as the final layer, details can be found in Table 1.Again we stop training when the network achieves above 95 percent accuracy in both classifications.For the analysis of the results, we only use the correctly classified data-points.

Analysis of the results
As we face a situation with too many classes we utilise a different method to analyse the nearest neighbours in the embedding layer.For a given input configuration, we look at distances of its nearest neighbours in the embedding layer.We identify a sufficient threshold and compare the class labels of the points closer than the threshold. 5  As a first step, we pick one data point in the embedding space and find the 250 nearest neighbours with respect to their Euclidean distance.A plot of these lines are the blue curves in Figure 4. Two generic features are several plateaus in the distance curve and several big jumps between two points which are shown in yellow in Figure 4. We are interested in the biggest jump, and we use this as our threshold to distinguish manifolds.The red line in Figure 4 is the location of the threshold.The prediction is that points closer than the point at the threshold all belong to one class.We require that we are looking at least at one neighbour.This prediction is quite successful given the fact that the network is just trained with the Hodge numbers, and has no training on the CICY labels.Figure 5 summarises the performance of our method with respect to the CICY labels and we find that for the vast majority of data points the neighbours are correctly classified (for 86.6% of CICY labels we find an accuracy above 95%).Outliers arise for CICYs with one or two existing representatives which is expected from this method.
Focusing on the Hodge pair with h 1,1 = 10 and h 1,2 = 20, there are 292 distinct CICYs.Again (cf. Figure 5 right panel), we find that the majority of the CICYs are correctly classified with our method -noting only a small drop to 80.6% compared to the performance on the entire 5 There is no obstruction to apply this procedure also in the previous situations.
The surprising part is that as far as we know there is no straightforward way to see whether two manifolds are inequivalent due to the basis dependence of the intersection numbers.Therefore, more analysis is in order to understand why networks are able to distinguish distinct matrices, and find a sufficient basis to distinguish between CICYs.We plan to return to this question whether the neural network has learned Wall's theorem [10]. 6

Finding Generators
Having identified the presence of symmetries, the next step, which we discuss now, is to identify the symmetry generators.Our starting point is a pointcloud on the input space which has been identified in the previous step to be related via a symmetry due to the closeness in the embedding layer.To establish a numerical method to perform this analysis we start with a noisy pointcloud.First, we describe our algorithm and apply it then in examples for several symmetry groups in various dimensions.Finally we exemplify how this algorithm can be utilised on images.

Algorithm
The idea behind the algorithm is to extract the information about the symmetry group when considering a pointcloud P which has been found to be related by some symmetry group.
Infinitesimally, points are connected as follows: where a are some small numbers selecting by how much the point is transformed with the respective generator T a .The symmetry group is characterised by the generators T a which we want to obtain from the pointcloud.In particular the structure of the nearest neighbours carries the information about the generators.To extract them efficiently, one needs to find an appropriate regression setup where all components of the generators T a are constrained.For instance, considering just a single point in n-dimensions gives via equation ( 10) n conditions on the components.However, by appropriately utilising multiple points the generators can be completely identified.We find the generators as follows: 1.If our dataset features several redundant dimensions or the inputs are not centered around the origin to pre-process the dataset by performing appropriate dimensional reduction and centering around the origin (e.g. via PCA).
2. We generate an orthonormal basis (b 1 , . . ., b n ) as follows.We pick a point p 1 ∈ P at random.The first basis vector is given by its associated normalised vector b 1 = p 1 /||p 1 ||.
We then pick a further vector at random in the pointcloud P, and the second basis vector is given by the normalised version of p 2 − (p 2 • b 1 )b 1 .We then complete the remaining orthonormal basis elements automatically.
3. The next step is to filter out points which are close enough to the hyperplane H spanned by b 1 and b 2 .This is the hyperplane in which the generator acts.As condition we use The more data points we have the smaller we can choose δ.Points in this 'thick' hyperplane feature neighbours in the direction of interest and points in the orthogonal direction.
The contribution of these latter points to our regression problem is removed later with condition (15).Note that a too large δ will include all points -in particular also the poles on the sphere -which leads to a drop in performance.
4. Within this points we now identify all pairs of points p, p ∈ H which are close to each other: This choice allows us to keep multiple point pairs and not just the nearest neighbour.
5. Each of these neighbouring point pairs (p, p ) provides constraints relevant for determining one combination of the generators in Equation (10).At linear order this is given as where T denotes the generator we determine.The normalisation factor 1/ p ensures the correct numerical prefactors.σ H (p, p ) denotes the sign which contains the appropriate directional information of the points (p, p ) for this hyperplane and is calculated by The necessity of σ can be understood by considering the example of identifying the generator of SO(2) and considering point pairs in different quadrants.Each of these point pairs constrains up to n components of the n × n-components of T. Additional components are constrained by demanding that 6. Using the above constraints in Equations ( 14) and ( 15) we now can constrain all components of the generator using linear regression.In practice we weigh the constraints arising from (15) stronger than constraints from ( 14), ensuring that ( 15) is definitely satisfied.
This also removes the false directional information arising from point pairs arising due to the thickness of our hyperplane.
7. By applying steps 2-5 multiple times we obtain generators for 'all' directional combinations.On the resulting generator candidates we perform principal component analysis.
By analysing the standard deviation in these components we identify the relevant number of generators for the underlying pointcloud.The associated principal components to these generators reveal the algebra structure of these generators.Hence we determine the underlying symmetry group.
8. To distinguish unitary from orthogonal groups such as in the example below where we distinguish between SU (2) and SO(4) additional care is needed in setting up the regression problem.The necessity arises as follows: Consider the orbit of a point on a unit sphere S 3 .The entire orbit which is generated by both symmetries is given by S 3 and hence one cannot distinguish with just one pointcloud.However realistic situations such as the example with the SU (2) superpotential (cf.Section 2) feature multiple orbits, one for each field.We can utilise this situation as we are equipped with two point pairs which are connected with the same transformation (neglecting for the moment that they can be in different representations).Here one can distinguish the transformations from SU (2) and SO(4) as the action on the first point pair fixes the SU (2) generator completely, whereas for SO(4) not all generators are fixed by the first transformation.Utilising both point pairs in our regression doubles the constraints arising from ( 14) and allows us to distinguish for instance SU (2) and SO(4).
Below, we discuss some numerical examples of these generators.

Examples
We design our examples in increasing complexity and capture various embeddings of symmetries to check the performance of our algorithm.The first warm-up example is that of a pointcloud generated by SO(2), i.e. points on a circle.
To test the stability of our algorithm we perform experiments with varying number of points and we add some Gaussian noise to the radius.Results for several examples are shown in Figure 6.Even for pointclouds with few points and large noise we find very good results for the generators.The large difference in the standard deviation from the first to the remaining components shows that this pointcloud is only connected with one generator.For the analysis shown here we use δ = 0.5.
For the SO(4) experiment we obtain the following generators Next we turn to the discussion of SU (2) and SO(2)×SO(2) acting on four real dimensions.Our method should reveal the underlying generators three and respectively two generators rather than all six generators of SO(4).Again we test our method on pointclouds with varying number of points and different noise.We provide an overview of our findings in Figure 8.For the SU (2) case, the dominant generators found by our algorithm are given: where these results correspond to the run with 5000 points shown in red in Figure 8.Note that to distinguish SU (2) from SO(4) it was necessary to utilise two pointclouds as described in bullet point of our algorithm.For SO(2) × SO(2) we find, for instance in the case of the run associated to the parameters of the black curve in Figure 8 G

Rotated MNIST
The final example we discuss is the application of our algorithm on images.To do this we want to re-identify SO(2) from the rotated MNIST dataset D all . 7In contrast to our previous examples we now want to identify the generators on a 28 × 28 = 784-dimensional space.However, as previously described, we can dimensionally reduce this space, for instance via PCA.
Our analysis proceeds as follows: We consider a subset of the rotated MNIST dataset, consisting The respective standard deviations can be found in Figure 9 on the right.We clearly identify the generator of SO(2) as the dominant generator.

Discrete symmetries -CICYs
To conclude this section we briefly return to the example of CICYs discussed in Section 2.2.
Per construction the symmetries acting are discrete rather than continuous.To identify underlying symmetries -earlier referred to as identities (cf.( 8)) -one needs to match identical transformations in different orbits acting on the input space.As our input dataset is precisely generated by these identities and such different representations are mapped to the same cluster in the embedding layer, our network does identify these identities.It will be interesting to analyse whether the network finds additional symmetries and identities which are yet unknown.
However, this would require a different training approach with differently prepared datasets which we leave for future work.

Conclusions
Detecting symmetries in an automated fashion removes the necessity for domain knowledge associated to a particular data product.Such domain knowledge often might not be of existence or has been the outcome of scientific efforts such as in the development of the quark model [16].
In this article we introduced a method on how to detect symmetries with only very limited domain knowledge.The required domain knowledge was to be able to perform a 'simple' classification task which we think is often a realistic starting point.
We have discussed examples of basic symmetries appearing in physics such as rotational groups and SU (2).The structure in the embedding layer does reveal these symmetries and hence provides orbits on the input space which are generated by these symmetries.In a second step we were able to pinpoint the nature of these continuous symmetries by our regression algorithm.Beyond rotational groups and SU (2) we find that the embedding layer can be used to identify classes CICY-manifolds.It remains to be seen whether these methods can establish new identities in the case of the classification of n-folds which is unknown to this date.
For this analysis, we introduced a novel graph representation for CICYs which removes several redundancies of the matrix representation used up to now.In passing we note that this provides the first application of graph neural networks in string theory.We have not yet explored the full potential on other ML work on this dataset with this representation (cf.[17][18][19][20][21][22] for other ML applications on the CICY dataset).
Another observation which appeared in this analysis is that the neural network has found a way to calculate topological invariants as required by Wall's theorem which formalises how complex manifolds are completely characterised.We have not yet investigated this avenue but want to highlight that it will be exciting to compare these two complimentary approaches to classification.In which situations does a neural network obtain use such mathematically rigorous ways of classification?
We have seen that an important ingredient in our analysis are dimensional reduction tools -here in particular TSNE [3].It remains to be seen in the future which additional structures TSNE and other techniques can reveal on datasets in mathematical physics, similar to structures seen in autoencoders [23].
Putting this method into perspective, we can find that our results can be improved with augmenting the pointclouds.Additional points can be obtained if an equation generating these orbits is known.In this context it might be useful to utilise the techniques recently described in [24].Furthermore, our technique of identifying symmetries is useful to determine which symmetry equivariant architecture (cf.[25]) promises to be efficient for more sophisticated classification tasks.Beyond classification, another application in machine learning for utilising symmetries which has recently been proposed is in the context of reinforcement learning [26].
In either case, it promises to be extremely interesting to see which other symmetries can be found in every day and scientific datasets, going beyond a standard rotational invariance such as we discussed in the context of MNIST.This is a proof of concept paper presenting several ways of identifying underlying symmetries in the data.Further scrutiny of these methods for other symmetries is in order.Now, it is even more tantalising to find out the underlying symmetry structures neural networks are dynamically using to achieve their remarkable performance.

Figure 1 :
Figure 1: Left: This shows the TSNE-representation (perplexity of 50) of the embedding layer.Each colour represents one class.For several classes, we can directly see two distinct point clouds.Right: This shows the plot of the Mexican hat potential where we highlight the classes using the same color coding as on the left panel.Here, we can directly match points with multiple clusters and disconnected TSNE components.

Figure 3 :
Figure 3: Different representation of one CICY.Left: The classic configuration matrix.Middle: A graph visualisation with two distinct weights.Right: Nearest neighbours of the graph.

Figure 5 :
Figure 5: Performance of our method on CICY dataset.Left: The distribution of performance for all 686,464 data points.Right: The distribution of performance on the subset of CICYs with Hodge-numbers h 1,1 = 10 and h 1,2 = 20.The analysis of finding nearest neighbours is still performed with all data points.

3 Figure 6 :
Figure 6: Three examples of pointclouds for SO(2) with varying number of points and different noise where the respective parameters are shown in the plot title.The respective generator corresponds to the first PCA component which is singled out by our algorithm.

3 Figure 7 :
Figure 7: Left: The standard deviation of the PCA components for the example of SO(3).Right: The results for the standard deviation of the PCA components of the SO(4) example.

3 Figure 8 :
Figure 8: Left: The standard deviation of the PCA components for the example of SU (2).Right: The results for the standard deviation of the PCA components of the SO(2) × SO(2) example.

Figure 9 :
Figure 9: Left: Pointcloud of first three PCA components of our rotated MNIST dataset.Highlighted in orange are the orbits of multiple digits eight.Gray points correspond to the other digits present in this dataset.Right: The standard deviation on the generators identified from this pointcloud for the digit eight.

Table 1 :
Neural network architecture for Hodge number classification.The embedding layer is the layer before the output layers.We use categorical crossentropy as the loss on both output layers.number of representatives in our training dataset.For evaluation of the classification we only use unique input vectors.
To balance the discrepancy of different number of representatives we keep several copies of CICYs with low