Quaternion-based machine learning on topological quantum systems

Topological phase classifications have been intensively studied via machine-learning techniques where different forms of the training data are proposed in order to maximize the information extracted from the systems of interests. Due to the complexity in quantum physics, advanced mathematical architecture should be considered in designing machines. In this work, we incorporate quaternion algebras into data analysis either in the frame of supervised and unsupervised learning to classify two-dimensional Chern insulators. For the unsupervised-learning aspect, we apply the principal component analysis (PCA) on the quaternion-transformed eigenstates to distinguish topological phases. For the supervised-learning aspect, we construct our machine by adding one quaternion convolutional layer on top of a conventional convolutional neural network. The machine takes quaternion-transformed configurations as inputs and successfully classify all distinct topological phases, even for those states that have different distributuions from those states seen by the machine during the training process. Our work demonstrates the power of quaternion algebras on extracting crucial features from the targeted data and the advantages of quaternion-based neural networks than conventional ones in the tasks of topological phase classifications.

For the topological systems with the Chern numbers or the winding numbers as the topological invariants, various types of inputs are used to perform the phase classifications. For instance, the quantum loop topography (QLT) is introduced to construct multi-dimensional images from raw Hamiltonians or wave functions as inputs [14,17]. The Bloch Hamiltonians are arranged into an arrays to feed the neural networks [16,24]. In addition, the real-space particle densities and local density of states [15] and the local projections of the density matrix [6] are also used as inputs. From cold-atom experiments, momentum-space density images were generated as inputs for classifications [20]. The time-offlight images [10,19], spatial correlation function [10], * m082030021@student.nsysu.edu.tw † shinming@mail.nsysu.edu.tw density-density correlation function [10] and the density profiles formed in quantum walks were also proposed as appropriate inputs [23]. Furthermore, the spin configurations [18] and the Bloch Hamiltonians over the Brillouin zone (BZ) have also been treated as inputs for the neural networks [18,21]. For these forms of inputs mentioned above, various ML techniques with distinct real-valued neural networks have been applied to discriminate different topological phases. As the development of artificial neural networks becomes mature, a raise of representation capability of machines is anticipated by generalizing real-valued neural networks to complex-valued ones [37,38]. Specifically, a quaternion number, containing one real part and three imaginary parts, and the corresponding quaternionbased neural networks [39][40][41][42] are expected to enhance the performance on processing of data with more degrees of freedom than the conventional real-number and complex-number systems. There have been various proposals about quaternion-based neural networks in ML techniques and applications in computer science, such as the quaternion convolutional neural network (qCNN) [38,43,44], quaternion recurrent neural network [45], quaternion generative adversarial networks [46], quaternionvalued variational autoencoder [47], quaternion graph neural networks [48], quaternion capsule networks [49] and quaternion neural networks for the speech recognitions [50]. However, the ML-related applications of the quaternion-based neural networks on solving problems in physics are still limited, especially in the topological phase detections, even though the quaternion-related concepts have been applied in some fields in physics [51][52][53].
In this work, we perform the Chern-insulator classifications from both supervised-and unsupervised-learning aspects based on the inputs transformed via the quaternion algebra. For the unsupervised learning, we encode the quaternion-transformed eigenstates of Chern insulators via a convolution function as inputs and study them using the principal component analysis (PCA). We found that using only the first two principal elements is not enough to fully classify the Chern insulators, consistent with Ming's work [23]. Further studies show that the performance can be improved by including more principal components. For the supervised learning, we construct a quaternion-based neural network in which the first layer is a quaternion convolutional layer. We then show that this quaternion-based machine has better performance than a conventional CNN machine. Our machine is good not only for testing datasets but also for identifying data points that have different distributions from those seen by our machine in the training processes. The good performance can be attributed to the similarities between the formula of the Berry curvatures and our quaternion-based setup. Therefore, our work demonstrates the power of the quaternion algebra on extracting relevant information from data, paving the way to applications of quaternion-based ML techniques in topological phase classifications. The outline of the remaining part of this work is as follows. In Sec. II, we introduce the model Hamiltonian, generating the data for our classification tasks, and the quaternion convolution layer used in this work. PCA analysis of the quaternion-transformed eigenstates is discussed in Sec. III. The data preparations, the network structures and the performance of the quaternion-based supervised learning task are given in Sec. IV. Discussions and Conclusions are presented in Sec. V and Sec. VI, respectively. We have three appendixes. Appendix A shows the details of data preparation. Appendix B provides a brief introduction to the quaternion algebra. Some properties of functions in Sec. III are included in Appendix C.

A. Model
A generic two-band Bloch Hamiltonian with the aid of the identity matrix σ 0 and Pauli matrices σ σ σ = (σ 1 , σ 2 , σ 3 ) is written as where k = (k x , k y ) is the crystal momentum in the 2D BZ (∀k x , k y ∈ (−π, π]). h 0 ( k) can change energy of the system but has nothing to do with topology, so it will be ignored in the remaining part of this paper. The vector h = (h 1 , h 2 , h 3 ) acts as an k-dependent external magnetic field to the spin σ, so that the eigenstate of the upper (lower) band at each k will be the spin pointing antiparallel (parallel) to h( k). It will be reasonable that the unit vector n = h/|h| ∈ S 2 embeds the topology in this system. Indeed, the topological invariant is the Chern number C, where the integrand is the Berry curvature and the integration is over the first BZ. For brevity, sometimes we will omit the argument k in functions. The Chern number is analogous to the skyrmion number in real space [54]. The integral is the total solid angle n( k) subtended in the BZ, so the Chern number counts how many times n( k) wraps a sphere.
We construct the normalized spin configurations n( k) based on the following models. For topological systems, with positive integer c and real parameter m to control the Chern number. c is the vorticity for the number of times the inplane component (n x and n y ) swirls around the origin. The sign of the c indicates a counter-clockwise or clockwise swirl. For a nontrivial topology, n z has to change sign somewhere in the BZ for n( k) to wrap a complete sphere. Therefore, |m| < 2 is required. Some examples of spin texture n( k) based on Eq. (3) are shown in Fig. 1. For c = 1, the model is the Qi-Wu-Zhang (QWZ) model [55]. For a given c, the Chern number C can be either 0, c, or − c depending on the value of m: The topological phase diagram is shown in Fig. 2. C = 0 denotes a topologically trivial phase and C = 0 a nontrivial phase.

B. Quaternion convolutional layer
A quaternion number has four components, the first of which stands for the real part and the other three of which stand for the imaginary parts. Given two quaternions q 1 = (r 1 , a 1 , b 1 , c 1 ) and q 2 = (r 2 , a 2 , b 2 , c 2 ), their product Q = q 1 q 2 = (R, A, B, C) is given by which can be written as the matrix product form To implement a quaternion convolutional (q-Conv) layer in numerical programming, we will regard the two quaternions as a 4 × 4 matrix and a 4 × 1 column matrix, respectively: More details of quaternion algebra are described in Appendix B. A conventional CNN contains a real-valued convolutional layer to execute the convolution of the input and the kernel. Let the input F have the shape: H i × W i × C i (Height × Width × Channel) and the shape of the kernel K be H k × W k × C i × C f . The convolution will produce an output O, O = F * K, whose elements are Here the stride is assumed to be 1 both in the width and the height directions. The indices i and j are spatial indicators, t is the index of channel in the input feature map and t is the kernel index. The shape of the output will be ( Assume that the input has four components. To uncover the entanglement among components through CNN, we will utilize the quaternion product. Now, we introduce another dimension-depth-which is four, as a quaternion number of four components. Both of the input F and the kernel K have depth of four as two quaternion numbers. The product of F and K will have depth of four as a quaternion in Eq. (5). Referring to Eq. (7) where we show a matrix representation to implement quaternion algebra and thinking of F as q 1 and K as q 2 in Eq. (7), we transform the depth-four input F into a 4×4 matrix, F (s,l) , and keep the kernel K still of depth 4, K (l) , where l, s = 1, . . . , 4. The product of F and K, say O, will have depth four as shown in Eq. (9). Further considering the shapes of F and K, the convolution is given by where the summations over i,j,k are equivalent to those in Eq. (8) and the summation over l is for the quaternion product. More specifically, we consider an input data as q 1 (four color squares on the left of Fig. 3) and four kernels encoded in q 2 , given in the following The output feature maps O . = (R A B C) T is then calculated based on Eq. (5). As the first step, we permute the order of q 1 to obtain F (·,1) =: F (·,3) =: (see the four sets of sqaures in the middle of Fig. 3). We then convolute those four quaternions (F (·,l) with l = 1, 2, 3 and 4) with four kernels (K (l) with l = 1, 2, 3 and 4) in the following way: as shown in the middle of Fig. 3. Finally, we sum over the above four quaternions to get the output feature maps O, as shown on the right of Fig. 3.

III. PRINCIPAL COMPONENT ANALYSIS
Principal component analysis (PCA) is a linear manifold learning that is to find the relevant basis set among data [56,57].
We prepare eigenstates |u ± of Eq. (1), where + (−) stands for the upper (lower) band. For a topologically nontrivial state, the phase cannot be continuous over the whole BZ. Therefore, we can divide the whole BZ into two parts, in each part of them the topological wave function has continuously well-defined phase. We then pick up a gauge by choosing two regions according to the sign of h 3 in Eq. (3): Illustration of a quaternion convolutional layer. On the left, we start with the input q1 having four quaternion components ((yellow, red, green, blue) stands for (r1, a1, b1, c1)). In the middle, q1 is permuted to construct {F (·,l) } 4 l=1 on which the convolution with four kernels {K (l) } 4 l=1 is performed. A summation is taken for each depth to obtain the output feature map O on the right. and |u + .
By translating |u ± . = (α ± , β ± ) T with α ± , β ± ∈ C, into a quaternion number of four components, we have To see the correlation of states over k, we define the quantity F to be the quaternion-based convolutions: where q * is the conjugate of q. It can be proved that F is real-valued. Therefore, F ( p) of all p in the BZ based on a given Hamiltonian can be analysed by using PCA. We collected various F of all k ∈BZ within seven topological phases as the dataset for PCA. For each topological phase, 30 F 's were prepared, so the total amount of data was 210. The data for six non-trivial phases were generated based on Eq. (3) with m = ±1 (the sign of m determines the sign of C). For the trivial phase, we prepared five data points from each of six combinations  In Fig. 4, we present various noiseless F generated from Eq. (3) with different c and m. It is notable that F for C = 0 are featureless, F for C = ±1 have a dipole moment, and F for C = ±2 have a quadruple moment, and F for C = ±3 seemingly have a primary dipole and a secondary quadruple moment. The remarkable features imply that the convolution function F is a good choice for topological classifications.
We examine data with the standard deviation (SD) equal to 0, 0.1, 0.2 and 0.3 respectively, and show the first two PCs of 210 pieces of data for each SD in Fig. 5. In Fig. 5, data are clustered into four groups and their variances increase with SD. It is successful to separate different topological phases into different clusters via PCA. However, some clusters contain two topological phases of Chern numbers: {+1, −3}, {−1, +3}, and {+2, −2}. This C modulo 4 resemblance has also be observed in a previous study [23].
We find that including more PCs helps separate different classes in each cluster. Figure 6 shows the first six PCs of data in topologically non-trivial phases, where PCx denotes the x-th PC component. One can find that PC1 and PC2 in each pair of {+1, −3}, {−1, +3}, and {+2, −2} are nearly identical, as also shown in Fig. 5. By incorporating more PCs up to PC6, all topological classes are completely classified. Via the proposed convolution, topological states can be successfully classified by using PCA, a linear classification machine.
Compared to the eigenstates, the spin configurations n( k) are gauge-invariant. Therefore, it is desired to classify the topology of the spin configurations via PCA. Unfortunately, the performance was not good, which will be discussed later. In order to directly classify the spin con-figurations, in the following, we train a qCNN machine via the supervised learning algorithm to discriminate spin configurations with different topological phases.

A. Datasets
The input data are normalized spin configurations n, laying on a 40 × 40 square lattice with periodic boundary conditions, and their corresponding topological phases are labels with one-hot encoding. We prepared four datasets: training, validation, testing and prediction dataset (more details are described in Appendix A).
The first three datasets are well known in conventional deep learning procedure [58]. The data in the training, validation and testing datasets will be constructed by the same models so that they have the same data distributions even though they are all different data points. Therefore, we denote these three datasets as indistribution datasets. The data in the prediction dataset, however, are constructed by similar but different models from those for the in-distribution datasets. Therefore, the data in the prediction dataset are not only unseen by the machine during the training process, but also of different distributions. We denote the prediction dataset as a out-of-distribution dataset, which is used to understand whether our machine can also classify spin configurations constructed by other similar but different topological models.
The data pool containing training and validation datasets is constructed as follows. Based on the Eq. (3), we firstly prepared 5760 data points of n of nine topological phases with Chern number ranging from -4 to 4 and each phase contains 640 data points. Besides of 5760 spin C = + 1 C = -3 C = + 2 C = -2 C = -1 C = + 3   configurations, the dataset contains 360 two-dimensional spin vortices. A spin-vortex has an in-plane spin texture that winds around a center, which is generated by setting one of three components in Eq. (3) to be zero. By including spin vortices, the machine can tell the difference between 3D winding (non-trivial) and 2D winding (trivial) spin configurations . After the training process, the trained machine is scored by a testing dataset with the same composition of nine phases as that in the training (and validation) dataset. Importantly, without changing the topologies, the Gaussian distributed random transition and random rotation imposed on these three datasets can increase the diversity of dataset and enhance the ability of generalization of the trained machine.
The prediction dataset contains six categories of spin configurations. The first category is generated with m uniformly distributed from +3 to −3. In the second and the third categories, we change the sign of n z (the second category) and swapping n y and n z of n (the third category). Finally, we consider three categories for trivial states, which are ferromagnetic (FM), conical, and helical states. FM can be viewed as 1D uncompleted winding configuration while conical and helical can be viewed as 2D uncompleted ones. In total, we prepared six categories for the prediction dataset. More details about data preparations will be described in Appendix A.
For the conventional CNN, we use n as the input data. For the qCNN, in order to feed the input data into the qCNN classifier, we transform the 3D spin vector into an unit pure quaternion, (n x , n y , n z ) ∈ R 3 → (0, n x , n y , n z ) ∈ H, where the scalar part (the first component) is zero and the vector part is n. Therefore, the inputs of qCNN are effectively equivalent to those of CNN.

B. network structure and performance
The schematic architectures of these two classifiers are shown in Fig. 7, where the last black arrows point to nine neurons for nine topological phases. In the qCNN classifier, we implement a quaternion convolution (q-Conv) layer as the first layer [red dotted cuboid in Fig. 7(b)], and the operations in a q-Conv layer are based on the quaternion algebra to hybridize spin components. Then the next three layers are typical 3D convolutions (Conv3Ds). Our Conv3Ds do not mix depths by choosing proper sizes of kernels. Followed the Conv3D layers is a 2D convolution (Conv2D) layer to mix data in depth: nine kernels of kernel size 4×1 will transform data from 4 × 9 to 1 × 9. On the contrary, the CNN classifer has only Conv2D layers. Although the qCNN is more complex than the CNN, the total network parameters of the qCNN is however less than the CNN. This is one advantage of the qCNN over the conventional CNN.
In order for classifiers to satisfy some physically reasonable conditions, two special designs are implemented.
Firstly, we extend the k points out of the BZ by padding the input data according to the periodic boundary conditions [59]. Secondly, the first layer takes "overlapping" strides with an arctan activation function, and the latter layers take "non-overlapping" strides with the tanh activation function for both qCNN and CNN machines. Figure 8 illustrates how the "overlapping" and "nonoverlapping" feature mapping can be manipulated by varying the size of stride.
Then, both qCNN and CNN machines are trained. The learning curves of both machines are shown in Fig. 9. The CNN machine (orange and light orange lines) jumps over a big learning barrier at around the 700 th epoch. After that, the training and the validation accuracy (orange and light orange line respectively) are separated and do not converge up to end of this training process. Even though the same training (and validation) dataset is used in the training process, the learning curves of the qCNN machine (blue and light blue lines) are qualitatively different. The training and the validation accuracy are separated around 90 th epoch, but the difference Stride = 1 Stride = 3 FIG. 8. Schematic of "overlap" convolution (red solid) and "non-overlap" convolution (blue solid) from a 3 × 3 kernel (black dotted) over data. The blue solid square is a signal movement from the kernel, and the size of stride is the same as the length of kernel, thus each movement of this kernel is "non-overlap." between these two accuracies decreases with increasing epochs. After the training procedure finished, the qCNN (CNN) machine gets 99.67% (94.12%) testing accuracy. This difference in accuracy results from the spin-vortex dataset, where the qCNN works well but CNN dose not. The trained machines are ready to do prediction, and the result is shown in Fig. 10. In Fig. 10, since the first category contains n of uniformly distributed m, where a few data points are very close to the phase boundaries m ≈ {0, ±2}, the accurate rate of the the qCNN is slightly low at 96%. For the second and third categories, we choose m = ±1, away from the phase transition points, and the performance is nearly perfect. For the uncompleted winding configurations, the qCNN, different from the conventional CNN, can accurately classifies FM, helical and conical states after learning the spin-vortex states. This is the main advantage of the qCNN over the conventional CNN, which is expected to result from the quaternion algebra.
The processing times of two classifiers are summarized in Table I. Since the q-Conv layer has massive matrix multiplication, the time of one epoch for qCNN is longer than that of convention CNN in our task, especially utilized by CPU.  In this work, we apply the quaternion multiplication laws to both PCA (unsupervised learning) and qCNN (supervised learning). The two methods take different inputs; the former one takes scalar function F (p), which is something to do with a convolution of the wave function, and the second one takes the pure quaternion function (0, n( k)), where real part is zero and the imaginary part is the spin vector. We will explain physical intuitions and comment the mechanisms in this section.
On PCA, we did not take n simply as the input because the representation of the vector n depends on coordinates but the topology is not. We believed that the topology as a global geometry should be embedded in the correlations. The correlation of dot products of n turned out to fail since relative angles of two spins were not informative to understand the swirling of n on S 2 . If one tries the quaternion in the convolution Eq. (15) by q = (0, n x , n y , n z ), the result is still inappropriate for the convolution is independent of the sign of m to discriminate topological states (see Appendix C). Eventually, we found that the F (p) defined in Eq. (15) was a proper quantity to characterize topology after PCA. The F has the property that it is featured (featureless) when the wave function is unable (able) to be globally continuous that happens in the nontrivial (trivial) phases. Unfortunately, the F (p) is not gauge invariant. The results were based on the choice of gauge in Eqs. (12) and (13) that made the wave function continuous locally and discontinuous at k where n z ( k) = 0. We had examined other choices of gauge and found that the present gauge exhibited the PCA features most clearly (results not shown). We remark that our PCA results looked good because the inputs were ingeniously designed and the PCA method might not be more practical than the qCNN method.
For qCNN, it is interesting to understand the mechanism behind. There are several possible factors promoting the performance of our supervised learning machine. The first one is that the size of kernel in the first convolutional layer is 2 × 2 with stride = 1, which means the machine can collect spin information among four nearest neighbors [see Fig. 11(b)]. We know that the Chern number is the integral of the Berry curvature in the BZ, and the Berry curvature is twice of the solid angle. A solid angle Ω subtended by three unit vectors a, b, and c is obtained by Our choice of the size of the kernel in the first hidden layer is the minimal of 2 × 2 that mixes only the nearestneighboring spins. In this way, it is very possible to enforce the machine to notice the solid angle extended in this plaquette. The second factor is the quaternion product. Recall that the conventional CNN might correlate spins n s in neighboring k s due to the feature map through the kernel. However, the map does not mix the components of spins. In comparison, the qCNN is more efficient for it directly entangle spins via the quaternion product. It is this entanglement of spin components by the quaternion product that makes the scalar and vector products in calculating the solid angle (see Eq. (17)) become possible to be realized by the machine. As a solid angle involves at least three spins and the feature map by the kernel is just linear, a nonlinear transformation is crucial to create high-order (three spins) terms in the expansion. This is possible and proved in Ref. [60] that multiplication of variables can be accurately realized by simple neural nets with some smooth nonlinear activation function. Therefore, the third factor is the non-linear activation function, arctan in this work. We expect that using arctan as the activation function can further help the machine to learn correct representations because the calculation of a solid angle involves the arctan operation in Eq. (17). This belief is indeed supported by the results shown in Fig. 12, where the arctan activation function outperforms the ReLU and tanh activation functions over nine different datasets. In summary, several factors are combined to enhance the performance of our machine as follows. The quaternion-based operations in the q-Conv layer mix not only spins with their neighbors but also components of spins. When these linear combinations are fed into the non-linear activation functions in our qCNN, the output can be viewed as an expansion of a non-linear function, which may contain a term having both the scalar-and vector-product of neighboring spins, similar to that in Eq. (17). Therefore, after the optimization process, the machine may keep increasing the weight of a solid-angle-related term and eventually learn to classify the topological phases. Also, adding some noises to the training dataset helped our supervised-learning machine to learn the generic feature of our data. We found that when the training data was generated directly from Eq. (3) without adding any noise, the machine worked well for training and testing datasets but had poor performance on all the prediction dataset. This could be understood by noting that the topological invariant is determined by the sign of m, which appears in the z component in Eq. (3). By using the dataset without noise, the machine might naively re-gard the z component as the judgment of topology when the training data does not contain wide distribution. We note that the topology is invariant when the spin texture is uniformly translated or rotated. So we trained our machine with randomly translated and rotated data to avoid incorrect learning. (See data preparation in Appendix A.) From our observations, the performance on the prediction dataset was remarkably enhanced when the noise was included, which supports our idea.

VI. CONCLUSIONS
In summary, we classify topological phases with distinct Chern numbers via two types of machine-learning techniques. For the unsupervised part, we propose a quaternion-based convolution to transform the topological states into the input data. With this convolution, distinct topological states are successfully classified by PCA, a linear machine for classification.
We then go to the supervised learning part where, in contrast to the conventional CNN, we successfully use the qCNN to classify different topological phases. This work demonstrates the power of quaternion-based algorithm, especially for the topological systems with the Chern number as the topological invariants. where m are random numbers in the corresponding ranges. The latter two sets are topologically trivial and each has 80 (identical) configurations: So, for each c there were 1280 nontrivial spin configurations and 160 trivial ones. Then the primitive data passed through some manipulations as the effect of data augmentation without changing the topologies. Each spin configuration n( k) was translated (T ), rotated (R), and then polluted with noise (G): where p 0 is a random displacement in k, R stands for a random 3D rotation of the spin, and ∆n ( k) is Gaussian noise (G) with standard deviation 0.1π in each component. (The spin should be normalized lastly.) T and R are homogeneous transformations in k, but G, inhomogeneous, picks only 30 out of 1600 k sites.
In addition to the 5760 sets of data in nine topological phases (C = −4 to C = +4), we also include 360 spin vortex states, which are C = 0 states, based on the formulas: with their normalized configurations. For each c, 30 spin configurations were generated with random m ranging from -3 to 3. The data also went through translation T and rotation R but no noise G. Therefore, we generated 6120 spin configurations totally as the training dataset. Among the training dataset, 25% of the data are assigned as the validation dataset (light color lines in Fig. 9).
Testing dataset-In addition to the training and validation dataset, we prepare extra 1224 spin configurations as the testing dataset, with the same composition as the training and validation datasets. This dataset is prepared for scoring the trained classifiers. Prediction dataset-The prediction dataset is an extra dataset, different from the aforementioned three datasets. It consists of six categories, each of which was not seen by the machine during the training process. This dataset was processed by T and R but not G. The six categories were constructed as following. The First category, the "chern" category, is a set S which was generated from Eq.
The second one was constructed by swapping the y and the z components: The next two categories, called helical and conical spin configurations, were generated based on the following equation Here = 0 is for the helical state and 0 < | | < 1 is for a conical state. The last category contains the ferromagnetic spin configurations (FM) whose z-component are a constant and x-and y-component are zero. Some spin configurations in the prediction dataset are illustrated in Fig. 13.

Appendix B: Quaternion
The quaternion number system were introduced by Irish mathematician William Rowan Hamilton in 1843 as an extension of the complex numbers. A quaternion number q is composed of four real numbers r, a, b and c to be where {1 1 1,î,ĵ,k} is the basis. Sometimes it is written as q = (r, v) or q = (r, a, b, c) in short. Here r is called the scalar (or real) part of the quaternion and v = (a, b, c) the vector (or imaginary) part. A quaternion without scalar part q = (0, a, b, c) is called pure quaternion. Similar to the imaginary number, Importantly, the algebra of quaternions is noncommutative, based on 1 1 1î =î1 1 1 =î, 1 1 1ĵ =ĵ1 1 1 =ĵ, 1 1 1k =k1 1 1 =k, iĵ = −ĵî =k,ĵk = −kĵ =î, andkî = −îk =ĵ.
The conjugate of the quaternion is defined to be and the norm is given by Therefore the inverse of q is defined as If q is unit quaternion, then their inverse is exactly their conjugate. The multiplication (so-called quaternion product or Hamilton product) of two quaternions q 1 = (r 1 , a 1 , b 1 , c 1 ) and q 2 = (r 2 , a 2 , b 2 , c 2 ) is given by Reversely, It is evident that in terms of matrices the commutativity of multiplication of quaternions dose not hold. Furthermore, in the matrix representation q * = q T , conjugation of a quaternion being equal to its transposition. More specifically, an unit quaternion have a property q −1 = q * = q T in the M(4, R) representations.

Appendix C: Details of definition in Section III
In this section, we provide some properties about the F ( p) function we defined in PCA section and the convolution of normalized spin vector n. Recall that the convolution we defined in the PCA section is as follows (C1) From now on, because of lack of notations, we denoted upper-bar (e.g.: q, |u , u| , h) being conduction band, and lower-bar (e.g. q, |u , u| , h) being valence band. A vertical line with a variable stand for its corresponding position of BZ.
Property C.1. F ( p) is a purely real-valued function.
Proof. Since p− k and k are one-to-one correspondence in BZ, and summing over k ∈BZ or p− k ∈BZ are equivalent.
Once we take conjugate on F , then The first line come from the property of conjugate on quaternions, the second line come from the equivalence of summing whole BZ, and the third line come from k ↔ p − k. We see that conjugation of F is itself. Therefore, F ( p) is a purely real-valued function.
Recall that in our model Eq. (3), h 1 , h 2 are both even of k and (h 3 −m) is odd of k. That is, given k = (k x , k y ) ∈ BZ, there is k = (π − k x , π − k y ) ∈ BZ such that In addition, those two points k, k are one-to-one correspondence in BZ, and identical at the Γ point. Notice that once we normalized h(m) = (h 1 , h 2 , h 3 (m)) by h(m) , then each component of n(m) is function of m.
Property C.2. Encoding n(m) into quaternion by q = (0, n x (m), n y (m), n z (m)), the convolution q * q is independent on the sign of m.
Proof. We consider two convolutions with q = (0, n(m)) but over k, k = (π − k x , π − k y ) and opposite sign m − −m, respectively: and are vector parts of the above quaternion product at k, k , respectively. In Property (C.1), we have shown the convolution over entire BZ is a purely realvalued function. That is, we only need to consider dot product of vector part as the quaternion product of two quaternion q 1 , q 2 when q 1 , q 2 both doesn't have real part. Now, the Eq. (C2) and the assumption m = −m gives us the fact Since h( k, m) = h( k , −m) , we can conclude that Substituting Eq. (C5) into Eq, (C4), we identify Eqs. (C3) and (C4). Therefore, applying opposite m, the convolutions over BZ gets exactly the same value. That is, the convolution q * q is independent on the sign of m once we encoded quaternion by (0, n(m)).

Then,
Re q * k q p− k = Re u( k) u( p − k) , ∀ k ∈ BZ, Proof. According to Property (C.1), it's suffice to show real part. By the assumption, given k ∈ BZ and let α = a + bi, β = c + di with a, b, c, d ∈ R, we have where V is the vector part of the quaternion product. Notice that the second line above is a quaternion product, but the third line above is dot product between two vectors. On the other hand where V ∈ C is the imaginary part. It's obvious that Eq. (C6) and Eq. (C7) have exactly the same value in real part.
Proof. One can observe eigenvalues to conclude that From now on, we consider h 3 ( k) > 0 for all k ∈ BZ, and therefore the spinor has the following form After transforming the above two eigenstates into quaternions, we calculate the value of q * q − q * q at k: where V is the vector part of the above quaternion product at fixed k point. Recalling Property (C.1) that shows the vector part has no contribution for real-valued function F ( p), it is suffice to calculate the real part of the quaternion product at fixed k point.
From Eq. (C2), there are two points k = (k x , k y ), and k = (π − k x , π − k y ) such that , for i = 1, 2, and k, k ∈ BZ. (C10) Therefore, the terms at k and at k in Eq. (C9) will cancel with each other. Note that values at Γ and at (π, π) are zero in Eq. (C9) since h 1 = h 2 = 0 at these points in our model Eq. Similarly, we assume h 3 ( k) < 0 for all k ∈ BZ. After calculation, the value of F ( p) is the same as Eq. (C9). Therefore, we can conclude that if h 3 has the same sign in the entire BZ, then F ( p) = 0.