Learning phase transitions from regression uncertainty: a new regression-based machine learning approach for automated detection of phases of matter

For performing regression tasks involved in various physics problems, enhancing the precision or equivalently reducing the uncertainty of regression results is undoubtedly one of the central goals. Here, somewhat surprisingly, the unfavorable regression uncertainty in performing the regression tasks of inverse statistical problems is found to contain hidden information concerning the phase transitions of the system under consideration. By utilizing this hidden information, a new unsupervised machine learning approach was developed in this work for automated detection of phases of matter, dubbed learning from regression uncertainty. This is achieved by revealing an intrinsic connection between regression uncertainty and response properties of the system, thus making the outputs of this machine learning approach directly interpretable via conventional notions of physics. It is demonstrated by identifying the critical points of the ferromagnetic Ising model and the three-state clock model, and revealing the existence of the intermediate phase in the six-state and seven-state clock models. Comparing to the widely-used classification-based approaches developed so far, although successful, their recognized classes of patterns are essentially abstract, which hinders their straightforward relation to conventional notions of physics. These challenges persist even when one employs the state-of-the-art deep neural networks (NNs) that excel at classification tasks. In contrast, with the core working horse being an NN performing regression tasks, our new approach is not only practically more efficient, but also paves the way towards intriguing possibilities for unveiling new physics via machine learning in a physically interpretable manner.


Introduction
In recent years, the artificial neural network (NN) based machine learning techniques have stimulated the rapid development of automated data-driven approaches to investigate a wide range of physics problems [1][2][3]. As one of the central classes of problems in condensed matter physics and statistical physics, classifying phases of matter and identifying phases transitions is a major focus of applying both generative machine learning [4][5][6][7][8] and discriminative machine learning [9][10][11][12][13][14][15][16][17][18][19][20]. Particularly, approaches utilizing the power of NNs in performing classification discriminative tasks were developed and succeeded in providing data-driven evidence on the existence of various phase transitions [9,10], including the phase transitions associated with nonequilibrium self-propelled particles [11,12], topological defects [13][14][15][16], many-body localization [17,18], strongly correlated fermions [19,20], etc. And besides the widely-involved classification tasks, there is yet another fundamental class of discriminative tasks that machine learning can efficiently dealt with, namely, regression tasks [21]. In fact, applying the machine learning techniques designed for regression tasks to physics problems is now giving rise to a burgeoning field towards automated theory building, where, for instance, the symbolic regression [22][23][24] was successfully applied to extract the equations of motion [25], symmetries [26], and conservation laws [27] from various types of data of physical systems.
But despite the exciting development in this context, a quite natural application scenario of machine learning techniques has received little attention so far, namely, classifying phases of matter and identifying phases transitions by utilizing the power of NNs in performing regression tasks [28][29][30][31]. Currently, to detect phases of matter with machine learning, the widely-used approaches usually train the NN to perform a certain classification task [9,10]. In spite of their successes, they could generally face the lack of interpretability as the classes of patterns recognized by them are essentially abstract, and hence cannot assume straightforward relation to conventional notions of physics [32][33][34][35][36][37][38]. For instance, the supervised learning-with-blanking approach [9,10] investigates phase transitions by examining the intersection of NN's binary classification confidences. But to relate this intersection, which separates two recognized classes, to the actual phase transition point between two distinct phases of matter, it requires additional system-specific knowledge, such as the number of existing phases, the absence or presence of phase separation and crossover, etc. Similarly, the unsupervised learning-by-confusion approach [10] and the unsupervised scanning-probe approach [39] investigate phase transitions by contrasting NN's recognition performance in the binary classification tasks with different assumed targets. Yet, relating the optimal target to a specific phase transition still necessitates external physical information beyond the scope of these approaches. These challenges persist even when one employs the state-of-the-art deep NNs that excel at classification tasks. While for the NNs performing regression tasks, in contrast, the readily interpretable meaning of their outputs can usually be traced back to the regression tasks themselves straightforwardly. Taking the NNs designed for solving the inverse statistical problem (ISP) [40] for example, they are trained to perform regression tasks of reconstructing certain system parameters from given samples of observed system configurations, with their outputs naturally holding the same meaning as the system parameters they try to reconstruct. In these regards, the power of NNs in performing regression tasks might shed new light on automated detection of phases of matter. Introducing a novel regression-based approach offers opportunities to complement the widely-used classification-based approaches by enhancing the interpretability of results from a physical standpoint. This thus raises the fundamental and intriguing question of whether and how phase transitions can be unsupervisedly revealed by NN-based regression.
Here, this question is addressed for the prototypical regression tasks in ISP performed by NNs. To date, significant efforts have been made to apply machine learning techniques to ISP with a primary focus on accurately and precisely inferring system parameters in various models, such as inferring the coupling strengths in spins systems [40][41][42][43][44][45][46], in monomer-dimer systems [47], and in restricted Boltzmann machine [48]. Other applications involved inferring the single-particle spectral density function from Green's functions [49], and inferring the disorder potential from quantum transport properties [50], etc. In these previous investigations, enhancing the precision or equivalently reducing the uncertainty of the regression results was regarded as one of the central goals. But somewhat surprisingly, this unfavorable regression uncertainty is found to contain hidden information that can be utilized to reveal possible phase transitions. In this work, these findings shall be demonstrated in the ferromagnetic Ising model and q-state clock models. The former exhibits a second-order phase transition, whose critical temperature is T C = 2/ ln(1 + √ 2) (J/k B ) [51]. The q-state clock models also exhibit this transition with small q (e.g. q = 3), while with slightly larger q (e.g. q = 6, 7), it exhibits two Berezinskii-Kosterlitz-Thouless (BKT) phase transitions with an intermediate phase [52][53][54]. These many-body models of interacting spins are not only important in condensed matter physics and statistical physics, but in recent years also typically employed to examine the ability of new machine learning approaches [1][2][3].
Taking the ISP associated with the Ising model (also known as the inverse Ising problem [40][41][42][43][44][45][46]) for instance, it asks the question of what the possible temperature is for a given spin configuration. For any spin configuration generated probabilistically, it does not correspond to an unique temperature, which thus gives rise to intrinsic regression uncertainty for the reconstructed temperatures (see equation (2) and the error bars in the inset of figure 1(b)). As demonstrated in this work in the cases of the ferromagnetic Ising model (see figure 1) and the three-state clock model (see figure 2(b)), the position of the non-trivial minimum of regression uncertainty directly corresponds to the critical point of the continuous phase transition of the system. It is further shown that the regression uncertainty for reconstructing temperatures of the six-state and seven-state clock models (see figures 2(c) and (d)) assumes two non-trivial minima, respectively, which can provide a new type of data-driven evidence on the existence of the intermediate phase in these systems. These findings clearly suggest that the regression uncertainty in ISP generally contains non-trivial hidden information. By utilizing this hidden information, an unsupervised machine learning approach was developed in this work for automated detection of phases of matter, dubbed learning from regression uncertainty (LFRU), which is distinguished from the various classification-based machine learning approaches developed so far [9-20, 39, 55-58]. This is achieved by revealing an intrinsic connection between regression uncertainty and response properties of the system, thus making the outputs of this machine learning approach directly interpretable via conventional notions of physics (noticing however that the employed NNs themselves still remain not interpretable). Moreover, since the implementation of this approach is robust against different choices of the NN architecture, it is expected that state-of-the-art developments in the field of artificial intelligence and data science can be readily exploited to automatically detect phases of matter in a physically interpretable manner via LFRU.
The rest of this paper is organized as follows: In section 2, the hidden information in regression uncertainty is presented, the intrinsic connection between regression uncertainty and response properties is discussed, the generic application of regression uncertainty in learning continuous phase transitions is shown. In section 3, how phase transitions in systems with possible intermediate phases can be investigated with the hidden information in regression uncertainty is further discussed, and its limitation is clarified. In section 4, conclusions and outlooks are given.

Identify phase transitions from regression uncertainty
To see concretely how the regression uncertainty in ISP can be utilized to reveal possible phase transitions, let us start with the ISP in the prototypical ferromagnetic Ising model H = −J ∑ ⟨i,j⟩ s i s j , where the Ising spins s i = ±1 are located on a square lattice with linear size L and periodic boundary condition imposed. The coupling strength J = 1 is set as the energy unit in the following.
As the inverse of generating spin configurations that satisfy the probability distribution exp(−H/T)/Z (Z ≡ ∑ {s i } exp(−H/T) and the Boltzmann constant k B is set to be 1) at a given temperature T, the ISP in this case asks the question of what the possible temperature is for a given spin configuration (see figure 1(a) for instance), hence naturally corresponds to a prototypical regression task. This regression task is performed by training the NN as an L × L → 1 map from the set of spin configurations to the set of reconstructed temperatures denoted by T R . During the training process of the NN, a large number of its parameters are optimized by minimizing a loss function. The mean square error (MSE) [21] is employed as the loss function L here for instance, i.e.
where ⟨·⟩ denotes the average over all the samples in the training dataset (see appendices A and B for more technical details concerning the data generation, NN training, the architecture of NN, and the loss function).

Hidden information in regression uncertainty
Due to the intrinsic statistical characteristic of ISP, for any set of spin configurations, i.e. samples, generated at a given temperature T according to the probability distribution exp(−H/T)/Z, some of its spin configurations can also appear in other sets of spin configurations generated at temperatures that are different from T. As a direct result, the reconstructed temperatures T R for each spin configuration in the same set generally does not assume the same value. This thus gives rise to an intrinsic uncertainty of the regression results. Straightforwardly, one can use the standard deviation U(T) of well-trained NN's outputs to characterize this regression uncertainty at a given temperature T, where ⟨·⟩ T denotes the average over all the samples generated at T in the test dataset. For instance, as one can see from the error bars and the range of the colored error band in the inset of figure 1(b), although the regression uncertainty U(T) can be generally very small for the well-trained NN that assumes good performance in reconstructing the temperature, it always exists. Naturally, the regression uncertainty is unfavorable for reconstructing the precise temperature T from given spin configurations. But somewhat surprisingly, this unfavorable regression uncertainty is found to contain hidden information that can be utilized to identify the critical point T C of the ferromagnetic phase transition of the system. As one can see from figure 1(b), the temperature dependence of regression uncertainty U(T) assumes a non-trivial M-shape for reconstructing temperatures T ∈ [2.0, 2.5] with the temperature spacing ∆T = 2 × 10 −2 . In particular, one can find that its valley appears at T = 2.26 ± 0.01 (the resolution is determined by ∆T, see appendix C for a finite-size analysis with a higher resolution), which matches the critical temperature T C = 2/ ln(1 + √ 2) ≈ 2.269 [51] of the two-dimensional ferromagnetic Ising model very well. This strongly suggests that the temperature dependence of regression uncertainty U(T) contains non-trivial information, i.e. the position of the valley, which can be utilized to reveal possible phase transitions. The reconstructed temperature TR is very close to the target T (the diagonal line represents the ideal regression results TR = T for the ISP), but there always exists an intrinsic regression uncertainty U(T) as shown by the error bars and the range of the colored error band. (c) Schematic illustration of the origin of the hidden information in regression uncertainty. In a simple scenario of reconstructing two temperatures T1 and T2 (T1 < T2 < TC), which involves two sets of an equal number of samples (illustrated by squares) generated probabilistically at T1 and T2, respectively, there can be n1 samples in the set of T1 with relatively high energy that are essentially indistinguishable from n2 samples in the set of T2 with relatively low energy. To deal with these n1 + n2 indistinguishable samples (illustrated by red squares), a simple but still decent way for the NN to accomplish its regression task is to associate them with their weighted average temperature. As a result, their TR tends to be closer to T2 rather than T1 due to n2 > n1 since the heat capacity that characterizes the strength of thermal fluctuations satisfies C(T2) > C(T1) in this case, which leads to U(T2) < U(T1). This indicates that the temperature at which the heat capacity C(T) reaches its maximum should match exactly the temperature at which the regression uncertainty U(T) reaches its minimum as shown in (b).

Connection between regression uncertainty and response properties
To gain more intuitions for understanding the behavior of regression uncertainty shown in figure 1(b) and see the possible origin of its non-trivial information concerning phase transitions, let us consider the training process of the NN in a simple scenario where it is fed with only two sets of an equal number of samples generated at temperatures T 1 and T 2 near the critical temperature T C with T 1 < T 2 < T C . Since the samples (illustrated by squares in figure 1(c)) in these two sets are generated probabilistically, one expects that certain amount of samples, say n 1 samples, that correspond to relatively high energy in the set of T 1 are essentially indistinguishable from certain amount of samples, say n 2 samples, that correspond to relatively low energy in the set of T 2 (illustrated by red squares in figure 1(c)). These essentially indistinguishable samples make it impossible for any algorithm to perform the ISP regression perfectly (corresponding to L = 0).
To give a reconstructed temperature T R to these n 1 + n 2 indistinguishable samples, one can image that in practice, the NN could adopt some oversimplified and crude ways if it is totally confused, e.g. associating all with T 1 (or T 2 ) without distinction or just randomly guessing. Beyond these bad candidates, a simple but still decent way for the NN to accomplish its regression task is to associate the n 1 + n 2 indistinguishable samples with their weighted average temperature T R = (n 1 T 1 + n 2 T 2 )/(n 1 + n 2 ). Indeed, the NN can learn better ways by itself, but if its actual output T R largely departs from this weighted average, one can hardly expect an overall good performance of ISP. Further noticing that in the data generation according to the probability distribution exp(−H/T)/Z, stronger thermal fluctuations make those samples with their energy deviating from the average energy of the set more easily to be accessed, one thus expects n 2 > n 1 since thermal fluctuations at T 2 are stronger than T 1 for T 1 < T 2 < T C . Then the weighted average is closer to T 2 rather than T 1 , so is the reconstructed temperature T R . This thus gives rise to a smaller regression uncertainty at T 2 compared to the one at T 1 , i.e. U(T 2 ) < U(T 1 ). Similarly, one expects the opposite for the case . Moreover, the strength of thermal fluctuations at different temperature T can be characterized by the response property of the system with respect to the temperature, i.e. by the heat capacity . This indicates that a larger heat capacity corresponds to a smaller regression uncertainty, and in particular, the temperature at which the heat capacity C(T) reaches its maximum should match exactly the temperature at which the regression uncertainty U(T) reaches its minimum. As shown by the heat capacity curve in figure 1(b), the temperature of the peak of C(T) indeed assumes the same value as the one of the valley of U(T). Actually, one can also see in the following that the valley position of regression uncertainty U(T) generally show a consistency with the peak position of heat capacity C(T), not only in the case of the ferromagnetic Ising model, but in the cases of three-, six-, and seven-state clock models as well, supporting the above understanding on the origin of the hidden information in regression uncertainty. It is worth mentioning that although the connection between U(T) and C(T) discussed above generally hold true, in practical calculations, since the temperature region where the regression is performed is a finite interval, the behavior of U(T) can also be influenced by the boundary effect of the region. This boundary effect in fact gives rises to the maxima of U(T) shown in figure 1(b). Consider for instance the regime below T C , one observes that as the temperature increases, the regression uncertainty U(T) increases, and reaches a local maximum. The origin of this behavior can be understood by comparing the regression uncertainty at the left boundary of the temperature region (T = 2.0) with the one at the temperature that is a bit away from the left boundary (e.g. T = 2.02). U(T = 2.02) is larger than U(T = 2.0) in this case, since when the NN tries to reconstruct the temperature at T = 2.02, it is confused by both the similar configurations that appear at T = 2.0 and the ones that appear at temperatures higher than T = 2.02, while for reconstructing the temperature at the left boundary T = 2.0, it is only confused by the similar configurations that appear at temperatures higher than T = 2.0. As shown in more detailed investigations on this boundary effect presented in appendix B (see in particular figure 4(b)), the maxima of U(T) depend on the choice of the temperature region where the regression is performed, while in sharp contrast, the temperature that corresponds to the non-trivial minimum of U(T) remains invariant at the critical temperature T C .

The generic application of regression uncertainty in learning continuous phase transitions
The above investigations on ISP in the ferromagnetic Ising model reveal an intrinsic connection between regression uncertainty and response properties of the system. By utilizing this connection, a new type of unsupervised machine learning approach LFRU was established for revealing possible phase transitions. To further demonstrate LFRU as a generic tool for identifying continuous phase transitions, let us consider the q = 3 case of the q-state clock model H q = −J ∑ ⟨i,j⟩ cos(θ i − θ j ), where θ i = 2π n i /q denotes the q-valued spin located at the lattice site i with n i = 0, 1, . . . , q − 1. The generic ability of deep NNs in image recognition enables LFRU to be directly applied to this model with no additional design. Actually, the training and testing processes are exactly the same in practice as the ones involved in studying the ferromagnetic Ising model. Here, the NN is trained to perform the ISP regression for reconstructing temperatures T ∈ [1, 2] (∆T = 5 × 10 −2 ) of the three-state clock model, then test its performance and calculate the corresponding regression uncertainty. As one can see from figure 2(b), the temperature dependence of regression uncertainty U(T) in this case also assumes an M-shape. In particular, the valley position T = 1.500 ± 0.025 matches the critical temperature T C = 3/(2 ln(1 + √ 3)) ≈ 1.492 [59][60][61] of the continuous phase transition point of this system.
Remarkably, although the ISP regression algorithm itself is supervised since there are labels attached to each sample, LFRU can still be regarded as an unsupervised approach since those labels (e.g. temperature T, rather than phase A/B) do not provide the NN with any prior physical knowledge about the actual target of LFRU, i.e. possible phase transitions. As already shown concretely in the ferromagnetic Ising model and the three-state clock model, one directly trains the NN to perform regression tasks of the corresponding ISP and then calculates the regression uncertainty of the well-trained NN afterward (see equation (2) for instance). By the non-trivial minimum of regression uncertainty, the key information concerning possible phase transitions is unsupervisedly revealed (see appendix D for a blank comparison with no phase transition involved). For continuous phase transitions, the corresponding response functions diverge (reach the maximum for finite-size systems) at the critical point, so the position of the non-trivial minimum of regression uncertainty directly corresponds to the critical point of phase transitions (see figures 1(b) and 2(b) for instance). In particular, when applying LFRU, one does not need to calculate the response functions such as the heat capacity, hence LFRU can assume practical advantages over conventional approaches in revealing possible phase transitions of the systems where directly calculating the heat capacity or other relevant observables is highly non-trivial or even impossible, e.g. fermion or spin systems with sign problems [19,20,62]. Moreover, noticing that LFRU does not assume the number of existing phases in the system under consideration, one could directly use it to investigate complex physical systems with possible intermediate phases [63][64][65][66][67][68]. As one shall see in the concrete example presented in the following, LFRU can indeed reveal the key information concerning intermediate phases.

Reveal intermediate phases from regression uncertainty
Let us now discuss how phase transitions in systems with possible intermediate phases can be investigated by LFRU. To this end, the q = 6, 7 cases of the q-state clock model are focused here for instance. These models exhibit an intermediate vortex-antivortex condensed BKT phase between paramagnetic and ferromagnetic phases [52][53][54], and hence in recent years are typically employed to examine the ability of new machine learning approaches [15,[69][70][71][72][73].
To investigate the temperature-driven phase transitions in the six-state and seven-state clock models, firstly, the NN is trained to perform the ISP regression for reconstructing the temperature T in these two models, and then their respective regression uncertainty are calculated. As one can see from the temperature dependence of regression uncertainty U(T) shown in figures 2(c) and (d), both the U(T) curves assume two successive M-shapes. Such a structure with two non-trivial minima, instead of one, is in sharp contrast to the ones presented in figures 1 and 2(b), indicating that the six-state and seven-state clock models assume three different phases, i.e. an intermediate phase that is absent in their two-state (Ising) and three-state counterparts. This thus provides a new type of data-driven evidence on the existence of the intermediate phase in these two models, corroborating the results in previous investigations employing conventional approaches such as the ones utilizing the helicity modulus [74,75], the Fisher zeros [76,77], the entanglement entropy of the fixed-point matrix-product-state [78], etc.
At this stage one may incline to associate the two minima of regression uncertainty U(T) shown in figure 2(c) or 2(d) with the two transition points of the six-state or seven-state clock model. But thanks to the physical interpretability of LFRU's outputs, it is clear that the two minima of regression uncertainty U(T) should correspond to the two maxima of the system's response function with respect to T, i.e. the heat capacity C(T), which is indeed the case as shown by the vertical lines in figures 2(c) and (d). Noticing that the phase transitions associated with the intermediate phase in these two models are of the BKT type, i.e. driven by topological defects [78,79], the positions of the two maxima of heat capacity C(T) in fact do not exactly match the critical temperatures [78,79], indicating that the positions of the two minima of regression uncertainty U(T) in figures 2(c) and (d) should not be interpreted as the BKT transition points. This on the one hand clarifies that quantitatively identifying critical points via LFRU is restricted to continuous phase transitions, and on the other hand provides data-driven evidence that the transitions in these systems are not of the Landau type, corroborating recent investigations [69,[77][78][79].
Finally, another remarkable aspect of LFRU's implementation to highlight is the robustness against different choices of the NN architecture. All the machine learning results presented in figures 1 and 2 are obtained by employing a widely-used NN that is known as ResNet [80]. In Appendix A, the same analysis is performed employing another type of widely-used NN that is known as GoogLeNet [81]. Their corresponding results agree very well with each other, suggesting that state-of-the-art developments in the field of artificial intelligence and data science can be readily exploited to automatically detect phases of matter in a physically interpretable manner via LFRU.

Conclusions
The ISP usually has a primary focus on accurately and precisely inferring system parameters, and thus the uncertainty of regression results was naturally considered as unfavorable for performing such regression tasks. However, this unfavorable regression uncertainty actually assumes an intrinsic connection to the system's response properties, making it accommodate crucial information concerning possible phase transitions of the system under consideration. This enables the development of a fundamentally new type of generic unsupervised machine learning approach LFRU for automated detection of phases of matter, which is based on regression instead of classification and produces physically interpretable results, as manifested concretely in the cases of Ising model and the three-, six-, and seven-state clock models. The working mechanism of this approach is quite general, so its application is not restricted to temperature-driven transitions investigated here. Moreover, noticing that NNs can easily deal with not only the data generated from numerical simulations but also experimental data [82], including both the real optical imaging [83] and preprocessed physical configurations such as snapshots of the many-body density matrix [84], hence LFRU is expected to be available for analyzing actual experiments as well. In addition, various possible phase transitions in a wide range of nonequilibrium complex many-body systems [3,[85][86][87][88], such as the active matter systems described by the Vicsek model [89] consisting of self-propelled particles, can likewise be investigated via this approach [90]. In these regards, we believe that our findings will stimulate further efforts in both developing and applying physically interpretable machine learning approaches to unveil new physics in both equilibrium and nonequilibrium complex many-body systems.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// github.com/wcgscnu/LFRU.

Appendix A. Data generation and NN training
In this work, the data generation for the ferromagnetic Ising model was done by Monte Carlo simulations using the Swendsen-Wang algorithm [91,92] on the Ising Hamiltonian H with periodic boundary condition imposed. For the ferromagnetic Ising model with each system size L = 120, 140, 160, 180, 200, there were 7 × 10 3 samples of the steady state spin configuration generated at each temperature T in the temperature region [2.0, 2.5] with the temperature spacing kept as ∆T = 2 × 10 −2 , and in the temperature region [2.25, 2.30] with ∆T = 2 × 10 −3 , respectively. These samples were converted to images as shown in figure 1(a), where the blue or red at each pixel represents the spin s i on the ith lattice site, and then divided into three categories in the ratio of 5:1:1 forming the training, validation, and test datasets [21].
The data generation for the q-state clock model was done by Monte Carlo simulations on the Hamiltonian H q with periodic boundary condition imposed. For q = 3 with L = 100, there were 7 × 10 3 samples of the steady state spin configuration generated at each temperature T in the temperature region [1,2] with ∆T = 5 × 10 −2 . For q = 6, 7 with L = 100, there were 7 × 10 3 samples of the steady state spin configuration generated at each temperature T in the temperature region [0.2, 1.4] with ∆T = 2 × 10 −2 , respectively. These samples were also converted to images as shown in figure 2(a), where the cyclic color at each pixel represents the planar angle θ i on the ith lattice site, and then divided into three categories in the same ratio as above forming the datasets. Using these datasets, two widely-used NNs are employed, namely, ResNet [80] and GoogLeNet [81], to perform regression tasks of ISP. The results obtained by ResNet are shown in figures 1 and 2 in the main text, and the results obtained by GoogLeNet are shown in Figure 3. These NNs are similarly trained by using the Adam optimizer [93] traversing the training dataset for 10 epochs at a learning rate α = 10 −3 , together with a validation at each epoch traversing the validation dataset. After that, the samples in the test datasets are used to calculate the regression uncertainty U(T). All the machine learning results in this work are averaged over 20 independent training and testing processes.

Appendix B. Loss function, and the maxima of regression uncertainty
In the main text, the MSE loss function L = ⟨(T − T R ) 2 ⟩ was employed to train the NN for instance. Here, for the sake of completeness, the same analysis is performed employing the mean absolute error (MAE) loss function L = ⟨|T − T R |⟩, which is another widely-used loss function that is suitable for regression tasks. Their results are presented in figure 4(a). For reconstructing temperatures T ∈ [2.0, 2.5] (∆T = 2 × 10 −2 ) of Ising model with L = 120, these two different choices of loss function give rise to similar behavior of regression uncertainty. In particular, the corresponding valley positions of regression uncertainty U(T) are numerically the same. This thus indicates that the signals generated by LFRU concerning possible phase transitions are robust against different choices of the loss function, as long as the NN trained with the loss function can achieve a good performance in reconstructing the system parameter.
Moreover, to better understand the origin of the maxima of regression uncertainty, the boundary effect of the parameter region where the regression is performed is investigated. It can be seen clearly from

Appendix D. Blank comparison and the performance on ISP
For a blank comparison, LFRU is applied here on the ferromagnetic Ising model concerning solely the low-temperature region [2.0, 2.2] (∆T = 2 × 10 −2 ) employing both ResNet [80] and GoogLeNet [81], respectively. As one can see in figure 6(a), the temperature dependence of regression uncertainty U(T) assumes a single maximum and no non-trivial minimum, namely, only the boundary effect of the parameter region exhibits signs. This is consistent with the fact that only one phase, i.e. the ferromagnetic phase, exists with no phase transition in the temperature region [2.0, 2.2]. Similar results are obtained concerning solely the high-temperature region [2.3, 2.5] (∆T = 2 × 10 −2 ) that corresponds to only the paramagnetic phase, as shown in figure 6(b). These results verify that the non-trivial minimum of regression uncertainty in performing regression tasks of ISP is indeed an unsupervised signal for revealing possible phase transitions of the system under consideration.
It is also worth mentioning that the signals generated by LFRU concerning possible phase transitions are based on the NN's good performance on ISP. Such a performance is not guaranteed by an arbitrary NN. Here, a poor performance is deliberately produced (see figure 6(d)) concerning the temperature region [2.0, 2.5] (∆T = 2 × 10 −2 ) on the ferromagnetic Ising model with L = 120. This is obtained by using a very  ) of Ising model with L = 120 obtained by using a very simple fully-connected NN that assumes a deliberately produced poor performance on ISP. This curve of U(T) looks similar to the ones presented in (a) and (b) with no phase transition involved, suggesting that the NN's good performance on ISP is a prerequisite of LFRU for automated detection of phases of matter. (d) Regression results corresponding to (c). The reconstructed temperature TR largely departs from the target T (the diagonal line represents the ideal regression results TR = T for the ISP). The error bars and the range of the colored error band represent the regression uncertainty U(T), but here the regression uncertainty no longer contain hidden information that can be utilized to reveal possible phase transitions, since this NN fails to accomplish its regression task.
simple fully-connected NN, which consists of an input layer with 3L 2 neurons (3 for RGB), a hidden layer with L neurons (rectified linear unit as the activation function, batch normalization applied) [21], and an output layer with only one neuron. It is worth mentioning that the fully-connected NN might also perform well after fine-tuning, but a poor performance is needed here for technical discussions. As one can see from figure 6(c), when the NN cannot well accomplish its regression task of ISP, the resulting behavior of regression uncertainty U(T) looks similar to the ones presented in figures 6(a) and (b) with no phase transition involved. However, the temperature region under consideration in figure 6(c) is indeed across T C , suggesting that one cannot distinguish whether a single peak of U(T) is associated with a phase transition or not merely by monitoring the regression uncertainty U(T) itself. A reasonable judgment is that any investigation according to the information extracted from a poor performance will suffer from a lack of legitimacy. Therefore, the NN's good performance on ISP is a prerequisite of LFRU for automated detection of phases of matter.

Appendix E. Comparison with the learning-by-confusion approach
Here in this appendix, LFRU is directly compared with the learning-by-confusion approach [10], which is also an unsupervised learning approach for automated detection of phases of matter. As a concrete example, let us consider identifying T C of the ferromagnetic Ising model with samples generated at temperatures T ∈ [2.2, 2.4] with the temperature spacing ∆T = 2 × 10 −2 . As illustrated in the right panel of figure 7, when applying the learning-by-confusion approach, one shall first propose a possible critical point T s . After that, the samples satisfying T < T s are accordingly labeled with (ỹ A = 1,ỹ B = 0) as phase A, and the samples satisfying T > T s are labeled with (ỹ A = 0,ỹ B = 1) as phase B. Then the NN is trained to perform an NN-based binary classification between these artificial phases, outputting for each sample its confidences (y A , y B ) of identifying the sample as phase A or B. The loss function should be suitable for the binary classification task, e.g. the cross-entropy function L ′ = ⟨−ỹ A ln y A −ỹ B ln y B ⟩ [21], instead of the MSE or MAE loss function for regression. After training, one obtains the classification accuracy P(T s ) that the NN successfully matches the samples in the test dataset with their attached labels corresponding to the proposed T s . According to reference [10], one shall examine a series of different T s , and the position of the maximum of P(T s ) is considered the critical point T C predicted via the learning-by-confusion approach. Similar to LFRU, the resolution of T C predicted via the learning-by-confusion approach is also determined by ∆T. The NN used by the learning-by-confusion approach and the NN used by LFRU only have slight differences in their output layers, i.e. there are two neurons in the former for binary classification, but only one single neuron in the latter for regression, and hence the time it takes to train these NNs traversing the same datasets for an epoch is approximately the same. Figures 7(a) and (b) show that the convergence speed is also similar, so they require similar numbers of epochs to give birth to a well-trained NN. However, a well-trained NN for the learning-by-confusion approach can just give a single P(T s ), and if there are m different possible values of T s to be examined as T C (at least m ⩾ 3, and usually much larger), it requires m well-trained NNs. In sharp contrast, when applying LFRU, a complete curve of U(T) is directly obtained from one well-trained NN, and the valley position of U(T) can be considered as the critical point T C predicted via LFRU. Therefore, the approach developed in this work could be about m times faster than the widely-used learning-by-confusion approach. The direct comparison of their results is presented in figures 7(c) and (d).

Appendix F. Description of algorithms
To more clearly see the efficiency difference discussed in appendix E, here the flowcharts (see figure 8) and pseudo-codes between LFRU and the learning-by-confusion approach (Algorithms) are presented. In particular, the computational cost can be evaluated from the number of times one reinitializes an NN. Throughout the algorithm of LFRU, one initializes an NN, train it, validate it, test it, and obtain the prediction of T C straightforwardly. While as one can see in Algorithm 2 and the right panel of figure 8, for the learning-by-confusion approach [10], the initialization of the NN (and its corresponding machine learning processes of training, validation, and testing) appears in a non-trivial for-loop traversing a series of possible T s . To correctly locate a peak of P(T s ), one needs to examine at least m = 3 different possible values of T s . And without any prior physical knowledge about the phases of matter in the system, m could be much larger. This thus leads to LFRU's efficiency advantage in the practical aspects. Algorithm 1. Pseudo-code of the LFRU approach. This algorithm is also illustrated in the left panels of figures 7 and 8. It takes physical configurations that correspond to different values of a certain system parameter as inputs, trains the NN to reconstruct the system parameter (for instance, temperature T), and the well-trained NN's regression uncertainty is found to contain hidden information that can be utilized to reveal possible phase transitions of the system under consideration.
Prepare datasets Label each sample from T Initialize NN for each Epoch in total training steps do Input training samples, obtain NN's outputs T R Calculate training loss L train = ⟨(T − T R ) 2 ⟩ Optimize the NN towards minimizing L train Input validation samples, obtain NN's outputs T R Calculate validation loss L val = ⟨(T − T R ) 2 ⟩ end for Load the well-trained NN which gives minimal L val for each T in the temperature region do Input the testing samples at T, obtain NN's outputs T R Calculate the uncertainty U(T) = √ ⟨(T R − ⟨T R ⟩ T ) 2 ⟩ T end for Obtain T dependence of U(T) and find its valley