The percolating cluster is invisible to image recognition with deep learning

We study the two-dimensional site-percolation model on a square lattice. In this paradigmatic model, sites are randomly occupied with probability p; a second-order phase transition from a non-percolating to a fully percolating phase appears at occupation density pc , called percolation threshold. Through supervised deep learning approaches like classification and regression, we show that standard convolutional neural networks (CNNs), known to work well in similar image recognition tasks, can identify pc and indeed classify the states of a percolation lattice according to their p content or predict their p value via regression. When using instead of p the spatial cluster correlation length ξ as labels, the recognition is beginning to falter. Finally, we show that the same network struggles to detect the presence of a spanning cluster. Rather, predictive power seems lost and the absence or presence of a global spanning cluster is not noticed by a CNN with a local convolutional kernel. Since the existence of such a spanning cluster is at the heart of the percolation problem, our results suggest that CNNs require careful application when used in physics, particularly when encountering less-explored situations.


I. INTRODUCTION
Convolution neural nets (CNN) are a class of deep, i.e., multi-layered, neural nets (DNNs) in which spatial locality of data values is retained during training in a machine learning (ML) setting.When coupled with a form of residual learning [1], the resulting residual networks (ResNets) have been shown to allow astonishing precision when classifying images, e.g., of animals [2] and handwritten characters [3], or when predicting numerical values, e.g., of market prices [4].In recent years, we also witnessed the emergence of DNN techniques in several fields of physics as a new tool for data analysis [5][6][7][8].In condensed matter physics in particular, DNN and CNN proved to be performing well in identifying and classifying phases of material states [9][10][11][12].
Despite all these studies, the ML process in itself tends to be somewhat a black box, and it is yet not known what is allowing a DNN to correctly identify a phase.In order to gain further insight into this issue, we choose a well-known and well-studied system exhibiting perhaps the simplest of all second-order phase transitions, the site-percolation model in two spatial dimensions [13,14].In this model, a cluster spanning throughout the system emerges at an occupation probability p c , leading to a non-spanning phase when p < p c while p ≥ p c corresponds to the phase with at least one such spanning cluster [14].Several ML studies on the percolation model have been already published, mostly using supervised learning in order to identify the two phases via ML classification [15,16].An estimate of the critical exponent, ν, of the percolation transition has also been given [15].The task of determining p c was further used to evaluate different ML regression techniques in Ref. [17].For unsupervised and generative learning, less work has been done [16,18].While some successes have been reported [18], other works show the complexities involved when trying to predict percolation states [16].
In this work, we start by replicating some of the supervised DL analyses.We find that CNNs usually employed in image recognition ML tasks also work very well for classifying percolation states according to p as well as for regression when determining p from such states.The results are less convincing when instead of p, we use the spatial correlation lengths ξ as an alternative means to characterize the phase boundary.We find that, even when correcting for probable difficulties due to nonbalanced data availability for ξ, classification and regression tasks fail to give acceptably diagonal confusion matrices.Crucially, when analyzing in detail whether spanning clusters < p c or non-spanning clusters > p c are correctly identified, we find the CNNs that performed so well in the initial image recognition tasks now consistently fail to reflect the ground truth.Rather, it appears that the CNNs use p as a proxy measure to inform their classification predictions -a strategy that is obviously false for the percolation problem.We confirm this conclusion by testing our networks with bespoke test sets that include artificially spanning clusters < p c or firebreak structures for > p c .

A. The percolation model
The percolation problem is well-known with a rich history across the natural sciences [13,14,[19][20][21][22].It provides the usual statistical characteristics across a arXiv:2303.15298v1[cond-mat.dis-nn]27 Mar 2023 second-order transition such as, e.g., critical exponents, finite-size scaling, renormalization and universality [14].Briefly, on a percolation lattice of size L × L, individual lattice sites x = (x, y), x, y ∈ [1, L], are randomly occupied with occupation probability p such that the state ψ of site x is ψ( x) = 1 for occupied and ψ( x) = 0 for unoccupied sites.We say that a connection between neighboring sites exists when these are side-to-side nearestneighbors on the square lattice, while diagonal sites can never be connected.A group of these connected occupied sites is called a cluster.Such a cluster then percolates when it spans the whole lattice either vertically from the top of the square to the bottom or, equivalently, horizontally from the left to the right.Obviously, for p = 0, all sites are unoccupied and no spanning cluster can exist while for p = 1 the spanning cluster trivially extends throughout the lattice.In Fig. 1, we show examples of percolation clusters generated for various p values.The percolation threshold is at p = p c (L), such that for p < p c (L) most clusters do not span while for p > p c (L) they do.This can be expressed as the percolation probability P (p) = s L (p)/L 2 , where s L (p) gives the size of the (largest) percolating cluster for size L and • denotes an average over many randomly generated realizations.Similarly, we can define a probability of non-percolating, . For an infinite system (L → ∞), one finds the emergence of an infinite spanning cluster at p c = 0.59274605079210(2).This estimate has been determined numerically evermore precisely over the preceding decades [23] while no analytical value is yet known [22].Another quantity often used to characterize the percolation transition is the two-site correlation function g(r) = ψ( x)ψ( x + r) x,| r|=r , where the • x,| r|=r denotes the average over all x and directions | r|.This g(r) measures the probability to have two occupied sites separated by a distance r, in the same cluster [14].The associated correlation length ξ is determined through ξ = r r 2 g(r)/ r g(r).In the infinite system ξ diverges at p c as |p − p c | −ν , where ν = 4/3 is the critical exponent, determining the universality class of the percolation problem [14].

B. Generating percolation states for training and validation
In order to facilitate the recognition of percolation with image recognition tools of ML, we have generated finite-sized L × L, with L = 100, percolation states, denoted as ψ i (p), for the 31 p-values 0.1, 0.2, . .., 0.5, 0.55, 0.555, 0.556, . . ., 0.655, 0.66, 0.7, . . ., 0.9.For each such p, N = 10000 different random ψ i (p) have been generated.Each state ψ i (p), i = 1, . . ., N , is of course just an array of numbers with 0 denoting unoccupied and 1 occupied sites.Nevertheless, we occasionally use for convenience the term "image" to denote ψ i (p).The well-known Hoshen-Kopelman algorithm [24] is used to identify and label clusters from which we (i) compute s(p), g(r), and ξ(p) as well as (ii) determine the presence or absence of a spanning cluster.In Fig. 1 we show examples of percolation states generated for various p values as well as the extracted P 100 , Q 100 , p c (100) and ξ(p) estimates.The different gray scales used in Fig. 1(a) mark the different connected clusters.However, for the ML approach below, we shall only use the numerical values 0 and 1 corresponding to the state ψ i (p) [25].This is visualized as the simple black and white version shown, e.g., for p = 0.5 in Fig. 1 (a).From Figs. 1 (b+c), we note that P (p) and ξ(p) behave qualitatively as expected [14], with P (p) 1 for p < p c and P (p) 0 for p > p c and ξ(p) maximal near p c .Clearly, p c (L = 100) ∼ 0.585(5) < p c .This latter behavior is as expected since s L (p) ≤ s ∞ (p), i.e., a cluster that seemingly spans an L × L finite square might still not span on an infinite system.
We emphasize that in the construction, we took care to only construct states such that for each p, the number of occupied sites is exactly N occ = p × L 2 and hence p can be used as exact label for the supervised learning approach.We note that p = N occ /L 2 can therefore also be called the percolation density.For the ML results discussed below, it will also be important to note that the spacing between p values reduces when p reaches 0.5 with the next p value given by 0.55 and then 0.555.Similarly, the p spacing increases as 0.655, 0.66, 0.7.We will later see that this results in some deviations from perfect classification/regression. Last, we have also generated a similar test set with L = 200.As the ML training cycles naturally take much longer, we have not averaged these over ten independent trainings.We find that our results do not change significantly when using this much larger data set and hence we will refrain from showing any result for these larger states in the following.

C. Supervised ML strategy for phase prediction
As discussed above, DL neural nets using the power of CNNs are among the preferred approaches when trying to identify phases in condensed matter systems [26][27][28].Here, we shall use a ResNet18 [1] network with 17 convolutional and 1 fully-connected layers, pretrained on the ImageNet dataset [29].As basis of our ML implementation we use the PyTorch suite of ML codes [30].We train the ResNet18 on the percolation of 310000 states, using a 90%/10% split into training and validation data, T and V, respectively; this corresponds to N train = 279000 and N val = 31000 samples, respectively.We concentrate on two supervised ML tasks.First, we classify percolation images according to (i) p, (ii) ξ as well as (iii) spanning versus non-spanning.In the second task, we aim to predict p and ξ values via ML regression.In both tasks, the overall network architecture remains identical, we just adapt the last layer as usual [31].For the classification the output layers have a number of neurons corresponding to the number of classes trained, i.e., for the classification by density the C = 31 p-values given above, while for regression the output layer has only one neuron giving the numerical prediction.However, the loss functions are different.Let w denote the set of parameters (weights) of the ResNet and let (ψ i , χ i ) represent a given image sample with χ i its classification/regression target, i.e., classes p or ξ, and also χ i the predicted values, p or ξ .For classification of categorical data, the class names are denoted by a class index c = 1, . . ., C and encoded as χ ck = 1 if χ c = k, 0 otherwise.Then, for the (multi-class) classification problem, we choose the usual cross-entropy loss function, where n is the number of samples, i.e., either N train or N val [26].We use the AdaDelta optimizer [32] and find that a learning rate of r = 0.001 produces good convergence [33].Another good metric for the classification task is the accuracy a, which is the ratio of correct predictions over n.The loss function for the regression problem is given by the mean-squared error 2 while r and the optimizer remain the same.When giving results for l c and l r below, we always present those after averaging over at least 10 independent training and validation cycles, i.e., with a different initial split of the data into mini-batches.We use the notation l c and l r to indicate this averaging.In the case of classification, we also represent the quality of a prediction by confusion matrices [26].These graphically represent the predicted class labels as a function of the true ones in matrix form, with an error-free prediction corresponding to a purely diagonal matrix.For comparison of the classification and regression calculations, we use in both cases a maximum number of max = 20 epochs.
Our ML calculations train with a batch size of 256 for classification and for regression.All runs are performed on NVIDIA Quadro RTX6000 cards.

D. Generating test data sets
We generate a test data set, τ , of 1000 states for each of the 31 p-values, such that in total we have N τ = 31000.This test set is used to make all the confusion matrices given below.By doing this, we ensure that the performance of the trained DL networks is always measured on unseen data [27].
In addition, we generate three special test data sets.These data sets have been constructed to allow testing for the existence of the spanning cluster.The first special data set, τ sl , is made for the 27 p-values 0.5, 0.55, 0.555, . . ., 0.66, 0.7 close to p c and again consists of 1000 states ψ i (p) for each p.After generating each ψ i (p), we add a straight line of occupied sites from top to bottom, while keeping p constant by removing other sites at random positions.Obviously, every ψ i (p) in τ sl therefore contains at least one spanning cluster by construction.As a consistency check to the performance of the ML networks, we also add two more ψ i without any connecting path for p = 0.1 and 0.2.
In the next set, τ rw , we start with the same 27 p-values for a new set of 27000 ψ i (p), but instead of the straight line, we add a directed random walk from top to bottom.As before, we conserve the overall density p of occupied sites.Hence, every samples in τ rw is spanning.We again add two ψ i for p = 0.1 and 0.2 without the connected random path.
Finally, the third special data set, τ fb , again contains 27000 lattices for the same previously mentioned 27 p-values, but in each of the states we apply random firebreak paths, horizontally and vertically, of unoccupied sites.This set is clearly non-spanning.Following the same logic as for τ sl and τ rw , we add two spanning test samples above p c without the firebreak, namely, for p = 0.8 and 0.9.In all three cases, despite the modification in the lattices we ensure that N occ = p × L 2 and hence the occupation density is p.Examples of the three sets can be seen in Fig. 2.

A. Classification of states labeled with density p
We use the density p values as labels for the ML task of image recognition with the DL implementation outlined in section II C.After ten trainings with all 310000 images for 20 epochs, we find on average a validation loss of l c,val = 0.052 ± 0.009 (corresponding to an accuracy of a c,val = 99.323%± 0.003).This is comparable to the very good image classification results shown on kaggle [34].Fig. 3(a) gives the resulting averaged confusion matrix.The dependence of the training and validation losses, l c,train and l c,val , respectively, on the number of epochs, , is shown in Fig. 3(b).From the behavior of the loss functions, we can see that l c,val ≥ l c,train until = 15 after which both losses remain similar.This suggests that max = 20 for our DL approach is indeed sufficient and avoids over-fitting.Similarly, the confusion matrix is mostly diagonal with the exception of very few samples around the change of resolution in density, at p ∼ 0.555 and 0.655, as commented before in section II A.

B. Prediction of densities p via regression
For the regression problem, we train the ResNet18 only for the nine evenly spaced densities p = 0.1, 0.2, . . ., 0.9.After training and validation with T and V, respectively, we examine the states in τ and predict their p values.In Fig. 4, we present the results with (a) indicating the fidelity of the predictions for each p-value and (b) showing good convergence of the losses l r,train and l r,val .Clearly, the regression works very well for the nine trained p-values p = 0.1, . . ., 0.9 as well as the untrained values 0.55, 0.555, . . ., 0.0.655,0.66 close to p c (100).After reaching = 20, we find that min [ l r,train ] = 0.0003 ± 0.0002 and min [ l r,val ] = (6.2± 1.2) × 10 −5 .Therefore we conclude that our CNN performs well for classification and regression tasks while T , V, and τ present appropriately structured data sets for these ML tasks in terms of data size.

C. Classification with correlation length ξ
We now turn our attention to studying image recognition when using the correlation lengths ξ, instead of p, as labels for the ψ i (p) states.One way to do this is to use ξ(p) as label.While for the classification by p the label value was identical to the actual density p of a given state, now each state is labeled by ξ(p) .This means that the actual ξ of the state might be different from the label assigned.Since ξ(p) can be uniquely identified by p, this strategy should be in fact equivalent to the previous situation and the CNN should give us similar classification results.The results of such a classification are shown in Fig. 5 where similarly to Fig. 3 we present in (a) the average confusion matrix for the 31 ξ(p) values (cf.also Fig. 1) and in (b) the evolution of losses during the training.We find a validation loss of min [ l c,val ] = 0.38 ± 0.07 (corresponding to a maximal accuracy of max [ a c,val ] = 87.12%± 0.05) and a highly diagonal confusion matrix, with only a small deviation that can be linked to the change in resolution in our data set above p = 0.5.
One might wish to interpret the above classification with ξ(p) as a success of the ML approach.However, let us reemphasize that it is fundamentally equivalent to simply changing labels while keeping the direct connection of the labels with p unaltered.We now wish to obtain a classification of states via their ξ's which is independent of the p's.In Fig. 6 we show the distribution Ξ of the ξ's in T ∪ V. Clearly, the number of small ξ values is larger than the number of ξ values close to the maximal value of max[ξ] = 15.771(cp.Fig. 1(c)).Hence simply using each ξ as label for the corresponding ψ i would result in a biased dataset.We therefore reorganize the T ∪ V data set.This can be done in two ways.For the first reorganization we create bins a constant number of 10000 samples in each bin.We call this dataset Ξ n .This results in a varying bin width.The second way to reorganize the data set is to keep the bin width constant while restrict- ing the number of samples in each bin.We shall denote this reorganization as Ξ w .Since ξ(p) is non-monotonic in p, we split the reorganization into the case (i) p < p c with and (ii) p > p c .We emphasize that the reorganized data sets consist of the same states as in T ∪ V but now have different labels according to the bin labels for Ξ w and Ξ n .Furthermore, there is now no longer any direct connection of the new labels to the original p densities.
In Fig. 7, we plot the resulting confusion matrices and losses.We see that the classification for Ξ w and Ξ n only results in large diagonal entries in the confusion matrices for small correlation lengths labels ξ.Overall, the classification for Ξ w is somewhat better than for Ξ n when away from p c (L).We attribute this to the larger number of states for the Ξ w for p < p c (L).Still, with overall 62.6% and 55.1% of states misclassified for Ξ w and Ξ n , respectively, it seems clear that classification for correlation lengths must be considered unsatisfactory.

D. Regression with correlation length ξ
For the regression task with ξ, we proceed analogously to section III B. Again, we train the CNN for the individual correlation length ξ i corresponding to each ψ i ∈ T for the nine densities p = 0.1, . . ., 0.9.We then compute the predictions of ξ i for all 31 densities in τ .The results are shown in Fig. 8.We find that the network architecture which previously predicted the density quite accurately is now struggling to correctly predict ξ.A structure seems to exist in the predictions.By looking closely we notice that the network make use of the density for its predictions.Furthermore, by plotting the correlation length according to the density we retrieve the plot of ξ Fig. 1.(b) Dependence of losses lr,train and l r,val on the number of epochs for regression according to ξ.We follow the same convention as in Fig. 4.

E. Classification with the spanning or non-spanning properties
As discussed earlier, the hallmark of the percolation transition is the existence of a spanning cluster which determines whether the system is percolating or not [14].In the previous section, our DL approach has classified according to p or ξ values without testing whether spanning clusters actually exist.We now want to check this and label all states according to whether they are spanning or non-spanning.From Fig. 1, it is immediately clear that for finite-sized systems considered here, there are a non-negligible number of states with appear already spanning even when p < p c and, vice versa, are still non-spanning when p > p c .Furthermore, we note that for such L, the difference between p c and p c (L) is large enough to be important and we hence use p c (L) as the appropriate value to distinguish the two phases.
Figure 9 shows the average results after = 20 with an validation loss of min [ l c,val ] = 0.165 ± 0.001 (corresponding to a maximal validation accuracy max [ a c,val ] = 92.702%± 0.001).At first glance, the figure seems to indicate a great success: from the 31000 states present in τ , 11510.6 have been correctly classified as non-spanning (i.e., N → N ), and 17205.9 as spanning (S → S ) while only 1223.1 are wrongly labeled as nonspanning (S → N ) and 1059.41 as spanning (N → S ) [35].Overall, we would conclude that 92.6% of all test states are correctly classified while 7.4% are wrong.However, from the full percolation analysis for T ∪ V shown in Fig. 1, we know that there are 92.7% of states without a spanning cluster below p c (L) while 7.3% of states, equivalent to 876 samples, already contain a spanning cluster.Similarly, for p > p c (L), 94.8% of states, equivalent to 936 samples, are spanning and 5.2% are not.At p c (L) = 0.585, we furthermore have 518 spanning and 482 non-spanning states.Hence in total, we expect 2812 wrongly classified states.Since the last number is decisively close to the actual number of 2282.5 of misclassified states, this suggests that it is precisely the spanning states below p c (L) and the non-spanning ones above p c (L) which the DL network is unable to recognize.Let us rephrase for clarity: it seems that the DL CNN, when trained in whether a cluster is spanning or non-spanning, completely disregards this information in its classification outputs.
F. Density-resolved study of spanning/non-spanning close to pc(L) In order to understand the behavior observed in the last section, we now reexamine the result of Fig. 9 by analyzing the ML-predicted probabilities, P ML (p).In Fig. 10, we show both P ML (p) as well as P (p); the latter having been obtained by the Hoshen-Kopelman algorithm, cf.Fig. 1(a).While the P (p) and P ML (p) curves -and of course also the corresponding Q(p) and Q ML (p) -appear qualitatively similar, they are nevertheless not identical and the slopes of P ML (p), Q ML (p) are different.We emphasize that the slopes are important for determining the universality class of a second-order phase transition via finite-size scaling [36].Since we know for each image whether it percolates or not, we can also check how well the ML predictions worked by considering the covariance.Let ζ(ψ i (p)) = 0 when there is no percolating cluster in the state ψ i (p) while ζ(ψ i (p)) = 1 if there is.Similarly, we define ζ ML (ψ i (p)) for the prediction by the DL network.Then cov(ζ, ζ ML )(p) measures the covariance of states being found to span by percolation and by ML for given p.In Fig. 10(b) we show the normalized result, i.e., the Pearson coefficient where σ ζ and σ ζML are the standard deviations of the percolation results and the ML predictions.We see that in the transition region, r ζ,ζML 0.12 which is very far from the maximally possible value 1.This suggest that while the ML predictions are not simply random, they are also not very much better than random.
Let us now study the classification into spanning/non- spanning states in detail for each p. Figure 11 and Table I show a comparison of the classification for the ten p values 0.56 to 0.605.We see, e.g., that for p = 0.56, 0.565, 0.57, 0.575 < p c (L) ∼ 0.58, 0.585, 485 of 487 samples, which are already spanning, have been misclassified as non-spanning.Similarly, for p = 0.59, 0.595, 0.6, 0.605 > p c (L), 7545.9 of in total 864 still non-spanning samples are classified as spanning.These results are similar whether one considers a typical sample or the averaged result.Hence, contrary to the supposed success of Fig. 9, we now find that the seemingly few misclassified states of Fig. 9 are indeed precisely those which represent the correct physics.Saying it differently, the ML process seems to have led to a DL network which largely disregards the characteristic of spanning clusters and just uses the overall density of occupied vs. nonoccupied sites to ascertain the phases.Of course, this is the wrong physics when considering percolation.our approach is already somewhat more challenging than these previous works since instead of just asking the DL network to identify the two phases p < p c and p > p c , we also successfully identify all 31 distinct densities p.Using ξ(p) instead of p to identify the 31 densities also works quite well, but is again expected since ξ(p) merely acts as a new set of supervised labels.
Problems emerge when we use the computed correlation length ξ for each state and try classification and regression with these correlation lengths.Having thus explicitly removed any connection with the p values, we nevertheless find that the resulting confusion matrices, fidelity curves and losses are all of much less quality than before.Instead, it seems that the density p information is still the overriding measure used by the DL network to arrive at its outputs (cf.Figs.7 and 8) [38].Rather, as we show in sections III E-G, the DL network completely ignores whether a cluster is spanning or non-spanning, essentially missing the underlying physics of the percolation problem -it seems to still use p as its main ordering measure.
We believe that the root cause of the failure to identify the spanning clusters, or their absence, lies in the fundamentally local nature of the CNN: the filter/kernels employed in the ResNets span a few local sites only [39].Hence it is not entirely surprising that such a CNN cannot correctly identify the essentially global nature of spanning clusters.But it is of course exactly this global percolation that leads to the phase transition.
The reader might wonder why previous CNN studies of phases in other models, such as, e.g., the Ising-type models [9], the three-dimensional Anderson model and its topological variants [10], have failed to find similar such issues.We think that this is because in the Ising case, the majority rule for spin alignment is not concerned with any globally spanning domain [40], while in the Anderson-type models, it is the (typical) local density of states which can serve as order parameter [41].In short, in these models a local property is indeed sufficient to distinguish their phases.
Of course ML aficionados might now want to suggest that extensions of local kernels are possible in CNNs.Indeed, one might, e.g., want to use CNNs in which large dilution parameters are employed to effectively make filters/kernels of manageable size while still spanning across a sizeable portion of the L × L size of each percolation state [42].But while this might solve the issue for fixed L in the percolation case -we have not tested it -it does imply knowing that a global property is important to start with.This would rather diminish the relevance of DL as a tool for unbiased discovery in physics.

FIG. 1 .
FIG. 1.(a) Examples of percolation clusters of size L 2 = 100 2 , obtained for p = 0.2 < pc, 0.6 > pc in the top row and p = 0.5, i.e. close to pc, in the bottom row.While individual clusters have been highlighted with different gray scales for the first three images, the bottom right image with p = 0.5 shows all occupied sites in black only, irrespective of cluster identity.This latter representation is used below for the ML approach.(b) Percolation probabilities P (p) and Q(p) of having a spanning (blue open squares) / non-spanning (red open circles) cluster close to the percolation threshold for dataset T ∪ V.The percolation probability of having a spanning (cyan crosses) / non-spanning (orange plus) cluster close to the percolation threshold for dataset τ .(c) Correlation length ξ(p).In (b+c), the vertical lines indicate the estimates pc(100) ∼ 0.585(5) (dotted) and pc (dashed).

FIG. 2 .
FIG.2.Examples of percolation images from the three special test sets τS with (a) a percolating straight line from top to bottom, (b) a percolating random path from top to bottom and (c) a "firebreak"-like cross of empty sites preventing percolation.For the sake of visibility, in (a+b) the connected path is highlighted in red.In all three cases, p = 0.5.

FIG. 3 .FIG. 4 .
FIG. 3. (a) Average confusion matrix for classification according to p.The dataset used is the test data τ and the models used for predictions are those corresponding with a minimal l c,val .True labels for p are indicated on the horizontal axis while the predicted labels are given on the vertical axis.The color scale represents the number of samples in each matrix entry.(b) Dependence of losses lc,train and l c,val averaged over ten independent training seeds, on the number of epochs for classification according to p.The squares (blue open) denote lc,train while the circles (red solid) show l c,val .The green crosses indicate the minimal l c,val for each of the ten trainings.

FIG. 5 .
FIG. 5. (a) Average confusion matrix for classification according to ξ .The dataset used is the test data τ and the models used for predictions are those corresponding with a minimal l c,val .(b) Dependence of losses lc,train and l c,val on the number of epochs for classification according to ξ .We follow the same convention as in Fig. 3 and Fig. 7.

FIG. 6 .
FIG. 6. Probability distributions for correlation lengths ξ when (a) p < pc (with 12 p-values) and p > pc (18 p-values) with unbalanced Ξ and the balanced counterparts Ξn and Ξw denoted by yellow, magenta and green, respectively.In each case, the distributions are normalized relative to the total number of ξ's in each set, i.e. for (a) 120000 in Ξ and Ξn and 6 = 21360 in Ξw while for (b) there are 180000 in Ξ and Ξn and 5 × 3077 = 15385 in Ξw.

FIG. 7 .FIG. 8 .
FIG. 7.Confusion matrices and losses lr,train and l r,val for the classification results when using the correlation-functionrelabeled Ξw and Ξn data sets.The left column (a) shows the case p < pc while the right column (b) gives the outcome for p > pc.The upper row corresponds to 10000 states for each class with 12 classes for p < pc and 18 at p > pc for Ξw.In the lower row, there are 3560 states for the 6 classes when p < pc and 3077 states for 5 classes when p > pc.

FIG. 9 .
FIG. 9. (a) Average confusion matrix for classification according to spanning/non-spanning.The dataset used is the test data τ and the models used for predictions are those corresponding with a minimal l c,val .The true labels for N and S, are indicated on the horizontal axis while the predicted labels are given on the vertical axis.(b) Dependence of losses lc,train and l c,val on the number of epochs for classification according to spanning/non-spanning.Again, we follow the same convention as for Figs. 3, 7, and 5

FIG. 10 .
FIG. 10.(a) The blue curve (red curve) shows the probability to have a spanning (non-spanning) sample in the training dataset.The cyan (orange) curve gives us the prediction of probability to have a spanning (non-spanning) sample, according to the trained network.(b) Dependence of the Pearson correlation coefficient r on the density p for classification according to spanning/non-spanning.The confidence interval is indicated in gray.In both (a) and (b), The lines connecting the symbols are only a guide to the eye.

FIG. 12 .
FIG.12.Sample states for the three special test sets (a) τ sl with added straight spanning lines, (b) τrw with spanning random walks and (c) τ fb with the non-spanning firebreaks.In each case, the right plots gives the confusions matrices obtained from the DL model previously trained in a spannin vs. non-spanning classification.In all cases, the density is strictly p = 0.5.