Certification of Gaussian Boson Sampling via graphs feature vectors and kernels

Gaussian Boson Sampling (GBS) is a non-universal model for quantum computing inspired by the original formulation of the Boson Sampling (BS) problem. Nowadays, it represents a paradigmatic quantum platform to reach the quantum advantage regime in a specific computational model. Indeed, thanks to the implementation in photonics-based processors, the latest GBS experiments have reached a level of complexity where the quantum apparatus has solved the task faster than currently up-to-date classical strategies. In addition, recent studies have identified possible applications beyond the inherent sampling task. In particular, a direct connection between photon counting of a genuine GBS device and the number of perfect matchings in a graph has been established. In this work, we propose to exploit such a connection to benchmark GBS experiments. We interpret the properties of the feature vectors of the graph encoded in the device as a signature of correct sampling from the true input state. Within this framework, two approaches are presented. The first method exploits the distributions of graph feature vectors and classification via neural networks. The second approach investigates the distributions of graph kernels. Our results provide a novel approach to the actual need for tailored algorithms to benchmark large-scale Gaussian Boson Samplers.


Introduction
Quantum processors and quantum algorithms promise substantial advantages in computational tasks [1]. Recent experiments have shown significant improvements toward the realization of large scale quantum devices operating in the regime in which classical computers cannot reproduce the output of the calculation [1][2][3][4][5][6].
An intriguing computational problem concerning quantum photonic processors is Boson Sampling (BS) and, more recently, its variant Gaussian Boson Sampling (GBS). The BS paradigm corresponds to sampling from the output distribution of a Fock state with n indistinguishable photons after the evolution through a linear optical m-port interferometer [7,8]. This problem turns out to be intractable for a classical computer, while a dedicated quantum device can tackle such a task toward unequivocal demonstration of quantum computational advantage. The GBS variant replaces the quantum resource of the BS, i.e. the Fock state, with single-mode squeezed vacuum states (SMSV). This change to the original problem enhances the samples generation rate with respect to BS performed with probabilistic sources, and preserves the hardness of sampling from a quantum state [9][10][11][12].
The GBS problem has drawn attention for the practical chance to achieve the quantum advantage regime. After the small scale experiments [13][14][15], the latest GBS instances have just reached the condition where the quantum device has solved the task faster than current state-of-the-art classical strategies [4][5][6]. The interest in GBS also concerns applications for sampling for Gaussian states beyond the original

Gaussian Boson Sampling and its connection with graphs
Background. Here we briefly review the general theory of the probability to obtain n-photon configurations from a set of indistinguishable Gaussian input states ρ i such that ρ in = ⊗ m i=1 ρ i , and distributed in m optical modes, after the evolution in a multi-port interferometer. Given the 2m × 2m covariance matrix σ that identifies the Gaussian state, and the output configuration n = (n 1 , n 2 , . . . , n m ), where n i is the number of Figure 1. GBS validation. In the GBS paradigm n-photon configurations are sampled at the outputs of an optical circuit with m ports. The aim of a validation algorithm is to exclude that the obtained samples could have been generated by classically-simulable models. Previous experiments mainly focused in techniques capable to rule out the uniform sampler, the thermal, coherent and distinguishable SMSV states hypotheses.
The quantity σ Q is σ + 1 2 I 2m where I 2m is the 2m × 2m identity operator; A n is a sub-matrix of the overall matrix A that contains the information about the optical circuit represented by the transformation U and the covariance of the input state, while Haf stands for the hafnian of the matrix. More precisely, A n is the 2n × 2n sub-matrix obtained by taking n i times the ith row and the i + m-th column of A [11,51]. The hafnian of A n corresponds to the summation over the possible perfect matching permutations, i.e., the ways to partition the index set {1, . . . , 2n} into n pairs such that each index appears only in one pair (see also [52]). The hafnian is in the #P-complete complexity class, and is a generalization of the permanent of a matrix M according to the following expression: The above description has been used to define a classically-hard sampling algorithm, using indistinguishable SMSV states with photon-counting measurements [10,11,51]. More specifically, in figure 2(a) we report the structure of the sampling matrix A for an input state ρ that has zero displacement. In the language of quantum optics the displacement is the operation that generates a coherent state from the vacuum. Then, the blocks B and C highlighted in the figure 2(a) correspond to the contribution of squeezed and thermal light respectively in the input state. Pure, indistinguishable, SMSV states display a C = 0 and B = U diag(tanh s 1 , . . . , tanh s m )U t , where s i are the squeezing parameters of each ρ i [51]. According to this representation the expression in equation (1) becomes where B n is the submatrix of B obtained from the string n as described at the beginning of the section.
Connection to graph theory. Recently, several works have identified a connection between the GBS apparatus and graph theory [16,18,53]. These studies take advantage of such a relationship to formulate GBS-based algorithms in the context of graph-similarity and graph kernels. The algorithms exploit the fact that the vectors extracted from GBS samples can be considered a feature space for a graph encoded inside the apparatus. In particular, they are strictly correlated to a class of classical graph kernels that count the number of n-matchings, i.e., the perfect matchings of the sub-graph with n links in the original graph encoded inside the GBS. Given A the adjacency matrix of the graph, the number of perfect matchings is proportional to the hafnian of the matrix, thus corresponding to the output probabilities in equations (1) and (3).
Indeed, any symmetric matrix, such as the graph adjacency matrices, can be decomposed accordingly to the Takagi-Autonne factorization as A = U diag(cλ 1 , . . . , cλ m )U t , where λ i are real parameters in the range [0, 1], c is a scaling factor and U is a unitary matrix. This decomposition matches with the expression of the sampling matrix B of SMSV states when λ i = tanh s i . Also squeezed states with very small displacement have a n-photon probability distribution that can be expressed through loop-hafnians.
For example, displaced squeezed states have been investigated in the context of graph similarity, where a small amount of displacement has been employed as a hyper-parameter to enhance the graphs' classification accuracy [17].
Regarding the sub-matrix selected by the sampling process, the configuration n identifies the elements of the sub-matrix A n that represent an induced sub-graph (see figure 2(b)). The nodes of the original graph A corresponding to detectors with zero counting are deleted, together with any edges connecting these nodes to the others. If some elements n i of n are larger than one, i.e. these detectors count more than one photon, A n describes what we call an extended induced sub-graph in which the corresponding nodes and all their connections are duplicated n i times.
It is worth noting that also the permanent has a precise meaning in the context of graphs. Indeed, the matrix on the right-hand side of equation (2) corresponds to the adjacency matrix of a bipartite graph. In other words, the permanent calculation provides the number of perfect matchings for this class of graphs (see figure 2(c)). One may ask whether other sampling processes regulated by permanent calculations, such as the BS and the thermal samplers (see appendix A), could have a relationship with bipartite graphs. The BS output distribution is defined by the permanent of the sub-matrix from the unitary transformation U representing the circuit. It is clear that not all graphs can be represented by a unitary adjacency matrix. Furthermore, in the BS paradigm, the sub-matrix selected by the sampling process depends also on the input state. This implies that the resulting sub-graph could not have the same symmetries and properties as the original encoded in the U matrix. The latter issue can be overcome by using thermal light, where only the output configuration n determines the sub-matrix. However, also for thermal light, the sampling matrix C does not in general represent an adjacency matrix, thus preventing the possibility of encoding any bipartite graphs. In conclusion, GBS devices with squeezed states are the only ones that have a direct connection with graphs (see figure 2(d)).

Feature vector-based validation algorithm
In the following, we illustrate two validation algorithms tailored for GBS. The idea behind our protocols is to exploit the connection between the samples of a genuine GBS and the graph properties encoded in the device.
According to equation (3) the most likely outcomes from GBS are those with the highest hafnians, i.e. the output configurations that identify the sub-graph A n with the largest number of perfect matchings. However, we remind that the calculation of a single hafnian is a #P-complete problem as the counting of the perfect matchings in a graph. Furthermore, estimation of the output probabilities from the quantum devices becomes unfeasible for large system sizes, and thus any protocol should not rely on this ingredient. Then, it is necessary for a successful validation algorithm to exploit quantities that do not depend on the evaluation of the probability of a single n, which would require exponential time for its estimation. Feature vectors. It is possible to extract properties from a graph summarized in the so called feature f. In the GBS-based algorithms the features of the graph are extracted from a coarse-graining of the output configuration states. For instance, the probability to detect configurations { n} with n i = {0, 1} is linked to the number of perfect matchings of the sub-graphs {A n } of A which do not have repetition of nodes and edges. Accordingly, the probability of the set of { n} with two photons in the same output will be connected to the perfect matching in sub-graphs with one repetition of a pair of nodes and edges. The collections of output configurations that identify a family of sub-graphs with a certain number of nodes and edges repetitions are called orbits [17]. Given n the total number of post-selected photons in the output, the orbit O n is defined as the set of the possible index permutations of n. Then, the graph feature vector components f = (f n 1 , f n 2 . . . , f n k ) are identified by the probability of each orbit, defined as In this work we consider the orbit O [ and [2, 2, 1, . . . , 1] respectively. These orbits display the highest probability in the regime in which the number of detected photons n is much lower than the size m of the interferometer. In such a condition the probability to measure many photons in the same output port decreases for large values of m [54]. The overall probability of the chosen orbits can be further optimised by tuning the squeezers parameters to have the photon number distribution peaked around the desired value n of the orbits (see appendix B). The idea is always to choose the set of orbits with the highest probability since they will require a smaller size set of collected samples for their estimation with a given accuracy. The orbit probabilities can be estimated with sufficient accuracy directly from a finite sample of photon counting measurements (see reference [17] and appendix C). This method can be applied in GBS experiments. In numerical simulation, direct sampling of photon counting is a viable approach for deriving orbit probabilities of Gaussian states that can be sampled classically, such as distinguishable SMSV, thermal and coherent states (see appendix A). These states reproduce the scenarios that could occur in the experimental realizations of GBS devices. For example, photon losses turn squeezed light into thermal radiation, while mode-mismatch, such as spectral and temporal distinguishability, breaks the symmetry of boson statistics. Exact estimation of the orbits for indistinguishable SMVS states can be performed by directly calculating all the hafnians, thus requiring evaluation of a large number of complex quantities. A different approach can be employed, based on approximating the orbits probability by a Monte Carlo simulation [55,56]. The outputs n within an orbit are selected uniformly at random and their exact probabilities are calculated. Then, the probability of the whole orbit after N extractions can be approximated by Pr where |O n | is the number of elements in the orbit. The adopted strategies reproduce the experimental conditions in which the orbits probabilities are estimated on a finite number N of samples. The code for generating GBS data included routines from Strawberry Fields [55] and The Walrus [57] Python libraries.
Validation by classification. As a first method for validation, we propose the classification of these different samplers in the space spanned by the three feature vector components identified by the orbits [1, . . . , 1], [2, 1, . . . , 1] and [2, 2, 1, . . . , 1]. In figure 3 we give an insight of our intuition by reporting an example of the distribution of feature vectors for different graphs and sampler types. The colors underline samples from different models such as genuine GBS, distinguishable SMSV states, coherent light, indistinguishable thermal light emitters and distinguishable thermal states. In this simulation we consider 100 optical random circuits with m = 400 modes and m sources set to produce a photon-number distribution centered in n m. In this condition we are in the dilute regime where the orbits with low number of photons in the same output have the highest probability to occur. It is worth noting that in this estimation it is necessary to take into account the occurrence of the orbits in the whole space of GBS, i.e. the Hilbert space associated to the all possible n-photon states that can be generated by the squeezers.
Experimentally, such a method requires the knowledge of the photon-number distribution of the sources. Such requirement is not demanding since the characterization of the Gaussian sources is a standard preliminary procedure in GBS experiments [4,5,13,14,19]. Alternatively, the orbits probability can be estimated by post-selecting samples with different total number n and dividing the occurrence of photon counting belonging to the orbit with a given n by the total number of samples. The data of the classical models were retrieved with such an approach while the GBS orbits were calculated via the Monte Carlo approximation. These simulations show that three orbits are informative to discriminate among different Gaussian samplers until the photon-number distribution is centered in the dilute regime. In particular, they allow to discriminate the effect of noise in real experiments such as photon losses that turn the system toward the thermal samplers, and photon distinguishability which undermines the presence of genuine quantum interference. In these regards, it is worth noting that the thermal light curve lies in the same plane of the GBS data but with somehow a smaller radius. The reason is that thermal radiation displays a non-zero probability to generate an odd number of photons, which is somehow the effect of losses on squeezed vacuum radiation. The distinguishability moves the two curves toward another plane that exhibits higher values of the probability of the orbit [1, 1, . . . , 1]. The physical intuition behind this behavior is that m. Each cloud corresponds to the post-selection of different number of photons n in the outputs. This is equivalent to look at the features of n-node sub-graphs. In yellow we report the thermal sampler case, in pink the distinguishable sampler, in red the distinguishable thermal sampler and in green the coherent light one. GBS data were generated numerically via Monte Carlo approximation of the orbits probabilities. The maximum size achieved for the simulation corresponds to n = 22 for computation time reasons. The data of the other models were extracted from direct sampling of the photon counting. Thermal, coherent and distinguishable thermal samplers display also a non-zero probability to generate odd number of photons. distinguishable particles do not interfere and, consequently, they have a lower probability of bunching. Further details are provided in appendix D.
To prove the effectiveness of feature vectors to validate a genuine GBS device of any size, we train a classifier such as feed-forward neural network with the data reported in figure 4. Experimental details are provided in appendix E. Here the samples correspond to different experiment layouts with number of modes m = n 2 , and the number of post-selected photons varying in n ∈ [4, 6, . . . , 22]. The size of the collected samples was ∼10 5 for the classical Gaussian states that generate a fraction of ∼10 3 − 10 4 output configurations in the orbits under investigation. Such a size of the photon counting allows for a good estimation of the feature vectors at any dimension of the GBS device (see also appendix C). For the GBS data, we performed ∼10 4 Monte Carlo extractions for the orbits probability estimation. The classifier reaches high level of accuracy, greater than 99%. We performed a further study reported in figure 4(b) to check the ability of the network to generalize to GBSs sizes not included in the training stage. Such ability can be exploited in the validation task when trusted GBS samples are not available for that dimension. To this aim we have trained the network with the data of figure 4(a) up to n = 12, 14, 16, and subsequently computed the classification accuracy for the data with n = 18, 20, 22. The latter has been estimated on 100 set of GBS for each n and on 10 independent training. We note that the network generalisation ability works very well as long as the difference between the size of the GBSs samples in the training set and the size of the new data is not too large.
Validation via graph kernels. Other interesting quantities linked to feature vectors are the graph kernels, which can be employed to define a second method for validation. Here we study the linear kernels defined as the scalar product between pairs of feature vectors k = f| g . This method is less demanding in terms of number of measurements since it works even in the case where only samples from a given number n of photons are post-selected at the output. In figure 5(a) we report the distributions of kernels for feature vectors normalized to the three-dimensional orbits space for a given number n of post-selected photons, i.e. f n = f n n f n . We note that kernels from distinguishable SMVS and distinguishable thermal states ( figure 5(a)) display the same Gaussian distribution of the indistinguishable case, but they are centered at different kernel values for any n. Indeed, each histogram in the figure corresponds to the data of figure 3 for the 100 sub-graph identified by n = 14 and n = 16. The coherent light data display the same average but show a larger variance. These differences highlighted in figure 5(a) can be exploited to discriminate the coherent and distinguishable particles hypotheses. To do this, we only require for the optical circuit to be reconfigurable, and perform enough experiments to retrieve the kernel distributions. Note that the number of kernels scales exponentially with the number of experiments, i.e. the number of sampled different graphs.
More precisely, the number of kernels after N experiments is N 2 . Thus, the kernels average and variance

Discussion
In this work, we have presented a new approach to GBS validation that exploits the intrinsic connection between photon counting from specific classes of Gaussian states of light and counting of perfect matchings in undirected graphs. Despite GBS-based algorithms in graph theory still need further studies to clarify their actual effectiveness and advantage with respect to the classical counterparts, the tools introduced in this context turn out to be informative in the framework of GBS experiments verification. We have seen how the feature vectors together with the graph kernels extracted from photon counting indicate the quantum nature of the sampling process. In fact, these quantities are very sensitive to imperfections that could occur in actual experiments, such as photon losses and distinguishability [17]. These two effects drive the device to act more similarly to thermal and distinguishable particles samplers that can be simulated efficiently by classical means. The methods based on graph feature vectors and kernel distributions require a reasonable number of samples due to the coarse-graining of the output space of GBSs. The method based on graph kernels requires fewer experiments with different graphs, in turn requiring the capability to tune the optical circuit U and the squeezing parameters s i . Nowadays, recent experimental results on integrated reconfigurable circuits [19,58,59] and in time-bin encoding [6] enable large tunability and dimension of the matrix U. In addition, squeezing parameters can be tuned by changing the power of the pump laser that generates squeezed light from nonlinear crystals, and by tuning the relative squeezing parameters phases as recently demonstrated in [5].
Further improvements to the approach adopted in this work can be foreseen. For instance, these include exploiting a more extensive orbit set or larger coarse-graining. These modifications could help in the validation of larger-scale instances of GBS. For example, it is possible to observe from figures 3 and 4 that the orbits probabilities tend to zero with larger size due to the increasing dimension of the GBS Hilbert space. A future perspective of such investigation may be the extension in the regime that exploits threshold detectors. This configuration has been adopted to prove quantum advantage, but its connection with graph feature vectors has not been investigated yet.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.

Appendix A. Sampling from Gaussian states
Thermal and coherent light. Not all Gaussian states display the same sampling complexity of indistinguishable squeezed states. For example, in the case of m indistinguishable thermal light emitters, we have that the sampling matrix A in figure 2(a) has B = 0 and C = 0. Then, according to the relationship in (2) the probability to detect n is where n i is the mean photon number associated to each thermal source ρ i in the input and C n the sub-matrix of C. The matrix C can be decomposed as C = U diag(τ i , . . . , τ m )U † with τ i = n i /(1 + n i ) and is a positive semi-definite matrix. In this special case, the permanent can be approximated with classical resources [60]. Furthermore, sampling from a thermal state is a problem in the class BPP NP [10], that includes problems easier requiring lower time resources than those belonging to the #P-complete one. In fact, a way to sample from a thermal state is through the P-representation of the electromagnetic field [61,62], i.e. the expression of the state as superpositions or mixtures of coherent states |α . For classical states of light, such as thermal states, the P-function is a probability distribution, more precisely it is n i . Then, sampling a string n can be simulated by extracting a set of α i from P Th (α i ) and considering the evolution of a coherent state in a linear optics interferometer. The latter transformation maps the state α i in another coherent state β j , according to β j = m i=1 U ij α i . The probability to detect the configuration n in the output, given a set of coherent states in the input, is given by: Such expression is the product of m Poissonian distributions and it can be sampled in polynomial time [10]. Furthermore, equation (A2) can be directly employed to perform classical sampling with coherent state inputs. Distinguishable emitters of thermal and squeezed light. The last possible scenario corresponds to sampling from a set of distinguishable Gaussian states. The evolution can be simulated by independently sampling photons from the photon-number distribution of each Gaussian source, and by considering the single-photon distribution after the interferometer. The distinguishability among photons generated from different sources permits to sample each input mode independently, without the complexity introduced by quantum interference. This is the case of distinguishable SMSV states, and is analogous to the scenario for distinguishable thermal light emitters. Hence, efficient simulation of distinguishable Gaussian states can per performed classically, thus not requiring a quantum processor.

Appendix B. Graphs employed in the numerical simulations
One of the main novelties in the GBS paradigm is the possibility to encode any symmetric matrix in the sampling process. This feature permits to identify relations between photon counting and the graph represented by its symmetric adjacency matrix. In the simulations presented in this work we consider the graphs resulting from A = U diag(c tanh s 1 , . . . , c tanh s m )U t , where the unitary matrix U of the optical circuit and the squeezing parameters {s i } were set as follows. The circuits U were randomly generated from the Haar distribution of unitary matrices. This choice guarantees on one hand to reproduce the most general conditions, and, on the other, to exclude the existence of any symmetry inside the sampling matrix that could undermine the complexity of the problem. For what concerns the squeezing parameters, we set their values to obtain a photon number distribution centered around a given n. The expression of such a distribution is analyzed in details in [51]. Naively, the squeezer values can be re-scaled by looking at the expression of their mean number of emitted photons n i = sinh 2 s i . Such a tuning of the λ i = tanh s i by a global factor c does not modify the adjacency matrix of the sampling process. In fact, the Takagi-Autonne factorization individuates the set of λ i up to a multiplicative constant.
The parameter settings of the classical-simulable Gaussian states were chosen to reproduce distributions as close as possible to the genuine GBS ones. To this end, we consider the evolution through the same circuit U. In addition, the Gaussian sources were set to emit the same average number of photons of the squeezed sources. This implies setting n i = sinh 2 s i .
Note that the methods proposed in this work are effective with any choice of the adjacency matrix. In figure 6 we report the distribution of two-component feature vectors extracted from the orbits [1, . . . , 1] and [2, 1, . . . , 1] for MUTAG graphs [63] encoded in a GBS of 28 optical modes. Each graph in this package represents a chemical compound in which the vertexes are atoms and edges covalent bounds. The squeezer parameters have been re-scaled to maximise the probability of photon emission around n = 4 so that the condition m n 2 is satisfied. Such a rescaling is always allowed by the Takagi-Autonne decomposition. We employ the same set of squeezers in GBSs with Haar-random extracted matrices. It is evident that the method discerns genuine GBS samples from the other light-sources in both cases. The main difference are the shapes and properties of the feature vectors distribution for MUTAG and Haar graphs. As expected, the feature vectors capture the properties and inner structure of the graphs under investigation.

Appendix C. Samples size for features vector estimation
A crucial issue for the effectiveness of the validation methods regards the number of samples needed for a good estimate of the graph features and its scaling with the size of the GBS. An answer is given by the theorem exploited for graph-similarity via GBS feature vectors in reference [17]. This theorem states that the number of samples S necessary to approximate a distribution of D possible outcomes with a probability at most δ to estimate such a distribution with an error equal or greater than ε, i.e. |P D − P e D | 1 ε is where P D and P e D are the theoretical and estimated distribution respectively, and | · | 1 is the L 1 -norm. Let us analyse such a formula in the feature vectors calculation for a given number n of detected photons. The parameter δ is somehow a confidence threshold that we require for the estimation. For example, a choice of δ = 0.05 implies that the resulting number of samples is such that the errors will be less than ε with a confidence of 1 − δ. The ε parameter can be derived by looking at the dispersion of the feature vectors distribution for a given graph family. A very reasonable assumption is to require that the approximation error of the orbit probability is within the typical dispersion of the Haar graphs. In figure 4(a) of the main text, there are evidences that such a dispersion tends to be constant for large values of n. In figure 7(a) we highlight much better this property. The figure shows the trends of the average relative standard deviation of the three orbits for the Haar graphs by increasing the number of detected photons n and optical modes m = n 2 . Then, a reasonable value for the relative errors for GBS features vectors distribution is ε = 0.03, that is the asymptotic value reported in the figure. The last issue regards the scaling of D the number of features, i.e. the number of orbits for a given n, which is the only term that grows exponentially with the number of detected photons. Notwithstanding, even this parameter can be kept constant if some considerations are made. The orbits [1, . . . , 1], [2, 1, . . . , 1], [2, 2, 1, . . . , 1] contain most of the possible output configurations of the Hilbert space associated to GBS as long as m n. For example, they always asymptotically cover a fraction of the space ∼92.5% for m = n 2 and ∼98% for m = 2n 2 (see figure 7(b)). Furthermore, the photon number distribution can be centred around n by tuning the squeezing parameters as we have illustrated in the main text. This means that the number of relevant features is always less than the total number D. In our case, it is reasonable to consider D ∼ 3 for any size of the GBS if the condition m n is satisfied. Following the above discussion, the corresponding sample size with δ = 0.05, ε = 0.03 and D = 3 is S ∼ 9250, and it is independent from n. The feature vectors of classical light source in the main text have been extracted from samples with an amount of ∼10 4 events in the orbits of the given n, that is in accordance with the estimate of S. This choice has allowed to discriminate the features of the GBS and classical models at any dimension. Additionally, a constant number of S ∼ 10 4 samples with n photons, can be generated by an actual GBS device, given that the necessary value of S does not scale with the system size.

Appendix D. Effects of noise in real experiments
The main sources of noise that undermine the complexity of sampling from quantum states of light are photon losses and photon partial distinguishability.
Photon losses modify the photon number statistics while the interference among photons is preserved until the photons emitters are perfectly indistinguishable. More precisely, losses turn squeezed light toward thermal radiation (see references [46,47,49,50,64]). For input thermal states, despite the presence of interference the different photon number statistics allows to build efficient classical samplers. The effects of losses on the feature vectors can be observed in figure 3 of the main text. Since losses increase the probability to detect odd number of photons, then the radius of the blue curve will become smaller and smaller toward the curve of the thermal light.
Photon distinguishability among the squeezers does not affect the photon number distribution, but it undermines the multi-particle interference in the sampling process. This effect is also responsible for a change in the relative probability of photon-bunching, namely the probability to detect more than one photon in the same port. In figure 3 we observe that distinguishable samplers curves lie on another plane with respect to indistinguishable squeezers, coherent and thermal light emitters. Then, the lack of interference results in a changing of the plane in the three orbits space.
To directly assess our validation method in the presence of partial photon distinguishability, in the following, we provide a numerical simulation regarding the transition from genuine GBS to the distinguishable SMSV sampler by varying the level of distinguishability among the photons. We employ the approach first introduced in references [27,38] for BS, and subsequently extended for arbitrary input in reference [37]. In particular, we define the indistinguishability among photons through the overlap η ij = | ψ i |ψ j | 2 between pairs of SMSV modes at the input. In these simulations, we consider the scenario where all pairs of sources present the same pairwise overlap η. The simulation of partial distinguishability has been carried out for a small GBS device with 8 modes, 8 squeezers and 4 detected photons. This scenario is quite far from the conditions of the simulated data in the main text in which m n. We expect that in this case the feature vectors made by the three orbits [1, 1, 1, 1], [2,1,1], [2,2] and their kernels display slight differences in their distributions from the results of the main text. Despite such different behaviours, these quantities are always informative to discern the presence of partial distinguishability, as shown in figure 8. We have evidence from the simulations carried out in the main text for high-dimensional GBS devices that the separation between feature vectors and kernels of genuine GBS and distinguishable samplers is larger than the separation in the data of figure 8. Then, we expect that these validation methods will be effective at capturing sources of noise such as residual distinguishability at any dimension of the device.

Appendix E. Neural network for data classification
In this section we describe the experimental settings for the validation task presented in the main text. The dataset is made of 100 feature vectors corresponding to samples of number of photons n ranging from 4 to 22 and scattered by optical circuits with m = n 2 optical modes (see figure 4). The feature vectors are labelled by two classes, distinguishing three-dimensional features vectors extracted from genuine GBS samples, from the other samplers (see appendix A). To solve the validation task, we employ a multi-layer perceptron (MLP) binary classifier. The architecture is composed of two stacked linear layers with ReLU activations, and a batch normalization layer [65] to improve the gradient flow in the backward pass. A final linear layer with a sigmoid activation is added to output the corresponding class prediction. The overall model is trained with a binary cross entropy loss. The MLP converged after 20 epochs, reaching an accuracy score of 99% over the test set. The model is implemented using PyTorch [66] and the code was run on a computer with a GPU NVIDIA GeForce GTX 1060 with 3GB of RAM.