Experimental generalized quantum suppression law in Sylvester interferometers

Photonic interference is a key quantum resource for optical quantum computation, and in particular for so-called boson sampling machines. In interferometers with certain symmetries, genuine multiphoton quantum interference effectively suppresses certain sets of events, as in the original Hong-Ou-Mandel effect. Recently, it was shown that some classical and semi-classical models could be ruled out by identifying such suppressions in Fourier interferometers. Here we propose a suppression law suitable for random-input experiments in multimode Sylvester interferometers, and verify it experimentally using 4- and 8-mode integrated interferometers. The observed suppression is stronger than what is observed in Fourier interferometers of the same size, and could be relevant to certification of boson sampling machines and other experiments relying on bosonic interference.

Photonic interference is a key quantum resource for optical quantum computation, and in particular for so-called boson sampling machines. In interferometers with certain symmetries, genuine multiphoton quantum interference effectively suppresses certain sets of events, as in the original Hong-Ou-Mandel effect. Recently, it was shown that some classical and semi-classical models could be ruled out by identifying such suppressions in Fourier interferometers. Here we propose a suppression law suitable for random-input experiments in multimode Sylvester interferometers, and verify it experimentally using 4-and 8-mode integrated interferometers. The observed suppression is stronger than what is observed in Fourier interferometers of the same size, and could be relevant to certification of boson sampling machines and other experiments relying on bosonic interference.
Introduction. -Scalable, general-purpose quantum computers, once developed, will be able to solve problems thought to be intractable for ordinary computers. Given the significant technological challenges involved, other nearer-term goals for the field were suggested, such as the creation of quantum machines able to beat classical computers in particular computational tasks. One such proposal which has drawn much interest is boson sampling [1], which relies on multiphoton interference in a random linear interferometer, and whose output statistics are thought to be hard to sample from classically. Due to bosonic interference, the output of those experiments is distributed according to the permanents of complex matrices specifying the interferometer's design, and the permanent is a function that is notoriously hard to calculate [1][2][3]. This possible path towards a demonstration of quantum computational supremacy has resulted in first experimental implementations of such boson sampling computers [4][5][6][7][8][9][10][11][12][13][14].
To investigate the factors behind the computational complexity of these devices, different kinds of input states have been considered. It is known that inputs consisting of distinguishable photons, coherent states, and antisymmetric (fermionic-like) states all result in classically simulable behavior. On the other hand, photon-added coherent states [15] and quantum superpositions of coherent states (cat states) [16] have been claimed to yield hardto-simulate outputs. Partially distinguishable photons seem to yield an intermediate regime deserving further investigation [17][18][19]. The effect of losses on simulation complexity was also considered [20,21]. Moreover, specific semi-classical states able to reproduce some collective interference effects while being efficiently computable have been identified [22].
Other implementations of boson sampling devices serve to improve performance, or were used to discuss how to certify the correct functioning of the device. So-called scattershot boson sampling devices [12,23,24] use simultaneous pumping of several parametric down-conversion sources to result in a random-input version of the original problem. The main advantage is a greatly enhanced generation rate, compared to a single-input implementations using the same sources. While complete certification of large boson sampling devices may turn out to be impossible, a number of proposals for partial certification were made, capable of comparing the experimental outcomes against some physically motivated error models [9,10,[25][26][27][28][29][30][31].
As genuine many-particle interference is required for boson sampling, efforts have been directed at providing stronger evidence of this phenomenon. The simplest such demonstration is the Hong-Ou-Mandel (HOM) effect [32]. Generalisations of this effect have been proposed, to investigate signatures of interference for specific inputoutput combinations of Fourier [22,33,34] and Sylvester [35] interferometers. These highly-symmetric transformations provide a rich landscape for multi-particle interference to happen, which we investigate here.
In this Letter we discuss and experimentally demonstrate a novel zero-transmission law for Sylvester interferometers. As we will see, indistinguishable photons interfere in these devices so that a certain fraction of all input-output combinations are suppressed. This suggests they may be helpful in identifying multi-photon interference in random-input, scattershot boson sampling experiments.
The associated unitary transformation is a rescaled When p = 1 we have the simplest Sylvester unitary, describing a symmetric 50:50 beam splitter. The celebrated Hong-Ou-Mandel effect [32] is a first example of suppression law, as a transition between photons in input modes (1, 2) and output modes (1,2) is strictly suppressed. In [35], more general input states were considered, consisting of n = 2 q photons, described by a list of mode occupation numbers (1 + nc, . . . , n + nc) with 0 ≤ c ≤ 2 k − 1, injected in a 2 k+q -dimensional Sylvester interferometer. It was proven that all outputs whose bitwise sum of binary-represented mode occupation numbers is not zero are strictly suppressed. This suppression law only identifies a small subset of all suppressed input-output combinations.
We now introduce a generalized law that identifies a larger number of suppressed input-output combinations in Sylvester interferometers. A n-photon state |r may be represented by a mode assignment listr = (r 1 ,r 2 , . . . ,r n ), specifying which modes are occupied. It will be useful to represent this state also as a n×p binary matrix R, whose i-th row is the binary representation of r i − 1. Let us call N A (R) the matrix obtained by negating the columns of R specified by a list of indices A. The input-output combinations |r → |s and |s → |r are suppressed if some A exists for which the following two conditions are met [36]: where ⊕ denotes the bitwise sum, and S is the binary matrix representation of state |s . In [37] we give a proof of this law (in Sec. I B), as well as a calculation of the asymptotic behavior for the fraction of suppressed events these criteria identify (in Sec. I C). Note that similar  criteria were independently reported in Ref. [38] while this letter was under completion. For the experimental implementations we report on below, our criteria identify all suppressed events, which are more numerous than the equivalent result for Fourier matrices [33] with the same size (See Sec. I D of [37] for more details).
Experimental generalized suppression law. -The generalized suppression law has been tested experimentally in 4-mode and 8-mode Sylvester interferometers, implemented by exploiting a 3D architecture enabled by femtosecond laser writing [34] (see Fig. 1). We performed a two-photon scattershot boson sampling experiment [12] to verify the suppression law, feeding each input port of the 4-mode chip with one heralded photon from a different PDC pair. The output events corresponding to two-photon injection were then post-selected via fourfold coincidence measurements (two heralding detectors and two detectors at the output of the device) for all the 4 2 = 6 input-output combinations. The experimental setup adopted for the scattershot approach is shown in Fig. 1. Further details on the generation and detection are reported in Sections II and III of [37].
The observed two-photon output statistics are shown in Fig. 2, where one can recognize the pattern of peaks and dips predicted by the Sylvester matrix (U 4 S ). An effective figure of merit to estimate if the observed data are compatible with those expected from a fully interfering multiphoton source is the degree of violation ν = N forbidden /N events [33,34], i.e. the ratio between the number of observed events in suppressed input-output combinations and the total number of events. Evaluating the observed violation permits to rule out alternative hypotheses on the nature of the injected state. The simplest alternative hypothesis is that the n photons are distinguishable. A more elaborate alternative model is the Mean Field (MF) state [33,39,40], defined as a single-particle state whose wavefunction is macroscopically spread over a set of inputs A: |ψ A MF = n −1/2 r∈A e iθr |j r , where |j r identifies a single-photon state in mode j r , and phases θ r are randomly chosen from a uniform distribution. Summing the output statistics of n states of this form reproduces some macroscopic interference effects [10] of n indistinguishable photons injected in modes A. Note that suppression laws are only partially fulfilled by MF states statistics, which confirms the diagnostic power of these tools. In particular, it is easy to check that the expected degree of violation for distinguishable particles and MF states in U 4 S are respectively 0.66 and 0.4 (considering only collision-free two-photon inputs and outputs). The overall experimental violation ν, obtained by summing all the measured events over all possible input-output combinations, is found to be ν (4) = 0.238 ± 0.003. The measured value, below the two thresholds by respectively 141 (distinguishable) and 54 (MF) standard deviations, thus unambiguously excludes both these hypotheses (Fig. 2).
The suppression law has been experimentally verified also using a Sylvester 8-mode chip, where the full set of  8 2 = 28 two-photon input states were independently investigated. In Fig. 3 we report the full set of 28 × 28 experimental probabilities, retrieved from the HOM visibilities measured for all no-collision input-output combinations. In this case, one can show that the theoretical degrees of violation for distinguishable particles and MF states are respectively 0.57 and 0.31. The mean value of the violations isν (8) = 0.115 ± 0.002, well below the theoretical bounds predicted for MF and distinguishable particles, thus ruling out these alternative hypotheses.
Conclusions and discussion. -We introduced and verified experimentally a novel suppression law pertinent to Sylvester interferometers with inputs of indistinguishable photons. The choice of this unitary transformation ensures a significant fraction of suppressed inputoutput combinations, even for a larger number of photons and modes. For example, for n = 6 and m = 16 our law predicts the suppression of ∼ 10% of the output events, compared to a total fraction of forbidden events of ∼ 37% (computed numerically), against just ∼ 2.3% for the Fourier matrix. This suppression can be used to certify genuine multiphoton interference in scattershot boson sampling experiments, where the full set of possible states is employed. Indeed, random-input generation has been recognized as a promising approach to greatly enhance the event rate in PDC-based scattershot boson sampling experiments. We have shown that the suppression law is able to rule out alternative models which show coarse-grained interference effects, which will be helpful in testing future boson sampling experiments.

I. SUPPRESSION LAWS FOR SYLVESTER INTERFEROMETERS
In this Section we discuss in detail the suppression law for Sylvester interferometers adopted in the main text.

A. Preliminary definitions
Let us begin by defining some notation used throughout this Section. We are interested in the transition amplitudes between states of n photons in m modes, and restrict ourselves to the case n ≤ m. We denote such states as |r := |r 1 , . . . , r m , where r k is the number of photons in the kth mode and m k=1 r k = n. The set of all such states is denoted by G n,m . We refer to the particular states where all r i s take only values 0 or 1 as collision-free, and the set of all collision-free states is denoted by Q n,m . It is easy to see that |G n,m | = m+n−1 n and |Q n,m | = m n . Let us now define two useful alternative representations for such states. Definition 1. (MAL and BM representations). Let |r = |r 1 , . . . , r m ∈ G n,m . We can define the following two representations of this state.
• Binary matrix (BM): States with m = 2 p , for some positive integer p, can also be represented as an n × p binary matrix R as follows. For a given MALr := (r 1 , . . . ,r n ) describing the state, the ith row of R corresponds to the binary representation ofr i − 1 (padded with zeros on the left so it has length p).
Since photons are indistinguishable, the ordering of elements in the MAL is meaningless, and we conventionally choose the list to be in increasing order. Correspondingly, the ordering of the rows of two binary matrices is irrelevant, i.e. if R and R are related only by a permutation of the rows, they represent the same physical state, and this we denote as R ∼ R . (S1) The following definitions are also convenient shorthands used throughout this Section.
Definition 2. Let R be the n×p BM representation of some state |r . Let A be some subset of the columns of R. We denote by N A (R) the matrix obtained by flipping each bit in the columns of R specified in A. Ifr is the MAL associated with R, we denote by N A (r) the MAL associated with N A (R).
Example 2. For m = 8 and n = 4, consider the state |r = |1, 1, 1, 1, 0, 0, 0, 0 , corresponding MALr = (1, 2, 3, 4) and BM representation Then we have the following possibilities when A consists of a single element since they are related to each other by permutations of their rows, whereas N {1} (R) represents a different state. The set A can also contain more than a single element. For example: and so on.
with H(m) defined recursively as for each positive integer p, and with H(1) := 1. We refer to U S (m) as normalized Sylvester matrix and to H(m) as Sylvester matrix. The m dependence of U S and H will be omitted when clear from the context.
An analytic expression for the (i, j) element of a Sylvester matrix can be given in the form: where i B and j B are the binary representations of i − 1 and j − 1 respectively, and is the bitwise dot product.

B. Suppression law for Sylvester matrices
In this Section we present a test which generalizes that of [1], predicting a higher fraction of suppressed pairs. This test can be assessed with a computational cost increasing only polynomially in m and n. A similar test was proposed in [2], using a different formalism.
Let us begin with the following two straightforward Lemmas, the proof of which we leave as an exercise for the reader.
Lemma 1. Let S n be the set of permutations of {1, . . . , n}, and let τ ∈ S n be a permutation different from the identity such that τ 2 = 1. Then we can uniquely associate to each σ ∈ S n another (different) permutation σ τ ≡ τ • σ, where • denotes the composition of permutations.
Lemma 2. Letr, R and N A be as in Definitions 1 and 2. Then N A (r) =r if and only if N A (R) ∼ R. This, in turn, happens if and only if there is a permutation τ ∈ S n such that N A (R) = R τ , where R τ is obtained from R by applying the permutation τ to the rows of R. Now, if τ is such a permutation, we have . all columns of R in A have an equal number of 1s and 0s, 5. all columns of R not in A have an even number of 1s and an even number of 0s.
We are now ready to state the main result of the Section: Theorem 1. Let |r and |s be two states of n particles in m = 2 p modes, with corresponding MAL representations r ands, and BM representations R and S. If there is a subset A of the columns of R such that then the transition from |r to |s (and consequently also that from |s to |r ), when transversing the Sylvester interferometer, is suppressed.
Proof. The transition amplitude from |r to |s can be written as where Per(U ) denotes the permanent of U . In the equation above, U r,s is the matrix defined element-wise as where, recall, H(2 p ) is the Sylvester matrix (cf. Definition 3). Using Eqs. (S6), (S8), and (S9), and denoting M i the ith row of matrix M , we obtain where B C denotes the bitwise dot product between vectors B and C, defined as B C := p α=1 B α C α , D is a constant factor, and we defined (S11) The actual value of E(σ) is unimportant here, only its parity matters. Since we are interested in whether A = 0, we will ignore the constant factor D in Eq. (S10). Clearly, for A to vanish, we need exactly half of the permutations to be such that (−1) E(σ) = 1. A necessary and sufficient condition for this to hold is if, for each permutation σ, we can uniquely assign another permutation σ such that E(σ ) = 1 ⊕ E(σ). From Lemmas 1 and 2, we know that if condition (S7a) holds we can uniquely associate to each σ another permutation σ τ ≡ τ • σ, where τ is the permutation such that N A (R) = R τ . Using σ τ in Eq. (S11) we have Using now the explicit expression for N A (R) and Lemma 2, we have Inserting these into Eq. (S12), we obtain where in the last step we used Eq. (S7b). Using this last result into Eq. (S10) we conclude that which proves that the input-output pair (|r , |s ) is suppressed. While Eq. (S7b) gives a sufficient condition for an input-output pair to be suppressed, it is not necessary. For most input states, not all suppressed outputs satisfy Theorem 1. In the next Section we give estimates of the which fraction of all states our test identifies as suppressed.

C. Estimates on fraction of suppressed states
In this Section, we give estimates on the fraction of suppressed input-output pairs identified by the conditions of Theorem 1, particularly focusing on upper bounds and asymptotic limits (in the number of modes m and photons n). Throughout this Section, we are still restricted to m = 2 p for some integer p, which means all binary matrices are n × p, and to n ≤ m. The fractions we obtain consider mainly the set of all possible states, G n,m , which has |G n,m | = m+n−1 n elements. At the end of this Section we discuss the applicability of our results to the restriction of no-collision states.
We begin by restating the two conditions of Theorem 1 informally, as they would be used in a test. For simplicity, we refer to the states |s and |t of the Theorem as input and output states, respectively, although any conclusions can be extended to the case where the roles of input and output are reversed. With this in mind, we note that Condition S7a concerns only inputs, and we restate it as: Condition I: For a given input with BM representation R, check whether R has any subsets of columns A such that negating those columns of R results in R up to a permutation of the rows (i.e. N A (R) ∼ R). If Condition I finds no such A, the test fails to identify any suppressed transitions. Also, as a consequence of Lemma 2, the test only works if n is even. Assuming that Condition I yielded some non-empty A, we can test for Condition S7b, which is a test only on the outputs and which we restate as:

Let us call
Condition II: Consider any output with BM representation S and any A ∈ A. If the columns A of S contain an odd number of 1s, the transition from R to S is suppressed.
For simplicity, we begin by estimating how many output states are suppressed given that Condition I identified a nonempty set A for some input. Before that, we need one final definition. We say that the elements of A are i ndependent if none can be replaced by a sequence of the others. That is, if there is no A ∈ A and A 1 , A 2 , . . . , A k ∈ A\{A} such that N A1 (N A2 . . . (N A k (X))) = N A (X) for all binary matrices X.
, to the same effect). Since the binary matrices have p columns, there can be at most p independent elements in A. We are now ready to state the following. Corollary 1. Let A be the set identified by Condition I for some input state, and suppose it contains q independent elements. Then the fraction of outputs in G n,m (i.e. including collision states) that Condition II identifies as suppressed Proof. Suppose initially that there is a single element A ∈ A, say A = {1}. The corresponding suppressed outputs are those whose BM representation contains an odd number of 1s in the first column. These consist of approximately half of all possible states, which can be seen as follows. Let us write the binary matrix S of some such output as where S 1 is its first column and S a matrix of the remaining p − 1 columns. Since all matrices that are equivalent up to a permutation of the rows correspond to the same state, we can assume without loss of generality that the 1s in S 1 occupy the first slots. This means there are only n + 1 possibilities for S 1 , and it is easy to see that n/2 of them satisfy Condition II. Since this holds irrespective of the choice of S , we conclude that (n/2)/(n + 1) = 1/2 + O(1/n) out of all states are suppressed. The argument follows through almost unchanged for any A, even if it spans several columns. Suppose now A has q independent elements. By the previous paragraph, the first element of A, let us call it A 1 , leads to a suppression of approximately (i.e. up to O(1/n)) half of all outputs. The second element, A 2 , also leads to a suppression of approximately half of all outputs-but now there is an overlap with those identified by A 1 . Since the two are independent, approximately half of the elements identified by A 2 have already been identified by A 1 (e.g., approximately half of the matrices with an odd number of 1s in the first column also have an odd number of 1s in the second column). Thus the new suppressions identified by A 2 correspond only to 1/4 + O(1/n) of all states. Each subsequent independent element of A further divides the remaining set of unsuppressed states by half, and we conclude that Condition II in fact identifies 1 − 1/2 q + O(q/n) of all states as suppressed. Since q ≤ p = log(m), this gives the claimed asymptotic behavior.
Corollary 1 shows that any input which satisfies Condition I has at least 1/2 of all outputs suppressed, and this fraction can, in principle, be as high as 1 − 1/m if Condition I identifies p independent elements in A. It is thus essential to identify how many inputs effectively satisfy Condition I in order to determine the overall fraction of suppressed pairs. Indeed, Theorem 1 treats input and outputs asymmetrically, and as a consequence Condition I is much more stringent than Condition II.

Corollary 2.
Only an exponentially-vanishing subset of the inputs is detected by the test of Theorem 1.
Proof. Let us begin by counting how many inputs have a particular element A in their set A. Consider initially that A = {1}. For Condition I to hold in this case, we need half of the elements of the first column of R to be 1 (cf. Lemma 2). Since we are free to rearrange the rows as desired, we can assume that R is ordered as follows where 1 n/2 and 0 n/2 are vectors of n/2 1s and 0s respectively, and R 1 and R 2 are (n/2) × (p − 1) binary matrices. For R to satisfy N {1} (R) ∼ R, we must have R 1 ∼ R 2 , and in fact we can further reorder the rows of R have such that We can now count how many binary matrices satisfy these constraints. Clearly we have no choice over the first column and over R 2 , so we only need count all possibilities for R 1 . Given the form chosen for R above, it is clear that any two choices for R 1 that are equal upon permutation of the rows represent the same state, and should not be counted twice. This leads to a cumbersome combinatorial problem, since we need to tally all possibilities for R 1 up to permutations, but the number of permutations changes depending on whether R 1 has repeated rows. A shortcut to this calculation is to realize that we can formally map R 1 back to a MAL representation, as in Section I A and, subsequently to a quantum state of n/2 photons in 2 p−1 = m/2 modes. Thus, the number of possibilities for R 1 is (n+m)/2−1 n/2 . Consider next another possibility, that A = {1, 2}. The counting in this case is similar to before, but a little trickier, because we need to keep track of how many choices we have for the first two columns such that N {1,2} (R) ∼ R. Let us once more order the matrix as follows: where 1 n/2 and 0 n/2 are defined as before, a 1 and a 2 are two binary vectors of length n/2 and R 1 and R 2 are (n/2) × (p − 2) binary matrices. We want R to be ordered in such a way that N {1,2} (R) ∼ R implies R 1 = R 2 , to reuse part of the previous argument. This immediately implies that a 2 is obtained from a 1 by negating its elements. We know that a 1 and a 2 , together, must contain n/2 1s, due to Lemma 2, but that is already guaranteed by the fact that they are negations of each other 1 . From the freedom of reordering rows, we can assume that the 1s in a 1 occupy the first positions. This leaves (n/2 + 1) possibilities for a 1 . Combining that with the (2n+m)/4−1 n/2 possibilities for R 1 as in the previous paragraph, we get a total of (n/2 + 1) (2n+m)/4−1 n/2 possibilities. How does this compare with the case A = {1}? We can use the following asymptotic expression for the binomial coefficient n which holds when n is both large and much larger than k. By using this expression it is easy to see that, in the limit of both m and n large with n ≤ m, (n/2 + 1) (2n+m)/4−1 n/2 grows exponentially slower than (n+m)/2−1 n/2 , and so we can use the latter as an upper bound. In fact, as we consider sets A comprising of more columns, the constraints tend to become more restrictive and the number of matrices that satisfy them decreases. So we will use (n+m)/2−1 n/2 as an upper bound on the number of states that satisfy Condition I for any A.
We are now ready to give an upper bound on the number of states that satisfy Condition I for some nonempty A. Since there are p columns in the binary matrices, There are 2 p − 1 = m − 1 possible As that can appear in A. By the inclusion-exclusion principle, we would need to sum the number of states that satisfy Condition I for each possible A, then subtract those that have been counted multiple times because they satisfy it for more than one A. It is simpler, however, to use the union bound, which in this case says that the number of states is upper-bounded by (m − 1) (n+m)/2−1 n/2 . Recall now that the total number of states is n+m−1 n . Using an asymptotic formula for the binomial coefficient, it is clear that the fraction of states detected by the test is exponentially small in the limit of large n and m, as claimed.
In Ref. [2], the authors seem to reach a similar conclusion, using a different formalism, for the fraction of suppressed outputs given a specific input (i.e., Corollary 1). However, they do not provide an estimate for the fraction of inputs that satisfy Condition I (i.e., Corollary 2).
So far, we have considered only the full set of states (i.e, including collision states) in the estimates of suppressed fractions, but the restriction to no-collision states is often more useful. For example, no-collision outputs are the only detected outcomes when the experiment is performed using bucket detectors (i.e. that do not distinguish one photon from many). More importantly, experimental implementations typically consider inputs with no more than a single photon per mode. This is also relevant for boson sampling applications, being input state with at most one photon per mode the appropriate choice for its computational hardness. Thus, it would be interesting to obtain versions of Corollaries 1 and 2 where both the suppressed pairs and the set of all states included this restriction. Unfortunately, some pathological instances arise when we try to specialize the previous results in that way. To see that, consider the case where n = m. There, we have a single no-collision state, and it satisfies Condition I. Thus, we conclude that 100% of inputs in that case have suppressed outputs! As we now argue, it is still possible to show a weaker version of Corollary 2 for no-collision inputs.
Consider the regime where m = O n 2 . Experiments are often done in this limit, especially since it seems to be a requirement for the computational hardness of the boson sampling model [3]. It is easy to show that the set of no-collision states is not a negligibly subset of all states in this regime, due to the so-called birthday paradox. To illustrate this suppose m = n 2 holds exactly, in which case the fraction of no-collision states among all states is n 2 n / n 2 +n−1 n . Using Stirling's approximation, one obtains that this tends to 1/e in the limit of large n. Since the set of no-collision states is only polynomially small in the set of all states, a no-collision version of Corollary 2 must still hold-even if all inputs that satisfy Condition I were concentrated in the no-collision subset, they would still be an exponentially small fraction of it. This argument shows that the conclusion of Corollary 2 can be extended to the no-collision case in the limit m = O n 2 , and we leave it as an open question whether it holds in general.
Corollary 2 also has consequences for the application of Theorem 1 as a test for validating boson sampling experiments. As argued in [4], suppressed events in Hadamard matrices (such as the Sylvester or Fourier matrices) could be useful as a way to witness partial photonic indistinguishability. Informally, the idea is that we only have suppressions of certain transitions if the particles are perfectly identical, and so observations of quantumly-suppressed events could be used to estimate the degree of partial distinguishability of the photons. Corollary 2 shows that in Scattershot BosonSampling experiments [5][6][7], where inputs are chosen uniformly at random from all no-collision states, the number of suppressed events detected by Theorem 1 vanishes exponentially. On the other side, when specific input states that satisfy Condition I for many different As are employed, Theorem 1 might provide a favorable scaling. Indeed, we showed that as many as 1 − O(1/m) out of all outcomes can be suppressed, but we leave a formal description of such a test for future work, as well as the question of whether the Sylvester matrix is optimal for this task.

D. Comparison between Sylvester and Fourier matrices
In this Section, we briefly compare the fraction of suppressed input-output pairs for Sylvester and Fourier matrices for a few small sizes (restricting ourselves to no-collision states, as these are experimentally more relevant as discussed previously). Input-output suppressions have been studied for Fourier matrices, for instance, in [8]. We are not aware of any estimates of the sort we made in Section I C for Fourier matrices, but from [8] it seems that the restriction over inputs is more stringent than the one in our Condition I. This suggests that a result similar to Corollary 2 should holds, also limiting the use of Fourier matrices to witness photon distinguishability for arbitrary input states, as discussed in the previous Section. In Supplementary Figure 1 we see how 8-mode Sylvester and Fourier matrices compare in terms of suppressed transitions for 2 and 4 photons, where it is clear that the Sylvester matrix outperforms the Fourier one.
For small values of n and m it is also possible to exactly compute the fraction of suppressed input-output pairs (including those not detected by Theorem 1), which we report in the Supplementary Table 1. This Table shows that the Sylvester matrix indeed seems to perform better than the Fourier matrix for most cases, although there are exceptions. For n = 3 and 7, for example, it is possible to show that there can be no suppressed transitions for Sylvester matrices (this is a consequence of the fact that the permanent of a (2 p − 1) × (2 p − 1) matrix where all elements are ±1, for integer p, is never zero [9]), whereas the Fourier matrix does contain a few suppressions. Note also that, for all cases reported in Supplementary Table 1 where the test of Theorem 1 can be applied, the total number of suppressed pairs for Fourier matrices is not only smaller than the corresponding quantity for Sylvester matrices, but smaller even than the subset of input-output pairs that our test detects.

II. PHOTON GENERATION, MANIPULATION AND DETECTION
Single photons were generated at 785 nm with a type-II parametric down-conversion process in four PDC sources for scattershot configuration, pumping two crystals (2-mm long BBO) with a 392.5 nm wavelength Ti:Sa pulsed laser. Photons are spectrally filtered by means of 3 nm interferential filters and coupled into single-mode fibers. The indistinguishability of the photons is reached by means of a polarization compensation stage and by propagation through delay lines for each path before injection into the interferometer via a single-mode fiber array. After the evolution through the integrated devices, photons are collected via a multimode fiber array. The detection system for the scattershot experiment consists of four single-photon avalanche photodiodes for the 4-mode chip and other four for the heralding photons in the scattershot regime. Single-shot measurements have been performed with a 2-photon state produced by a single BBO crystal and injected in the 8-mode Sylvester interferometer. At the detection stage, eight avalanche photodiodes have been used to collect all output combinations. An 8-channel electronic data acquisition system (ID-800 by IDQuantique) allowed us to detect 2-photon coincidences between all output pairs and 4-photon coincidences (two injected plus two triggered) for all possible input states. LabView and C programs have been used to retrieve the coincidence events associated to all possible output combinations.

III. MODEL OF THE EXPERIMENT
Here we discuss a theoretical model to describe the results of the experiment with the 4-mode device. In addition to the non-perfect unitary transformation, two sources of deviation from the ideal behavior contribute to the output measured pattern: i) partial indistinguishability of the heralded photons, and ii) multi-photon emission from the sources.
As a preliminary step we characterized the parameters of the experimental setup. Typical singles count rates for the different sources are in the range 100−250 kHz, while two-fold coincidences are in the range 10−35 kHz. Additionally, the overall transmission from the delay lines to the output fiber-array are ∼ 0.08 − 0.16 depending on the input-output combination. From these values we estimated the nonlinear gain g of the sources (g ∼ 0.12 for C1, g ∼ 0.115 for C2), the heralding probabilities η T i (in the range ∼ 0.1 − 0.22 for the different sources), and the overall transmission of the injected photon from the generation to the detection stage (∼ 0.01 − 0.02, including detection efficiency).

A. Multi-photon emission
Multi-photon emission arises due to the probabilistic nature of parametric down-conversion (PDC). Indeed, there is a non-zero probability that two pairs are emitted from the same source within the same pulse. Ignoring terms with the emission of three or more pairs, the output state of each source can be approximated as: where g is the nonlinear gain of the source. Note that in our experiment, each PDC crystal corresponds to two different photon-pair sources as shown in Fig. 2 of the main text. Let us consider the situation where an event is recorded by the heralding detectors corresponding to inputs (i, j), in coincidence with an event registered by the detectors placed at output modes (m, n). This event is assigned to the transition from the input combination (i, j) to the output one (m, n). The correct evolution is obtained when the sources on modes (i, j) generate a photon pair, the corresponding heralding detectors click, and two photons are detected on output modes (m, n). Multi-photon emission and non-photon number resolving detectors result in additional patterns that can excite the same set of detectors. These patterns act as noise contributions that cannot be discriminated from the correct evolution. Two different contributions have to be considered. a) Three different sources connected to input ports (i, j, k) emit a photon pair, while only the heralding detectors on modes (i, j) click. Hence, three-photons are effectively produced. If only the detectors on output modes (m, n) click (due to losses or the presence of more than one photon in the one output mode), this process cannot be discriminated from the correct evolution (i, j) → (m, n). b) Only the sources connected to input ports (i, j) generate photons, but one of the two sources produces a double-pair event. This event cannot be discriminated in the heralding process with non-photon number resolving detectors. In this case, two-photons may be injected in the same input mode. Similarly to case a), when only detectors on mode (m, n) click, this process cannot be discriminated from the correct evolution (i, j) → (m, n).

B. Partial photon indistinguishability
Partial distinguishability between the generated heralded photon arises due to spectral correlations between photons belonging to the same pair. Indeed, the two-photon term of a parametric down-conversion source takes the following form: where f (ω 1 , ω 2 ) is the two-photon spectral amplitude. In general, spectral correlations are encoded in the function f (ω 1 , ω 2 ). When PDC sources are adopted as heralded single-photon sources, one of the two photons is detected to certify the presence of the twin photon. Due to the correlations encoded in the two-photon wave packet, the heralded photon will be in general in a mixed spectral state. This will result in a degree of partial distinguishability between the photons emitted by two identical sources. The effective joint density matrix describing the state of two heralded photons can be then approximated as: ρ (2) = p|1, 1 1, 1| + (1 − p)|1 , 1 1 , 1 |, where |1, 1 stands for two indistinguishable photon, and |1 , 1 stands for two distinguishable particles. Here, p is an effective parameter describing the indistinguishability of the two photons. The parameter p can be characterized from the visibility of an Hong-Ou-Mandel interference experiment performed with a 50:50 symmetric beam-splitter. In our case, the measured visibility between photons emitted from the two sources was V (2) = 0.724 ± 0.008. The parameter p can be retrieved from the value of V (2) by taking into account multi-photon emission, leading to p = 0.758 ± 0.008. In Supplementary Figure 2 we report the comparison between the experimental data and the theoretical predictions obtained by the model (discussed below) which takes into account all effects i)-iii). A good agreement is found and confirmed by the variation distances d = 1/2 α |P exp α − P mod α | between the experimental data P exp α and the predictions P mod α . For the reported experiment, the value d averaged over the input states is d = 0.053 ± 0.021. Analogously, in Supplementary Figure 3 we report the results of the comparison between the experimental data with distinguishable photons and the predictions obtained by the model. Again the good agreement is confirmed by the distance d = 0.045 ± 0.016, averaged over the input states.