Classical simulation of photonic linear optics with lost particles

We explore the possibility of efficient classical simulation of linear optics experiments under the effect of particle losses. Specifically, we investigate the canonical boson sampling scenario in which an $n$-particle Fock input state propagates through a linear-optical network and is subsequently measured by particle-number detectors in the $m$ output modes. We examine two models of losses. In the first model a fixed number of particles is lost. We prove that in this scenario the output statistics can be well approximated by an efficient classical simulation, provided that the number of photons that is left grows slower than $\sqrt{n}$. In the second loss model, every time a photon passes through a beamsplitter in the network, it has some probability of being lost. For this model the relevant parameter is $s$, the smallest number of beamsplitters that any photon traverses as it propagates through the network. We prove that it is possible to approximately simulate the output statistics already if $s$ grows logarithmically with $m$, regardless of the geometry of the network. The latter result is obtained by proving that it is always possible to commute $s$ layers of uniform losses to the input of the network regardless of its geometry, which could be a result of independent interest. We believe that our findings put strong limitations on future experimental realizations of quantum computational supremacy proposals based on boson sampling.


I. INTRODUCTION
Quantum computers are expected to offer an advantage in a wide variety of computational problems relative to their classical counterparts.However, experimental limitations mean quantum computers are notoriously hard to build, and to date it remains unclear which technological architecture is the most promising for this purpose.Thus, a full-purpose, large-scale universal quantum computer still seems like a long-term goal for the field.
With this in mind, a recent intermediate milestone was proposed in the form of the quantum computational supremacy paradigm [1,2].This consists of a series of proposed restricted models of quantum computing which are not expected to be universal, but for which there are strong arguments that they outperform classical computers in some computational task.One example of such proposals is boson sampling [3], where the quantum device is restricted to single-photon states, linear optics and photon-number detectors, which we focus on in this work.Other examples include circuits of commuting gates [4], a variant of the one-clean-qubit model [5,6], random quantum circuits [7,8], among others.These results follow a similar reasoning: one identifies some computational problem that is expected to be hard (such as computing the permanent of Gaussian matrices, in the case of boson sampling), posits a hardness conjecture regarding that problem, and shows formally how that conjecture implies that an efficient classical algorithm for simulating the physical system in question would have unexpected or surprising complexity-theoretic consequences.And so, the computational problem that is "solved" by these restricted models is just to simulate their own behavior (according to the predictions of Quantum Mechanics), which a classical computer should not be able to do efficiently.
Shortly after the seminal boson sampling [3] paper was published, several works started investigating the effects of realistic experimental imperfections on the idealized theoretical model.The robustness of boson sampling was analyzed under the effect of partial photon distinguishability [9,10], fabrication imperfections on the linear-optical transformation [11][12][13], losses [14,15], probabilistic sources [16] and so on.On the experimental side, several smallscale implementations of boson sampling have been reported so far [17][18][19][20][21][22][23][24][25][26], with state-of-the-art implementations with up to four photons using near-deterministic quantum dot sources [25,26].Through the development of the field there has been much interaction between the experimental and theoretical communities in order to assess the most important obstacles towards a large-scale implementation of boson sampling.One successful example of such an interaction was the conception of scattershot boson sampling [24,27,28], an alternative approach that allows circumvention of the exponential slowdown incurred due to the use of probabilistic sources.From the complementary perspective, recently developed classical algorithms [29,30] allow simulation of up to 40-photon boson sampling experiments in current supercomputers.
One of the main obstacles that still hinders the scaling of boson sampling, both on the theoretical and experimental sides, is photon loss.In [14], the authors investigated the complexity of lossy boson sampling, and concluded that it remains as hard as standard boson sampling when a constant number of photons is lost.Unfortunately, this falls short of the realistic regime where we expect some fraction of the photons to be lost.Even worse, most current implementations [17][18][19][20][21][22][23][24][25][26] are expected to suffer from exponential losses.The intuition behind this claim is the following.Every time a photon traverses a beamsplitter, it has some probability of being lost, say (1 − η).And so, if it must traverse s beamsplitters in a network, the probability that it arrives at the output should decay as η s .The best known lower bound on the depth of a linear-optical circuit for it to satisfy the complexity requirements laid out in [3] is that it grows as n log n.Thus, the probability that each photon survives the experiment decays exponentially and quickly becomes negligible.
It is then a timely question to ask how much loss a boson sampling device can support before it admits an efficient classical simulation.To our knowledge, the best known bound of this type states that an efficient classical simulation becomes possible when all but log n photons are lost (this was stated without proof in [14], but is a straightforward consequence of the more recent algorithm of [30]).In [15], the authors also investigate a similar question, and show that linear optics becomes classically simulable (via a positive quasiprobability representation) if a fraction of the photons is lost, as long as this is accompanied by a constant dark-count probability per detector.
In this work we propose a procedure to classically simulate certain linear-optical processes in a regime where losses are large.We do this by considering different physically-motivated models of loss.First we assume that exactly n − l out of n photons have been lost, and show that efficient classical simulation is possible whenever l scales slower than √ n.We then consider a more usual loss model, when each photon has some (mode-independent) probability of being lost in the circuit, and so the output state does not have a well-defined number of particles.We show that, also in that case, if the average number of remaining photons is less than √ n an efficient classical simulation is possible.Finally, we consider the most general case where the photons can traverse some linear-optical network of beamsplitters and where the loss probability can depend on the path followed by each photon.In this case we prove that, if the smallest number of lossy elements any input photon has to traverse in its path through the network is s, we can effectively model that network as s rounds of uniform losses followed by a linear-optical channel with an efficient classical description.This leads to an efficient classical algorithm for lossy linear-optical networks of depth greater than C log n, for some suitable constant C.
Our results also have implications outside boson sampling.First, in order to obtain our main result in Theorem 1 we need to find the best approximation, in trace distance, to lossy bosonic states via particle-separable states (i.e.convex combinations of symmetric product pure states).To this end we use techniques that can be useful in the study of entanglement properties of symmetric states [31][32][33].Our results might also be relevant for the design and characterization of complex linear-optical networks [23,34,35].In Theorem 2 we prove that a network where the smallest number of lossy elements in any path between inputs and outputs is s (but that has an otherwise arbitrary geometry) is equivalent to another network with s rounds of mode-independent losses at the input.This proves the intuition expressed previously regarding the exponential losses in current implementations of linear-optical networks, but also formally justifies the approximation of mode-independent losses which is often assumed in the literature [14,15,29,36].
Our paper is organized as follows.In Section II we give some theoretical background that underlies our work.In particular we review how linear-optical processes can be described both in first and second quantization formalisms, how different models of loss can be described mathematically, and some foundations on the boson sampling problem.In Section III we describe the general idea of our simulation algorithm.In Section IV we describe our main results and discuss their consequences, although we defer the technical proofs to Section VI.In Section V we present the relation of our results to other works, with particular emphasis on how they fit in the complexity-theoretic formalism of boson sampling.In Section VI we detail the proofs of the results of Section IV.Finally, in Section VII we conclude with some general remarks and open questions.
Notation: Throughout the paper we use the following notation.Given two positive-valued functions f and g, we write f = o(g) if lim x→∞ f (x)/g(x) = 0 and f = O(g) if lim x→∞ f (x)/g(x) < ∞.Likewise, we say f = ω(g) if f = o(g).Finally, f ≈ g will refer to the situation when lim x→∞ f (x)/g(x) = 1.

A. Description of bosonic states
In this work we consider systems of bosons that can occupy m modes.One natural description of these systems is in the language of second quantization, commonly used in the context of quantum optics [37,38] to describe the boson sampling problem.Here we focus on the complementary description based on first quantization, more natural for the description of the so-called particle entanglement, a concept that will prove useful in what follows.
In the language of first quantization, a system of n particles is described by the Hilbert space S n,m := Sym n (C m ), i.e. the symmetric subspace of H n,m := (C m ) ⊗n , which is the Hilbert space of n distinguishable particles that can occupy m modes.This reflects the fact that bosonic wave-functions are symmetric upon exchange of particles.A basis {|i } m i=1 of the single-particle space C m defines the so-called Dicke basis of S n,m .Elements of this basis are labeled by m-element tuples n = (n 1 , . . ., n m ) of non-negative integers n i satisfying m i=1 n i = n.The Dicke state |n can be defined by |n := N (n)P n,m sym |i , where P n,m sym is a projector onto S n,m (acting on H n,m ) and |i := |i 1 . . .|i n .The relation between n and i is as follows: n k equals the number of times |k appears in |i , and so the basis |n is also called occupation-number basis.Finally, N (n) is a normalization factor [see Eq. (A10)].We also define the shorthand for the standard state which is often assumed to be at the input of a boson sampling device (see Section II D).Additionally, by Ψ n we denote the projector onto |Ψ n .
In our analysis, a predominant role will be played by (bosonic) particle-separable or symmetric separable states.For a fixed number of particles l, they are defined [39,40] as states σ that can be written as convex combinations of pure product bosonic states, i.e.
where {p α } is some probability distribution and |φ α φ α | are states on a single-particle space C m .In what follows we denote the set of symmetric separable n-particle states by Sep (S n,m ).Bosonic n-particle states that cannot be decomposed in this form are called particle-entangled.When we restrict the standard notion of entanglement of distinguishable particles from H n,m to S n,m , it coincides with the above definition of particle entanglement.It is also a resource for quantum sensing [36,41] and necessary for violations of Bell inequalities in the presence of superselection rules [42].
If we remove the constraint on the total number of particles, the corresponding Hilbert space (known as the bosonic Fock space) is the direct sum of the corresponding Hilbert spaces S m := ∞ n=0 S n,m , where S 0,m is the one-dimensional space spanned by the Fock vacuum |0 .The occupation number states |n (also called Fock states in this context) constitute a basis of S m .The notion of particle separability extends naturally to S m : a state σ supported on S m is particle-separable where {p n } is a probability distribution and σ (n) ∈ Sep (S n,m ).From our perspective, particle-separable states will be important as they yield classically-simulable instances of boson sampling (see Section II D).Second quantization constitutes a complementary language for the description of bosonic systems.In this formalism, a central role is played by creation (a † i ) and annihilation (a i ) operators that act on S m and satisfy canonical commutation relations [a i , a † j ] = δ ij .These operators define the number operator on mode |i , ni := a † i a i , which satisfies the relation ni |n = n i |n .Fock states can then be expressed as Every operator acting on S m can be expressed as a polynomial in creation and annihilation operators.This, together FIG. 1.A linear-optical circuit with "boson sampling resources", i.e.Fock input states and number-resolving detectors.
with the commutation relations, leads to the mode-dependent tensor product decomposition where 1 is the Fock space associated to the ith mode.This decomposition leads to the notion of mode separability (and, by negation, mode entanglement), which differs from that of particle separability.In particular, Fock states admit the decomposition |n ≈ ⊗ m i=1 |n i , while being particle-entangled whenever more than two modes are occupied.Conversely, particle-separable states |φ ⊗l are generically mode-entangled.

B. Linear optics in first and second quantization
A lossless linear-optical transformation is encoded by an m × m unitary matrix U .In the first quantization description, this matrix induces the independent evolution of particles as ρ → U ⊗n ρ U ⊗n † , for ρ -a state on S n,m . ( In second quantization, the action of U can be conveniently described as a transformation on creation operators where a † i,in and a † i,out are the operators for the ith mode at the input and output, respectively, of the linear-optical transformation.Transformations described by the above equations give rise to the quantum channel Λ U acting on states defined on S m . Consider now the paradigmatic boson sampling instance, where an n-photon input state Ψ n [see Eq. ( 1)] evolves according to an m-mode linear-optical transformation U and is measured at the end in the photon-number basis, as represented in Fig. 1.One way to obtain a classical simulation of this system is to map it to a regular quantum circuit, and then leverage known results for classical simulation of quantum circuits.The representations defined in Eq. ( 5) and Eq. ( 6) lead to two methods to perform this mapping.
In the first-quantized description arising from Eq. ( 5), each of the n photons gets mapped to a single m-level system, which now labels in which mode that photon is.We represent the corresponding quantum circuit in Fig. 2(a).In order to simulate a bosonic state, the system must be initialized in a fully symmetric state.The linear-optical transformation U is then implemented as the local unitary transformation U ⊗n .This description treats all linearoptical transformations in the same footing, and does not bring much insight e.g.into geometrical properties of a circuit.On the other hand, several properties of the input state are transparent.For example, if one photon is perfectly distinguishable from the others (maybe by some internal degree of freedom such as polarization), the simulation of Fig. 2(a) just leaves that particle out of the symmetrization step.Thus, particle indistinguishability in the physical system translates into entanglement in the first-quantized simulation.Another effect with a natural manifestation in the first-quantized simulation is photon loss, as we discuss in more detail in the next Section.
In the second-quantized description that arises from Eq. ( 6), we have a decomposition according to Eq. ( 4) where each of the m modes is mapped to an n-level system labeling its occupation number.In this case, each 2-mode transformation in the original linear-optical circuit is mapped to a gate acting only on the two corresponding subsystems in FIG. 3. The fixed-loss model.Since the number of lost particles, n − l, is fixed, the effect of losses is easy to describe in the first-quantization mapping by tracing out n − l subsystems.Here the initial state is the bosonic symmetric state, which can be viewed either as the physical state itself, or as the state in the simulation of Fig. 2(a) after the symmetrization procedure.
the simulation, as in Fig. 2(b).This representation has the benefit of preserving geometrical properties of the circuit, and we might try to adapt results that limit the computational power of quantum circuits based e.g. on their depth [43][44][45].

C. Losses in linear optics
Let us now discuss how losses affect linear-optical systems.We survey three models of losses: the fixed-loss model and the beamsplitter loss model, the latter which can be divided in the uniform case (i.e.where all modes are subject to the same loss) and the non-uniform case describing more general (passive) linear-optical networks.
Fixed-loss model-In the fixed-loss model, we have initially n particles and assume that exactly n − l of them are lost.If we further assume that losses are mode-independent, i.e. any photon is equally likely to have been lost, this model has a very simple interpretation in the first-quantization picture.The loss corresponds to tracing out n − l of the n particles, as shown in Fig. 3, Since ρ is symmetric it does not matter which n − l particles we trace over.By standard mathematical properties of trace we can show that these losses commute with any linear-optical unitary transformation, This is important as it implies that it does not matter whether losses occurred before or after the linear-optical transformation.
The fixed-loss model was used in [14] 1 and in the context of quantum metrology in [46].Physically, this model is suitable for cases where number of lost particles can be effectively controlled and accounted for, e.g. in atomic interferometry [47][48][49][50].It is also more convenient for defining lossy boson sampling as a computational problem, since we can restrict our attention to the number of losses observed in a single run of the device with fewer assumptions about the physical process that caused them [14].In the context of optics, however, a more faithful physical model of losses is given by the beamsplitter model [51], which we describe now.
4. The beamsplitter model of losses.On the left we have the photonic mode passing through a lossy element with survival probability η.On the right we have an equivalent circuit where the lossy element is replaced by a beamsplitter with transmission probability √ η, in which case each photon in the input is routed into the unmeasured mode with probability 1 − η.
Uniform beamsplitter loss model-In the beamsplitter loss model, whenever a photon reaches a lossy element, it is lost with some probability (1 − η).For example, a state |1 with a single photon, when passing through a lossy element, leaves in state (1 − η) |0 0| + η |1 1|.We call this the beamsplitter model as it is equivalent to replacing the loss by a beamsplitter with transmissivity √ η, as in Fig. 4.This is the most common way of modeling losses in the quantum optics literature [36,51].If losses are mode-independent they can be conveniently described in the first quantization picture2 : n-particle states are transformed by the channel Λ η , where η l (1−η) n−l n l is the probability that exactly l particles remain.Comparing Eq. ( 7) and Eq. ( 9) we see that the fixed-loss and uniform beamsplitter loss models are closely connected, which we use often in this work.In particular, in Section III A we show that in certain regimes of parameters l and η they are also computationally equivalent from the perspective of complexity theory.
General lossy optical networks-To analyze the case when losses affect each mode differently we turn our attention to general passive linear-optical networks.This provides a convenient mathematical framework to formulate this question, but also corresponds to what is commonly done in experiments.Since arbitrary multimode transformations are not readily available, the simplest way to implement a complex m-mode linear-optical operation is to decompose it as a network (or circuit) of elements acting on a few modes at a time, such as in [34,35].If each of the smaller elements has some loss associated with it, this might lead to a distribution of losses that affects a photon differently depending on which path it takes within the network, and which is determined by the overall geometry of the circuit.
Throughout most of this work we assume that a linear-optical network is composed entirely of 2-mode transformations, i.e. beamsplitters and phase shifters.We further assume, for simplicity, that losses are caused only by beamsplitters, disregarding those associated with phase shifters, detectors, sources and transmission paths between beamsplitters, and furthermore that each beamsplitter induces the same loss probability (1 − η) in each of its input arms.Thus, the building block of a network is the element depicted in Fig. 5, where red elements represent pure losses (as in Fig. 4) and blue collectively represents beamsplitters and any neighboring phase shifters (although we refer to them only as beamsplitters from hereon).From Eq. ( 8) and Eq. ( 9) it follows that a round of uniform losses on m modes commutes with an m-mode passive linear-optical transformation, and in particular this holds for m = 2 as shown in Fig. 5.For a network of arbitrary geometry, however, the analysis is more complicated since it is not The building block of a linear-optical network used throughout the paper.Red elements are loss elements defined as in Fig. 4, while blue are lossless two-mode linear-optical transformations, including beamsplitters and phase shifters.It is not hard to show that a round of uniform losses on m commutes with any linear optics on the same m modes, and here we depict this for m = 2.
immediately obvious to what extent losses can be commuted within the network.We return to this question in Section IV.We will describe the action of a general passive linear-optical network N by a channel Λ N , which can be obtained by composition of the corresponding channels for loss elements and beamsplitters.The explicit description of Λ N in terms of Eq. ( 9) and Eq. ( 5) is possible in principle, but can be very cumbersome.For a more practical description, we direct the interested reader to a formalism that describes it in terms of the action of linear-optical transformations on coherent states [52].

D. Boson sampling
Here we consider the simulation of general lossy linear optics, but the main setup we consider is the boson sampling model of quantum computing [3].The ideal boson sampling instance is defined by three ingredients: (i) preparation of an n-photon, m-mode input state, (ii) application of an m-mode unitary linear-optical transformation U , and (iii) measurement of occupation number in every output mode.This is exactly embodied in the physical setup described in Fig. 1.From the action of U on bosonic operators [cf.Eq. ( 6)], it is easy to show that the transition probability from some input state |n to some output state |p is given by where U n,p is an n × n submatrix of U constructed by repeating n i times the ith column of U , and p j times its jth row [3,53], and Per(A) denotes the permanent of matrix A [54].Throughout this paper we assume that the input is the state |Ψ n from Eq. ( 1), i.e. that the photons initially occupy the first n modes, without loss of generality.
In [3] the authors showed that, modulo certain complexity-theoretic conjectures, it would be impossible for a classical computer to efficiently simulate the output of this idealized process, even in an approximate sense.We defer a more technical discussion of the results of [3] to Section V, when we discuss the complexity-theoretic consequences of our own results.For now it suffices to say that the existence of an efficient classical algorithm capable of simulating a distribution sufficiently close to the ideal boson sampling one would imply a collapse of the polynomial hierarchy to the third level, which is considered a very unlikely outcome in the complexity theory literature.
The results of [3] concern the possibility of classical simulation of boson sampling, but there are several notions of simulation one can use.A strong simulation of a quantum process is a classical algorithm that outputs the probability of any outcome, which may also include marginal probabilities.A weak simulation is a classical algorithm that produces a sample from the same distribution as the quantum device.It is generally regarded that, from the point of view of assessing whether some quantum device outperforms classical computers, the notion of weak simulation is more adequate, as it requires the classical algorithm to perform the same task as the quantum device.
Within the notion of weak simulation we can consider an exact simulator (which produces samples from the exact ideal distribution) or an approximate simulator.Given the ideal distribution p x , an approximate simulator samples from some distribution q x such that for some > 0. In the original boson sampling paper [3], the authors require a classical algorithm to simulate the quantum device only in the approximate weak sense described above.Furthermore, their result also requires that the runtime of the classical algorithm is poly(n, 1/ ).In other words, the classical algorithm should be able to give better approximations at the cost of only polynomial more time.
In our main results of Section IV, by contrast, the classical algorithm is only able to perform weak simulation up to some threshold precision (n).We cannot improve the simulation simply by spending more computational time, but under some conditions the precision of our algorithm improves with the size of the problem.This means it is a different notion of approximation from the one used e.g. in [3], although other recent papers use similar definitions to ours [55].We discuss these distinctions and complexity-theoretic implications of our results in Section V.For a more thorough discussion on this subject, see [56,57].
One last important thing to point out is that there are cases where boson sampling is known to be classically simulable, such as when inputs and measurements are in the Gaussian basis [58], or when there is a combined high rate of losses in the system and dark counts in the detectors [15].One such situation which will be important later on is when photons are perfectly distinguishable (e.g.due to some undetected degree of freedom such as polarization).
In this case the transition probabilities are given by where |U n,p | 2 is obtained from U n,p by taking the absolute value squared of each of its elements.This equation is surprisingly similar to Eq. ( 10), except that now the matrix has only positive elements.This, however, is crucial in enabling an efficient classical simulation.One way to see this is to note that, although the permanent is a hard function in general, for matrices with only positive elements it can be approximated efficiently by a classical algorithm due to Jerrum, Sinclair and Vigoda [59].Alternatively, if one is satisfied with a weak simulation, one can do it simply by sampling the outcomes of the photons one at a time, as they do not interfere.The first-quantization counterpart of this statement is that distinguishable particles correspond, in the simulation of Fig. 2(a), to particles that do not undergo the symmetrization procedure.So, if all photons are distinguishable, the system simply evolves as a number of parallel m-level systems.
Although we do not actually consider partial photon distinguishability, note that the bosons in a particle-separable state such as in Eq. ( 2) can be considered as effectively distinguishable.To see this, first let T be linear-optical transformation that takes a single photon from state |1 into state |φ .Then write That is, any symmetric product state is equivalent to initializing all photons in the same mode and applying some linear-optical transformation.But now note that if all photons start in the same mode, the submatrix constructed in Eq. ( 10) has only n repeated columns, and in this case the permanent collapses simplify to a combinatorial factor multiplied by the product of all elements in that column.This, in turn, becomes identical to same probability defined for distinguishable particles in Eq. ( 12).Thus, we conclude that any bosonic particle-separable state effectively behaves as classical mixture of states of distinguishable particles, which is crucial for our simulation scheme in what follows.In the first-quantization picture, this statement corresponds to the fact that the only n-particle state that is symmetric and where all particles are in the same state is a product state.

III. GENERAL OUTLINE OF APPROXIMATE CLASSICAL SIMULATION
Our classical simulation results all follow from the same general reasoning, which we present in this section.Let us first introduce two definition of distance to measure the quality of our simulations.The first is the total variation distance, which measures the distance between two probability distributions P = {p x } and Q = {q x } and can be written as The total variation distance is the typical figure of merit in the context of boson sampling, as in Eq. (11).The second measure is the trace distance between two quantum states ρ and σ The trace distance can be also defined [60] as the maximum, over all POVMs {M x }, of the total variation distance between the probability distributions D ρ and D σ that arise when one measures M x on states ρ and σ respectively.Therefore, all such probability distributions D ρ and D σ satisfy We are now ready to outline our classical simulation scheme.In all cases, we are interested in simulating a linearoptical protocol (as in Fig. 1) that uses as input some state ρ, corresponding to the action of some lossy channel on the initial state |Ψ n .The simulation is achieved by constructing an alternative state, σ, with the following properties (see Fig. 6) (a (a) FIG. 6.A schematic view of our simulation strategy.The set Sep is the subset of separable states within some state space of interest, and σ is the closest state in Sep to the lossy state ρ.
(i) σ is as a convex combination of particle-separable states (possibly with different numbers of particles): with σ l ∈ Sep (S l,m ), and so σ ∈ Sep (S m ).Every σ l can be further decomposed as σ l = α q l α |φ l,α φ l,α | ⊗l .We also require that the joint probability distribution p l q l α can be efficiently sampled on a classical computer.(ii) The trace distance d tr (σ, ρ) between σ and ρ is small, in a sense to be made precise later.Let D(ρ; Λ) and D(σ; Λ) be the probability distributions obtained by applying some linear-optical channel Λ to ρ and σ, respectively, and measuring the resulting state in the particle-number basis.By Eq. ( 16) we have In other words, simulating any linear-optical channel Λ acting on σ will give a simulation of the corresponding channel acting on ρ, up to some error determined by d tr (σ, ρ).In the following sections Λ might be Λ U , Λ η or Λ N , as appropriate from context (cf.definitions in Section II A).
Our claim now is that sampling from the distribution D(σ; Λ) can be done efficiently on a classical computer.We restrict our attention to weak simulation, i.e. the task of classically producing a sample from the correct distribution.This can be done in three steps.First, we draw a pair (α, l) according to the probability distribution {p l q l α }, which can be done efficiently by assumption.Second, it is easy to simulate the effect of the linear-optical channel Λ on |φ l,α φ l,α | ⊗l .Indeed, in the first-quantization formalism [cf.Eq. ( 5)] any linear-optical channel that we care about will act as U ⊗l , i.e. as a tensor product of single-particle unitaries 3 .Finally, since the particle-number measurements can also be implemented by a product of single-particle measurements, the last step correspond simply to simulating O(n) independent parallel m-level quantum systems, which can be trivially be done on a classical computer.
Combining these observations, it is clear this procedure efficiently samples from a distribution closer than d tr (σ, ρ) in total variation distance to the desired one D(ρ; Λ).In Section IV we obtain explicit expressions for the separable states and their trace distances to the desired state under different physically-motivated loss models, and in Section V we return to the complexity-theoretic consequences of our simulations and further interpretations.

A. Computational equivalence of loss models
Our main results include claims about the loss models described in Section II C.There we described how the models in which losses are mode-independent are mathematically related, namely via the binomial distribution of Eq. ( 9).From that expression it is clear that, in the beamsplitter model with uniform transmission probability η per mode, the average number of lost photons is (1 − η)n.Thus, if we set η = 1 − k/n, we obtain any instance of the fixed-loss model with exactly k losses as the expected outcome in the beamsplitter model.However, that does not suffice to prove that the models are computationally equivalent.To achieve that, we need to show under which conditions one model can simulate the other efficiently, i.e. with only polynomial (in n) overhead.This will allow us to re-purpose results obtained from one model into the other.
Consider two black boxes that simulate lossy n-photon boson sampling experiments.The first, BS, outputs samples according to the beamsplitter model for some uniform loss probability (1 − η) per photon, per mode.The second, FL, outputs samples according to the fixed-loss model with k losses, for some range of values of k.The formal question is whether we can use poly(n) samples produced by one black box to output a single sample from the distribution generated by the other (with high probability), and what are the corresponding asymptotic behaviors of k and η.
To prove that BS can simulate FL note that, for large n, we can use Stirling's approximation to write That is, since η ∈ [0, 1], the probability of observing exactly the expected number of losses is O(1/ √ n).Thus, if we set η = 1 − k/n, we only need O( √ n) samples from BS in order to obtain one sample from FL with high probability.This was used in [14] to extend their hardness claim from the fixed-loss model [i.e. that it is computationally as hard as ideal boson sampling if k = O(1)] carried over to the beamsplitter model in the regime 1 Consider now the reverse question, i.e. when can FL simulate BS?We already saw that, by setting k = (1 − η)n, FL samples from a distribution with the same number of photons as the average of the BS distribution, but that is not enough.We also need to show that the distribution over photon numbers produced by BS, i.e. the distribution of Eq. ( 9), is sufficiently concentrated around its mean.If that holds, then a FL black box that samples from events with O(ηn) transmitted photons can approximately simulate a BS black box by first sampling k according to Eq. ( 9) truncated to a small range of values around the mean, and then outputting a distribution for that value of k.
The number of transmitted photons can be described by a random variable X = n i=1 X i , where each X i is an independent random variable which takes value 0 is the ith photon was lost and 1 otherwise.Clearly, X has the expectation E(X) = ηn.Then, by a standard multiplicative Chernoff bound [61] we can write: We can repeat this argument for Y = n i=1 (1 − X i ), which is the number of lost photons and has expectation value E(Y ) = n(1 − η).The Chernoff bounds for X and Y show that, in the extremal regimes where the photons are either very likely to survive or to be lost (specifically if either n(1 − η) or nη go to 0 sufficiently fast with n), then the number of photons at the end of the circuit concentrates around its average as n increases.This happens, for instance, if the average number of photons lost is constant (as in [14]), or if there are only an average of o( √ n) photons left after the losses (as is the regime of our main results in following sections).

IV. SIMULATION OF LOSSY LINEAR OPTICS
In this Section we describe our main results, namely under which loss regimes linear optics can be simulated classically by approximating the lossy states by with particle-separable states.We defer most technical proofs to Section VI, rather focusing here on the interpretation and consequences of the main theorems.
We begin in Section IV A by considering lossy linear optics in the fixed-loss model described in Section II C.Although this model is less realistic from a physical standpoint, it is mathematically elegant and leads to more straightforward results.
In subsequent sections we focus on lossy linear-optical networks, defined in Section II C. Two common examples of these networks are depicted in Fig. 7 [34,35].These networks model good approximations for experiments based on integrated photonics such as [17][18][19][20]23].
In Section IV B we initially address losses for "uniform" (or "balanced") networks, namely those where every mode is affected by losses in the exact same amount.This is a simpler case where loss elements commute with unitary transformation [cf.Section II C], and so all losses can be assumed to happen at the input.The results in this section rely heavily on the Chernoff bounds from Section III A and the results from the fixed-loss model.
Finally, in Section IV C we justify the use of the uniform loss model for networks with arbitrary geometries, such as Fig. 7(b).To this end we prove that it is possible to "extract" a certain amount of uniform losses from an arbitrary network.The relevant parameter for how much loss can be pulled out of the network is the minimal number of beamsplitters that a photon must traverse between any input and any output.FIG. 7. Two universal decompositions of linear-optical transformations.The building block in these diagrams and all that follow is as defined in Fig. 5.The circuit in (a) does not have perfectly balanced losses because of the first and last modes, but that is a minor effect as the dimension of the unitary increases.The circuit in (b) is very unbalanced, since one mode always traverses a single lossy element, while there is another which traverses roughly m.The green paths indicate one example of a shortest path that a photon can take in the circuit, as will be used in Theorem 2. We caution the reader not to confuse the red elements with phase shifters, as this figure is similar to representations of lossless linear-optics used in the literature.

A. Simulation in the fixed-loss model
Suppose the the input of the bosonic state is |Ψ n (i.e. the first n out of m modes are occupied by a single particle each).Suppose then that we trace n − l out of the n particles, corresponding to losing exactly n − l photons at the input [cf.Section II C].We can write the resulting state as where the sum is over all occupation-number vectors of length |n| = l satisfying n i ≤ 1.We are now interested in finding the separable symmetric state σ * ∈ Sep(S l,n ) that is closest in trace distance to ρ l,n .In other words we seek for σ * satisfying d tr (σ * , ρ l,n ) = ∆ l,n , where ∆ l,n := min As discussed in Section III, if we find σ * such that ∆ l,n is suitably small, we can use σ * to simulate any boson sampling task performed on the state ρ l,n up to some precision that depends on ∆ l,n .The following result establishes precisely the exact dependence of ∆ l,n on n and l.We defer the proof of to Section VI A. Let us just state it for now, to investigate its basic consequences.
Theorem 1 (Approximation of the lossy input state by a particle-separable state).Let ρ l,n be the lossy bosonic state defined in Eq. (20).The trace distance ∆ l,n of ρ l,n to the set of symmetric separable l-particle states Sep(S l,n ) is given by Moreover, a separable state σ * ∈ Sep(S l,n ) that attains ∆ l,n is where Let us look at the asymptotic behavior of ∆ l,n in the limit of large and small losses.Using Stirling's approximation we write log n 2 , which leads to the following immediate result.
Corollary 1.After n − l losses we have Therefore, if all but l = o( √ n) photons are lost, ∆ l,n asymptotically tends to 0. Conversely, if l = ω( √ n), then ∆ l,n tends to 1.
All that remains is to show that it is indeed possible to use σ * efficiently in a classical simulation.But this is quite straightforward.The state σ * is obtained by starting with the state and averaging over all phases ϕ i uniformly in the [0, 2π] interval.A weak simulation then consists of choosing a set of phases ϕ i uniformly at random, then simulating the action of U ⊗l on the |φ ⊗l , together with the final single-qudit computational basis measurements.This is simply the simulation of l parallel m-level systems4 , and can be done efficiently on a classical computer.Combining this with the discussion in Section III we obtain the following result.
Corollary 2. Let D(ρ l,n , Λ U ) be the probability distribution obtained by applying the linear-optical transformation U to state ρ l,n of Eq. ( 20) and measuring in the particle-number basis.There exists a probability distribution D(σ * , U ) that can be classically sampled from efficiently and such that D(ρ l,n ; Λ U ) − D(σ * ; Λ U ) ≤ ∆ l,n , where ∆ l,n is given in Eq. ( 22).Moreover, in the limit l = o(n 1/2 ) we have ∆ l,n ≈ l 2 2n .To our knowledge, the previous best known bound of this type [14] gave an efficient classical simulation when all but l = O(log n) of the photons were lost, since in that regime, the transition probabilities are associated with permanents of log-sized matrices, which can be computed classically in polynomial time.Although this is stated in [14] without proof, it follows straightforward from a recent simulation boson sampling algorithm [30].
As discussed previously, the model where a specific number of particles is lost is of limited practical relevance, since in an experiment this-number is typically a random variable.However, as we show next, insights gained from the analysis of this simplified model are useful to derive classical simulations for the more realistic ones.

B. Uniform losses
Consider now a lossy linear-optical network under the assumption of uniform losses, i.e. every particle in every input can independently be lost with probability 1 − η.As discussed in Section II C, the action of such a loss channel onto the initial n-particle Fock state |Ψ n leads to the state To obtain an approximate classical simulation of the resulting linear-optical process, we choose to approximate ρ η,n by states from Sep (S n ) i.e. convex combinations of symmetric separable states with different numbers of particles, where {p l } is a probability distribution and σ (l) is a symmetric separable state supported on l particles and n modes.
As an ansatz for a good approximation to ρ η,n we take the mixture of optimal l-particle separable states with appropriate binomial weights, where σ (l) * are given in Eq. ( 23).The resulting trace distance between σ η and ρ η,n is ≃ FIG. 8.A lossy linear-optical network that is not balanced, in the sense that different paths within the network suffer different amounts of losses.In these cases, it is not always obvious how many layers of losses can be "extracted" from the circuit.
The proof of the above lemma is given in Appendix A. Combining this result with the results of Section IV A and the arguments of Section III we conclude that it is possible to approximately simulate any boson sampling protocol that uses the lossy state ρ η,n as input, provided that l = ηn = o(n 1/2 ), where l is the average number of photons that are left in the network.On the other hand, if l = ω( √ n), then ∆ (η, n) goes asymptotically to 1.
Corollary 3. Let U be a linear-optical unitary matrix.Let D(ρ η,n ; Λ U ) be the probability distribution obtained by applying Λ U to ρ η,n [cf.Eq. ( 25)] and measuring the resulting state in the particle-number basis.There exists a probability distribution D(σ η ; Λ U ) that can be sampled efficiently on a classical computer such that D(ρ η,n ; Λ U ) − D(σ η ; Λ U ) = ∆ (η, n), where ∆ (η, n) is given in Eq. ( 28) and satisfies ∆ (η, n) Our arguments give an approximate classical weak simulation to D(ρ η,n ; Λ U ) only when η = o(n −1/2 ), which at first sight does not seem physically relevant.Usually, lossy devices (such as sources, beamsplitters or detectors) have loss probabilities that are constant, rather than decreasing with n.However, as we discuss in the next section, standard practical realizations of linear-optical networks often lead to dependence of the effective overall transmissivity of the network, η eff , on the size of the system.Since the usual boson sampling paradigm requires networks that increase with the number of photons, this can easily ensure that η eff decreases at least as η eff = o(n −1/2 ).

C. Non-uniform losses
The effect of losses on linear-optical protocols has been extensively studied in the literature [14,15,29,46,51].A common assumption in these works is that losses are uniform, as we considered in the previous section.This model can be justified, for example, by assuming that every path that a photon can take through a network is equally lossy.For some geometries of the network this can indeed be the case, such as that shown in Fig. 7(a) [35].Moreover, this is a natural assumption for losses that model e.g.m identical imperfect particle detectors [15,51].
However, if the network has an unbalanced geometry, different paths through it accumulate different amounts losses, as in the case of Fig. 7(b) [34].The analysis of such networks is more complex, as losses affecting only certain modes in general do not commute with linear-optical elements connecting them to different modes (see Fig. 8).Indeed, the main point of [35] is to propose a balanced decomposition of a universal linear-optical network with the goal of making it more loss-resistant.The authors argue that, in a balanced network such as that of Fig. 7(a), losses "only" act as overall losses, i.e. uniformly, whereas in the more commonly used decomposition of Fig. 7(b) losses also behave as non-unitary noise that is harder to mitigate and characterize.
To the best of our knowledge, until now there was no rigorous justification for the phenomenological assumption that losses are uniform, unless the network has a very special geometry such as in Fig. 7(a).The following result remedies this by showing that it is always possible to rewrite a lossy network of arbitrary geometry as a round of uniform losses followed by another (possibly still lossy) linear-optical network.
Theorem 2. Let N be an m-mode linear-optical network constructed out of the two-mode elements of Fig. 5, each inducing the same loss probability (1 − η) in both of its input modes.Let s be the smallest number of beamsplitters traversed by any path connecting an input mode to an output mode in the network.Let Λ N be the quantum channel associated with the entire network N .Then we have the following decomposition of Λ N : where Λ η eff is a channel describing uniform losses with effective transmissivity η eff = η s , and Ñ is a linear-optical network obtained by removing sm loss elements from N .The description of Ñ from N can be obtained efficiently.
A schematic representation of this Theorem is shown in Fig. 9.We present the complete proof of Theorem 2 (including a prescription to obtain Ñ from N ) in Section VI and focus now on its consequences.
A version of Theorem 2 is well-known if one can assume that the network N can be decomposed onto layers of commuting linear-optical elements, each layer covering all modes.Since uniform m-mode losses commute with arbitrary m-mode linear-optical networks [cf.Section II C], whenever a network has a layer of uniform losses it can be commuted out of the network as in Fig. 10.Thus, if a network has s layers of this type they can be extracted layer by layer, leading to a decomposition as in Theorem 2 in a straightforward manner.Theorem 2 strengthens this statement by showing that it holds in more general geometries, as long as every path between an input and an output passes through s loss elements.
Theorem 2 already formalizes the claim, stated in Section I, that losses in current implementations of boson sampling increase exponentially with the depth of the circuit (if one identifies the depth of the circuit with the definition s in Theorem 2), which is a serious scalability issue.Let us now consider the consequences of this result for classical simulability.
First note that, from the efficient description of Ñ , we can efficiently sample from the probability distribution D |φ φ| ⊗l , Λ Ñ (see Section II D and Section III).Furthermore, from Eq. ( 31), we have that is possible to classically sample from a distribution approximating the desired one D (Ψ n ; Λ N ).On the other hand, from Lemma 1 we know that there exist a state , where ∆ (η eff , n) is defined in Eq. (28).Putting this all together leads to the following result.

Corollary 4.
Let N be the m-mode linear-optical network of Theorem 2 and Λ N the corresponding linear-optical channel.Let s be the smallest number of beamsplitters traversed by any path between an input and an output mode.Let D(Ψ n ; Λ N ) be the probability distribution obtained by applying Λ N to the standard boson sampling input state Ψ n and measuring the resulting state in the particle-number basis.There exists a probability distribution D(σ η eff ; Λ Ñ ) that can be sampled efficiently on a classical computer and such that D(Ψ n ; Λ N ) − D(σ η eff ; Λ Ñ ) = ∆ (η eff , n), where ∆ (η, n) is given in Eq. ( 28) and satisfies ∆ (η, n)
It is clear from Corollary 4 that our simulation gives a good approximation whenever there are less than O( √ n) photons left on average.In terms of s, the regime of α √ n photons left corresponds to s = − log n/(2 log η)+log α/ log η.Thus, our results show that boson sampling becomes classically simulable for networks of depth greater C log n, where C is some suitably large constant that depends on η.We should note that a similar scaling could also be drawn from the results described at the end of Section IV A, since the regime where only α log n photons are left on average corresponds to depth s = − log n/ log η + log log n/ log η + log α/ log η.
It is curious to note that our Corollary 4 generates a tension with the conclusions drawn by the authors of [35].There, they argue that balanced networks are better, since unbalanced losses act as non-unitary noise.But here we conclude the opposite: if a network is too balanced, more losses can be uniformly extracted from the network, improving the quality of our simulation of Corollary 4. ≃ FIG. 10.If a linear-optical network has a layer of equal loss elements on all modes, those losses can be "pulled out" of the network, since a round of equal losses on every mode commutes with any linear optics.A linear-optical network where every layer has this property can be modeled exactly by the uniform model of loss.
We conclude this section by pointing out a few ways that the above results can be strengthened.For the first note that, if photons are only inserted in particular input modes, we can replace the network by another effective network where unused inputs are ignored.For example, consider the standard decomposition of Fig. 7(b).One could think that Theorem 2 would not be relevant there since, in principle, s = 1.However, if it is known that photons will only enter the first n modes we can restrict definition of s in Theorem 2 to only take into account paths starting from those inputs, effectively increasing s.This can also be relevant if one does not have control over the input, such as in scattershot boson sampling [24,28].
Another way Theorem 2 can be strengthened is that it can be applied even if losses are not all equal over all beam splitters.As long as every beamsplitter has some nonzero loss probability in each input mode (which may even differ between the two inputs of the same beamsplitter), we can repeat the procedure of Theorem 2 by pulling out s layers of uniform loss, each with loss probability 1 − η * , where η * is the highest among all value of η for all beamsplitters.Finally, we conjecture that Corollary 4 holds even if a network is composed not only of two-mode elements, but linear-optical elements of various sizes, and we leave this as a question for future work.

V. RELATION TO OTHER WORKS AND RELEVANCE FOR BOSON SAMPLING A. Complexity-theoretic considerations
All main simulation theorems contained in sections IV A to IV C have a common feature, in that they allow us to approximate boson sampling with the desired lossy state ρ, by replacing it by another state σ, up to some total variation distance .The precision of the simulation, , is not a controllable parameter, but rather a function of n and possibly other physical parameters such as the transmission probability η.In suitable regimes (for instance, if l = o(n 1/2 ) in Corollary 2), decreases as the problem size n increases.Let us investigate the complexity-theoretic consequences of these results.
In the original paper on boson sampling, [3], the authors define the following computational problem, which they call Gaussian Permanent Estimation: ).Given as input a matrix X of i.i.d.elements taken from the standard (complex) normal distribution, together with bounds , δ > 0, estimate |PerX| 2 to within additive error ± • n! with probability at least 1 − δ over choices of X in poly(n, 1/ , 1/δ) time.
Their main result (Theorem 1.3 of [3]) then can be stated roughly as follows.Let U be an m × m Haar-random matrix, where m = O(n 2 ), and denote by D the probability distribution over outcomes generated by a boson sampling computer, i.e. with probabilities given by Eq. (10).Suppose there is a classical algorithm C that takes as input the unitary matrix U and an error bound > 0 and outputs a sample from some distribution D such that D − D < , in time poly(n, 1/ ).Then it would be possible to solve |GPE| 2 ± in the complexity class BPP NP (for definitions and discussions on these complexity classes we direct the reader to [3] or to the Complexity Zoo [62]).
The authors argue that such a consequence would be very surprising, and so an efficient classical algorithm C is unlikely, but they also take a step further.By positing two additional complexity conjectures, the Permanent-of-Gaussian conjecture and the Permanent Anti-Concentration conjecture, which together state that estimating |Per(X)| 2 for Gaussian X to high precision and with high probability is a #P-hard problem, they show that the existence of an efficient classical algorithm C would imply a collapse of the polynomial hierarchy.The polynomial hierarchy is a hierarchy of complexity classes which, in the complexity theory community, is widely believed to be infinite, and so this results further strengthens the initial claim that boson sampling cannot be efficiently simulated on a classical computer.
By inspecting our results, we see that they do not follow the same definition of efficient classical simulation as used in [3].There, the classical algorithm should be able to produce samples from a distribution that is -close to the ideal one, for arbitrarily small , just by spending poly(1/ ) more computational time.This definition of classical simulation has also recently been discussed in detail in [57], where it is called an -simulation.
One might ask whether the authors of [3] could have used a less stringent definition of simulation, where is some fixed threshold value rather than a controllable parameter that can be made arbitrarily small.This is used, for example, in [55], where the authors apply a similar reasoning used for boson sampling to the model of commuting quantum circuits, concluding that it would be unlikely for a classical computer to be able to efficiently sample from a distribution that is closer than 1/192 in total variation distance to the ideal one.By adapting the argument of [3] one can see that, if there existed a classical algorithm that sampled from a distribution D such that D U − D < for fixed , it would possible be solve, in BPP NP , an easier version of |GPE| 2 ± where δ and are not free parameters, but rather satisfy δ = O( ).That is, there would be a trade-off between the probability that the BPP NP algorithm outputs an estimate for |PerX| 2 for Gaussian X and the quality of that estimate.That would be still an unlikely complexity-theoretic outcome, albeit a less surprising than an efficient classical algorithm C described above.
This discussion does explicitly influence our conclusions, however.Ultimately, our main results are not addressing any complexity conjecture, given that to our knowledge no one has formally conjectured that boson sampling is hard in the limit of large losses or that the probabilities generated by a lossy device are associated with any natural hard problem.At the same time, for any given problem size n our simulations work only up to fixed precision (n) as described in the corresponding theorems.Thus our results do not rule out, technically, that lossy boson sampling has interesting computational capabilities, if one defines the corresponding computational task in terms of the "arbitrarily small " prescription advocated in [3,57].On the other hand our results do place stringent requirements on proposed experimental demonstrations of quantum computational supremacy based on lossy boson sampling.The fact that the precision of our simulations improves with the size of the problems means that, in order for a realistic boson sampling machine to outperform (asymptotically) our simulation, the experimental imperfections (most crucially the loss probability per-mode per-component) must decrease sufficiently fast as experiments move to larger problem size.

B. Relation to other work
Connection to the mean field state-Consider the closest separable state to ρ l,n from Eq. ( 20), namely the state σ * in Theorem 1.This state has appeared previously in the boson sampling literature, where it was called a mean-field state [63].A device that performs a boson sampling experiment using this state as input has been proposed in [63] as a semi-classical approximation to a boson sampling device that would be classically simulable, but that would be able to fool methods capable of distinguishing boson sampling distributions from alternative ones [22,64,65].In particular, the mean-field state has the ability to effectively display more "bunching" effects than distinguishable classical particles, which is usually considered a genuinely bosonic signature.It is interesting then that this state has also appeared in our main results as the closest symmetric separable state to the lossy bosonic state, as answer to a question that is seemingly unrelated to the matter of verification of boson sampling devices.
Quantum de Finetti theorem-It is natural to investigate the connection of our results to quantum de Finetti theorem [66][67][68].In particular, the quantum de Finetti theorem for bosonic systems is concerned with the best approximation (in trace distance) of the state tr n−l (Ψ) by separable product states σ ∈ Sep (S l,m ), where Ψ is an n-particle m-mode pure bosonic state.This is exactly the setting that we consider in the case of the fixed-loss model considered in Theorem 1. Applying directly the bosonic de Finetti [67] In the context of boson sampling we always have m ≥ n and thus the above bound becomes useless.The much stronger result given in Theorem 1 was possible only because it concerns separable approximation to the particular lossy state tr n−l (Ψ n ), where Ψ n is defined in Eq. (1).
Entanglement of symmetric states-Our results can be also linked to studies of entanglement of symmetric states, which recently received a lot attention [31-33, 69, 70].In, particular, due to the difficulty of the general separability problem, many works consider the problem of deciding whether states diagonal in the Dicke basis are entangled [32,33,70].However, little attention has been given to computing entanglement measures or entanglement indicators for such states.The trace distance to the set of symmetric separable states Eq. ( 21) can be used as an indicator of entanglement and as a lower bound the geometric measure of entanglement [71].Thus, our results can be also understood as quantitative description for a class of lossy bosonic states.We believe that tools introduced in this work will be useful in quantitative studies of entanglement of symmetric states diagonal in the Dicke basis.

VI. PROOFS OF THE MAIN RESULTS
In this section we prove the main results of the paper: Theorems 1 and 2. The proofs given below are essentially self-contained, however along the way we use some auxiliary technical statements whose justification we defer to the Appendix.
A. Best separable approximation with respect to trace distance In order to find the best separable approximation to ρ l,n (in trace distance) we prove a few intermediate results.First, in Lemma 2 we show that the symmetries of the state ρ l,n allow us to limit our attention to separable approximations having a particularly simple "twirled" structure, which we describe explicitly in Corollary 5 and Lemma 3.Then, in Lemma 4 we show that the closest symmetric separable state can be chosen as the twirling of a pure product state.Finally, in Lemma 5 we carry out the optimization over pure product seed states which directly yields Theorem 1. Proofs of some of the intermediate results are technical and can be left out during the first reading.
In what follows we denote by LO (n) the group of n × n unitary matrices encoding linear-optical transformations on n modes.
Lemma 2 (Stabilizing subgroups and the structure of the best separable approximation).Let σ * ∈ Sep (S l,n ) be a symmetric l-particle separable state supported on n modes satisfying Let K be some subgroup of LO (n) consisting of unitaries stabilizing the state ρ l,n5 , i.e.
Then the state satisfies d tr (ρ l,n , σ * ) = d tr (ρ l,n , σ * ) = ∆ l,n .The integration in Eq. ( 35) is done with respect to the Haar measure on K.
Proof.First note that Λ K satisfies Λ K (ρ l,n ) = ρ l,n .Moreover, Λ K is a CPTP map and so, by the data-processing inequality [60] we obtain Now note that transformations of the form V ⊗l preserve separable symmetric states.This means that Λ K (Sep (S l,n )) ⊂ Sep (S l,n ), and hence σ * ∈ Sep (S l,n ).From the definition of σ * we thus get d tr (ρ l,n , σ * ) ≥ d tr (ρ l,n , σ * ) which together with Eq. ( 36) concludes the proof.
Remark 1. Inspection of the above proof shows that an analogous result holds if ρ l,n is replaced by arbitrary l-particle symmetric n-mode state ρ ∈ D (S l,n ) and K is replaced by any subgroup of the stabilizer of ρ in LO (n).
The following subgroups of the stabilizer of ρ l,n will prove especially useful in the computation of ∆ l,n and finding σ * .
where V ϕ1,...,ϕn = exp (ii n i=1 ϕ i |i i|) and P(n) is the permutation group of the set [n]. Intuitively speaking, K c represents linear-optical transformations that are diagonal in the Fock basis, while K d represents permutations of the n modes.Using group-theoretic techniques similar to the ones used in [72,73] we can show that, by composing elements from K c and K d , one can obtain the entire stabilizer of ρ l,n in LO (n) (however, in what follows we do not need this result).The twirling operations for K c and K d read Corollary 5.By application of Lemma 2 first to K c and then to K d , we obtain that the closest separable symmetric state to ρ l,n (in trace distance) can be chosen as We now describe the action of the operations of Eq.( 38) on product symmetric states |φ φ| ⊗l .To this end we need to introduce some auxiliary notation.We define the type of the n-mode occupation number string n = (n 1 , . . ., n n ) as the string τ (n) = (τ 1 , . . ., τ n ) consisting of non-increasingly ordered components of n.Two Fock states |n and |n can be mapped to each other by a linear-optical transformations from V π ∈ K d if and only if τ (n) = τ (n ).Just like for occupation-number strings, we define the length of the type τ as |τ | := n i=1 τ i .We use the following notation for multinomial coefficients: For a given type τ we can define the corresponding mixed state where N(τ ) is a normalization constant equal to the cardinality of the set { n | τ (n) = τ }.In particular, for τ 0 = ( l 1, 1, . . ., 1, 0, . . ., 0) we have N(τ 0 ) = n l and ρ τ0 = ρ l,n .For s vector α = (α 1 , . . ., α n ) and an occupation number string n, define |α| 2n := n i=1 |α i | 2ni .Finally, to state our results we also need the following polynomial in α depending on the type τ : Lemma 3 (Structure of the twirled product states).Let |φ = n i=1 α i |i be a pure state.Let K c , K d be as in Eq. (37).Let Λ K be a twirling map defined by Eq. (35).Then we have the following result with the notation defined below Corollary 5.
The proof of the above Lemma is cumbersome, so we present it in Appendix A. The above characterization of the twirled product states allows us to obtain the following important result.Lemma 4 (A seed state can be chosen to be a pure product state).Let σ = k p k |φ k φ k | ⊗l be an arbitrary symmetric separable state on S l,n .Let K c , K d be as in Eq.( 37) and let Λ K be the map defined by Eq. (35).Then the trace distance between Λ K d • Λ Kc (σ) and ρ l,n is given by where for |φ = n i=1 α i |i we have defined q(|φ φ|) := l!m τ0 (α).Therefore, the closest separable symmetric state to ρ l,n (in trace distance) can be chosen as σ * = Λ K d • Λ Kc |φ φ| ⊗l , for a suitable product state |φ φ| ⊗l .
Proof.Let |φ = n i=1 α i |i .We note that Eq.( 41) implies the following convex decomposition where ρ ⊥ l,n is supported on the subspace orthogonal to the support of ρ l,n and Now, for an arbitrary separable symmetric state σ where q = k p k q k , q k = q(|φ k φ k |) and again ρ l,n is supported on the subspace orthogonal to the support of ρ l,n .Now, using the explicit formula for the trace distance given in Eq. ( 15), we obtain directly Eq. (42).
We now state the result that settles the optimization problem for seed symmetric product states.This together with Lemma 4 and Corollary 5 concludes the proof Theorem 1.
Remark 2. For the optimal pure state |ψ 0 from (47) we have which gives exactly the form of the optimal separable state given in Theorem 1.

B. Extraction of uniform losses from a lossy network
Let us now prove Theorem 2. Our proof is constructive, in the sense that it follows directly from an algorithm to obtain Ñ from N .A central ingredient of our proof is the equivalence shown in Fig. 5, namely that two equal losses at the output modes of a beamsplitter can be commuted to left to act on its input modes.Throughout this section we will say that losses can be "pulled out" from a network if, by repeated application of that identity to individual beamsplitters, they can moved all the way to the input of the circuit.Whenever we pull out one loss element for each input in the network, we say we pulled out one uniform layer of losses.The network Ñ will be obtained by pulling out s layers of losses from N , one at a time.
Recall that, by assumption, N is such that every path from the input to the output has to traverse at least s beamsplitters.Let us temporarily call the length of a path within the network the number of loss elements that a photon would traverse following that path, irrespective of the actual number of beamsplitters.This distinction is initially irrelevant since every beamsplitter has exactly one loss element in each of its input paths, but it becomes important in intermediate stages after we start to commute loss elements around.Let us call a path that connects an input to an output an IO-path.By assumption, the shortest IO-path in N has length s.
We now describe the procedure step by step.
step 0: Label every beamsplitter in N by the shortest path between it and any input.There might be exponentially within the network, but this procedure can be done efficiently if it is performed in a breadth-first manner.First, label all beamsplitters immediately connected to the inputs as 1.Then, move to all beamsplitters that are connected to those and label them.Repeat this procedure until all beamsplitters have been labeled.For any beamsplitter, its label is at most 1 greater than the smaller of the labels of the two that immediately precede it.This procedure can be done in time linear in the number of beamsplitters, and is exemplified in Fig. 11.

FIG. 11.
Example of an unbalanced network on which we can apply the procedure outlined in Section VI B. Each beamsplitter is labeled by the smallest number of loss elements between it and any input.The loss elements marked in blue are those described in step 1.This network only has nearest-neighbor beamsplitters as it is easier to visualize, but at no point in the proof this is required.
12. The four possible configurations of a beamsplitter from set Gj−1, as described in step j in the main text.
Step 1: Consider the set of loss elements that lie between a beamsplitter labeled 1 and an input (e.g. the highlighted ones in Fig. 11).Pull these losses out to the input as an uniform layer, and remove them from the network.This reduces the shortest IO-path by 1. Recompute the labels of all beamsplitters accordingly.
We now apply mathematical induction.Suppose that, on step j − 1, it was possible to pull out losses from every beamsplitter labeled 1 at that stage, and that these losses were pulled out as a uniform layer (which we then removed from the network, relabeling all beamsplitters in the process).Furthermore, suppose that, by this procedure, the length of the shortest IO-path was reduced by 1.Let G j−1 be the set of beamsplitters that were relabeled from 1 to 0 in that step.Then apply the following.
step j: If the shortest IO-path has length 0, terminate the procedure, as s layers of losses have already been pulled out.Otherwise consider the beamsplitters in G j−1 .Each of them must be in one of the configurations of Fig. 12(a-c), since Fig. 12(d) would imply that the shortest IO-path has length 0. For each, if there are two loss elements at their output, commute them through to the two inputs.We now claim that this must be possible for every beamsplitter in G j−1 in every configuration of Fig. 12(a-c).To see this, note that we only ever commute losses through beamsplitters labeled 0. But at the beginning of step j − 1, every beamsplitter in G j−1 was labeled 1, and they were only relabeled from 1 to 0 because losses that preceded them were pulled out.This means that the procedure has not yet touched the losses that follow them along both output modes.
After losses have been commuted through all beamsplitters in G j−1 , note that this would reverse their label from 0 back to 1.But this places them in the configuration they were at the beginning of step j − 1, relative to the part of the network that precedes them.This means that we can pull out a uniform layer of losses, by the inductive hypothesis.After this uniform layer is pulled out and removed from the circuit, relabel all beamsplitter accordingly.Note that a new set of beamsplitters have been relabeled from 1 to 0, all which lie in the immediate future of the beamsplitters in G j−1 .This set is G j , to be used in the next step.By analyzing the configurations of Fig. 12(a-c), it is also easy to see that this step has reduced the length of the shortest IO-path by 1.
Thus we have shown that the inductive hypothesis also holds for j, and so it holds all the way up to step s.This procedure is clearly efficient and pulls out s uniform layers of losses.The network that remains afterwards is Ñ , as in Theorem 2, and is a valid linear-optical network with an efficient description, which completes the proof.
For the particular example of Fig. 11, the network Ñ is shown in Fig. 13.In Appendix B we provide a detailed walkthrough of the steps that lead to Ñ .In Fig. 13 we notice that some segments in the network seem to accumulate FIG. 13.The resulting network after the procedure of Theorem 2 is applied to the network of Fig. 11.Empty boxes correspond to the loss elements that have been removed from the network.Note that the procedure of pulling out the losses may cause the losses that remain to pile up in some locations of the network.
losses in the process of pulling out the uniform layers.This is a curious feature, but does not affect the proof that the procedure works in any way.

VII. DISCUSSION AND OPEN PROBLEMS
In this paper we proposed a method for efficient classical simulation of linear-optical experiments, most notably boson sampling, in the high-loss regime.We showed that, in an experiment containing initially n photons, if all but l = o ( √ n) of them are lost we can approximate the resulting state by a symmetric separable state of l particles.This state has trace distance o l 2 /n to the state of interest, and so a simulation based on it becomes better and better as the size of the system increases.The fact that this state is separable (in the first quantization representation of linear optics) and has a simple description implies a straightforward efficient classical simulation of any linear-optical protocol that uses it as input.We also showed that in the more physically-realistic model of losses, where each photon has some probability (1 − η) of being lost (rather than there being a definite number of photons lost) our classical simulation is efficient whenever η = o(n −1/2 ).Finally, we showed how to perform this simulation in a non-uniform model of loss, i.e. when we have different paths in the linear-optical circuit incurring different losses.We showed that, by assuming that every beamsplitter in a circuit induces some loss (1−η) and that the shortest path that a photon can take through the circuit passes through s beamsplitters, our simulation is efficient whenever η eff = √ mη s = o(n −1/2 ), or alternatively for networks of minimal depth s = O(log n) for some suitably large constant (depending on η).
Our results imply that, in order to demonstrate some computational advantage within the boson sampling paradigm, experiments will likely have to shift to different architectures.One possibility is to improve the original results of [3] to show that boson sampling on linear-optical circuits of logarithmic depth is already hard, for example, because such a circuit can already exhibit sufficient Haar-randomness for the purposes of the proofs of [3].Our results place limitations on the capabilities of log-depth circuits, but do not rule out the possibility that sufficiently shallow circuits would lead to hard boson sampling instances-although it is possible to show, using tensor-network methods, that these circuits would require beamsplitters that act between arbitrarily distant modes [44,75].This at least rules out boson sampling with log-depth networks if one is restricted to the two-dimensional integrated architectures of [17][18][19][20]23].
We leave several open questions for future work, including: (i) In [14] the authors proved that boson sampling remains hard when O(1) photons are lost, and here we proved that it becomes classically simulable when only O( √ n) photons remain.The main problem we leave open is at which point between O( √ n) and n−O(1) losses the transition between these two complexity regimes occurs.
(ii) Can we get better approximations if we allow for particle-separable states that are not symmetric?Alternatively, can we prove that the closest state to our desired state ρ l,n must be symmetric?
(iii) Our results bound the trace distance between the mean-field state and the lossy input state, as in Theorem 1.This also acts as a bound on the total variation distance between the corresponding distributions when both states are input into arbitrary linear-optical transformations.What happens with the total variation distance after a Haar-random unitary U ? Would the distance between the output states after a random U be considerably smaller than the worst-case bound we obtain?Would this change the scaling of our main result and our conclusions?
(iv) We showed that, for a network where the shortest path connecting any input to any output is s, one can remove s layers of uniform losses.But it is possible to decompose an arbitrary linear-optical unitary such that s = 1, e.g. the standard decomposition of Reck et al in Fig. 7(b) [34].In [35], the authors propose a balanced decomposition, as in Fig. 7(a), since there losses behave uniformly and induce less non-unitary noise.What if one instead tried to engineer the unbalanced noise in a helpful way?That is, can we argue that boson sampling must be hard even under the non-unitary noise coming from the unbalanced losses in Fig. 7(b)?If so, that might provide a way to engineer long networks with small s tailored to circumvent our argument.
(v) As discussed in Section V, our simulations get better as the size of the problem increases, but seemingly cannot be made arbitrarily precise at the cost of more computational time.This means we do not have an -simulation as per the definition of [3,57], although our notion of simulation agrees with the one used e.g. in [55].Can our simulation be improved into the more stringent form of an -simulation?Alternatively, do these different notions of simulation actually lead to fundamentally different complexity regimes?
(vii) Is it possible to derive a version of the de Finetti theorem that, for symmetric states diagonal in Dicke basis, yields a bound stronger than the one in Eq. ( 32)?
Note: Upon completion of this work, we became aware of a simultaneous work [75] that seems to arrive at similar results for classical simulation of boson sampling in the case of uniform losses.

FIG. 2 .
FIG.2.Two different quantum circuits that can simulate the system of Fig.1, based on the two ways to represent bosonic particles.(a)(b)

FIG. 9 .
FIG. 9.If s is the smallest number of beam splitters that any path in the network traverses (in this example, s = 2), Theorem 2 allows us to extract s layers of losses from the circuit and gives us an efficient description of the remaining network Ñ .