Simulating all multipartite non-signalling channels via quasiprobabilistic mixtures of local channels in generalised probabilistic theories

Non-signalling quantum channels -- relevant in, e.g., the study of Bell and Einstein-Podolsky-Rosen scenarios -- may be simulated via affine combinations of local operations in bipartite scenarios. Moreover, when these channels correspond to stochastic maps between classical variables, such simulation is possible even in multipartite scenarios. These two results have proven useful when studying the properties of these channels, such as their communication and information processing power, and even when defining measures of the non-classicality of physical phenomena (such as Bell non-classicality and steering). In this paper we show that such useful quasi-stochastic characterizations of channels may be unified and applied to the broader class of multipartite non-signalling channels. Moreover, we show that this holds for non-signalling channels in quantum theory, as well as in a larger family of generalised probabilistic theories. More precisely, we prove that non-signalling channels can always be simulated by affine combinations of corresponding local operations, provided that the underlying physical theory is locally tomographic -- a property that quantum theory satisfies. Our results then can be viewed as a generalisation of Refs.~[Phys. Rev. Lett. 111, 170403] and [Phys. Rev. A 88, 022318 (2013)] to the multipartite scenario for arbitrary tomographically local generalised probabilistic theories (including quantum theory). Our proof technique leverages Hardy's duotensor formalism, highlighting its utility in this line of research.


Introduction
Quantum operations are at the core of communication and information processing tasks, and how well we can perform at the latter may depend on the properties of the quantum operations that we have at hand. One particular set of operations of interest is that of non-signalling quantum channels [1], i.e., those that cannot be used by two distant parties to exchange information in a way that is against the laws of relativity theory. Bipartite non-signalling quantum operations have been extensively studied, specially since they play a central role in Bell [2] and Einstein-Podolsky-Rosen 'steering' [3,4] scenarios, which in turn underpin cryptographic protocols [5,6]. In addition, the simulation of bipartite non-signalling quantum channels via affine combinations of local operations has provided valuable insight on the exploration of the advantage they provide for communication and information processing tasks [7,8].
In this work we investigate no-signalling channels in GPTs. In particular, we prove a useful technical result, namely that multipartite non-signalling channels in locallytomographic GPTs [9] can be simulated by affine combinations of product (local) channels (Theorem 5.1). Our results can be viewed as a generalisation of those of Ref. [7] and of Ref. [8,Lem. 1] to arbitrary tomographically local GPTs: the former applies only to multipartite non-signalling stochastic maps on classical variables, while the latter applies to bipartite non-signalling quantum channels.
Our proofs leverage the convenient duotensor formalism of Ref. [91] with a slight twist based on Ref. [92] which allows us to directly lift the result of Ref. [7] (using a generalisation of Lem. 2 in Ref. [8]) to this more general setting. We believe that this way of lifting structural properties of stochastic maps to properties of channels in arbitrary tomographically local GPTs via the duotensor formalism [91] may be a useful tool in future research.

Generalised Probabilistic Theories: the basics
The framework of Generalised Probabilistic Theories (GPTs) can be used to define arbitrary physical theories. The simplicity of the framework enables various alternative theories to be formulated and explored while allowing at the same time a deep study of the probabilistic and compositional aspects of such theories. It is based on the tenet that a minimal requirement of any physical theory is that it must make probabilistic predictions about the outcomes of experiments. Whilst this is conceptually extremely minimal, the mathematical consequences of this lead to a rich formal structure known as a GPT.
Because physical theories describe predictions about measurement outcomes in experiments, a few elements are necessarily present in all of them. Namely, these theories need to talk about types of systems, possible states for each of them, possible measurement outcomes, transformations, and the operation of discarding a system (see Table 1). In quantum theory, these elements are, respectively, the Hilbert spaces, the density operators on them, positive operators upper-bounded by the identity, completely positive trace-non-increasing (CPTNI) linear maps, and the (partial) trace operation.
Having those elements present, although necessary, is not sufficient to express the full form of a physical theory. Some structure relating them are implied by the way that experiments are performed. Abstractly speaking, a notion of connectivity between those elements must also be present because, in experiments, we perform actions on systems, that is, we subject them to processes, and these processes can happen in parallel (independently) or in sequence. This motivates a notion of compositionality of processes. From this notion of how the experimental processes connect, or compose, a convenient diagrammatic notation can be defined so as to capture the entire structure of the GPTs. We can represent any process by a box, and encode the type of system on which it happens as a labelled input wire at the bottom of it. (Hence, we have also implied that systems are represented by wires.) Additionally, since the type of a system can change after a process, we denote the output type of a process by a labelled wire on top of its box. In this notation, then, a system type S, and a transformation T from a system type A to a system type B, respectively, appear as (1)

Elements of a GPT
Note that a state of a system can be conceptualised as some preparation procedure, which, abstractly speaking, is also a process. Hence we can represent it as a box that has no input wire but has as output the wire corresponding to the type of that system. Similarly, an effect, or measurement outcome, is a box with input wire corresponding to the system where it can be observed, and no output wire. We follow the convention that states and effects are represented by triangular boxes, so a state σ and an effect e of a system S appear as respectively. Because of this, the discarding operation, since it has an input but no output, appears as a special effect in the theory. This effect is sometimes called the deterministic effect and is unique for each system type 1 . In this notation, it is represented by These diagrammatic pieces can be connected when the input/output wire types match. This represents the sequential composition of processes. When processes are instead drawn side by side, we are representing their parallel composition. By connecting boxes, therefore, we can then construct more complex diagrams, i.e. complex processes, such as where we omit the wire labels for simplicity, but it should be clear that only matching types can be connected. When a diagram has no loose wires, they are interpreted as numbers, which in the case of GPTs are the probabilities generated by the theory. For instance, e g σ = Prob(e|g, σ) (5) denotes the probability that the outcome associated to effect e is observed when the system is prepared in state σ and a transformation g is applied to it. Of course, we might need to describe systems that are composed by simpler partsmultipartite systems -so we can emphasize that some system is composite by drawing the wires of its parts side by side When we represent bipartite composite systems by the two wires together, its deterministic effect is represented by the composition of the deterministic effects of its parts: With what we have, we can represent simple experimental processes, composite processes, and probabilities of outcomes in those experiments. To reason about them, we need now a notion of equality of processes, or, in other words, a notion of tomography. We say that two processes are equal if they give the same probabilities in all situations, so In fact, in this paper we will work with a special class of GPTs which satisfy the principle of tomographic locality [9,91]. This means that: that is, in a tomographically local theory we can do process tomography without a side channel. An important type of processes is that of those that are discard-preserving, which means they satisfy the following: In the case of quantum theory, these correspond to the trace-preserving maps. Physicallyrealisable discard-preserving processes in a GPT are known as channels. Discard-preservation also defines a notion of causality for processes [93,94]: a process is said to be causal if it is discarding preserving. This is so because this condition ensures compatibility with relativistic causal structure [95]. A final ingredient in the GPT formalism is the possibility to represent convex mixtures of processes. This stems from the requirement that in an experiment we can always decide to perform f with probability p or g with probability (1 − p), at least, provided that f and g have the same input and output systems. This is introduced through the definition of a sum of processes that distributes over diagrams: Note that this definition implies that we can only sum processes with the same input/output types. From this, since a probability p can be a number (diagram without loose wires) of the theory, we can write to describe convex mixtures of processes. At this point a notion of order can be defined for processes: This order allows us to define discard-non-increasing processes. A process f is said to be discard-non-increasing if and only if Note that in any GPT all (physically-realisable) processes must be discard-nonincreasing; this corresponds to the constraint in quantum theory that processes are trace-nonincreasing.
In particular, this means that for any effect in the theory, there must be another effect such that they sum to the deterministic effect. Note that this is important for the definition of measurements. In quantum theory, for example, the deterministic effect is the trace operation, or multiplication by identity followed by the trace, and it is required that the POVM elements forming a measurement sum to identity, so each of them is less than or equal to the determistic effect.
Since in this work we focus on the class of GPTs that are tomographically local, we can moreover use the particular duotensor notation of Ref. [91]. Next we will present the basics of this notation.

Duotensor basics
Here we present an adaptation to the duotensor formalism where, in addition to the GPT systems of the previous section, we also have classical systems representing measurement outcomes and control systems. In order to distinguish these two kinds of systems, the classical ones will be drawn horizontally. We will also label them by finite sets, Λ: The physical processes transforming between these classical systems are (sub)stochastic maps between these finite sets. We draw these as white boxes, such as: A particularly useful example which we will make use of in this work is the copy map, which we draw as a white dot and is defined by: Note that the copy map satisfies: that is, copying the components of a system is the same as copying the composite system. In contrast to the physical (sub)stochastic maps, we will draw mathematically well defined but unphysical processes as black boxes such as: which, in this case, would be a linear map from Λ to Λ which is not (sub)stochastic, e.g., it may have negative coefficients. Note that in contrast to the approach of Ref. [91], rather than labelling horizontal systems by black and white dots, we instead label the processes as being either black or white. This is equivalent but more convenient for us as, on the one hand, we can interpret the color as representing whether or not a process is physical, and, on the other hand, it takes us to a more standard category-theoretic notation. Indeed, categorically there is no distinction between the horizontal and vertical wires, it is simply a convenient way to label the different objects, at which point it is clear that all of the processes that we draw below live inside the category of real linear maps.
For each system S in the GPT we define a particular minimal informationally-complete state preparation and measurement. We call these the fiducial preparation and fiducial measurement. A state preparation is a box which has a classical input and a GPT output where the classical input controls which state is prepared, whilst a measurement is a box which has a GPT input and a classical output where the classical output encodes the result of the measurement. We can therefore denote the fiducial preparation and fiducial measurement for a system S as: where without loss of generality we take Λ S to index both the fiducial set of states and the fiducial set of effects. Moreover, all of the fiducial states are normalised and the fiducial effects sum to the unit effect, such that: Note that here we follow the convention of Ref. [92] rather than Ref. [91], as the former demands that the fiducial effects form a measurement whilst the latter does not. Note that this does not constitute a loss of generality as a minimal informationally-complete measurement can be shown to exist for any GPT. Now, for each system S, define the fiducial transition matrix by  (22) and note that eq. (21) implies that these fiducial transition matrices are stochastic maps, such that: Now, the fact that the fiducial preparation and measurement are informationallycomplete means that they are invertible linear maps. Importantly, however, these inverses are not typically physical transformations. We therefore denote them as: which can easily be seen using Eq. (25) to be the inverse of the fiducial transition matrix. Hence: The fiducial transition matrix and its inverse (the white and black squares respectively) are known as hopping metrics in the terminology of Hardy. It is also easy to see from these conditions that: which, in particular, means that: Moreover, it is also easy to show that: The key use of all of this for us, is that it allows us to map any GPT channel to a stochastic map and back again as follows: a GPT channel is mapped to a stochastic map via T1 Tn S1 Sn C · · · · · · → T1 Tn S1 Sn and the stochastic map associated to the GPT channel can be mapped back to the GPT channel via T1 Tn S1 Sn C · · · · · · → T1 Tn S1 Sn C · · · · · · · · · · · · = T1 Tn S1 Sn It is clear that the RHS of Eq. (31) is indeed stochastic as it is positive (since it is composed out of physically realisable GPT transformations) and satisfies: T1 Tn S1 Sn C · · · · · · = T1 Tn S1 Sn C · · · · · · = S1 Sn · · · = . . .
where the second equality holds because C is a GPT channel rather than a generic GPT process. Similar arguments imply that if C satisfies certain no-signalling conditions then so to will the associated stochastic map. For example, a bipartite channel B is said to be non-signalling if: from which it is easy to show that the associated stochastic map will also be non-signalling, for example: This straightforwardly generalises to multipartite GPT channels, and also to the case where only some of the no-signalling conditions hold. That is, the non-signalling structure of the channel and of the associated stochastic map are the same.

Geometry of transformations
In this section we present a geometric perspective on some of the processes discussed above, as well as on particular types of channels.

States
First let us start by discussing the geometry of the state space for some system S. Schematically this looks like:

State cone
Normalised vectors Normalised states Subnormalised states Formally, we have some real vector space V S which contains a convex cone of states, T S , which is closed, pointed, and full dimensional, with an intersecting hyperplane which defines the normalised vectors. The intersection of this hyperplane and the state cone defines the normalised state space, Ω S . A subnormalized state s is a vector in the cone such that there exists α ≥ 1 such that αs is normalized. In particular, the convex set of subnormalised states spans the vector space and, moreover, there exists at least one normalized state which is interior to the cone.

Effects
Next let us consider the geometry of the effect space for some system S. Schematically this looks like:

Effect cone
Discarding effect (37) Formally, the effect space of S lives inside the dual of the vector space of states, V * S , and consists of a convex cone of effects, T S , which is closed, pointed and full dimensional. The unique "normalised" effect (the discarding effect) , is the unique linear functional that evaluates to 1 on the intersecting hyperplane defining the normalised states. This must be in the interior of the effect cone such that it is an order unit for the cone. That is, we have that every effect e in the cone can be rescaled to an effect αe, for some α > 0, such that there exists some e in the cone which satisfies αe + e = . The set of subnormalised effects, E S , can be defined as those that satisfy this condition for some α ≥ 1. In particular, this ensures that the convex set of subnormalised effects spans the dual space and that is in the interior of the effect cone.

Physical transformations
Finally, we turn to our main focus which is the geometry of transformations within a tomographically local GPT. Schematically this looks like:

Transformation cone
Discard-preserving linear maps

Discard-nonincreasing transformations
Discard-preserving transformations (38) As we are assuming tomographic locality, the transformations from S to T live inside the vector space of linear maps from V S to V T , which we denote as L(V S , V T ). The geometric picture that we present here is not as standard in the literature as it is for the state and effect cases, and so we now explain how this structure arises.
In this picture we have a convex set of normalised transformations which are defined by the intersection of an affine set (namely, the discard-preserving linear maps) and a convex cone (namely, the cone of transformations, T T S ). We can then view this as a positive cone such that the discard-nonincreasing transformations are those that are "underneath" the discard-preserving transformations in the associated partial order.
As there exists a set of states which span V T and a set of effects which span V * S , then, using the fact that L(V S , This means that any linear map in L(V S , V T ) can be written as With this in mind, we define the following convex subcone K of the cone of transformations T T S which is useful in our analysis: Equipped with this definition, we can express L(V S , V T ) neatly as which means that K spans L(V S , V T ).

Measure-and-prepare transformations and discard-preserving channels
Of particular interest is the set of physical transformations referred to as measure-andprepare, which we denote as MP: recalling that is the discarding effect. Note that since these are physically possible in any GPT, they are a subset 2 of the valid transformations, that is, they live inside the convex cone T T S and, in fact, MP ⊆ K. A measure-and-prepare transformation φ ∈ MP has the additional property of being discard-preserving: We denote the set of discard-preserving linear maps as DP and define it formally as Note that the set of discard-preserving maps forms an affine space (see Appendix A for definitions relating to "affine" concepts), which is easily proven given the definition above. Note that DP may contain non-physical transformations. However, from the above discussion, it does contain the measure-and-prepare transformations. Neatly, we have that MP ⊆ DP ∩ K. Perhaps surprisingly, this containment is not strict, as shown in Lemma 1 (see the Appendix).
The sets DP, MP, and K allow us to get a useful characterization of the discardpreserving linear maps, which we now discuss.

A useful characterization of discard-preserving linear maps
The following theorem characterizes the set of discard-preserving maps in terms of those that are also measure-and-prepare.
where Aff denotes the affine hull operation.
We provide a proof of Theorem 4.1 preceded by a background on convex geometry in Appendix A.

A characterisation of no-signalling GPT channels
Critical to our result is that of Ref. [7]. In the duotensor formalism presented in the previous section, the result of Ref. [7] is: if we have some non-signalling stochastic map S it can be written as an affine combination of product stochastic maps: where n is the number of input/output system, q α ∈ R, α∈A q α = 1, and the s α i are stochastic maps. That is the q α define a quasiprobability distribution q over the set A. We can therefore equivalently draw this as: where the white dot is the copy operation, the quasiprobability distribution q is a black triangle because it is not physically realisable as it can have negative coefficients, and the S i are stochastic maps controlled by the variable A.
The duotensor formalism of Ref. [91], together with the above understanding of the geometry of GPT transformations, allow us to easily lift this result to arbitrary no-signalling channels in arbitrary tomographically local GPTs.
Theorem 5.1. Any non-signalling GPT channel C in a tomographcically local GPT G, can be written as an affine combination of product channels.
Proof. Consider some n-partite non-signalling channel C in a tomographically local GPT: By decomposing the input and output identities using Eq. (29), we obtain: We can then note that: is a non-signalling stochastic map, and hence we can apply the result of Ref. [7] to obtain: where s α i are stochastic maps, q α ∈ R, and α∈A q α = 1. By substituting this back in Eq. (50), we obtain: where It is then easy to check that the x α i are discard preserving: We can then use Thm. 4.1 to write each x α i as an affine combination of GPT channels: We can write this instead as: (59) where C α i is a classically controlled channel and R α i is a quasidistribution. Now, let us define: where C i is a classically controlled channel and R i is a quasistochastic map.
Putting this together with Eq. (52) we find that: where Q is the quasidistrubtion defined by the q α , i.e.: Eq. 67 gives us a quasiprobability distribution over a set of variables, one for each C i . In the remainder of this proof we show that this can be rewritten as a quasidistribution over a single variable, which is then copied to each of the C i 's. Diagrammatically, this means that the copy operation should be the last operation prior to the C i 's. It is then this quasidistribution over a single variable which defines our affine combination of product channels. Now, define "all but system i" marginalisation maps, D i as: where the case i = 1 follows similarly. We can then write: = . . .
where in the last step we have simply merged together parallel wires into a single composite wire, whilst using Eq. (18) to write the composite of copies as a copy of the composite. By decomposing the quasidistribution Q we can equivalently write this as: where c β i are GPT channels and q β is a quasidistribution. That is, any no-signalling GPT channel can be written as an affine combination of product GPT channels.
Note that, if we have a GPT, such as quantum theory, in which one can always reversibly encode classical data into a GPT system, then we can rewrite this as:

T1
Tn S1 Sn where E is the encoding map, D the decoding map, s Q is some vector which is not necessarily a physical GPT state, and where the C i are GPT channels.

Outlook
In this work we have provided a characterisation of multipartite non-signalling channels in arbitrary locally-tomographic theories: these channels can always be represented as affine combinations of local channels. In the case where the input and output system types are classical, i.e., where the channel is a multipartite non-signalling stochastic map, we recover the result of Ref. [7]. In the case of bipartite non-signalling channels whose inputs and outputs are quantum systems, we in turn recover the results of Ref. [8].
Our proof technique highlights the usefulness of the duotensor formalism, and we hope this will motivate its use throughout the quantum community. In particular, we show how it can be used to lift properties of multipartite stochastic maps, to arbitrary tomographically local GPTs. This motivates the question as to which other properties of stochastic maps can be similarly lifted?
Proof. Since MP ⊆ DP ∩ K, all that remains to show is the opposite containment. Let φ ∈ DP ∩ K be a fixed, arbitrary vector. Since φ ∈ K, we can write it as where k is finite, s 1 , . . . , s k ∈ Ω T are normalized states, and e 1 , . . . , e k ∈ T S are in the effect cone.
It remains to show that e 1 , . . . , e k sum to . Since φ ∈ DP, we have that since the states s i are normalised. This is the desired equality we seek. Finally, note that by the partial order given in Section 4.2, for all e i in the sum above, we have e i ∈ E S , so φ satisfies all requirements for membership in MP.

Lemma 2. Suppose S is a set and
is a full-dimensional cone, i.e., V = K − K. Suppose for x ∈ K, we have that for all s ∈ S, there exists t s > 0 such that x − ts ∈ K for all t ∈ [0, t s ]. Then x ∈ core(K).
Note that the only difference between the definition of core(K) and the condition above is that the vectors in the statement above are not arbitrary but rather belong to a set which generates a full-dimensional cone.
Proof of Lemma 2. Since V = K − K, for any arbitrary v ∈ V we can write v = y − z where y, z ∈ K. Then for x ∈ K, t ≥ 0, we can write the following So, for x to be in core(K), it suffices to find a t v > 0 such that x + tv ∈ K for all t ∈ [0, t v ]. To do that, we have two cases to analyse, z = 0 and z = 0. Note that if z = 0, x + tv = x + ty ∈ K for all t ≥ 0 since x, y ∈ K, and K is cone, so this case is trivial. Suppose z ∈ K is nonzero, then we can write it as i α i s i where α i > 0 and s i ∈ S and the sum is finite. Then we have For brevity, define a = i α i > 0. By hypothesis, let t i > 0 be such that for all t ∈ [0, t i ]. This exists since a is positive and by assumption there is some t i = at i such that x − ts i ∈ K for all t ∈ [0, t i ]. We now have which is in K for all t ∈ [0, t v ] where t v := min i {t i } (which is positive since there are finitely many indices i). This concludes the proof.
We use this particular case characterization of a core element to prove the following lemma which is helpful in our proof of Theorem 4.1.
Proof. Clearly µ S T ∈ DP, as µ is normalised, so all that remains to show is that it is in core(K). Define Since K (as defined in Equation (41)) is the convex hull of S (as defined in Equation (82)) and L(V S , V T ) = K − K, by Lemma 2, it suffices to show that for a fixed s ∈ T T and e ∈ T S , there existst > 0 such that