Limits on non-local correlations from the structure of the local state space

The outcomes of measurements on entangled quantum systems can be nonlocally correlated. However, while it is easy to write down toy theories allowing arbitrary nonlocal correlations, those allowed in quantum mechanics are limited. Quantum correlations cannot, for example, violate a principle known as macroscopic locality, which implies that they cannot violate Tsirelson's bound. This work shows that there is a connection between the strength of nonlocal correlations in a physical theory, and the structure of the state spaces of individual systems. This is illustrated by a family of models in which local state spaces are regular polygons, where a natural analogue of a maximally entangled state of two systems exists. We characterize the nonlocal correlations obtainable from such states. The family allows us to study the transition between classical, quantum, and super-quantum correlations, by varying only the local state space. We show that the strength of nonlocal correlations - in particular whether the maximally entangled state violates Tsirelson's bound or not - depends crucially on a simple geometric property of the local state space, known as strong self-duality. This result is seen to be a special case of a general theorem, which states that a broad class of entangled states in probabilistic theories - including, by extension, all bipartite classical and quantum states - cannot violate macroscopic locality. Finally, our results show that there exist models which are locally almost indistinguishable from quantum mechanics, but can nevertheless generate maximally nonlocal correlations.


Introduction
Nonlocality is a key feature of quantum mechanics. By performing measurements on separated systems in an entangled state, one can obtain correlations that are stronger than those of any local model, as witnessed by the violation of Bell inequalities [1]. On the other hand, sets of nonlocal correlations are known that are stronger than those of quantum mechanics, but which do not allow for instantaneous signalling. This led Popescu and Rohrlich [2] to raise the question of why nonlocality seems to be limited in nature.
In recent years, new insights have been gained into this question by studying the information theoretic properties of super-quantum correlations. For instance these correlations lead to implausible reductions for all communication complexity problems, such that they can be solved with only constant communication [3,4,5]. The principle of information causality [6] is satisfied by quantum correlations, but can be violated if certain super-quantum correlations are available -similarly the principle of macroscopic locality [7]. Various multi-player games have been described, for which super-quantum correlations would provide an advantage over quantum correlations [8,9].
The above studies focused on the information theoretic power of correlations without any reference to the physical theories they emerge from. Recent works revealed interesting connections between the structure of quantum mechanics and the nonlocal correlations that can be generated by quantum systems. Barnum et al. [10], for example, considered a theory that is locally equivalent to quantum mechanics but whose non-locality is only limited by the no-signalling principle. Despite this theory being less restrictive than quantum mechanics, the set of bipartite correlations that can be obtained is identical to that of quantum states. This implies that, despite the fact that quantum correlations are clearly a global property of joint systems, their limitation does not result from the lack of joint states, but rather from the structure of the local state spaces. Meanwhile, Acín et al. [11] have shown that this result does not extend to three or more parties.
In this paper, we show that the connection between local state spaces and the limitation of bipartite nonlocal correlations is actually a more general phenomenon. In particular, if local state spaces have a property known as strong self-duality, then the correlations obtainable from maximally entangled states must be compatible with the principle of macroscopic locality. It follows that they must also respect Tsirelson's bound. A precise definition of strong selfduality is given later, but in the quantum case it corresponds roughly to the fact that the same rank one projector represents both a pure state and the outcome of a measurement which identifies that state.
By way of illustration, we introduce along the way a family of models, where each model is defined by the local state space for a single system, and the state space is taken to be a regular polygon with n vertices (see figure 1). For two such systems, there is a natural analogue of a maximally entangled state. The family includes the classical case of two trits (n = 3); systems generating the super-quantum correlations introduced by Popescu and Rohrlich (n = 4); and systems producing quantum correlations (n → ∞). Thus the family allows us to study the transition between these theories, and the bipartite correlations that can be produced by a maximally entangled state, by modifying only the local state space. For high n the local state spaces are almost indistinguishable from a quantum system. Nevertheless it turns out that these models show dramatically different correlations -and thereby have fundamentally different information theoretic capabilities -depending on the parity of n. This is explained by the fact that those with odd n are strongly self-dual, while those with even n only weakly self-dual.
One way of viewing the polygon models is that moving from n → ∞ to n = 3, there is a progressive weakening of the superposition principle. A weakened superposition principle means that states can only be superposed in certain combinations. In a similar spirit, a different range of models was introduced in Ref. [12], with each model defined by a relaxation of the uncertainty relations of quantum mechanics. Here too, a transition from quantum correlations to Popescu-Rohrlich correlations was observed.
This paper is organized as follows. Section 2 gives a brief, not too technical, introduction to a mathematical formalism in which a very broad range of probabilistic theories can be expressed, including quantum theory and classical probability theory. Section 3 introduces the polygon models, and by investigating the properties of bipartite correlations, sheds some light on the relation between these and the local state space structure. Section 4 returns to the general case and contains the proof of the main theorem, which establishes a rigorous limit on the nonlocal correlations obtainable from a broad class of bipartite states in general probabilistic theories. In particular, states obtainable by norm-preserving local transformations from what we call inner product states cannot violate the principle of macroscopic locality. Section 5 provides a formal definition of strong and weak self-duality, and discusses consequences of the main theorem for the correlations in bipartite polygon systems. Section 6 presents a strongly self-dual system in which a non-maximally entangled state gives rise to correlations that cannot be obtained from any inner product state. Finally, section 7 discusses some open questions.

Systems and measurements
This section describes briefly the framework of generalized probabilistic theories [13], using the notation and conventions of Ref. [14]. The aim is to be able to describe theoretical models other than the classical and quantum theories, and for these two to be included as special cases.
We start by taking an operational point of view. A state of a system is a mathematical object that defines the outcome probabilities for all the measurements that can possibly be performed on this system. The state space Ω of a system is the set of states that it can be prepared in.
By defining the operations of summation and multiplication by a real number on states, we can identify pω 1 + (1 − p)ω 2 as the probabilistic mixture obtained by preparing ω 1 with probability p and ω 2 with probability 1 − p. The state space Ω is now a convex set, embedded in a real vector space V . For simplicity, assume that Ω is compact and finite dimensional. States that can be represented by convex combinations of other states are mixed states. The extremal points of the state space Ω cannot be written in such a form, and are pure states. For a quantum system, for example, Ω is the set of density operators on a Hilbert space, and the pure states are the rank one projectors. For a qubit, Ω is particularly easy to visualize, since it corresponds to the Bloch ball, with pure states on the surface of the ball. For a (finitedimensional) classical system, Ω is the set of probability distributions over some finite sample space.
A measurement outcome is represented by an effect, that is a map e : Ω → [0, 1], where e(ω) is the probability of obtaining the outcome e when the measurement is performed on a system in the state ω. Probabilities of measurement outcomes should respect probabilistic mixtures of states, meaning that e[p ω 1 + (1 − p) ω 2 ] = p e(ω 1 ) + (1 − p) e(ω 2 ), i.e., the effects are affine maps. A special effect is the unit effect u, which is uniquely defined such that u(ω) = 1 for all ω ∈ Ω. The unit effect represents a measurement with a single outcome that is certain to occur regardless of what the state is. An arbitrary measurement is a set of effects {e i } summing to the unit effect i e i = u. This ensures that outcome probabilities of measurements sum to one.
The set of proper effects E(Ω) = {e : 0 ≤ e(ω) ≤ 1 ∀ω ∈ Ω} is the convex hull of the unit effect, the zero effect and a set of extremal effects. For a quantum system, if states are density operators on a Hilbert space, then effects can be identified with positive semidefinite operators on the Hilbert space, in such a way that outcome probabilities are given by the usual trace rule. Measurements correspond to positive operator-valued measures. For a classical system, effects can be identified with fuzzy indicator functions on the sample space, i.e., maps from the sample space into [0, 1].

Unnormalized states
It is frequently useful to work with unnormalized states. Given a state space Ω and effect space E(Ω), let V be the linear span of Ω. The linear span of E(Ω) is then the dual space V * . Both V and V * are real vector spaces. In the case of a quantum system, for example, V is the linear span of the density operators, which is the set of all Hermitian operators on the corresponding Hilbert space. Similarly, V * is the linear span of the positive semidefinite operators, which is also the set of all Hermitian operators.
An unnormalized state is an element of V of the form r ω, with r > 0 and ω ∈ Ω. The set of all unnormalized states is a cone denoted V + . Similarly, an unnormalized effect is an element of V * of the form r e for r > 0 and e ∈ E(Ω). The set of unnormalized effects is the dual cone to V + , denoted V * + . The cone V + and the dual cone V * + are related via In the case of a quantum system, both V + and V * + can be identified with the set of positive semidefinite operators on the Hilbert space. In general a cone V + can have a very different structure than its dual cone V * + , e.g., they may have a different number of extremal rays.

Bipartite states
Given two systems A and B, an operational model needs to specify the set Ω AB of available joint states, in addition to the individual state spaces Ω A and Ω B . In general, one can imagine many weird and wonderful ways in which two systems might combine to form a joint system. By imposing two quite natural conditions, however, one can narrow down these possibilities significantly.
The first condition is the no-signalling principle, which says that it should not be possible to send messages instantaneously by performing measurements on the separate parts of a joint system. The second is that of local tomography. Given a single system, call a measurement informationally complete if its outcome probabilities are sufficient to determine uniquely the state of the system. The principle of local tomography states that if an informationally complete measurement is performed separately on each of the subsystems of a composite system, then the joint outcome probabilities are sufficient to determine uniquely the state of the joint system.
These two conditions together are sufficient to ensure that the linear space V AB in which the joint state space Ω AB and the cone of associated unnormalized states are embedded can be taken to be V A ⊗ V B (see for example Ref. [14] and the references therein). If simultaneous measurements are performed on systems A and B, then the joint probability for outcomes e and f is given by (e ⊗ f )(ω AB ).
It is convenient to define the unit effect of the joint state space as u AB = u A ⊗ u B such that a joint state is normalized if where u A and u B are the unit effects for systems A and B respectively. Naturally, probabilities are positive, so a joint state must satisfy for all e A ∈ E(Ω A ), e B ∈ E(Ω B ). (2) and (3) are satisfied.

Definition 1. The maximal tensor product of Ω
It is easy to check that the no-signalling principle is indeed satisfied for such an Ω AB . Consider two measurements on A, corresponding to sets of effects x = {e 1 , . . . , e m } and i.e., it is independent of whether x or x ′ is performed on A.
Intuitively, the maximal tensor product is the set of all non-signalling joint states that can be written down for two systems, given the individual state spaces Ω A and Ω B . A particular theory or model need not assume that every element of the maximal tensor product is an allowed state for the joint system. In general, a model will specify a joint state space Ω AB which is a subset of Ω A ⊗ max Ω B .
Straightforwardly generalizing the notions well known from quantum theory, one calls a state a product state if it can be written in the form ω A ⊗ ω B for some states ω A ∈ Ω A and ω B ∈ Ω B . States that can be written as probabilistic mixtures of product states are separable, while states that are not separable are entangled.
This work mostly considers correlations obtained from product measurements on bipartite states. The general formalism, however, does not assume that all measurements on composite systems are product measurements. As in the case of single systems, outcomes of measurements on a composite system correspond to effects, where these are maps Ω AB → [0, 1]. The set of all such effects is written E(Ω AB ), and may include entangled, as well as product, effects. However, E(Ω A ⊗ max Ω B ) only contains separable effects.
Quantum theory provides a useful example of many of the concepts above. In this case, Ω AB is the set of density operators on the Hilbert space H AB = H A ⊗ H B . Recall that V A and V B are real vector spaces of Hermitian operators on H A and H B respectively. The set of Hermitian operators on H AB can be identified with V A ⊗ V B , so the joint quantum states are indeed elements of V A ⊗ V B . The density operators on H AB are a proper subset of Ω A ⊗ max Ω B . Elements of Ω A ⊗ max Ω B which are not density operators are (normalized) entanglement witnesses. An entanglement witness w is locally positive, meaning that for all product measurements, (e A ⊗ e B )(w) ≥ 0. But w is not a density operator, since there are entangled measurement outcomes e with e(w) < 0.

Polygon systems
This section defines a family of models such that the state spaces Ω of single systems are regular polygons with n vertices. It is convenient to represent both states and effects by vectors in R 3 such that e(ω) is the usual Euclidean inner product. For fixed n, let Ω be the convex hull of n pure states {ω i }, i = 1, ..., n, with where r n = sec(π/n). The unit effect is In the case of even n, the set E(Ω) of all possible measurement outcomes is the convex hull of the zero effect, the unit effect, and e 1 , . . . , e n , with Letē i = u−e i , hence a possible dichotomic measurement is {e i ,ē i }. When this measurement is performed on a system in the state ω j , the probabilities for the two outcomes are given by e i · ω j andē i · ω j , and satisfy e i · ω j +ē i · ω j = 1. Observe that for even n,ē i = e (i+n/2)mod n . The case of odd n is slightly different. In this case, define and again letē i = u − e i , so that a possible dichotomic measurement is {e i ,ē i }. This time, however,ē i does not equal e j for any j. The set E(Ω) of all possible measurement outcomes is the convex hull of the zero effect, the unit effect, e 1 . . . , e n , andē 1 , . . . ,ē n . As can be seen in figure 2 in such theories there are effects that are extremal in E(Ω) (namely theē i ) but not ray extremal, i.e., they do not lie on an extremal ray of the cone V * + . This also happens in Figure 2. State spaces Ω (blue polygons) and sets of proper effects E(Ω) (red polytopes) of the polygon toy theories with n vertices. The case n = 3 corresponds to a classical system, the n = 4 system is capable of generating all no-signalling correlations. In the limit n → ∞ the state space becomes a disc, which can be thought of as the equatorial plane of the Bloch ball.
quantum mechanics, but only if the dimension of the Hilbert space is larger than two. For example the effect 1 − |ψ ψ| for any rank one projector |ψ ψ| is then extremal in the set of proper effects, but not ray extremal. A two-dimensional illustration of the state and effect spaces is given in figure 1 and a three-dimensional illustration in figure 2.
The n = 3 case corresponds to a classical system with three pure states. Think of it as a trit. The three pure states are ω 1 , ω 2 and ω 3 , and correspond to the three different possible values of the trit. The state space Ω is a triangle. A generic point in Ω is a mixture of the three pure states and corresponds to a probability distribution over the three trit values. Notice that in this case, e 1 + e 2 + e 3 = u, hence a possible measurement is a three-outcome measurement with outcomes e 1 , e 2 and e 3 . This is the obvious measurement that simply reads off the value of the trit. Below we shall consider bipartite states of polygon systems. Given two trits, the only possible joint states are separable, and it is not possible to produce nonlocal correlations. The case n = 4 corresponds to a single system in a toy theory known as 'box world', which has been discussed elsewhere in the literature (see for instance Ref. [13]). The state space is a square. As shown below, a notable feature of box world is that given two of these systems, it is possible to construct joint states that are more nonlocal than quantum states. In fact, an entangled state of two of the n = 4 systems can produce maximally nonlocal correlations known as PR box correlations [2], which have been much explored in the literature [3,4,6,8].
As n → ∞, the state space tends to a disc of radius one. This makes it similar to a quantum mechanical qubit, whose state space is the Bloch ball. The disc can be thought of as the equatorial plane of the Bloch ball. We will refer to this case, somewhat loosely, as the quantum case.

Bipartite states of polygon systems
We shall not attempt a complete characterization of the set of all possible non-signalling states Ω A ⊗ max Ω B for each value of n. Instead, this section describes a particular joint state of two polygon systems, which is the natural analogue of a maximally entangled state of two qubits. The next section examines the nonlocal correlations that can be obtained from performing measurements on these maximally entangled polygon systems.
Recall that a joint state is an element of V A ⊗ V B , hence in the case of two polygon systems, a joint state is an element of R 3 ⊗ R 3 = R 9 . It is convenient to represent the joint state as a 3 × 3 matrix such that (e i ⊗ e j )(ω AB ) can be calculated by simply left and right multiplying this matrix with the representations of the effects e i and e j in R 3 . Define The state φ AB is the natural analogue of a quantum mechanical maximally entangled state for the following reasons. First, it can be verified (see, e.g., Ref. [15]) that except for n = 3, φ AB is an entangled pure state, where pure means that it is extremal in the maximal tensor product, hence cannot be written as a mixture of other non-signalling states. The n = 3 case corresponds to two classical trits, with φ AB the maximally correlated state, i.e., if the trit values are 1, 2, 3, then φ AB corresponds to P (11) = P (22) = P (33) = 1/3. Second, φ AB is constructed so that if a measurement is performed on the A system, and outcome e i obtained, then the updated (or collapsed) state for the B system is ω i . The marginal probability for Alice to obtain outcome e i is the same for all i. Compare this with the case of two spin-1/2 particles in the state 1/ √ 2(|00 + |11 ), where |0 and |1 are the eigenstates of spin-z. If a spin measurement in direction m in the xz-plane is performed on system A, then the probability of obtaining the up outcome is 1/2, and if the up outcome is obtained, then the collapsed state of the B system is spin up in direction m. These quantum predictions are recovered by φ AB in the limit n → ∞.
The following sections investigate the nonlocal correlations that can be produced by performing measurements on two systems in the state φ AB . For this it is useful to have an expression for the joint probability of obtaining outcome e A i on system A and e B j on system B. This is easy to calculate from (9). For even n, where α i = 2πi n and β j = (2j−1)π n , and as before, r n = sec(π/n). For odd n where α i = 2πi n and β j = 2πj n . Notice the cosine dependence, which is reminiscent of quantum mechanical correlations.

The Clauser-Horne-Shimony-Holt inequality
One commonly used measure of the degree of nonlocality that a bipartite system exhibits is the maximal violation of the Clauser-Horne-Shimony-Holt (CHSH) inequality [16]. The CHSH inequality involves two parties, conventionally called Alice and Bob. Each chooses between two dichotomic measurements. Let Alice's choice of measurement be x, and Bob's y, with x, y ∈ {0, 1}. Denote the measurement outcomes a, b ∈ {0, 1}. A set of correlations is characterized by the joint probability distribution P (a, b|x, y). The strength of the correlations is quantified by the CHSH parameter where E x,y = P (0, 0|x, y) + P (1, 1|x, y) − P (0, 1|x, y) − P (1, 0|x, y). As CHSH showed, local correlations must satisfy S ≤ 2. In quantum mechanics, correlations can violate this inequality, but must respect Tsirelson's bound S ≤ 2 √ 2 [17]. By inspection, the algebraic maximum of S is 4, and it is easy to see that it is attained by the following correlations: Here, ⊕ denotes addition modulo 2. These correlations were described by Popescu and Rohrlich, who pointed out that they are maximally nonlocal, yet still respect the no-signalling principle [2]. Since they cannot occur in quantum mechanics, they are imagined to be produced by a fictitious device, which is often referred to as a PR box. As discussed in the introduction, PR boxes have been explored in the literature and are known to be particularly powerful for certain kinds of information theoretic problem, especially communication complexity problems [3,4,5,6,7,8,9]. It is interesting to see how the maximal CHSH value obtainable from polygon systems in the state φ AB varies as the number of vertices n of the polygon increases. The n = 4 case is particularly simple. The optimal choice of measurements to violate the CHSH inequality is and it can be verified from (10) where as before, α x = 2πix n and β y = (2jy−1)π n . For odd n, where α x = 2πix n and β y = 2πjy n . Maximizing these expressions over all possible choices for the angles α i and β j gives the maximal violation achievable by local measurements on the maximally entangled state φ AB . A detailed analysis of these expressions can be found in Appendix A. Figure 3 shows the maximal CHSH value for the maximally entangled state of polygon systems as a function of n. √ 2) appears as a natural separation between the case of even n and odd n.
The most important feature of figure 3 is that the correlations of even n systems can always reach or exceed Tsirelson's bound, while the correlations of odd n systems are always below Tsirelson's bound. Thus Tsirelson's bound appears as a natural separation between the correlations of these two different kinds of polygon state spaces. Sections 4 and 5 show why this is. Section 4 shows that for odd n, the maximally entangled state φ AB belongs to a broad class of states we call inner product states, and that all correlations obtainable from measurements on inner product states satisfy Tsirelson's bound. Section 5 goes further, and relates this to a fundamental geometric difference between polygons with even n and odd n. In figure 1, the difference is seen in the fact that for odd n, the effect cone V * + coincides with the state cone V + , whereas for even n, the effect cone is isomorphic to the state cone but rotated through some angle.
We have only considered correlations obtainable from the maximally entangled state φ AB . In principle there could be joint states other than the maximally entangled state which show stronger violations for some Bell inequalities. While this seems unlikely for the CHSH inequality, other Bell inequalities are known to be maximized by non-maximally entangled states in quantum mechanics [18].

The Braunstein-Caves inequalities
The Braunstein-Caves (or chained) Bell inequalities [19] are similar to the CHSH inequality, but involve N measurement settings on each system, rather than two. Let Alice's choice of measurement be x, and Bob's y, with x, y ∈ {1, . . . , N}. Let the outcomes be a, b ∈ {0, 1}. Local correlations satisfy where as before E x,y = P (0, 0|x, y) + P (1, 1|x, y) − P (0, 1|x, y) − P (1, 0|x, y). In the case N = 2, this is equivalent to the CHSH inequality, up to relabelling of measurement settings. The algebraic maximum of S N is 2N. This maximum can be attained by performing measurements on the maximally entangled state of even n polygon systems with n = 2N. This state is thus tailor made for violating the Braunstein-Caves Bell inequalities. To see this, let Alice's and Bob's measurement choices be given by and note that (i) E j,j = 1 for j = 1, ..., N, (ii) E j,j+1 = 1 for j = 1, ..., N − 1 and (iii) E N,1 = −1. In the case n → ∞, maximal violation of the Braunstein-Caves inequality is achieved in the limit of infinitely many settings. This is also true for a quantum mechanical maximally entangled state, as shown in Ref. [20]. In general, given a set of correlations P (a, b|x, y), they can be written as a mixture P (a, b|x, y) = qP NL (a, b|x, y) where 0 ≤ q ≤ 1, P N L (a, b|x, y) is a set of nonlocal correlations and P L (a, b|x, y) a set of local correlations. Suppose, however, that the correlations P (a, b|x, y) return the maximum value S N for an appropriate Braunstein-Caves inequality. Then q(S N )+(1−q)(S N −2) ≥ S N , hence q = 1. Therefore, the fact that the maximally entangled state of even n polygon systems returns the maximum value for the appropriate Braunstein-Caves inequality indicates that there is no local part in the correlations with N = n/2 measurement settings. This was pointed out in the case of quantum systems in Ref. [20,21]. As a further curiosity, if we did have access to these systems, they could be used for secure key distribution, using the protocol of Ref. [22].

Distillation
So far, we have only considered correlations that can be produced by measuring a single copy of a bipartite polygon system. There remains the possibility that stronger correlations could be produced by performing local measurements on multiple bipartite pairs, and locally processing the data (there is a further possibility, involving entangled measurements across multiple copies on each side, which we do not discuss). Consider the bipartite state φ AB of two even n polygon systems, and suppose that Alice and Bob are choosing from the measurements with outcomes a, b ∈ {0, 1} as usual. Recall that E j,j = 1 for j = 0, 1 and E 0,1 = 1.
Here, 0 ≤ ǫ = 1 − cos( 2π n ) ≤ 1, P PR is given by and P L is a set of local correlations given by In Ref. [5], it is shown that all correlations of the form (22) with 0 < ǫ < 1 can be distilled into stronger correlations using a protocol that involves two copies of a bipartite system. Importantly, this protocol consists only of local processing and does not involve any communication. In the asymptotic limit of infinitely many copies of a bipartite system, the correlations (22) can be distilled to PR box correlations by iterating the protocol. Thus for any finite even n, the polygon systems produce correlations that can be distilled arbitrarily close to PR box correlations (since ǫ = 1 − cos( 2π n ) > 0). It is only in the limit n → ∞ (the quantum case), that we get ǫ = 0 and thus lose the ability to distill PR box correlations.
The consequence of the above is that polygon systems with even and finite n inherit the powerful communication properties of PR boxes as long as there are multiple copies of the maximally entangled state available. For instance, they collapse communication complexity [3], allow for better than classical non-local computation [8], violate information causality [6] and macroscopic locality [7]. Moreover, since the PR box can be considered as a unit of bipartite nonlocality [23,24], it follows that any bipartite no-signalling probability distribution can be generated from multiple copies of polygon systems with even n. This is particularly surprising as in practice, an individual polygon system with even and very large n would be very difficult to distinguish from one with odd n, and also from the quantum case, i.e. the disc that one gets in the limit n → ∞. These toy theories thus show that practically indistinguishable theories can have fundamentally different limits to the non-local correlations they allow.
For polygon systems with odd and finite n, the situation is dramatically different, as seen in the next section.

Bounds on correlations
For even n polygon systems, the maximally entangled state can produce arbitrarily strong nonlocal correlations, whereas for odd n polygon systems, the nonlocality is highly constrained. The maximally entangled state of odd n polygon systems cannot, for example, violate Tsirelson's inequality. This section shows that this is a consequence of a much more general result.
We first introduce a class of bipartite states in general theories, which we call inner product states. The main theorem establishes a strong constraint on the nonlocal correlations that can be produced from measurements on inner product states. One consequence is that inner product states cannot violate Tsirelson's inequality. The maximally entangled states of odd n polygon systems are inner product states, hence the theorem explains what was only established by direct calculation above -that these states do not violate Tsirelson's inequality. On the other hand, the maximally entangled states of even n polygon systems are not inner product states, which is consistent with them producing arbitrary non-signalling correlations. We also show that all classical and quantum states are, in terms of non-local correlations, no stronger than an inner product state.

Inner product states
Recall that a state cone V + is the set of unnormalized states of a system, and that these span a vector space V . An effect cone V * + is the set of unnormalized measurement outcomes, and these span the vector space V * . Given two systems A and B, if the state cones V A + and V B + span vector spaces V A and V B respectively, then a joint state is an element of V A ⊗ V B . Call two distinct systems similar if their state spaces are isomorphic. Examples of similar systems are two quantum mechanical qubits, or two classical trits, or two n-vertex polygon systems. For the rest of this section, assume a bipartite system composed of two similar subsystems A and B. In this case, the respective state spaces and effect spaces can be identified, so that V A = V B = V , (V A ) * = (V B ) * = V * , u A = u B = u, and so on.

Definition 2.
A joint state ω AB is symmetric if (e ⊗ f )(ω AB ) = (f ⊗ e)(ω AB ) for all measurement outcomes e, f ∈ V * + . Definition 3. A joint state ω AB is an inner product state if ω AB is symmetric, and positive semidefinite, i.e., (e ⊗ e)(ω AB ) ≥ 0 ∀e ∈ V * . Note that by definition of a joint state, it is always true that (e ⊗ e)(ω AB ) ≥ 0 when e ∈ V * + , i.e., when e is a valid effect. This is simply a statement of the fact that measurement outcome probabilities have to be greater than or equal to zero. The definition requires something stronger, which is that (e ⊗ e)(ω AB ) ≥ 0 for any e in the whole of the vector space V * . Example 1. Any symmetric product state ω AB = ω ⊗ ω is an inner product state.

Example 2.
Consider two classical systems, each of which is a nit, taking values {1, . . . , n}. A joint state is simply a joint probability distribution over nit values. Write the joint state as a matrix P , where P ij is the joint probability that A = i and B = j. This is an inner product state iff the matrix P is symmetric and positive semi-definite. In particular this includes any perfectly correlated state of the form

Example 3.
Consider two polygon systems, each corresponding to a state space with n vertices. Section 3.2 defined an analogue of a maximally entangled state φ AB . In the matrix representation of (9), φ AB is an inner product state if and only if the matrix is symmetric and positive semi-definite. Hence φ AB is an inner product state for odd n, whereas for even n, φ AB is not an inner product state.

Example 4.
The quantum case is slightly subtle. Given two qubits, the maximally entangled state is symmetric but is not an inner product state, since if σ y is a Pauli spin matrix, then (σ y ⊗ σ y )(Φ + ) = −1. Consider the operator defined byΦ = (1 ⊗ T )(Φ + ), where T is the linear map that takes an operator in V B to its transpose with respect to the computational basis. The new operatorΦ is not a valid quantum state. It is locally positive but not globally positive, hence is not a density operator. But it is in the maximal tensor product of two qubits, and it is an inner product state. In fact,Φ predicts perfect correlation whenever Alice and Bob perform measurements in the same direction. However, the two states are equivalent in terms of the non-local correlations they can produce (as was first shown in Ref. [10]).
Theorem 8 below establishes a constraint on the nonlocal correlations that can be obtained from measurements on an inner product state. It may seem as if the definition of an inner product state is quite restrictive, given that an inner product state must be symmetric, for example, and given that the maximally entangled state Φ + of two qubits is not included. This would diminish the interest of the theorem. However, suppose that a bipartite state ω AB can be obtained from an inner product state via a transformation of one of its subsystems. Then any correlations obtained from ω AB could also be obtained from an inner product state. Hence any restriction on the correlations from inner product states also applies to ω AB . Formally, Theorem 4. Consider a joint state ω AB , which can be written in the form ω AB = (1 ⊗ τ )(σ AB ), for some τ : V + → V + that takes normalized states to normalized states. Any correlations obtained from measurements on ω AB can also be obtained from measurements on σ AB .
Since τ takes normalized states to normalized states, τ † (u) = u. Given a measurement y on system B, with outcomes {f 1 , . . . , f r }, let y ′ be the measurement with outcomes Note that from f 1 + · · · + f r = u, and τ † (u) = u, it follows that τ † (f 1 ) + · · · + τ † (f r ) = u, as must be the case for y ′ to be a valid measurement. Then measurements x and y on ω AB have the same joint outcome probabilities as measurements x and y ′ on σ AB . Hence, if a particular set of correlations can be obtained by performing measurements on ω AB , those same correlations can be obtained by performing different measurements on σ AB .
Proof. Using the Schmidt decomposition, every pure quantum state |ψ can be written in the form: where r is the Schmidt rank, {|a i } and {|b i } are orthonormal bases and the λ i are real and positive. A unitary transformation U, on system B, which maps {|b i } to {|a i } gives where T is the transpose map, acting on the B system, defined with respect to the basis {|a i }. Note thatρ AB is symmetric since for Hermitian operators E and F , Note also thatρ AB is positive semi-definite since for any Hermitian operator E, Thereforeρ AB is an inner product state. The quantum state ρ AB can be written ρ AB = (1 ⊗ τ )(ρ AB ), where τ is the transpose map followed by U −1 , which proves the theorem. Now any correlations that can be obtained from measurements on a bipartite classical or quantum system, pure or mixed, can also be obtained from measurements on a pure quantum state of two d-dimensional systems for some d. This follows from the fact that mixed quantum states always have a purification on a larger Hilbert space. Combining this observation with theorems 4 and 5 gives Theorem 6. Any correlations obtained from measurements on a bipartite, pure or mixed, classical or quantum system could also be obtained from measurements on an inner product state.
Hence as far as correlations go, the fact that we consider only inner product states is not nearly so restrictive as it looks. By extension, the results apply to all classical and quantum bipartite systems.

The set Q 1
The problem of characterizing those correlations which could in principle be produced by performing measurements on quantum systems, and those that cannot, is an interesting one. Tsirelson's inequality, which limits the possible violation of the CHSH inequality in quantum theory, was the first result in this direction. A great deal of progress is made in Refs. [25,26], where the problem is reduced to the following form. A hierarchy of sets Q 1 , Q 2 , . . . is defined, such that each Q k is a proper subset of the set of all possible bipartite non-signalling correlations, and each Q k is strictly contained in its predecessor. For given correlations P (a, b|x, y), and for each k, it is a semi-definite programming problem to determine whether P (a, b|x, y) is contained in Q k . Furthermore, a given set of correlations P (a, b|x, y) can be obtained from measurements on quantum systems if and only if P (a, b|x, y) is contained in Q k for some k. Hence the sets Q k become smaller as k increases, until in the limit k → ∞ they converge towards the set Q of quantum correlations.
The set Q 1 , which is the largest in the hierarchy, is of further significance. In Ref. [7] it is shown that correlations in Q 1 satisfy a readily comprehensible physical principle called macroscopic locality. For a precise description of what this means, see Ref. [7], but in a nutshell, the principle states that the coarse-grained statistics of correlation experiments involving a large number of particles should admit a description by a local hidden variable model. In other words, the set of microscopic correlations that satisfy the principle of macroscopic locality are those which are compatible with classical physics in a certain limit in which the number of particle pairs being tested is large, and only coarse-grained statistics, rather than settings and outcomes for every pair, are collected. It is also known that Q 1 is closed under wiring [7,27], in other words it is not possible to distill correlations in Q 1 to correlations outside Q 1 by performing measurements on a number of distinct pairs of systems, and locally manipulating the data. Finally, in the specific case of binary measurement choices and outcomes, all correlations in Q 1 respect Tsirelson's bound of 2 √ 2 for the CHSH scenario. The main theorem below states that correlations from measurements on inner product states are contained in the set Q 1 .
First, we give a formal definition of Q 1 . Suppose that Alice and Bob share two systems in a bipartite state, and let Alice choose a measurement x and Bob choose a measurement y. Up to now, when we discussed correlations, Alice's and Bob's outcomes were labelled a and b, and correlations written P (a, b|x, y). For the specific purpose of defining Q 1 , however, it is more useful to label the measurement outcomes in such a way that outcomes of distinct measurements have different labels. Hence let the index i range over all possible outcomes of all of Alice's measurement choices. For example, if Alice is choosing from N possible measurements, each of which has k possible outcomes, then i takes values in {1, . . . , kN}, with i = 1, . . . , k the outcomes of the x = 1 measurement, i = k + 1, . . . , 2k the outcomes of the x = 2 measurement, and so on. Let the same conventions apply to Bob's outcome, which is denoted j. With a slight abuse of notation, let x(i) denote the unique measurement choice of Alice for which i is a possible outcome. Similarly, y(j). Write P (i, j) for the probability of obtaining outcomes i and j when the measurements x(i) and y(j) are performed. Let P A (i) denote the marginal probability for Alice to obtain outcome i when she performs measurement x(i), and P B (j) denote the marginal probability for Bob to obtain outcome j when he performs measurement y(j).

Definition 7 ([25, 26, 7]). A set of correlations
such that (i) P A and P B are the vectors of probabilities P A (i) and P B (j), (ii)P is a matrix with elementsP ij = P (i, j), (iii)Q andR are sub-matrices with diagonal elementsQ ii = P A (i) andR jj = P B (j), In words, the last two conditions state that elements ofQ andR corresponding to different outcomes of the same measurement must be zero. The remaining off-diagonal elements ofQ andR can be chosen freely.

The main theorem
Theorem 8. Consider two similar systems, whose joint state is an inner product state. All correlations that can be obtained from local measurements lie in Q 1 .
Proof. It is sufficient to show that for any set of correlations generated by measurements on an inner product state, there exists a matrix γ of the form (28), which is symmetric, positive semi-definite, and has the feature that entries in the blocksQ andR corresponding to different outcomes of the same measurement are zero.
Consider correlations generated by measurements on an inner product state ω AB . Using the notation introduced in section 4.2, let e i be the effect corresponding to Alice's measurement outcome i, and f j the effect corresponding to Bob's measurement outcome j. Suppose that i ranges from 1, . . . , n A and j from 1, . . . , n B . Define a vector of effects g = (u, e 1 , . . . , e n A , f 1 , . . . , f n B ), and denote the entries g 1 = u, g 2 = e 1 , . . . , g 1+n A +n B = f n B . Define the (1 + n A + n B ) × (1 + n A + n B ) matrixγ such thatγ kl = (g k ⊗ g l )(ω AB ). From the fact that ω AB is an inner product state, it follows directly thatγ is a symmetric and positive semi-definite matrix [29]. Now define a matrix γ of the form (28), with γ kl =γ kl for all k, l except for the following elements of the sub-matricesQ andR: By construction, γ satisfies conditions (i)-(v) of Definition 7, and symmetry of γ follows from symmetry ofγ. It remains to show that γ is positive semi-definite.
To this end, let δ = γ −γ and note that δ is of the form where δ Q is an n A × n A sub-matrix, δ R is an n B × n B sub-matrix, and0 is the n A × n B matrix with all entries 0. Since both γ andγ are symmetric, δ is also symmetric. We will show that δ Q and δ R are positive semi-definite. It follows that δ is positive semi-definite. Since γ = δ +γ, it follows that γ is also positive semi-definite. Note that (δ Q ) ii ′ = 0 for x(i) = x(i ′ ). It follows that δ Q is block diagonal, with each block corresponding to a particular measurement choice of Alice. Consider a particular block, corresponding to a measurement with, say, r outcomes. It is of the form Using e 1 + · · · + e r = u, this matrix can be decomposed into a sum of (r 2 − r)/2 matrices where all entries of the matrices M mn are 0, except for Each M mn is manifestly positive semi-definite, hence M is positive semi-definite. Since each block of δ Q is positive semi-definite, δ Q is also positive semi-definite. A similar argument shows that δ R is also positive semi-definite. Therefore δ and γ are positive semi-definite. This concludes the proof.

Corollary 9.
Consider two systems, whose joint state is of the form ω AB = (1 ⊗ τ )(σ AB ), where τ : V + → V + takes normalized states to normalized states and σ AB is an inner product state. All correlations obtainable from measurements on ω AB lie in Q 1 .
Proof. This is immediate from theorem 8 and theorem 4.
Theorem 6 then implies that all correlations from bipartite classical and quantum states lie in Q 1 . This was known already of course from Refs. [25,26]. One could view the theorem and corollary as an independent proof of this fact.

Polygons revisited
It has already been observed that given two n-vertex polygon systems, the maximally entangled state φ AB , defined in section 3.2, is an inner product state if and only if n is odd. Theorem 8 states that correlations obtained from measurements on an inner product state lie in the set Q 1 , which means in particular that they respect Tsirelson's bound for the CHSH inequailty. This explains why Tsirelson's bound is satisfied by the odd n polygon systems, and is consistent with violation of Tsirelson's bound by the even n polygon systems.
This section relates these observations to simple geometrical properties of the state spaces of polygon systems. A quick glance at figures 1 and 2 reveals an obvious difference between the odd n and even n cases. For odd n, the effect cone V * + coincides with the state cone V + . For even n on the other hand, the effect cone is isomorphic to the state cone, but is rotated by some non-zero angle. This simple observation lies at the heart of why it is only the maximally entangled states of odd n polygon systems that are inner product states, and hence why it is only these that must satisfy Tsirelson's bound.
The fundamental difference between the odd n and even n state spaces can be stated more formally as follows. First Definition 10 (weakly self-dual). A system is weakly self-dual iff the state and effect cones are isomorphic.
All of the polygon state spaces are weakly self-dual. The isomorphisms are simply the rotations and improper rotations around the z axis by (1 + 2k)π/n, k ∈ {0, . . . , n − 1} if n is even and by 2kπ/n, k ∈ {0, . . . , n − 1} if n is odd.
The odd n polygon state spaces, on the other hand, satisfy a stronger condition, whereby there are additional restrictions on the isomorphism connecting V * + and V + . Definition 11 (strongly self-dual). A system is strongly self-dual iff there exists an isomorphism T : Given the representation of sections 3.1 and 3.2, the identity map is an example of such an isomorphism. The odd n polygon state spaces are strongly self-dual, but the even n are not.
The concepts of strong and weak self-duality have appeared earlier in the literature, for example in Ref. [28]. Weak self-duality is intimately related to the operational tasks of probabilistic remote state preparation (steering) and teleportation [15,28]. Now we can relate these properties of individual systems to the bipartite maximally entangled state φ AB . Notice that given two similar systems, any isomorphism T : The state defined is normalized by construction and is locally positive since 0 ≤ f [T (e)]/u[T (u)] ≤ 1 for all e, f ∈ E(Ω). Intuitively, ω AB T is defined so that if Alice performs a measurement and obtains outcome e, then Bob's unnormalized collapsed state, conditioned on that outcome, is T (e).
In the special case that the individual systems are strongly self-dual and the isomorphism T has the additional properties required by definition 11, then the induced state ω AB T is symmetric and positive semi-definite, hence it is an inner product state. This is the case for the maximally entangled state φ AB of odd n polygon systems, defined in (9), where φ AB corresponds to a map T which is simply the identity map. It follows that for odd n, correlations from φ AB lie in Q 1 .
In the case that individual systems are weakly but not strongly self-dual, the maximally entangled state corresponds to an isomorphism T , but there is no such T with the additional properties of symmetry and positive semi-definiteness, hence the maximally entangled state is not an inner product state. This is the case for the maximally entangled state φ AB of the even n polygon systems, defined in (9), where φ AB corresponds to a map T which is a rotation in R 3 by π/n. This is why for even n, correlations from φ AB need not lie in Q 1 .

Correlations outside of Q 1
Correlations obtained from the maximally entangled state of two odd n polygon systems must be contained in Q 1 , and this has been seen to be related to the fact that the individual systems are strongly self-dual. It is natural to ask whether the correlations obtained from any joint state of strongly self-dual subsystems must also lie in Q 1 . An explicit counterexample shows that this is not the case.
Consider a strongly self-dual system with normalized extremal states and normalized ray extremal effects The state space for this system looks something like a house and is depicted in figure 4.
We have explicitly calculated all extremal states in the maximal tensor product of two such systems. One of these joint states can be written as where we have used the same representation as a 3 × 3 matrix that was introduced in section 3.2. This state is extremal in the maximal tensor product, but is not an inner product state. With a suitable choice of measurements, correlations can be produced which violate Uffink's quadratic inequality [30] In particular the measurement choices give However, satisfaction of Uffink's inequality is known to be a necessary condition for membership of Q 1 [31]; hence these correlations cannot lie in Q 1 . Although these correlations violate Uffink's inequality and lie outside of Q 1 , they do not violate Tsirelson's bound for the CHSH inequality. In fact, we have not been able to find a joint state of two strongly self-dual subsystems that violates the CHSH inequality beyond Tsirelson's bound. This leads us to conjecture that Tsirelson's bound holds for every theory with strongly self-dual subsystems.

Discussion
One way of viewing the difference between classical and quantum systems is that the structure, or shape, of the space of possible states of a system is different. For example in the case of a classical trit, the state space is the space of probability distributions over trit values, which is geometrically a triangle. In the case of a qubit, the state space is the Bloch ball. This work considers a very general setting in which a whole range of probabilistic models can be defined, with the classical and quantum theories as special cases. There is little constraint on the state space, except that it is assumed to be convex, and joint systems are assumed to satisfy a no-signalling principle and a principle of local tomography. The aim is to investigate the nonlocal correlations that can be produced by measurements on entangled systems in these models, and to compare and contrast with the classical and quantum cases.
The main theorem, with its corollary, states that correlations from a broad class of bipartite states in probabilistic theories cannot be arbitrarily nonlocal -they are constrained to obey the principle of macroscopic locality, or equivalently to lie within the set Q 1 , which means in particular that they satisfy Tsirelson's bound for violation of the CHSH inequality. This theorem extends to all bipartite quantum states, which explains why quantum mechanics cannot violate macroscopic locality or Tsirelson's bound.
The work has also revealed an intimate and intricate relationship between the shape of the state space for an individual system, and the strength of the nonlocal correlations that can be obtained from two systems in an entangled state. This is illustrated by a family of models, in each of which the state space for a single system is a regular polygon with n vertices. Given two such systems, there is an analogue of a maximally entangled state. It turns out that the strength of nonlocal correlations generated by this state depends dramatically on the parity of the number of vertices n of the local polygon. If n is even, maximally nonlocal correlations can be generated, including those that violate macroscopic locality. If n is odd, however, the maximally entangled state respects macroscopic locality. This is in turn explained by the fact that odd n polygons have a geometric property known as strong self-duality, while even n polygons do not.
It would be natural to think that all bipartite states of strongly self-dual subsystems would respect macroscopic locality, but the house-shaped counterexample shows that this is not the case. An interesting open question, therefore, is the following: What additional property of local state spaces would ensure that all bipartite states give correlations which respect macroscopic locality? One suggestion is the constraint that for any ray extremal effect, there is a unique state on which this effect will occur with certainty. This property is very attractive from a physical point of view. It allows a natural definition of the post-measurement states of these effects, such that repeating a measurement reproduces the same outcome. This extra constraint is indeed not satisfied by the house model, since the effect e 1 occurs with certainty for both states ω 1 and ω 5 , but it is satisfied by odd n polygon models. Another possibility that seems to be plausible is that strong self-duality together with the property that all extremal states of the local systems can be transformed into one another reversibly might limit the set of possible correlations to the ones compatible with macroscopic locality.
Finally, it is worth emphasizing that two theories which have almost identical local state spaces can lead to dramatically different nonlocal correlations. In particular, given any finite level of accuracy, it is always possible to find a polygon model with an even and sufficiently large number of vertices n, which is locally indistinguishable from the quantum-like case, where the state space is a disc. Nevertheless, while quantum correlations are restricted, any non-signalling correlations can be distilled in the former model by using multiple copies of the maximally entangled state. Table A2. Analytical expression for the maximal CHSH-violation of polygon boxes x ∆α 1 ∆β 0 ∆β 1 S