Hyperdense coding and superadditivity of classical capacities in hypersphere theories

In quantum superdense coding, two parties previously sharing entanglement can communicate a two bit message by sending a single qubit. We study this feature in the broader framework of general probabilistic theories. We consider a particular class of theories in which the local state space of the communicating parties corresponds to Euclidean hyperballs of dimension n (the case n = 3 corresponds to the Bloch ball of quantum theory). We show that a single n-ball can encode at most one bit of information, independently of n. We introduce a bipartite extension of such theories for which there exist dense coding protocols such that log_2 (n+1) bits are communicated if entanglement is previously shared by the communicating parties. For n>3, these protocols are more powerful than the quantum one, because more than two bits are communicated by transmission of a system that locally encodes at most one bit. We call this phenomenon hyperdense coding. Our hyperdense coding protocols imply superadditive classical capacities: two entangled systems can encode log_2 (n+1)>2 bits, even though each system individually encodes at most one bit. In our examples, hyperdense coding and superadditivity of classical capacities come at the expense of violating tomographic locality or dynamical continuous reversibility.

In quantum superdense coding, two parties previously sharing entanglement can communicate a two bit message by sending a single qubit. We study this feature in the broader framework of general probabilistic theories. We consider a particular class of theories in which the local state space of the communicating parties corresponds to Euclidian hyperballs of dimension n (the case n = 3 corresponds to the Bloch ball of quantum theory). We show that a single n-ball can encode at most one bit of information, independently of n. We introduce dense coding protocols in which log 2 (n + 1) bits are communicated if entanglement is previously shared by the communicating parties. For n > 3, these protocols are more powerful than the quantum one, because more than two bits are communicated by transmission of a system that locally encodes at most one bit. We call this phenomenon hyperdense coding. Our hyperdense coding protocols imply superadditive classical capacities: two entangled systems can encode log 2 (n+1) > 2 bits, even though each system individually encodes at most one bit. In our examples, hyperdense coding and superadditivity of classical capacities come at the expense of violating tomographic locality or dynamical continuous reversibility.

I. INTRODUCTION
Classical information can be encoded in quantum systems and reliably recovered. One of the founding results in quantum information science [1] is the Holevo theorem [2]. It implies that, fundamentally, the classical capacity of N qubits is N bits: N qubits can perfectly encode N classical bits, but no more. It follows that classical capacities are additive in quantum theory. Though this seems quite natural -it would be strange that by combining two systems that locally store one bit more than two bits could be encoded -there exist, due to quantum entanglement, quantum channels whose capacities to communicate classical [3] or quantum [4] information can be superadditive.
Entanglement is also responsible for other counterintuitive aspects of the communication properties of quantum systems. In particular, it is at the basis of superdense coding [5], one of the fundamental protocols of quantum information theory. Though one qubit can locally encode at most one classical bit, in a superdense coding protocol a two bit message can, surprisingly, be communicated by the transmission of a single qubit with the aid of previously shared entanglement.
Though the use of entanglement therefore provides an advantage over purely local protocols, it has a limited communication power because no more than two bits can be communicated by transmission of a qubit independently of the amount of entanglement that the communicating parties share. The amount of quantum information that a transmitted quantum system can communicate is fundamentally limited by its Hilbert space dimension [6]. This guarantees in particular that no more than 2N bits can be communicated in quantum superdense coding using a pair of entangled systems whose individual classical capacity is N bits. A hypothetical protocol violating this quantum limit, which we denote a hyperdense coding protocol in the following, would imply a violation of the additivity property mentioned above.
Our motivation in this paper is to understand better the physical principles underlying the limited power of quantum superdense coding and the additivity of classical capacities. Could one conceive theories with hyperdense coding and superadditive classical capacities? How would such theories differ from quantum theory? What physical principles would they violate that are obeyed by quantum theory? Can the additivity of classical capacities and the inexistence of hyperdense coding be considered as conditions for physically sensible theories?
We consider a particular class of GPTs in which the state space of the communicating systems corresponds to an Euclidean ball of arbitrary dimension n ∈ N. These hypersphere theories, as we call them here, have been previously investigated [13,16,19,[32][33][34][35][36][37]. They have important physical motivations. The cases n = 1 and n = 3 correspond locally to the classical and the quantum bit, respectively. The local state space of a generalized bit can be deduced to be an n−ball from physically motivated axioms [19], which include a weak version of information causality [38].
We show that for arbitrary n, a single n−ball can encode at most one bit of information. We then show that, if the space of entangled states is constructed in an ap-propriate way, there exists a superdense coding protocol whose communication capacity is log 2 (n + 1) bits. Thus, for n > 3, this protocol is more powerful than the quantum one: it is hyperdense coding. Our hyperdense coding protocol imply that by using entangled states, two hypersphere systems can be used to encode log 2 (n + 1) bits, thus achieving superadditive classical capacities when n > 3.
We then turn to the interpretation of these results. We find a connection between superadditivity of classical capacities and hyperdense coding with two of the physical conditions imposed on GPTs in various derivations of finite dimensional quantum theory: continuous reversibility and tomographic locality [12,13,16,19].
This paper is organised as follows. We briefly introduce in Section II the formalism of generalised probabilistic theories and in Section III the specific class of hypersphere theories that we consider here. In Section IV, we investigate different communication scenarios in the context of these theories. We introduce a protocol for hyperdense coding, which implies the superadditivity of classical capacities, as well as for teleportation and entanglement swapping. We conclude with a discussion of the physical implications of our results. In particular, we present in the Appendices several additional results relating the (in)existence of hyperdense coding and superadditivity of classical capacities with physical properties such as tomographic locality and dynamical reversibility.

A. States and effects
In general probabilistic theories (GPTs), the space of unnormalised states is a proper cone C ⊂ R n+1 . The space of unnormalised effects is the dual cone of C: C * ≡ {e ∈ R n+1 |e · ω ≥ 0, ∀ω ∈ C}. The unit effect u is an interior point of C * . A measurement is a set of effects that sum to the unit: M = {e i ∈ C * | i e i = u}. An unnormalised state ω ∈ C has positive scalar product with the unit effect, u · ω ≥ 0, while a normalised state ω ∈ C has unit scalar product with the unit effect, u · ω = 1. Given normalised state ω and measurement M , the probability of outcome i is p(i|M ω) = e i · ω, where '·' denotes the Euclidean inner product.
For any GPT, without loss of generality we can take the unit to have the form u = 1 0 , where 0 is the null vector in R n . The space of normalised states can then be written as Ω ≡ {ω r ≡ 1 r |r ∈ R ⊂ R n } with R a convex set. It is the convex hull of its extremal points, the pure states, which cannot be expressed as convex combinations of other states. The states in Ω that are not pure are called mixed.
The unnormalised states can be expressed as ω ′ = λω with λ ≥ 0 and ω ∈ Ω being a normalised state. The zero state is the null vector 0 ∈ R n+1 . The space of physical states is often defined as the set of unnormalised states ω ′ such that u · ω ′ ≤ 1, which is the convex hull of the zero state and the set of normalised states Ω. An unnormalised state corresponds to a preparation in which a particular outcome is obtained with a probability smaller than 1. By considering all the outcomes obtained in a state preparation, we can consider that a state is always prepared with unit probability, hence, we can restrict the space of physical states to the set of normalised states Ω. Here, we only consider normalised states Ω.
The space of physically realisable effects will be noted E. Any measurement corresponds to a set of effects belonging to E. We assume here that the set of all normalised effects is observable, that is, E = {e ∈ C * |e · ω ≤ 1, ∀ω ∈ Ω}, which is convex. It can be argued that this does not need to hold in general, that E could as well be restricted to a proper subset of the normalised effects [26].

B. Bipartite systems
Let Ω A ⊂ R nA+1 , Ω B ⊂ R nB +1 be the state spaces of the systems A and B, and Ω AB their joint state space. In general, Ω AB can be defined arbitrarily. Unless otherwise stated, in the rest of this work we assume that a state φ ∈ Ω AB defines the outcome probabilities for each pair of effects e A ∈ E A and e B ∈ E B , and that it satisfies two natural physical conditions: the no-signalling principle and tomographic locality [12,20,21]. The no-signalling principle states that the outcome probabilities for any measurement performed on the system A are independent of what measurement is performed on the system B and vice versa. Tomographic locality states that all states φ ∈ Ω AB are characterized by the outcome probabilities of local measurements performed on A and B.
It follows [20,21] from these conditions that Ω AB ⊂ R nA+1 ⊗ R nB +1 and that is the maximal tensor product. Similar definitions can be given for the space of effects: The minimal tensor product Ω A ⊗ min Ω B includes all the separable states but does not contain any entangled states. The maximal tensor product Ω A ⊗ max Ω B contains all states that are consistent with the no-signalling principle and that give valid probabilities to all local measurements. The states in We define the unit effect of Ω AB as u AB ≡ u A ⊗ u B . It follows that all states φ ∈ Ω AB can be expressed as where a ∈ R nA , b ∈ R nB and C ∈ R nA ⊗ R nB . Similarly, all effects E ∈ E AB can be expressed as matrices where γ ∈ R, α ∈ R nA , β ∈ R nB and Γ ∈ R nA ⊗ R nB . The probability of obtaining outcome given by effect E if the state is φ is E · φ = Tr(E t φ), where 't' denotes transposition. The unit effect u AB is a matrix (2) with γ = 1 and all other entries zero. The reduced states must satisfy ω A ∈ Ω A and ω B ∈ Ω B . Thus, it is required that a ∈ R A and b ∈ R B .

C. Dynamics
A GPT specifies the state spaces Ω A , Ω B and Ω AB for all physical systems A and B. It also specifies the set of allowed transformations on the state spaces. We adopt the following consistency condition as the minimal physical condition that the allowed transformations must satisfy [20]. The set of allowed transformations T where Ω AB is the joint state space of A and any system B, and I B is the identity map on B.
In general, the allowed transformations can be represented as linear maps [20]. Thus, we express any transformation T A ∈ T A as a matrix in R nA+1 ⊗ R nA+1 .
We notice that according to the conditions given above, the allowed transformations must be normalisationpreserving, that is, normalised states are transformed into normalised states. It is often considered that transformations T can output an unnormalised state ω ′ = T ω, with ω ∈ Ω such that 0 ≤ u · ω ′ < 1. These are normalisation-decreasing transformations and correspond to obtaining an outcome with probability less than 1. However, by considering all the possible outcomes, the transformations reduce to normalisation-preserving ones. Here, we only consider normalisation-preserving transformations, as given by the consistency condition.

III. HYPERSPHERE THEORIES
We introduce in this section a family of GPTs that we call hypersphere theories. These theories have been studied before [13,16,19,[32][33][34][35][36][37]. The state space of single systems in hypersphere theories (HSTs) is defined as the unit ball of dimension n, Ω ≡ ω r ≡ 1 r r ∈ R with R ≡ r ∈ R n r ≤ 1 , for which the set of pure states r = 1 defines a hypersphere. The extremal effects are e m = 1 2 1 m with m ∈ R n and m = 1, defining also a hypersphere. Each unit vector m = 1 defines a canonical measurement: M m = {e m , e −m }.

A. Bipartite systems
We now introduce a particular extension of HSTs to two systems that, as we show in the next section, has hyperdense coding, superadditive classical capacities, teleportation and entanglement swapping.
Furthermore, it follows easily from (3) that D N is closed under the element-wise product: . Thus D N is a group under the element-wise product, with d 0 being the identity element. Moreover, the elements of D N satisfy: Let us now define the set Ω AB of bipartite states for systems A and B that are locally described by HST systems of dimension n = 2 N − 1. We take to be the convex hull of the product states and the discrete set of entangled states S N ≡ {φ µ } µ∈{0,1} N defined as the set of diagonal matrices whose entries are given by (φ µ ) ν,ν ′ ≡ δ ν,ν ′ (d µ ) ν , where ν, ν ′ ∈ {0, 1} N . The corresponding reduced states for systems A and B are the completely "mixed" states We introduce a discrete set of entangled effects This set defines a measurement as follows from (5), (7) and (8). The probability to get outcome µ when measuring the entangled state φ µ ′ is as can be deduced from (6) - (8). Finally, we define the allowed local transformations as the convex hull of the discrete set T N ≡ {T µ } µ∈{0,1} N where Note that T 0 = I is the identity in R 2 N . Furthermore, from (4), (7) and (12), we have for all µ, µ ′ ∈ {0, 1} N . Thus, we have that T N is a group under matrix multiplication. Since the vectors d µ have only ±1 entries with an even number of −1 entries for N ≥ 2, it follows from (7) and (12) It follows thatT

B. Consistency of the above definitions
We show that the above definitions make sense in the GPT framework considered here, i.e., that the set Ω AB of bipartite states is contained in the maximal tensor product Ω A ⊗ max Ω B , that the set of effects F N is contained in the space of normalized effects E AB of Ω AB , and that the local transformations T N are consistent in the sense that T µ : Ω A → Ω A and (T µ ⊗ I B ) : Ω AB → Ω AB for all T µ ∈ T N . Additionally, we show that tomographic locality is satisfied.
In addition, tomographic locality is satisfied. It is straightforward to see that the entries (φ) ν,ν ′ of the states φ ∈ Ω AB are determined by the local measurements on systems A and B, respectively, where v ν ∈ R 2 N −1 is a vector whose νth entry equals unity and whose other entries are zero.
Finally, we verify that F N ⊂ E AB . We need to show that, for any E µ ∈ F N , we have: We show i). From (8), (12) and (14), we have We show ii). This is implied by (11).

A. Classical capacities
Consider a situation where Alice sends Bob a classical message x with probability p(x) by encoding it in a GPT state ω x and where Bob decodes the message using a decoding measurement M = {e y ∈ E| y e y = u}. Given that the message x was sent, Bob obtains the outcome y with probability p(y|x) = e y ·ω x . The mutual information quantifies the amount of classical information that is transmitted through such a protocol.
Classical capacity of a GPT. The classical capacity χ C (Ω) of a GPT with state space Ω is the maximum of I(X : Y ) over all probability distributions p(x), encoding states ω x ∈ Ω, and decoding measurements M .
The classical capacity of d-dimensional classical theory (a classical dit) is log 2 d. Similarly, the classical capacity of a qudit is log 2 d, as follows from the Holevo bound [2]. The following proposition allows one to put upper bounds on the classical capacities of GPTs. It will prove useful below.
r r ∈ R ⊂ R n and unit effect u ≡ 1 0 . Assume that any effect e ∈ E can be expresed as e = γ 1 m , with γ ∈ [0, 1] and m ∈ M ⊂ R n . Then the classical capacity of the GPT Ω is bounded by Proof. Let X be the random variable corresponding to messages x, chosen with probability p(x) = p x , encoded in states ω x ≡ 1 rx , where r x ∈ R. Let Y be the random variable of the measurement outcomes y corresponding to effects e y ≡ γ y The outcome probabilities are The condition p(y|x) ≥ 0 implies that for γ y > 0. Notice that the case γ y = 0 corresponds to probabilities p(y|x) = 0 which have null contributions to the Shannon entropies H(Y |X) and H(Y ), and hence to the mutual information I(X : Y ). Thus, without loss of generality we consider that γ y > 0, for which (19) holds.
From the definition of the mutual information I(X : (18), it is straightforward to obtain the expression where andr ≡ x p x r x . Let D ≡ max rx∈R,my∈M log 2 (1 + m y · r x ) . We have from (21), the definition of D, and (17), that Denote z y = 1+m y ·r. We have z y ≥ 0 and y γ y z y = 1 with γ y ≥ 0 a probability distribution since y γ y = 1. Hence we have where we have used convexity of x log 2 x for x ≥ 0.
Proposition 2. The classical capacity of hypersphere theories is equal to 1 bit.
Proof. It is immediate to check that the channel in which Alice prepares pure states ω r and ω −r , with r = 1, each with probability 1 2 , and Bob carries out measurement , has capacity 1 bit. Hence the capacity of hypersphere theories is at least 1 bit.
The converse follows from Proposition 1, as in the case of hypersphere theories, we have M = R = 1.
Superadditivity of classical capacities. The classical capacity The classical capacities of classical and quantum theory are additive: . We show below that the hypersphere theories defined in section III have superadditive classical capacities.

B. Dense coding
The classical capacity of a GPT is the maximum classical information that can be transmitted without the assistance of previously shared resources. The classical capacity can sometimes be enhanced by the use of previously shared entanglement, as in the following general dense coding protocol.
In a dense coding protocol, Alice and Bob initially share a bipartite state φ ∈ Ω AB . Alice chooses a message x with probability p x from a finite set and applies the local transformation T x ∈ T A on her system A. After Alice's operation, the state is transformed into φ x ≡ T x φ. Alice sends Bob her system. After receiving system A, Bob applies a measurement on the composite system AB, defined by the set of effects Note that following our notation given by (1) and (2), E y and φ x are real matrices, T x φ denotes matrix multiplication, and E y · φ x = Tr(E t y φ x ), where 't' denotes transposition.
Dense coding capacity of a state. The dense coding capacity where the max is taken over all dense coding protocols in which the initially shared state is φ and where I(X : Y ) is the mutual information between x and y.
Dense coding capacity of a GPT. The dense coding capacity χ DC (Ω AB ) of a GPT with state space Ω AB is the maximum of χ DC (φ) over all states φ ∈ Ω AB .
Note that we trivially have the following inequality, since a dense coding protocol can be viewed as a communication protocol in which the states T x φ are prepared: Proposition 3. The classical and dense coding capacities of a GPT with bipartite state space Ω AB satisfy The terms 'dense coding' and 'superdense coding' are usually treated as synonyms in quantum theory. We use here the terminology 'superdense coding' as follows.
Superdense coding. A superdense coding (SDC) protocol is a dense coding protocol implemented with a state φ ∈ Ω AB whose capacity satisfies We prove now that superdense coding requires some type of entangled states, hence, it is impossible in classical probabilistic theories.
Proof. Suppose that Ω AB = Ω A ⊗ min Ω B and consider the dense coding protocols described above. The initial state φ ∈ Ω AB shared by Alice and Bob is separable . Convexity of the mutual information implies that I(X : Y ) ≤ z q z I z (X : Y ) where I z (X : Y ) is the mutual information between x and y when the initial state is ω z,x ⊗ ω ′ z . Hence we can take the initial state to be a product state φ = ω ⊗ ω ′ , and drop the label z. We can write y } y constitues a measurement on Ω A (i.e. e ′ y · ω ≥ 0 ∀ω ∈ Ω A and y e ′ y = u A , which are implied by the fact that 0 ≤ E y · φ for all φ ∈ Ω AB and y E y = u AB ). Therefore p(y|x) = e ′ y · ω x is a probability distribution that can be obtained by a measurement on Ω A . Therefore χ DC (Ω AB ) ≤ χ C (Ω A ), hence, there cannot be superdense coding.
Intuitively, one also expects that superdense coding requires entangled effects, that is, that superdense coding It is easy to show that superdense coding is impossible if Bob's measurement is a convex combination of product measurements: Proof. Let Bob's measurement correspond to effects with {e being measurements on systems A and B, respectively. We show that in this case there cannot be superdense coding, that is, we show that I(X : Y 1 , Y 2 ) ≤ χ C (Ω A ), where I(X : Y ) is the mutual information between x and y.
Convexity of the mutual information implies that I(X : is the mutual information between x and y for the measurement E y2 . Hence, we can drop the index z, and consider a product measurement E y1,y2 = e y1 ⊗ f y2 . The probability of outcome y 1 , y 2 on state φ x = T x φ is p(y 1 , y 2 |x) = (e y1 ⊗ f y2 ) · φ x . The no-signalling principle implies that p(y 2 |x) = p(y 2 ), hence, p(y 1 , y 2 |x) = p(y 2 )p(y 1 |y 2 , x). Therefore, I(X : Y 1 , Y 2 ) = I(X : We have that p(y 1 |y 2 , x) = p(y 2 |x) −1 p(y 1 , . This is because p(y 2 |x) = y1 p(y 1 , y 2 |x) = (u A ⊗ f y2 ) · φ x and p(y 1 , y 2 |x) = (e y1 ⊗ f y2 ) · φ x = e y1 · (φ x f y2 ). We also have that ω x,y2 ∈ Ω A because i) for any effect e ∈ E A it holds that e · ω x,y2 ≥ 0, due to the fact that (e ⊗ f y2 ) · φ x ≥ 0, and ii) u A · ω x,y2 = 1, as follows from the definition of ω x,y2 . Thus, we have I(X : Note that if E AB = E A ⊗ min E B , the most general measurements do not consist only of convex combinations of product measurements as above, but also include measurements in which each effect is a convex combination of product effects: y2 } do not necessarily need to define independent measurements, i.e., we do not need to have y1 e (z) ). An example of a measurement of this sort that cannot be viewed as a convex sum of product measurements is given in quantum theory by the phenomenon of quantum nonlocality without entanglement [39]. We leave it as an open question whether superdense coding is possible with this class of measurements. A step towards proving this generalization of Proposition 5 is the fact that superdense coding is impossible if Ω AB consists of a PR box [20,40].

C. Hyperdense coding
Quantum theory allows superdense coding, since e.g. for the maximally entangled state φ of two qubits, where Ω A is the state space of single qubits [5]. However, superdense coding is limited in quantum theory as χ DC (φ) ≤ 2χ C (Ω A ) for any Ω A and φ ∈ Ω AB . We call hyperdense a dense coding protocol overcoming this quantum limitation.
Note that if χ C (Ω A ) = χ C (Ω B ), for example if Ω A = Ω B , then hyperdense coding implies superadditive classical capacities: , as follows from Proposition 3 and the given definitions. Proposition 6. Hyperdense coding is impossible in quantum theory.
Proof. We give two simple arguments within quantum theory. First, the convexity of the mutual information [41] and the convex decomposition of mixed states into pure states imply that the dense coding capacity χ DC (Ω AB ) is achieved by pure states. Furthermore, in the case that the state φ ∈ Ω AB initially shared by Alice and Bob is pure and Alice's quantum system A has Hilbert space dimension d, the Schmidt decomposition of pure states [1] implies that the dimension of the Hilbert space φ that is accessible by Alice's transformations on A is no greater than d 2 . Thus, we have that For the second alternative argument, consider classical systems C and D, initially uncorrelated to Alice's and Bob's joint quantum system AB, that record Alice's preparation x and Bob's measurement outcome y, respectively. The principle of quantum information causality [6] states that the quantum mutual information between system C and the joint system ABD, after Alice's transmission of a qudit A satisfies the bound I Q (C : ABD) ≤ 2 log 2 d, independently of how big the Hilbert space dimension of Bob's system B might be. It is easy to see from the data-processing inequality [1] that I(X : Y ) ≤ I Q (C : ABD), from which follows that χ DC (Ω AB ) ≤ 2 log 2 d = 2χ C (Ω A ).

D. Hyperdense coding in hypersphere theories
We now introduce a dense coding protocol in hypersphere theories of dimension n = 2 N − 1 which is hyperdense for N > 2. Alice and Bob initially share the entangled state φ 0 = I ∈ S N ⊂ Ω AB , which is the identity matrix in R 2 N ⊗ R 2 N , as defined by (7). We consider messages x ∈ {0, 1} N . To encode x, Alice applies the local transformation T x ∈ T N given by (12), which encodes the message x ∈ {0, 1} N . From (12) and (13), the joint state transforms into T x φ 0 = φ x ∈ S N . Alice sends the system A to Bob. Bob applies the measurement M = {E y } y∈{0,1} N given by (9) on the joint system AB. The probability that Bob obtains outcome y when Alice encodes the message x is p(y|x) = E y · φ x = δ y,x . Thus, Bob decodes Alice's N bit message perfectly. This shows that Since the classical capacity of individual systems in HSTs is 1 bit, this provides an example of hyperdense coding for N > 2. By taking N → ∞, Bob can learn an arbitrarily large amount of information by receiving a system that locally can only encode up to one bit. The above dense coding protocol can be turned into a classical communication protocol in which Alice encodes message x in the state φ x of two HST systems, and Bob decodes the message perfectly using measurement M = {E y } y∈{0,1} N . Hence (see Proposition 3) Even though each system has capacity of 1 bit, together they have capacity of at least N bits, with arbitrary N ∈ N. Thus, in the theories defined above, the classical capacities are superadditive. The upper bound of 2N (the left hand side of (27)) follows from the fact that the classical capacity of any GPT is bounded by the log 2 of the dimension of the state space [28], which in the case of entangled states for a pair of HST systems, as defined above, is 2N . Thus, the gap we exhibit between single system classical capacity and two system classical capacity is close to optimal.

E. Teleportation and entanglement swapping in hypersphere theories
Quantum teleportation [42] and entanglement swapping [43] are fundamental protocols of quantum information theory. We show how they can also be realized in the context of hypersphere theories considered here. We notice that teleportation and entanglement swapping in GPTs have been studied before [20,22,40]. In particular, it has been shown that a PR box does not admit teleportation or entanglement swapping [20,40]. General conditions on GPTs to support teleportation are given in [22].
We consider the following protocol. Alice has a system A ′ in a pure state ω a ∈ Ω A ′ that she wants to teleport to Bob's location. To do so, Alice and Bob initially share an entangled state φ 0 ∈ S N ⊂ Ω AB given by (7) in the systems A and B, at Alice's and Bob's locations, respectively. Alice applies the measurement defined by the entangled effects E x ∈ F N ⊂ E AB given by (9) on her joint system A ′ A and obtains the outcome x with probability p x = 2 −N , for x ∈ {0, 1} N . Alice sends Bob her outcome. Bob applies the correction operation T x ∈ T N given by (12) on his system B and obtains, as we show below, the teleported state ω a on B with unit probability. From the linearity of the theory, this protocol works too if Alice's input state ω a is mixed. In particular, the system A ′ can be in an entangled state ϕ ∈ Ω A ′ C with another system C, leading to entanglement swapping.
Let us now show that these protocols for teleportation and entanglement swapping work as claimed.
To do so, we first define a consistent state space for a tripartite system: where Ω AB ≡ convex hull {Ω A ⊗ min Ω B , S N } and S N ≡ {φ µ } µ∈{0,1} N , as before. We define the four party state space in a similar way, We notice that the three and four party state spaces are symmetric under any permutation of the systems, all product states are included, and the set of bipartite entangled states S N is included in any bipartition. It follows that the set of global effects E A ′ ABC includes all the product effects E ⊗E ′ , where E ′ ∈ E BC and E ∈ E A ′ A , in particular for the bipartite entangled effects E, E ′ ∈ F N , for any permutation of the systems A ′ , A, B and C.
The state ω a is successfully teleported to system B if, for any measurement on B, the outcome probabilities are those predicted by ω a . In our notation, this means that for any effect e y ∈ E B , where the label 'tel' denotes that these are Bob's outcome probabilities after performing the teleportation protocol given above. We show that the probability that Alice and Bob obtain respective outcomes corresponding to the effects Since the probability that Alice obtains outcome x is p x ≡ y p tel (x, y) and since y e y = u, we have from (29) that p x = 2 −N . Thus, we have p tel (y|x) ≡ 1 px p tel (x, y) = e y · ω a , as given by (28).
We show (29). Since the states φ x and the local transformations T x are diagonal matrices, as given by (7) and (12), we have from (12) and (13) that φ 0 T t x = φ x . Using the indices i, j and k for the systems A ′ , A and B, respectively, we obtain from (29) that where in the second line we used the definition for the effects E x given by (8), in the third line we used that the states φ x are diagonal matrices, as given by (7), and in the last line we used that the entries of φ x are 1 or −1, as given by (3) and (7). The introduced teleportation protocol can easily be extended to entanglement swapping. Let Alice's system A ′ be initially in an entangled state ϕ ∈ Ω A ′ C with a system C, held by Charlie. As in the teleportation protocol, Alice and Bob share the entangled state φ 0 ∈ Ω AB in systems A and B. Then, Alice and Bob perform the teleportation protocol described on the tripartite system A ′ AB.
The entangled state ϕ is successfully swapped from systems A ′ C to systems BC if the outcome probabilities of any measurement on the joint system BC are those predicted by the state ϕ. The probability that Alice obtains the outcome E x ∈ E A ′ A on the system A ′ A and Bob obtains (by collaborating with Charlie) the outcome E ′ y ∈ E BC on the system BC, after he applies the correction T x on B, is where p x ≡ y p swap (x, y) > 0. This follows straightforwardly as in (30), with p x = 2 −N .

V. DISCUSSION
Here we have introduced hyperdense coding: superdense coding in which more than two bits of information can be communicated by transmission of a system that locally encodes at most one bit. We have presented dense coding protocols in the context of hypersphere theories, in which single systems are described by an Euclidean ball of dimension n. Our protocols are hyperdense when n > 3.
It is well known that if one imposes a sufficient set of axioms to GPTs then one recovers classical or quantum theory [12,13,15,16,19]. Therefore, the theories we have introduced must violate at least one of these axioms. The theories we introduced violate continuous reversibility. Continuous reversibility is an important physical condition that has been imposed to the framework of GPTs in several derivations of finite dimensional quantum theory [12,13,16,19]. Continuous reversibility states that for every pair of pure states there exists a continuous reversibile transformation that transforms one into the other [12].
The theories that we introduced not only violate continuous reversibility, but already simply reversibility. Indeed, the set of transformation acting on a single system is discrete, see Eq. (12), while the set of local pure states is continuous, given by a hypersphere. Thus, not every pair of pure states describing a single system is connected by a transformation.
Note further that if we modify the group of transfor-mationsT for the theories given above to beT = SO(n) so that continuous reversibility is satisfied (at least for local systems), then the space of entangled states must be modified and the hyperdense coding protocols we introduced no longer work. In fact, we obtain in this case that less than one bit is communicated if n = 3, while 2 bits are communicated if n = 3, which corresponds to the Bloch ball (see details in Appendix A). More generally, we have shown, under a variety of technical conditions, that for arbitrary hypersphere theories, hyperdense coding is impossible if one impose continuous reversibility on local systems (see Appendix B).
In general, continuous reversibility not only applies to single systems with state spaces Ω A or Ω B , but also to composite systems in Ω AB , i.e., for every pair of pure states in Ω AB there should exist a continuous reversible transformation that transforms one into the other. In line with the above remarks, note that imposing continuous reversibility for composite systems implies that superdense coding is possible in HSTs only if n = 3, which corresponds to the Bloch ball. This follows from the result of [19] showing that for a pair of HST systems satisfying continuous reversibility, entangled states only exist for the case n = 3. While our theories must thus obviously violate continuous reversibility, note that it is an open question whether they satisfy the weaker requirement that there merely exists a transformation mapping one of the product state to one of the entangled states used in our protocol.
In addition to continuous reversibility, tomographic locality is another physical conditions that has been used to derive finite dimensional quantum theory in the framework of GPTs [12,13,16,19]. Tomographic locality is a physical property stating that the description of entangled states is completely fixed by the outcome probabilities of local measurements [12,20]. Violation of tomographic locality means that there are global degrees of freedom that are inaccessible to local observers. The theories that we have discussed satisfy tomographic locality. It is not difficult to build theories that exhibit hyperdense coding and that satisfy local continuous reversibility, but that violate tomographic locality (see Appendix C).
On a more general level, our hyperdense coding protocols have highly unphysical consequences. Indeed, they imply superadditive capacities: the classical capacity of the bipartite system AB is greater than the sum of the local capacities of A and B. A breakdown of additivity suggests that in such theories one cannot define a unit of information, and hence that the whole framework of information theory breaks down. (We note that in quantum theory there is a weak breakdown of additivity, namely the capacity of a specific channel can be superadditive [3]. This has much less dramatic consequences than the superadditivity discussed here).
Another possible consequence is related to thermodynamics. Indeed, the entropy of a state ω can be defined, as briefly suggested in [23], as the maximum classical capacity of any encoding/decoding protocol in which Alice sends Bob the state ω = x p x ω x on average. This definition coincides with the Shannon and von Neumann entropies in the classical and quantum cases. It differs however from other definitions of entropy in GPTs [23,24]. Our hyperdense coding protocols show that using the definition based on classical capacity, entropy can be superadditive in GPTs. This suggests that statistical mechanics could not be applied to these theories, and that they would not have a macroscopic limit.
For these reasons, it would be interesting to investigate whether additivity of the classical capacity should be taken as a basic physical condition for any reasonable theory. We have suggested that it is related to other properties, including tomographic locality and continuous reversibility. What the detailed relation is, and whether additivity of classical capacities can replace other physical conditions previously explored to distinguish quantum theory, is an open question. We define precisely the notion of continuous reversibility for local systems.
Local Continuous Reversibility (LCR). For a bipartite system AB, any pair of pure states for the local system A, or B, is connected by a continuous reversible transformation.
We investigate the implications of imposing local continuous reversibility on the dense coding protocol given in the main text. In order to ensure the consistency condition, we modify the space of states and effects in a minimal way, in terms of two parameters λ and τ . Recall that n = 2 N − 1 is the dimension of the local HST systems, with N ∈ N. We show that if we impose satisfaction of local continuous reversibility then our dense coding protocols communicate two bits only for N = 2 and no more than one bit for N = 2, hence, they are superdense coding protocols only for N = 2, which corresponds to the Bloch sphere.
We consider entangled states and effects similar to the ones in the main text, but modified in terms of two parameters λ and τ : whereT µ is defined as in Eq. (14) of the main text, for µ ∈ {0, 1} N and some λ, τ ∈ [−1, 1] that we specify below. Notice that the case λ = τ = 1 corresponds to the states and effects given by Eqs. (7) and (8) of the main text. Equations (5), (7), (12) and (14) of the main text, and Eq. (A1) imply that µ∈{0,1} N E (τ ) µ = diag(1, 0), which is the unit effect u AB , for any value of τ .
We impose that local continuous reversibility must be satisfied. We must have that T A = T B = T , where T is defined as the set of transformations of the form withT ∈T andT being a group of continuous reversible transformation that is transitive on the sphere in R 2 N −1 [19]. It follows thatT is a subgroup of SO(2 N − 1) [36]. Since SO(1) = {1}, there are no continuous reversible transformations for the case N = 1. In fact, the case N = 1 corresponds to the classical bit, whose transformations are discrete and correspond toT = O(1) = {1, −1}.
Then, there is no superdense coding for N = 1 since superdense coding does not exist in classical probabilistic theory. Thus, in the following, we only consider N ≥ 2. For N = 3, the only possible group of continuous reversible transformations that is transitive on the sphere isT = SO(2 N − 1), while for N = 3,T can be either SO(7) or G 2 [36]. We consider here thatT = SO(2 N − 1) for all N ∈ N/{1}. For a given λ, in order to have that the allowed local transformations satisfy the consistency condition, it must be that T φ (λ) µ T ′t ∈ Ω AB , for all T, T ′ ∈ T and µ ∈ {0, 1} N . We thus define Ω AB ≡ convex hull{Ω A ⊗ min 0 T ′t |T, T ′ ∈ T } is a continuous set of entangled states, differently to the discrete set S N defined by the states (7) of the main text. We notice that φ (λ) µ ∈ Λ (λ) for all µ ∈ {0, 1} N . Since |λ| ≤ 1, following Eq. (15) of the main text, it is straightforward to show that Ω AB ⊆ Ω A ⊗ max Ω B .
Appendix B: Communication capacities of locally continuous hypersphere theories We prove the following results stating the impossibility of superadditive capacities, hyperdense and superdense coding for particular classes of bipartite HST systems.

Proposition 7.
Let Ω AB be a bipartite state space satisfying the no-signalling principle, tomographic locality and local continuous reversibility with Ω A ≃ Ω B ≃ Ω a hypersphere system of dimension n ≥ 2. Consider states φ x ∈ Ω AB that encode a message x with probability p x , whose local states are ω ax = 1 ax ∈ Ω A and ω bx = 1 bx ∈ Ω B , with a x , b x ∈ R n and a x ≤ 1, b x ≤ 1. Let y be the outcome of a measurement on the state φ x corresponding to the effect E y ∈ E AB and let I(X : Y ) be the mutual information between x and y. Let T be the set of allowed local transformations on Ω A and Ω B . In general, their elements are of the form where the matricesT form the groupT . The following properties hold: i) for states with a x = 0 or b x = 0, we have I(X : Y ) ≤ 2 for odd n = 7, with equality achieved only for n = 3; ii) ifT = SO(n) and n is odd then I(X : Y ) ≤ 2 for a x = 0 or b x = 0, with equality achieved only for n = 3; iii) ifT = SO(n) and n is even then I(X : Y ) ≤ 2 for arbitrary φ x ∈ Ω AB , and I(X : Y ) ≤ 1 for states with a x = 0 or b x = 0; iv) ifT = O(n), which includes continuous and discontinuous transformations, we have for any n that, I(X : Y ) ≤ 2 for arbitrary φ x ∈ Ω AB and I(X : Y ) ≤ 1 for states with a x = 0 or b x = 0; v) for states that are obtained by local transformations φ x = T x φ, we have I(X : Y ) < 2 ifT = SO(n) with even n, or ifT = O(n) with arbitrary n.
Recallling that the classical capacity of a single system in hypersphere theories is 1 bit, no protocol using the above families of states can achieve superadditivity of classical capacities, hyperdense coding or superdense coding, for the respective cases.
The following lemmas are used in the proof of Proposition 7, given above. They provide important constraints on the bipartite states and effects in HSTs for arbitrary state spaces Ω AB limited only by NS, TL and the restriction that Ω A ≃ Ω B ≃ Ω is a HST system of dimension n.

Lemma 1.
Let Ω AB be a bipartite state space that satisfies no-signalling and tomographic locality, with Ω A ≃ Ω B ≃ Ω being the state space of a HST system of dimension n. Any state φ ∈ Ω AB can be expressed as a matrix where a, b ∈ R n , C ∈ R n ⊗ R n , a ≤ 1, b ≤ 1 and c k ≤ 1, with c k being the kth column vector of C, for k = 1, 2, . . . , n.
Proof. The principles of no-signalling and tomographic locality imply that φ ∈ Ω AB can be expressed as the matrix (B2), with C ∈ R n ⊗ R n and a, b ∈ R n , as given by Eq. (1) of the main text, and that Ω A ⊗ min Ω B ⊆ Ω AB ⊆ Ω A ⊗ max Ω B [21]. We show below that for HSTs, a ≤ 1, b ≤ 1 and c k ≤ 1, for k = 1, 2, . . . , n.
Since the respective local states on Ω A and Ω B are We show that c k ≤ 1, for k = 1, 2, . . . , n. Let v k ∈ R n be a column vector whose kth entry is equal to unity and all other entries are zero. It follows that Cv k = c k . Let c k > 0. Consider the following extremal product effects: Using expression (B2) and the property Cv k = c k , we obtain where b k is the kth entry of b and λ ∈ {0, 1}. Since Lemma 2. Let E AB be the space for normalised effects corresponding to a bipartite state space Ω AB that satisfies no-signalling and tomographic locality, with Ω A ≃ Ω B ≃ Ω being a HST system of dimension n. Any effect E ∈ E AB can be expressed as a matrix where 0 ≤ γ ≤ 1, α, β ∈ R n , Γ ∈ R n ⊗ R n , α ≤ 1, β ≤ 1, γ k ≤ 1, with γ k being the kth column vector of Γ, for k = 1, 2, . . . , n.
We complete the proof by showing (B6) -(B9). To do so, we use the expression of E given by (B5). We apply E on particular product states φ and use the conditions 0 ≤ E · φ ≤ 1. This follows because Ω A ⊗ min Ω B ⊆ Ω AB .
Proof of Proposition 7. Since Ω AB satisfies the nosignalling principle and tomographic locality, Lemmas 1 and 2 hold. Thus, the states φ x ∈ Ω AB and the effects E y ∈ E AB can be expressed as in (B2) and (B4): It follows from (B12) and Proposition 1 that where χ ≡ max{χ x,y }, χ x,y ≡ α y · a x + β y · b x + Γ y · C x and the maximum is taken over all states φ x ∈ Ω AB and measurements with effects E y ∈ E AB . Let this maximum be achieved by a state φ and a effect E, for which we drop the x and y labels. From (B12), we have where The set T of allowed local transformations on HST systems Ω A ≃ Ω B ≃ Ω of dimension n ≥ 2 that satisfy local continuous reversibility has elements of the form given by Eq. (B1), where the groupT must be transitive on the unit sphere in R n [19]. There are various groups with this property. In general,T is a subgroup of SO(n). For odd n = 7, the only possibility isT = SO(n) [36].
From the consistency condition, we have T φ x T ′t ∈ Ω AB for all φ x ∈ Ω AB and all T, T ′ ∈ T , hence, E y ·(T φ x T ′t ) ≥ 0 for all E y ∈ E AB . Thus, E ·(T φT ′t ) ≥ 0 for all T, T ′ ∈ T . It follows from (B14) that where a ′ ≡T a, b ′ ≡T ′ b and C ′ ≡T CT ′ t , for allT ,T ′ ∈ T . We show i). As said above, for odd n = 7, local continuous reversibility implies thatT = SO(n). Consider Eq. (B15) for the case a = 0. LetT = I andT ′ = Q k being a diagonal matrix with all entries equal to −1 except for the kth entry, which equals unity. We have that T ,T ′ ∈ SO(n). It follows that where b k and β k are the kth entries of b and β, and c k and γ k are the kth column vectors of C and Γ, respectively. Thus, in the case a = 0, we have from (B16) that for k = 1, 2 . . . , n. Since β · b = n k=1 β k b k and Γ · C = n k=1 γ k · c k , it follows from (B17) that where the second inequality is achieved only for n = 3. Thus, from (B13) and (B18), we have I(X : Y ) ≤ 2, with equality achieved only for n = 3. The case b = 0 is proved similarly by consideringT ′ = I andT = Q k . We show ii). Since we assume thatT = SO(n) for arbitrary odd n, the proof follows straightforwardly as for i).
We show v). We assumeT = SO(n) and even n, or T = O(n) and arbitrary n. In any case we have −I ∈T . Considering equation (B15) withT = I andT ′ = −I, we obtain, as in the proof of iii), the expressions (B20) and (B21), from which it follows that I(X : Y ) ≤ 2. However, the equality I(X : Y ) = 2 cannot be achieved by states that are obtained by local transformations, φ x = T x φ 0 , as we show. On the one hand, from (B13) and (B20), a necessary condition for the equality I(X : Y ) = 2 is that a = 1, that is, that the local state ω a ∈ Ω A is pure. On the other hand, if ω a is pure, it is easy to see that φ must be product: φ = ω a ⊗ ω b , for some ω b ∈ Ω B [27]. Thus, if a = 1, the bipartite states are product: φ x = ω ax ⊗ ω b . It follows from Proposition 4 that in this case I(X : Y ) ≤ 1 < 2.
Appendix C: Locally continuous hyperdense coding violating tomographic locality We show that relaxing tomographic locality allows hyperdense coding while still satisfying local continuous reversibility.
Consider local state spaces Ω A ≃ Ω B ≃ Ω (n) m , where Ω (n) m ≡ ω r ≡ 1 0 r 0 ∈ R n , r ∈ R m , r ≤ 1 , for m ∈ N and n ∈ Z + . The pure states satisfy r = 1. The unit effect is u = 1 0 with 0 being the null vector in R n+m . The state space Ω (n) m corresponds to an m−hypersphere system embedded in a bigger vector space. The space of local effects E is the convex hull of the zero effect, the unit effect u and the extremal effects e r ≡ 1 2 1 0 r , where 0 is the null vector in R n and r ∈ R m with r = 1. In what follows we consider n = 2 N − 1, with N ∈ N/{1}. The case N = 1 corresponds to a classical bit.
We define the set of allowed local transformations as T A ≃ T B ≃ T . The elements of T are defined by where R ∈ SO(m) and T µ ∈ T N , for µ ∈ {0, 1} N , as defined by Eq. (12) of the main text. It is straightforward to see that the local state space Ω (n) m remains invariant under the set of allowed local transformations T .
We show that Ω AB ⊆ Ω A ⊗ max Ω B . From the definition of Ω AB , we only need to show that Λ ⊂ Ω A ⊗ max Ω B .
On the other hand, tomographic locality is violated because there exist states in Ω AB , the entangled states in Λ, that cannot be determined from the outcome probabilities of local measurements performed on Ω A and Ω B . This is easily seen from (C3) because the outcome probabilities of local measurements on states Φ µ ∈ Λ are independent of the state Φ µ . Thus, the states in Λ cannot be determined from local measurements.
We introduce hyperdense coding protocols in the state space Ω AB defined above. Alice and Bob initially share the state Φ 0 given by (C2). With probability p x = 2 −N , Alice implements the local transformation T (R) x defined by (C1) for some R ∈ SO(m), and the state transforms into T (R) x Φ 0 = Φ x , for x ∈ {0, 1} N . Alice sends Bob her system. Bob applies the joint measurement defined by the effects for y ∈ {0, 1} N , where in the second equality we used the definition (C2), and expression (8) of the main text. We show that {F y } y∈{0,1} N defines a measurement. First, we show that {F y } y∈{0,1} N ⊂ E AB . Since Ω AB ≡ convex hull{Ω A ⊗ min Ω B , Λ} with Λ = {Φ x } x∈{0,1} N , we only need to show that 0 ≤ F y · (ω a ⊗ ω b ) ≤ 1 and 0 ≤ F y · Φ x ≤ 1, for all x, y ∈ {0, 1} N , ω a ∈ Ω A and ω b ∈ Ω B . It is easy to see that F y · (ω a ⊗ ω b ) = 2 −N for all y ∈ {0, 1} N , ω a ∈ Ω A and ω b ∈ Ω B . Moreover, from (C2), (C4) and Eq. (11) of the main text, we have for all x, y ∈ {0, 1} N . Thus, {F y } y∈{0,1} N ⊂ E AB . Second, from (C4) and Eq. (10) of the main text, we have y∈{0,1} N F y = diag(1, 0), with 0 being the null vector in R n+m and n = 2 N − 1, which is the unit effect u AB on Ω AB .
This protocol achieves a mutual information between Alice's and Bob's random variables X and Y of as follows from (C5). Since the classical capacity of the systems in Ω A or Ω B is one bit for any value of m, this is a hyperdense coding protocol for N > 2.