Channel discord and distortion

Wei-Wei Zhang; Yuval R. Sanders; Barry C. Sanders

doi:10.1088/1367-2630/ac180a

1. Introduction

Discord is often touted as a quantifier of quantum correlations in a state, with nonzero discord said to imply that observed correlations transcends non-quantum (i.e. 'classical') limits [1, 2], akin to, but different from, a Bell inequality [3]. Treated as a quantum resource operationalized by state merging [4], discord in quantum computing protocols [5, 6] is believed by many to deliver a quantum advantage to some protocols [7]. However, this quantum nature of discord has been challenged by stochastic information, which shows that discord is due to noisy measurement by one of the two parties [8]. Essentially, discord can be understood in terms of a protocol amenable to a stochastic-information interpretation, and this interpretation fails if and only if (iff) the two parties share bipartite entanglement [8]. Discord thus serves as a fascinating starting point for studying stochastic information.

Previous work analyzed state discord in the context of classical states [8], i.e. analyzing discord as a signature of classical rather than quantum correlations. Here we introduce the concept of classical channel discord, which is based on averaging over all allowed channel input states for a given channel, with the previous definition used in assessing how much discord is added to a state by the given channel. Our expression for channel discord is amenable to numerical evaluation, which shows channel discord is monotonic with respect to channel distortion. We augment this numerical analysis by solving analytically the small but nontrivial two-bit case (each of two parties holds one bit) to confirm our numerics for this case and establish a path for proving discord–distortion monotonicity, which is a challenging calculation as we show. Our monotonicity result establishes meaningfully that channel discord and distortion are essentially equivalent.

Mathematically, discord in a two parties state refers to an apparent discrepancy between two expressions for mutual information obtained by two parties named here as Alice (A) and Bob (B), one quantity depending on joint probability and then other depending on conditional probability. The usual view of the reason for quantum discord in a state is that conditional information must be adapted to the quantum case by introducing measurement, and this incompatibility gives rise to nonzero discord [2]. Quantum discord in a state has been shown to be equivalent to classical discord iff entanglement between the two parties is zero, and this equivalence has been explained by showing that, in the absence of entanglement, discord represents one party, namely Bob, suffering from noisy measurement whereas Alice's measurements are ideal [8]. Our goal is to show that classical channel discord, obtained by averaging channel-added discord to all allowed channel input states, is equivalent to channel distortion by establishing a monotonic relation between discord and total-variation, or Kolmogorov, distance, which quantifies channel distortion. We analyze the general case numerically and solve analytically for the two-bit case.

Our article is structured as follows. In section 2 we summarise essential background on stochastic information, discord, total-variation distance and doubly stochastic channels. Our approach is described in section 3 and our approach specifically elaborates on our model for describing a noisy protocol for creating channel discord. Furthermore, section 3 presents our notation and mathematical expressions, and our methods for solving these expressions numerically. Subsequently, in section 4, we present our numerical results and explain the plots, and we discuss the results thoroughly in section 5. In section 6, we summarise our claims and provide an outlook. In appendices A and B we introduce convenient notation for probability vectors and what we call Hadamard calculus respectively, which fundamental to our approach.

2. Background

In this section we discuss the background and context for our work. In section 2.1 we discuss informational states including what we call stochastic information, which is a probabilistic mixture of definite informational states; specifics regarding probabilistic information states are explained in appendix B, which is based on the Hadamard notation explained in appendix A. Included in this discussion, we review the notions of entropy by considering shared information between parties, and we also review the notions of entropy and mutual information, all in the elegant Hadamard notation elaborated in appendix A, which we apply for the first time to this application. Then, in section 2.2, we explain mappings of information states in terms of channels with special emphasis on doubly stochastic channels. In this subsection we discuss ways to quantify how stochastic a channel is. Finally, in section 2.3, we review the notion of classical discord for states and the concepts of total-variation distance for stochastic information states; quantum discord is explained in appendix C.

2.1. Stochastic information

In this subsection we review the concept and mathematical framework for stochastic information based on the probability-vector representation elaborated in appendix B which uses Hadamard calculus introduced in appendix A. Then we discuss known concepts concerning entropy of a stochastic-information state by using Hadamard calculus introduced in appendix A. Finally, we review bipartite stochastic information including conditional entropy and mutual information.

The joint probability of messages shared between Alice, whose message size is M^A, and Bob, whose message size is M^B, is the bipartite matrix

$\begin{equation}{\boldsymbol{p}}^{\text{AB}}=\sum\limits _{m{m}^{\prime }}\enspace {p}_{m{m}^{\prime }}^{\text{AB}}{\boldsymbol{\delta }}_{m{m}^{\prime }}^{\text{AB}}\in {\text{mat}}_{{M}^{\text{A}}{\times}{M}^{\text{B}}}\left({\mathbb{R}}^{{\geqslant}0}\right),\enspace {\Vert}{\boldsymbol{p}}^{\text{AB}}{\Vert}=1,\end{equation} \tag{ 1 }$

using the notation that ${\text{mat}}_{{M}^{\text{A}}{\times}{M}^{\text{B}}}\left(R\right)$ refers to matrices with M^A rows and M^B columns whose entries are from any ring R. The norm is defined by equation (A4). Here we have let ${\boldsymbol{\delta }}_{m{m}^{\prime }}^{\text{AB}}$ denote a versor for message m' as discussed in appendix B.

In concordance with quantum-information nomenclature, we refer to ${\boldsymbol{\delta }}_{\check {m}}$ as a 'pure state' [9]. Impure states refer to 'mixed' states, which are probabilistic mixtures of pure states and are obtained as a probabilistic mixture of any pure state. We discuss mixed and pure bipartite stochastic-information states in appendix B.

Now we discuss the entropy of the probability vector representing the mixed message. Mixedness of a state $\boldsymbol{p}\in {\mathbb{R}}^{M}$ is quantified by entropy [10]

$\begin{equation}0{\leqslant}H(\boldsymbol{p}){:=}-\boldsymbol{p}\odot \mathrm{log}\boldsymbol{p}{\leqslant}\mathrm{log}\enspace M\end{equation} \tag{ 2 }$

using Hadamard notation explicated in appendix A. A state is pure iff its entropy is zero, which follows from p ◦ log p = 0 for a versor. The lower bound for the entropy (2) is H( p ) = 0 for a pure state. The upper bound for entropy is H( p ) = log M for a uniformly mixed state, with M as the size of p . A high-entropy state is a state whose entropy is close to this bound. The joint state (1) has joint entropy H^AB (2) and total message size M. In our analysis, we always assume, without loss of generality, that

$\begin{equation}{M}^{\text{A}}\equiv {M}^{\text{B}}\enspace \Longrightarrow\enspace {M}^{\text{A}}=\sqrt{M}={M}^{\text{B}}.\end{equation} \tag{ 3 }$

Thus, the joint entropy of the bipartite versor (B5) is zero as required for a pure state. The matrix representation of p ^AB is an M^A × M^B matrix, with nonnegative real entries such that the sum of all entries is one.

The marginal distribution is obtained by ignoring the other party's share of the mixed state. Hence, Alice's marginal distribution is the probability vector

$\begin{equation}{\boldsymbol{p}}^{\text{A}}{:=}\left(\sum\limits _{{m}^{\prime }}\enspace {p}_{m{m}^{\prime }}^{\text{AB}}\right),\enspace {\Vert}{\boldsymbol{p}}^{\text{A}}{\Vert}=\sum\limits _{m}\enspace {p}_{m}^{\text{A}}\end{equation} \tag{ 4 }$

using the unit one-norm. Similarly, we construct the marginal distribution p ^B by summing over Alice's degree of freedom.

Alice's state conditioned on Bob's state is

$\begin{equation}{\boldsymbol{p}}^{\text{A}\vert \text{B}}{:=}{\boldsymbol{p}}^{\text{AB}}\oslash {\boldsymbol{p}}^{\text{B}}=\left(\frac{{p}_{m{m}^{\prime }}^{\text{AB}}}{{p}_{{m}^{\prime }}^{\text{B}}}\right)\end{equation} \tag{ 5 }$

with ⊘ explained in appendix A. The last term of equation (5) displays row-vector elements ${\boldsymbol{p}}_{m}^{\text{A}}$ obtained by element-wise division of each matrix element ${p}_{m{m}^{\prime }}^{\text{AB}}$ by respective column-vector elements ${p}_{{m}^{\prime }}^{\text{B}}$ . Similarly, the conditional probability distribution for Bob is p ^B|A = p ^AB ⊘ p _A analogous to (5).

The entropy of the conditional probability distribution p ^A|B is

$\begin{equation}{H}^{\text{A}\vert \text{B}}\left({\boldsymbol{p}}^{\text{AB}}\right)=H\left({\boldsymbol{p}}^{\text{A}\vert \text{B}}\right).\end{equation} \tag{ 6 }$

Fact 1. A bipartite stochastic state p ^AB, which decomposes to p ^AB ⊘ p ^B and to p ^AB ⊘ p ^A, is conditionally pure (cp) iff

$\begin{equation}{H}^{\text{A}\vert \text{B}}\equiv 0\equiv {H}^{\text{B}\vert \text{A}}.\end{equation} \tag{ 7 }$

Consequently, p ^AB is cp iff it is permutationally equivalent to a diagonal matrix; i.e. in diag([0, 1]), which refers to the set of diagonal matrices whose entries are each in the real-number interval [0, 1]. Thus,

$\begin{equation}\exists \sigma \in {S}_{{M}^{\text{A}}},{\sigma }^{\prime }\in {S}_{{M}^{\text{B}}}:{{\Pi}}_{\sigma }{\boldsymbol{p}}^{\text{AB}}{{\Pi}}_{{\sigma }^{\prime }}\in {\mathrm{diag}}_{\mathrm{min}\left\{{M}^{\text{A}},{M}^{\text{B}}\right\}}([0,1]),\end{equation} \tag{ 8 }$

i.e. is diagonal of size min{M^A, M^B} × min{M^A, M^B}. Furthermore, conditional probability distributions p ^A|B and p ^B|A, which are obtained from a bipartite stochastic pure information state p ^AB, are necessarily pure.

Operationally speaking, a bipartite state is cp only if Alice's pure state can be known by Bob after he measures his share of the joint stochastic-information state and vice versa. Consequently, the bipartite versor (B5) used in equation (1) is cp.

Mutual information

$\begin{align}{I}^{\text{A};\text{B}}\left({\boldsymbol{p}}^{\text{AB}}\right){:=}& {H}^{\text{A}}\left({\boldsymbol{p}}^{\text{AB}}\right)+{H}^{\text{B}}\left({\boldsymbol{p}}^{\text{AB}}\right)-{H}^{\text{AB}}\left({\boldsymbol{p}}^{\text{AB}}\right)\\ =& {\boldsymbol{p}}^{\text{AB}}\odot \mathrm{log}\left({\boldsymbol{p}}^{\text{AB}}\oslash {\boldsymbol{p}}_{\text{A}}\oslash {\boldsymbol{p}}^{\text{B}}\right)\end{align} \tag{ 9 }$

quantifies correlation between two parties, Alice and Bob, with the last part of this expression expressed in a novel way by using Hadamard arithmetic. An equivalent, alternative mutual information definition is

$\begin{equation}{J}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\right){:=}{H}^{\text{A}}\left({\boldsymbol{p}}^{\text{AB}}\right)-{H}^{\text{A}\vert \text{B}}\left({\boldsymbol{p}}^{\text{AB}}\right)={I}^{\text{A};\text{B}}\left({\boldsymbol{p}}^{\text{AB}}\right).\end{equation} \tag{ 10 }$

Consequently,

$\begin{equation}{{\Delta}}^{\text{A};\text{B}}\left({\boldsymbol{p}}^{\text{AB}}\right){:=}{I}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\right)-{J}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\right)\equiv 0\end{equation} \tag{ 11 }$

so ${I}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\right)$ (9) and ${J}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\right)$ (10) are equal.

2.2. Stochastic map and stochastic matrix

A noisy channel is any mapping that changes the entropy (or noise, which is monotonically related) of a state in a non-decreasing way and adds noise to at least one state [11]. We are specifically interested in noisy channels that can be represented as stochastic matrices that map probability vectors representing states [8].

Under the action of a channel represented by matrix $\mathcal{E}$ , the state, represented by p , maps to $\mathcal{E}\boldsymbol{p}$ . We require that $\mathcal{E}$ is a square matrix with nonnegative entries such that either rows or columns sum to one. Hence, the norm of the state p is unchanged by the stochastic map by stochastic matrix $\mathcal{E}$ . The entropy of this state after passing through the channel is $H(\mathcal{E}\boldsymbol{p}){\geqslant}H(\boldsymbol{p})$ . A doubly stochastic matrix is a stochastic matrix whose rows and columns both sum to one.

In the bipartite setting, an identity mapping by Alice concomitant with a stochastic map $\mathcal{E}$ by Bob, yields the resultant bipartite state

$\begin{equation}{\boldsymbol{p}}^{\text{AB}}{\mapsto}\mathbb{I}{\boldsymbol{p}}^{\text{AB}}\mathcal{E}={\boldsymbol{p}}^{\text{AB}}\mathcal{E}.\end{equation} \tag{ 12 }$

with the trivial identity map $\mathbb{I}$ on Alice's side and the noise matrix $\mathcal{E}$ only acting on Bob's share. By the Perron–Frobenius theorem, $\mathcal{E}$ being stochastic or doubly stochastic implies that this mapping has at least one stationary vector with all entries being positive real numbers with this vector corresponding to the largest eigenvalue of the matrix representing the mapping [12].

Although we discuss discord and total-variation distance in terms of measurement described by a noisy measurement channel represented by a stochastic matrix, we focus on doubly stochastic matrices due to the abundance of mathematical properties that we can exploit for generating and understanding our results. For stochastic information theory, doubly stochastic matrices are the non-quantum analogue of quantum completely positive trace-preserving maps [13].

Birkhoff's theorem says that any doubly stochastic matrix can be written as a convex hull of permutation matrices, which is known as the Birkhoff polytope [14]. The doubly stochastic matrix thus represents a random permutation of bits in the string. Furthermore, for each strictly positive matrix A, exactly one doubly stochastic matrix T_A exists such that T_A = DAD' with the diagonal matrices D and D' having positive diagonal-elements and themselves unique up to a scalar factor [15, 16]. Here we present key background information on doubly stochastic matrices needed for our study. Specifically, we define doubly stochastic matrices and connect these matrices with the Birkhoff polytope, also known as a permutahedron. A permutation σ is represented by a permutation matrix Π_σ, whose entries are all zeroes and ones such that only one instance of one appears in each row or column. For ℘ a length-M! probability vector, a permutahedron $\mathcal{E}$ is the convex sum

$\begin{equation}\mathcal{E}=\sum\limits _{\sigma \in {S}_{M!}}\enspace \boldsymbol{\wp }\cdot {\mathbf{\Pi }}_{\sigma },\enspace {\mathbf{\Pi }}_{\sigma }\in {\text{mat}}_{M!}\left(\left\{0,1\right\}\right)\enspace \Longrightarrow\enspace \mathcal{E}\in {\text{mat}}_{M!}\left(\mathbb{R}\right).\end{equation} \tag{ 13 }$

Note that $\mathcal{E}{{\Pi}}_{\sigma }={{\Pi}}_{\sigma }\mathcal{E}$ .

2.3. Classical discord and distortion for a state

In appendix C, we summarise quantum discord; in this subsection, we summarise classical discord for stochastic-information states [8]. Whereas quantum discord is the discrepancy Δ^A;B between mutual information I (9) and J (10) for quantum states, classical discord (11) is zero in the ideal case. However, discord is nonzero if, analogous to the quantum case, which optimizes over all possible measurements, classical discord also involves noisy measurements.

Alice's measurements are treated as ideal whereas Bob's measurements are treated as being noisy, described by a stochastic or a doubly stochastic mapping $\mathcal{E}$ acting on the state (12) [8]. The resultant state, after Bob's noisy measurement, is ${\boldsymbol{p}}^{\text{AB}}\mathcal{E}$ (12). Whereas the mutual information I (9) is known, the alternative mutual information (10) is modified to include the effect of Bob's noisy measurements and is consequently described by

$\begin{equation}{J}_{\mathcal{E}}^{\text{A};\text{B}}\left({\boldsymbol{p}}^{\text{AB}}\right){:=}{J}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\mathcal{E}\right)\end{equation} \tag{ 14 }$

with the subscript $\mathcal{E}$ referring to Bob's noisy channel as we always treat Alice's as ideal: $\mathbb{I}$ . In other words, the conditional information inherent in inferring alternative mutual information (14) involves Bob announcing his results to Alice, and Bob's measurement apparatus is noisy: described by stochastic or doubly stochastic channels, as described in section 2.2, prior to ideal measurement and announcement by Bob. Following this definition of alternative mutual information involving noisy measurement (14), discord for stochastic information is [8]

$\begin{equation}{{\Delta}}_{\mathcal{E}}^{\text{A};\text{B}}\left({\boldsymbol{p}}^{\text{AB}}\right){:=}{I}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\right)-{J}_{\mathcal{E}}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\right)\end{equation} \tag{ 15 }$

for specified noisy channel $\mathcal{E}$ , which only affects J and not I. State discord (15) is considered to be classical because states are distributions and because channels are stochastic maps; i.e. all the mathematical objects are distributions and their mappings and hence does not require a Hilbert-space description. This $\mathcal{E}$ -dependent discord is necessarily nonnegative due to the data-processing inequality [11].

By analogy with quantum discord, which minimizes over all measurement, classical discord corresponds to minimizing state discord (15) over all allowed channels $\left\{\mathcal{E}\right\}$ [8]. For shared stochastic-information states, classical discord quantifies how much stochasticity is added by a noisy measurement process. If this noise is described by a doubly stochastic channel, this noise corresponds to random permutations, following Birkhoff's theorem, corresponding to instances of measuring some messages incorrectly as other messages, with the identity permutation corresponding to measuring all integers correctly. Non-zero discord can be interpreted as quantifying stochasticity added by measurement only if entanglement is zero; otherwise a quantum model is required to describe correlations [8]. Analogous to quantum state merging operationalizing quantum discord [4], stochastic-information state merging operationalizes classical discord [8].

Channel distortion, which is used in rate-distortion theory [11], quantifies the minimum number of bits required per symbol that could be achieved over a channel so that the input signal can be approximately reconstructed at the output without exceeding a given expected distortion. Mathematically, rate-distortion theory, distortion functions quantify the cost of representing a symbol by an approximate symbol. Typical distortion functions include Hamming distortion, squared-error distortion and total-variation.

Total-variation, or Kolmogorov, distance between probability distributions p and p ' (B2), namely [17],

$\begin{equation}\mathcal{D}\left(\boldsymbol{p},{\boldsymbol{p}}^{\prime }\right){:=}\frac{1}{2}\sum\limits _{m\in [M]}\left\vert {p}_{m}-{p}_{m}^{\prime }\right\vert .\end{equation} \tag{ 16 }$

has been widely used for extremum problems, such as controlling uncertain stochastic systems [18], approximating a family of probability distributions by a given probability distribution, maximizing or minimizing entropy subject to total-variation distance constraints, quantifying uncertainty of probability distributions by total-variation distance metric, stochastic minimax control, and in many problems of information, decision theory, and minimax theory [19], testing for scale families [20] and distortion of channels [11]. Thus, total-variation distance is well studied and valuable across a broad spectrum of applications, including for us in comparing total-variation distance to discord.

3. Approach

In this section, we begin by explaining our model, which involves three agents: Alice and Bob who share messages and Charlie who provides random messages from a distribution. After describing the model, we develop the mathematics required to analyse the effect of noisy measurement in terms of average discord and average distortion in section 3.2. Finally we elaborate on our methods for solving the expressions and what we plot in section 3.3.

3.1. Model

We describe our model for discord as a three-agent protocol involving Charlie, Alice and Bob. By describing the tasks performed by each of the three agents, we have fully described the protocol and the pertinent quantifiers of discord and distortion. Although this model is implied in a previous study of classical discord, we need to make explicit the agents of this protocol and their actions to be clear in our study of channel discord.

Charlie generates joint distributions

$\begin{equation}\left\{\enspace {\boldsymbol{p}}^{\text{AB}}\in {\text{mat}}_{{M}^{\text{A}}{\times}{M}^{\text{B}}}\left({\mathbb{R}}^{{\geqslant}0}\right)\right\}\end{equation} \tag{ 17 }$

with prior $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ and then computes ${I}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\right)$ (9) for each p ^AB. For given p ^AB (1), Charlie generates a length-ς sequence of pairs of integers

$\begin{equation}\left\{\left({m}^{\text{A}}\in [{M}^{\text{A}}],{m}^{\text{B}}\in [{M}^{\text{B}}]\right)\right\}\end{equation} \tag{ 18 }$

by sampling over p ^AB. In each instance, the first integer message m^A is sent to Alice and the second integer message sent to Bob. As A and B can experience noise in their readout, the resultant messages, m^A' and m^B', can differ from the original messages, m^A and m^B.

Alice and Bob send this noisy pair, m^A' and m^B', to Charlie. At the end of this part of the protocol, Charlie has stored the length-ς sequence {m^A', m^B'}. Charlie then infers the distribution p ^A'^B' from these data, with this inferred state represented by ${\tilde {\boldsymbol{p}}}^{{\text{A}}^{\prime }{\text{B}}^{\prime }}$ .

As Alice's instrument is assumed to be noiseless, m^A' ≡ m^A, Charlie's procedure is greatly simplified: he does not send Alice the message but just stores it. The message pair is thus {(m^A, m^B')}. Charlie's inferred state is $\tilde {{\boldsymbol{p}}^{\text{AB}}\mathcal{E}}\in {\text{mat}}_{{M}^{\text{A}}{\times}{M}^{\text{B}}}\left({\mathbb{R}}^{{\geqslant}0}\right)$ , which approximates the actual state ${\boldsymbol{p}}^{\text{AB}}\mathcal{E}$ after Bob's noisy measurement and equation (1). Charlie computes all permutations of the state $\tilde {{\boldsymbol{p}}^{\text{AB}}\mathcal{E}}$ , with each permuted state denoted $\tilde {{\boldsymbol{p}}^{\text{AB}}\mathcal{E}}{{\Pi}}_{\sigma }$ .

He thence estimates the alternative mutual information ${J}_{\sigma }^{\text{A;B}}$ (10) with the subscript σ indicating which of the permuted states $\tilde {{\boldsymbol{p}}^{\text{AB}}\mathcal{E}}{{\Pi}}_{\sigma }$ is being considered. With these results at hand, Charlie estimates the discord for each $\tilde {{\boldsymbol{p}}^{\text{AB}}\mathcal{E}}{{\Pi}}_{\sigma }$ and computes the minimum discord over all σ. Then he averages over results for many generated states p ^AB to obtain an estimate for average discord for a specific channel corresponding to Bob's noisy measurement.

For each estimate $\tilde {{\boldsymbol{p}}^{\text{AB}}\mathcal{E}}{{\Pi}}_{\sigma }$ , he computes the distortion, which he quantifies by the total-variation distance ${\mathcal{D}}_{\sigma }^{\text{A;B}}\left(\tilde {{\boldsymbol{p}}^{\text{AB}}\mathcal{E}}{{\Pi}}_{\sigma }\right)$ , between p ^AB and the estimate $\tilde {{\boldsymbol{p}}^{\text{AB}}\mathcal{E}}{{\Pi}}_{\sigma }$ . Charlie repeats this task for all permutations σ to obtain the minimum total-variation distance and then averages over all states to obtain average distortion. The mathematical description of this procedure is in section 3.2.4.

In each instance Alice receives noiseless message m^A, which is the versor ${\boldsymbol{\delta }}_{{m}^{\text{A}}}$ , which she reads and sends the same message back to Charlie. Thus, for our mathematical analysis, Alice's role is superfluous, hence neglected in our protocol.

In each instance Bob receives message m^B. His measurement is noisy, which we describe by a doubly stochastic channel $\mathcal{E}$ described in section 2.3. This noise corresponds to permutations of the message basis so some messages are seen to be different messages incorrectly except in the case of the identity permutation $\mathbb{1}$ , which corresponds to reading the message correctly.

To elucidate our model, we consider the specific case of a two-bit channel. Thus, we assume that Charlie generates a single bit each for Alice and Bob; i.e. M^A = M^B = 2 and m^A, m^B ∈ {0, 1}. Let the noisy channel be the mapping of 0 to 0, i.e. 0 ↦ 0, with probability $\frac{2}{3}$ . Then, by the doubly stochastic property, 0 ↦ 1 with probability $\frac{1}{3}$ and 1 ↦ 1 with probability $\frac{2}{3}$ so 1 ↦ 0 with probability $\frac{1}{3}$ the matrix describing this mapping is doubly stochastic. Bob sends the message m^B' obtained from his measurement back to Charlie. Note that, for the execution of this protocol, the same stochastic matrix, which describes Bob's measurement noise, is applied once per instance, i.e. each time that Bob reports each measurement outcome.

3.2. Mathematics

In this subsection, we describe in section 3.2.1 how Charlie generates states as randomly chosen joint distributions to be sent to Alice and Bob, and then we describe how random channels are generated in section 3.2.2 for Bob. In section 3.2.3 we describe mathematically how the channel is applied to the joint state. Permutations are applied to states, and both discord and distortion are minimized over all permutations, as described in section 3.2.4. Finally, in section 3.2.4, we explain how average discord and average distortion are estimated.

3.2.1. Generating joint distributions

In this subsection, we explain mathematically how Charlie generates p ^AB. Charlie constructs a prior $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ , which is heavily weighted over high-entropy states, meaning that state entropy (2) is close to the upper bound. By sampling this prior, Charlie obtains states with high, low and medium entropy by a linear interpolation between states drawn randomly from $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ and the state represented by the identity matrix denoted $\mathbb{1}$ . This linear interpolation generates a continuum of interpolated states for each of the N_rand states. Thus, Charlie generates random states from which he draws messages to send to Alice and Bob.

To sample from $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ , we generate a random $\boldsymbol{p}\in {\text{mat}}_{{M}^{\text{A}}{\times}{M}^{\text{B}}}(\mathbb{R})$ with each of the M = M^A M^B entries chosen uniformly from [0, 1]. The matrix p is then normalized by dividing all entries by their sum. This method of selecting random states leads to states that are mostly high-entropy with the maximum entropy being $\mathrm{log}\left({M}^{\text{A}}{M}^{\text{B}}\right)$ . For M^A = M^B, which corresponds to a square matrix, the maximum entropy is log M for M the square (3) of M^A.

Alternatively, Charlie can sample from a joint-state prior $Q\left({\boldsymbol{p}}_{\text{cp}}^{\text{AB}}\right)$ for cp states, with cp states satisfying fact 1. Conditionally pure states are permutationally equivalent to diagonal matrices ${\mathrm{diag}}_{\mathrm{min}\left\{{M}^{\text{A}},{M}^{\text{B}}\right\}}([0,1])$ , as explained in equation (8). Thus, For M^A = M^B, a cp state is constructed by generating a random ${\boldsymbol{p}}_{\text{cp}}\in {\mathrm{diag}}_{{M}^{\text{B}}}([0,1])$ with each of the diagonal entries chosen uniformly from [0, 1] and then normalized such that the sum of diagonal elements is one. As for general states, such cp states also tend to have high entropies.

Sampling either $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ or $Q\left({\boldsymbol{p}}_{\text{cp}}^{\text{AB}}\right)$ yields a candidate state ${\boldsymbol{p}}_{\text{cp}}^{\text{AB}}$ , which is then mapped to a family according to

$\begin{equation}{\boldsymbol{p}}_{\text{cp}}^{\text{AB}}{\mapsto}\frac{a{\boldsymbol{p}}_{\text{cp}}^{\text{AB}}+b\mathbb{1}}{\mathrm{norm}\enspace a{\boldsymbol{p}}_{\text{cp}}^{\text{AB}}+b\mathbb{1}}\quad \forall \enspace a\in [0,1],\enspace b\in [B],\end{equation} \tag{ 19 }$

where B ≫ 1 to ensure sufficiently many medium- and low-entropy states. The channel representing Bob's noisy measurement then acts on Bob's message share, and we explain how to generate these channels in the next subsection.

3.2.2. Generating channels

In this subsection, we explain how to generate a random doubly stochastic channel for Bob. As the doubly stochastic channel is a permutahedron discussed in section 2.2, generating a random channel is equivalent to constructing the length M^B! weight, or probability, vector ℘ , which corresponds to the probability coefficients for reverse lexicographic ordering of permutations {σ}, similarly to reverse lexicographical ordering of the vector of permutation matrices (B4). A given weight vector ℘ has associated entropy

$\begin{equation}0{\leqslant}H(\boldsymbol{\wp }){\leqslant}\mathrm{log}\enspace {M}^{\text{B}}!,\end{equation} \tag{ 20 }$

which is the entropy of the corresponding channel $\mathcal{E}$ . Now we explain how to generate ℘ from a distribution $\mathcal{P}$ that is an equal weighting of a uniform prior ${\mathcal{P}}_{{\uparrow}}$ , resulting in high-entropy weight vectors ℘ _↑ such that its entry H( ℘ _↑) (2) is the maximum entropy log M^B!, and another prior ${\mathcal{P}}_{{\downarrow}}$ that generates states ℘ _↓ with low entropy H( ℘ _↓) (2).

To generate ${\mathcal{P}}_{{\uparrow}}$ , we first set each entry of ℘ _↑ be 1 and then normalize this length M^B! weight vector by dividing each element by || ℘ ||₁. In contrast, we generate ${\mathcal{P}}_{{\downarrow}}$ by uniformly randomly generating the first element of ℘ _↓, namely ${\wp }_{{\downarrow}}^{1}$ , from the interval [0, 1] and replace

$\begin{equation}{\wp }_{{\downarrow}}^{1}{\leftarrow}\frac{{\wp }_{{\downarrow}}^{1}}{{M}^{\text{B}}!}.\end{equation} \tag{ 21 }$

The next element of the weight vector, namely, ${\wp }_{{\downarrow}}^{2}$ , is drawn uniformly from the interval

$\begin{equation*}\left[1-{\wp }_{{\downarrow}}^{1},1\right],\end{equation*}$

and we continue according to the rule that ℘_ℓ is drawn uniformly from

$\begin{equation*}\left[1-\sum\limits _{{\jmath}=1}^{\ell -1}\enspace {\wp }_{{\downarrow}}^{{\jmath}},1\right].\end{equation*}$

After randomly generating all these M^B! elements, we normalise this weight vector to obtain ℘ _↓.

Now that we have generated an instance of a low-entropy weight vector ℘ _↓ and a high-entropy weight vector ℘ _↑ we generate numerous vectors in between the two by linear interpolation, thereby sampling the continuous set

$\begin{equation}\boldsymbol{\wp }\left({\boldsymbol{\wp }}_{{\downarrow}},{\boldsymbol{\wp }}_{{\uparrow}},a\right){:=}\frac{a{\boldsymbol{\wp }}_{{\downarrow}}+(1-a){\boldsymbol{\wp }}_{{\uparrow}}}{{\Vert}a,{\boldsymbol{\wp }}_{{\downarrow}},+,(1-a),{\boldsymbol{\wp }}_{{\uparrow}}{\Vert}}\quad \forall \enspace a\in [0,1],\end{equation} \tag{ 22 }$

This interpolation (22) yields medium-entropy channels to round out the sampling.

As we have generated many random instances of ℘ (22), we can construct corresponding descriptions of doubly stochastic channels. The elements of ℘ are coefficients of reverse lexicographically ordered permutation matrices, and this weighted sum is then the permutahedron that describes random doubly stochastic channel $\mathcal{E}$ with entropy given by the entropy of its representative weight vector. Mathematically, the matrix description of the channel is

$\begin{equation}\mathcal{E}=\boldsymbol{\wp }\cdot \mathbf{\Pi },\enspace \mathbf{\Pi }{:=}\left({{\Pi}}_{\sigma };\sigma \in {S}_{{M}^{\text{B}}!}\right),\end{equation} \tag{ 23 }$

for ℘ and Π length-M^B! vectors of real numbers and permutation matrices (B4), respectively, and $\sigma \in {S}_{{M}^{\text{B}}!}$ is drawn in reverse lexicographical order.

3.2.3. Applying the channel to the joint distributions

To describe Bob's noisy measurement mathematically, we apply the generated random channel, discussed in section 3.2.2, to Bob's share of the entire message p ^AB sent by Charlie. First suppose that Alice and Bob each have noisy measurements described by channels ${\mathcal{E}}^{\text{A}}$ and ${\mathcal{E}}^{\text{B}}$ , respectively. Then the state sent back to Charlie from Alice and Bob is

$\begin{equation}{\mathcal{E}}^{\text{A}}{\boldsymbol{p}}^{\text{AB}}{\mathcal{E}}^{\text{B}}\in {\text{mat}}_{{M}^{\text{A}}{\times}{M}^{\text{B}}}(\mathbb{C}),{\boldsymbol{p}}^{\text{AB}}\in {\text{mat}}_{{M}^{\text{A}}{\times}{M}^{\text{B}}}(\mathbb{C}),{\mathcal{E}}^{\text{A}}\in {\text{mat}}_{{M}^{\text{A}}{\times}{M}^{\text{A}}}(\mathbb{C}),{\mathcal{E}}^{\text{B}}\in {\text{mat}}_{{M}^{\text{B}}{\times}{M}^{\text{B}}}(\mathbb{C}),\end{equation} \tag{ 24 }$

corresponding to one application of a noisy channel for each of Alice's and Bob's noisy measurements but noiseless transmission back to Charlie. As Alice's measurement is noiseless, we assign

$\begin{equation}{\mathcal{E}}^{\text{A}}\equiv \mathbb{1},\enspace {\mathcal{E}}^{\text{B}}=:\mathcal{E}.\end{equation} \tag{ 25 }$

Remark 1. In section 3.1, Charlie estimates ${\boldsymbol{p}}^{\text{AB}}\mathcal{E}$ to be $\tilde {{\boldsymbol{p}}^{\text{AB}}\mathcal{E}}$ by repeated sampling but here, in the mathematical description, we work with exact descriptions of the state (24).

3.2.4. Estimating discord and distortion

After Charlie receives ${\boldsymbol{p}}^{\text{AB}}\mathcal{E}$ (24), he computes all permutations of this matrix. Charlie generates each of the M^B! instances of M^B × M^B permutation matrices Π_σ (B3), for each σ drawn from permutation group ${S}_{{M}^{\text{B}}}$ in reverse lexicographical order. For each instance σ, Charlie obtains the permuted state by multiplying ${\boldsymbol{p}}^{\text{AB}}\mathcal{E}$ by each permutation matrix Π_σ.

Here we redefine average discord to quantify channel distortion, in contrast to the earlier definition of classical discord in terms of fluctuations for a state [8], and we define average distortion. This redefinition of average discord, in terms of channels, then allows us to show numerically the monotonic relationship between classical (channel) discord and channel distortion. Average discord is obtained first by minimizing discord (15) over all state permutations and then by averaging over all states according to the prior $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ in section 3.2.1. Similarly, average distortion is obtained by averaging state-dependent distortion (16) over the prior of states $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ .

For average discord, we first extend state-dependent discord (15) to be the minimized state-dependent discord over all permutations of the state

$\begin{equation}\underset{\sigma }{\mathrm{min}}\enspace {{\Delta}}_{\mathcal{E}}^{\text{A};\text{B}}\left({\boldsymbol{p}}^{\text{AB}}{{\Pi}}_{\sigma }\right).\end{equation} \tag{ 26 }$

Averaging over the prior $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ yields channel discord

$\begin{equation}{{\Delta}}^{\text{A};\text{B}}\left(\mathcal{E}\right){:=}\int \mathrm{d}Q\left({\boldsymbol{p}}^{\text{AB}}\right)\underset{\sigma }{\mathrm{min}}\enspace {{\Delta}}_{\mathcal{E}}^{\text{A};\text{B}}\left({\boldsymbol{p}}^{\text{AB}}{{\Pi}}_{\sigma }\right),\end{equation} \tag{ 27 }$

which quantifies the discord due to noisy measurement in a state-independent but of course prior-dependent way. Average discord is obtained by sampling the integral (27) to obtain the estimate ${\tilde {{\Delta}}}^{\text{A};\text{B}}(\mathcal{E})$ .

Similarly, we extend the definition of distortion $\mathcal{D}$ (16) first by minimizing over permutations and then by averaging over the prior $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ . The state- and channel-dependent distortion is

$\begin{equation}{\mathcal{D}}_{\mathcal{E}}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}\right){:=}\mathcal{D}\left({\boldsymbol{p}}^{\text{AB}}\mathcal{E},{\boldsymbol{p}}^{\text{AB}}\right)\end{equation} \tag{ 28 }$

and its minimization over all permutations is

$\begin{equation}{\mathrm{min}}_{\sigma }\enspace {\mathcal{D}}_{\mathcal{E}}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}{{\Pi}}_{\sigma }\right).\end{equation} \tag{ 29 }$

Then, analogous to average discord (27), we integrate to obtain channel distortion

$\begin{equation}{\mathcal{D}}^{\text{A};\text{B}}\left(\mathcal{E}\right){:=}\int \mathrm{d}Q\left({\boldsymbol{p}}^{\text{AB}}\right){\mathrm{min}}_{\sigma }\enspace {\mathcal{D}}_{\mathcal{E}}^{\text{A;B}}\left({\boldsymbol{p}}^{\text{AB}}{{\Pi}}_{\sigma }\right)\end{equation} \tag{ 30 }$

for a given channel. Average discord is obtained by sampling the integral (27) to obtain the estimate ${\tilde {\mathcal{D}}}^{\text{A};\text{B}}(\mathcal{E})$ .

3.2.5. Example: two-bit channel

We now elucidate our model by considering the special case where Charlie generates a single bit for Alice and a single bit for Bob: M^A = M^B = 2. Thus, equation (1) implies that the joint probability of messages shared between Alice and Bob is specified by the matrix

$\begin{equation}{\boldsymbol{p}}^{\text{AB}}=\left(\begin{matrix}\hfill {p}_{00}\hfill & \hfill {p}_{01}\hfill \\ \hfill {p}_{10}\hfill & \hfill {p}_{11}\hfill \end{matrix}\right),\quad {p}_{ij}{\geqslant}0,\enspace {p}_{00}+{p}_{01}+{p}_{10}+{p}_{11}=1.\end{equation} \tag{ 31 }$

The two-bit state (31) is subjected to a distortion channel whose form is dictated by equation (23), which implies

$\begin{equation}\mathcal{E}=\left(\begin{matrix}\hfill 1-\mu \hfill \\ \hfill \mu \hfill \end{matrix}\right)\cdot \left(\begin{matrix}\hfill \mathbb{1}\hfill \\ \hfill X\hfill \end{matrix}\right)=(1-\mu )\mathbb{1}+\mu X=\left(\begin{matrix}\hfill 1-\mu \hfill & \hfill \mu \hfill \\ \hfill \mu \hfill & \hfill 1-\mu \hfill \end{matrix}\right),\quad 0{\leqslant}\mu {\leqslant}1.\end{equation} \tag{ 32 }$

Consequently, the entropy is

$\begin{equation}H=-(1-\mu )\mathrm{log}(1-\mu )-\mu \enspace \mathrm{log}\enspace \mu \end{equation} \tag{ 33 }$

according to equation (20).

Applying the channel to the two-bit state yields

$\begin{equation}{\boldsymbol{p}}^{\text{AB}}\mathcal{E}=\left(\begin{matrix}\hfill {p}_{00}\hfill & \hfill {p}_{01}\hfill \\ \hfill {p}_{10}\hfill & \hfill {p}_{11}\hfill \end{matrix}\right)\cdot \left(\begin{matrix}\hfill 1-\mu \hfill & \hfill \mu \hfill \\ \hfill \mu \hfill & \hfill 1-\mu \hfill \end{matrix}\right)=\left(\begin{matrix}\hfill (1-\mu ){p}_{00}+\mu {p}_{01}\hfill & \hfill \mu {p}_{00}+(1-\mu ){p}_{01}\hfill \\ \hfill (1-\mu ){p}_{10}+\mu {p}_{11}\hfill & \hfill \mu {p}_{10}+(1-\mu ){p}_{11}\hfill \end{matrix}\right).\end{equation} \tag{ 34 }$

We then compute the state discord per equation (26), and we could perform a similar calculation for state distortion per equation (16). The explicit expression for discord (26) is

$\begin{align}& \left((1-\mu ){p}_{00}+\mu {p}_{01}+(1-\mu ){p}_{10}+\mu {p}_{11}\right)\mathrm{log}\left((1-\mu ){p}_{00}+\mu {p}_{01}+(1-\mu ){p}_{10}+\mu {p}_{11}\right)\\ & \qquad +\left(\mu {p}_{00}+(1-\mu ){p}_{01}+\mu {p}_{10}+(1-\mu ){p}_{11}\right)\mathrm{log}\left(\mu {p}_{00}+(1-\mu ){p}_{01}+\mu {p}_{10}+(1-\mu ){p}_{11}\right)\\ & \qquad -\left((1-\mu ){p}_{00}+\mu {p}_{01}\right)\mathrm{log}\left((1-\mu ){p}_{00}+\mu {p}_{01}\right)-\left(\mu {p}_{00}+(1-\mu ){p}_{01}\right)\mathrm{log}\left(\mu {p}_{00}+(1-\mu ){p}_{01}\right)\\ & \qquad -\left((1-\mu ){p}_{10}+\mu {p}_{11}\right)\mathrm{log}\left((1-\mu ){p}_{10}+\mu {p}_{11}\right)-\left(\mu {p}_{10}+(1-\mu ){p}_{11}\right)\mathrm{log}\left(\mu {p}_{10}+(1-\mu ){p}_{11}\right)\\ & \qquad -\left({p}_{00}+{p}_{10}\right)\mathrm{log}\left({p}_{00}+{p}_{10}\right)-\left({p}_{01}+{p}_{11}\right)\mathrm{log}\left({p}_{01}+{p}_{11}\right)\\ & \qquad +{p}_{00}\mathrm{log}\left({p}_{00}\right)+{p}_{01}\mathrm{log}\left({p}_{01}\right)+{p}_{10}\mathrm{log}\left({p}_{10}\right)+{p}_{11}\mathrm{log}\left({p}_{11}\right)\end{align} \tag{ 35 }$

as a function of state p ^AB and channel parameter μ.

If we could integrate discord and distortion over all possible joint states, i.e. over the formal variables p₀₀, p₀₁, p₁₀, p_l1 with respect to an appropriate prior distribution, then we could obtain channel discord and distortion via equations (27) and (30), respectively. Note that the resulting expressions depend only on μ. On the other hand, if discord is monotonic with respect to entropy for all states { p ^AB}, i.e. for all points on the tetrahedron, then discord would be monotonic with respect to this average.

3.3. Methods

In this subsection we discuss how we average discord and average distortion for many randomly chosen channels, with each of these channels corresponding to a different noisy measurement process implemented by Bob. First, we explain in section 3.3.1 how many instances of states we generate and how to create those instances. Then, in section 3.3.2, we explain how we generate all permutation matrices and thence all permuted states. Then we explain how we generate random channels $\left\{\mathcal{E}\right\}$ numerically in section 3.3.3 and how many such channels. Finally, in section 3.3.4 we explain how we calculate the states sent back to Charlie, how to use these states to compute average discord and average distortion, and then study their relations.

3.3.1. Numerically generating states

We begin by generating joint states from either prior $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ or $Q\left({\boldsymbol{p}}_{\text{cp}}^{\text{AB}}\right)$ as described in section 3.2.1. In practice, we generate random states as follow. First we choose B = 99 in equation (19) as we have discovered empirically that this value of B yields a good spread of low-, medium- and high-entropy states. Then we step through values of linear interpolation parameter a by increasing in step sizes that grow quadratically: the coefficient a of $\mathbb{1}$ increases in steps of ([ϖ − 1]0.0101)² for ϖ ∈ [100].

For each choice of a and fixing b = (1 − a)B, we choose a new random instance of p ^AB according to the random-matrix construction method described in section 3.2.1. We insert a, b and p ^AB into equation (19) to obtain one instance of a state for calculating average discord and average distortion.

3.3.2. Numerically generating all permutation matrices

Our analysis considers different messages sizes M such that ${M}^{\text{A}}={M}^{\text{B}}=\sqrt{M}$ , where we have chosen to study only cases for which Alice's and Bob's messages are the same size. For each message size all permutation matrices are generated according to the mathematical description in section 2. We need to construct M^B! permutation matrices, each of size M^B × M^B, representing permutations $\sigma \in {S}_{{M}^{\text{B}}}$ . Each row of the matrix corresponds to a permuted version of the length-M^B initial row vector, which is the vector comprising entries that are the column numbers themselves. These columns correspond to reverse lexicographical ordering of the permutations, with reverse lexicographical ordering discussed in section 2.

We apply perms from MATLAB^® to the initial vector, and the output is the M^B! × M^B matrix Θ, whose elements are column indices for nonzero entries of permutation matrices Π_σ. For each of the M^B! rows, labelled by σ in reverse lexicographical order, we construct the corresponding M^B × M^B permutation matrix Π_σ, whose entries are all zeroes and ones such that only one instance of one appears in each row or column as described in and around equation (B3).

Specifically, to generate Π_σ, we pick row σ from the matrix Θ, denoted as a vector Θ_σ. The value of the entry in the first column of Θ_σ indicates which element of the first row of Π_σ is one, with the rest of the elements in that row being zero. Then we proceed to the second entry of the vector Θ_σ, and its value indicates which element of the second row of Π_σ is one. We continue for all M^B rows of Π_σ and then repeat for all $\sigma \in {S}_{{M}^{\text{B}}}$ . In this way we have constructed the full set of permutation matrices for message m^B.

3.3.3. Numerically generating random channels

We generate 6000 doubly stochastic channels, each channel represented by some weight vector $\boldsymbol{\wp }\left({\boldsymbol{\wp }}_{{\downarrow}},a\right)$ (22), regardless of message size. Construction of each of these weight vectors proceeds according to the mathematical description in section 3.2.2. We choose to generate 6000 instances of random channels because we allow for 100 equally spaced values of a in equations (22) and 60 randomly chosen ℘ _↓ for each a.

A doubly stochastic channel is a permutahedron, and we have generated the set of all permutation matrices ${{\Pi}}_{\sigma }\in {S}_{{M}^{\text{B}}}$ in section 3.3.2. Specifically, we generate random weight vectors according to equation (22), thereby yielding low-, medium- and high-entropy (20) weight vectors. The resultant set of 6000 randomly generated weight vectors faithfully represents 6000 randomly generated channels, and our interpolation (22) ensures good sampling of a wide range of channel entropies. In addition, we manually add the noiseless channel $\mathbb{1}$ to our simulations to include the instance of zero average discord and zero average distortion.

3.3.4. Relating average discord to average distortion

Now that state p ^AB is generated numerically according to the procedure described in section 3.3.1, for both $Q\left({\boldsymbol{p}}^{\text{AB}}\right)$ corresponding to random initial states and $Q\left({\boldsymbol{p}}_{\text{cp}}^{\text{AB}}\right)$ for random cp states, we calculate the corresponding ${\boldsymbol{p}}^{\text{AB}}\mathcal{E}$ for each permutation σ. These permuted returned states $\left\{{\boldsymbol{p}}^{\text{AB}}\mathcal{E}{{\Pi}}_{\sigma }\right\}$ are used to calculate both average discord and average distortion. Mathematical expressions for average discord and average distortion are given in section 3.2.4.

We begin with how we calculate average discord ${{\Delta}}^{\text{A};\text{B}}\left(\mathcal{E}\right)$ (27). First we calculate each ${{\Delta}}_{\mathcal{E}}^{\text{A};\text{B}}\left({\boldsymbol{p}}^{\text{AB}}{{\Pi}}_{\sigma }\right)$ for each p ^AB and then minimize over all $\sigma \in {S}_{{M}^{\text{B}}}$ according to equation (26), thereby obtaining the minimized ${{\Delta}}^{\text{A};\text{B}}\left(\mathcal{E}\right)$ . The next step is to average over all p ^AB. As explained in section 3.3.1, we generate a random state p ^AB for each choice of linear interpolation parameter a (22), which suffices to sample the integral (27) fairly and thus obtain a good estimate ${\tilde {{\Delta}}}^{\text{A};\text{B}}(\mathcal{E})$ of the actual average discord ${{\Delta}}^{\text{A};\text{B}}(\mathcal{E})$ .

The procedure for calculating average distortion ${\mathcal{D}}^{\text{A};\text{B}}(\mathcal{E})$ (30) is similar. First we calculate each ${\mathcal{D}}_{\mathcal{E}}^{\text{A};\text{B}}\left({\boldsymbol{p}}^{\text{AB}}{{\Pi}}_{\sigma }\right)$ for each p ^AB and then minimize over all $\sigma \in {S}_{{M}^{\text{B}}}$ according to equation (29), thereby obtaining the minimized ${\mathcal{D}}^{\text{A};\text{B}}\left(\mathcal{E}\right)$ . The next step is to average over all p ^AB. As explained in section 3.3.1, we generate a random state p ^AB for each choice of linear interpolation parameter a (22), which suffices to sample the integral (30) fairly and thus obtain a good estimate ${\tilde {\mathcal{D}}}^{\text{A};\text{B}}(\mathcal{E})$ of the actual average distortion ${\mathcal{D}}^{\text{A};\text{B}}(\mathcal{E})$ .

Finally, we relate estimated average discord to estimated average distortion by plotting ${\tilde {{\Delta}}}^{\text{A};\text{B}}(\mathcal{E})$ against ${\tilde {\mathcal{D}}}^{\text{A};\text{B}}(\mathcal{E})$ . Specifically, we choose sufficiently large yet tractable message sizes, namely, M^B ∈ {6, 7} and make distinct plots for each M^B. For each randomly chosen channel, the resultant single point on the graph corresponding to ${\tilde {{\Delta}}}^{\text{A};\text{B}}(\mathcal{E})$ and ${\tilde {\mathcal{D}}}^{\text{A};\text{B}}(\mathcal{E})$ is marked, and we thereby obtain a scatter plot. We create plots for two cases, random initial states and random initial cp states, and compare these two cases.

3.3.5. Example: two-bit channel

We explicitly analyze the relationship between average channel discord (27) and channel distortion (30) in the case of a two-bit channel as described in section 3.2.5. By doing so, we establish the monotonicity of discord as a function of channel entropy. Our approach follows.

(a)
We first recognize that the set of possible channels $\mathcal{E}$ can be parameterized by a single number 0 ⩽ μ ⩽ 1 per equation (32). Hence, the channel discord can be expressed as a function of μ. Furthermore, the channel discord is defined to be an integral with respect to some measure over an integrand that itself depends on μ. If we prove that the integrand is monotonic in μ over some interval, we will also have proved that the integral is monotonic in μ over the same interval.
(b)
We can therefore assess the monotonicity of the channel discord (26), which we write here simply as Δ, as a function of H (33) by examining the derivative of Δ with respect to H and applying the chain rule:
$\begin{equation}\frac{\mathrm{d}{\Delta}}{\mathrm{d}H}=\frac{\frac{\mathrm{d}{\Delta}}{\mathrm{d}\mu }}{\frac{\mathrm{d}H}{\mathrm{d}\mu }}={\left(\mathrm{log}(1-\mu )-\mathrm{log}\enspace \mu \right)}^{-1}\cdot \frac{\mathrm{d}{\Delta}}{\mathrm{d}\mu }.\end{equation} \tag{ 36 }$
This expression is well-defined except when $\mu =\frac{1}{2}$ , but this point corresponds to the maximum value of H and so is not relevant for monotonicity arguments. Thus, monotonicity of Δ with respect to H can be proven by demonstrating monotonicity on the intervals $0{< }\mu {< }\frac{1}{2}$ and $\frac{1}{2}{< }\mu {< }1$ , which can be proven by demonstrating monotonicity of the integrand as described in the previous point.

The above two points imply that we can prove the monotonicity of channel discord as a function of channel entropy by showing that the state discord is a monotonically increasing (decreasing) function of μ for $0{< }\mu {< }\frac{1}{2}$ ( $\frac{1}{2}{< }\mu {< }1$ ). We prove that this is so in section 4.4.

4. Results

In this section we present our results, which are numerical in nature. We choose tractable message-size values, namely M^B ∈ {6, 7}, to study the relation between average discord ${\tilde {{\Delta}}}^{\text{A;B}}(\mathcal{E})$ and average distortion ${\tilde {\mathcal{D}}}^{\text{A;B}}(\mathcal{E})$ . Specifically, we plot ${\tilde {{\Delta}}}^{\text{A;B}}(\mathcal{E})$ vs ${\tilde {\mathcal{D}}}^{\text{A;B}}(\mathcal{E})$ for many generated channels as described in section 3.3.3 averaged over randomly generated states. We plots for two cases: randomly generated joint distributions in section 4.1 and randomly generated cp states in section 4.2. Then we explain our best-fit quadratic relation between ${\tilde {{\Delta}}}^{\text{A;B}}(\mathcal{E})$ and ${\tilde {\mathcal{D}}}^{\text{A;B}}(\mathcal{E})$ in section 4.3. Our methods for generating these plots are described in section 3.3.

4.1. Plots for randomly generated joint states

In figure 1, we have plotted estimated average discord ${\tilde {{\Delta}}}^{\text{A;B}}(\mathcal{E})$ (27) and estimated average distortion ${\tilde {\mathcal{D}}}^{\text{A;B}}(\mathcal{E})$ (30) for two cases of total message length M ∈ {6, 7} as discussed in section 3.3.4. This scatter plot represents 6001 instances of randomly chosen channels for Bob and randomly chosen initial states by Charlie, and the points are colour-coded by the Shannon entropy of the weight vector representing the channel (20).

The origin of the plot corresponds to zero average discord and zero average distortion and arises for Bob's measurement being noiseless, i.e. for a zero-entropy weight vector ℘ . We observe a monotonic trend of increasing average discord with respect to increasing average distortion. This monotonicity inference is reinforced in section 4.3 where we explain the best-fit curve, which is certainly monotonic. Furthermore, based on the colour-coded heat map in figure 1, we see monotonically increasing of all three: average discord, average distortion and channel entropy.

The scatter plot shows more features. The highest point of the curve has the maximum allowed entropy log M^B! (20) for the channel. For the two chosen messages sizes, the maximum entropies are

$\begin{equation}\mathrm{log}\enspace 6!=9.492,\qquad \mathrm{log}\enspace 7!=12.299,\end{equation} \tag{ 37 }$

respectively. Also the scatter plot is narrow for low- and high-entropy cases of channels and wide for medium-low choices of channel entropy.

We have provided figures 1(a) and (b) showing scatter plots for M^B = 6 and M^B = 7, respectively. The two scatter plots are similar. The differences are that the maximum entropy for the second scatter plot is higher due to the larger message size, with the increase in maximum entropy given by the ratio of the numbers in (37). Both estimated average discord and estimated average distortion are increased slightly for increased message size.

4.2. Plots for randomly generated conditionally pure states

In this subsection we obtain scatter plots of estimated average discord vs estimated average discord for the case that Charlie generates random cp states (8) instead of random joint states (1) as was done in section 4.1. Other than using cp states here, we follow exactly the same procedure used to obtain figure 1. The purpose of this subsection is to verify or refute that the two cases of initial random joint distributions vs initial random cp states show the same or different features.

The scatter plot for estimated average discord vs estimated average distortion is shown in figure 2 for initial cp states. Similarly to figure 1, the scatter shows monotonically increasing average discord with respect to average distortion, monotonicity of both with respect to channel entropy represented by the heat map, and wider scatter for medium entropy compared to narrow scatter width for low and high entropy. The differences are only with respect to randomness of generated channels in the plots, suggesting that figures 1 and 2 are identical up to random-sampling variability.

4.3. Quadratic best fit to the plots

Now we explain how we fit a curve to the scatter plots of figures 1 and 2. Our numerical results fit well with a quadratic curve

$\begin{equation}{\tilde {{\Delta}}}^{\text{A;B}}(\mathcal{E})={t}_{1}{\left({\tilde {\mathcal{D}}}^{\text{A;B}}(\mathcal{E})\right)}^{2}+{t}_{2}{\tilde {\mathcal{D}}}^{\text{A;B}}(\mathcal{E})+{t}_{3}\end{equation} \tag{ 38 }$

with {t_ı} chosen differently for each plot to minimize root-mean-square error (RMSE). In all four cases, RMSE ∼ 0.1, which indicates a good fit as the RMSE is much smaller than the range of ${\tilde {{\Delta}}}^{\text{A;B}}(\mathcal{E})$ .

4.4. Monotonicity of two-bit channel discord as function of channel entropy

Now we show that the discord of a two-bit noise channel varies monotonically with the entropy of that channel. As we explain in section 3.3.5, it suffices to show that the state discord Δ for an arbitrary two-bit state (31) varies monotonically in the parameter μ that specifies the channel (32). This monotonicity relation can in turn be proven by showing $\frac{\mathrm{d}{\Delta}}{\mathrm{d}\mu }\lessgtr 0$ for $\mu \gtrless \frac{1}{2}$ by exploiting equation (36). For ease of calculation, we substitute $\mu {\mapsto}\frac{1+\alpha }{2}$ . Thus, we accomplish our aim of proving $\frac{\mathrm{d}{\Delta}}{\mathrm{d}\mu }\lessgtr 0$ when $\mu \gtrless \frac{1}{2}$ by instead proving that

$\begin{equation}\mathrm{sgn}\left(\frac{\mathrm{d}{\Delta}}{\mathrm{d}\alpha }\right)=-\mathrm{sgn}\enspace (\alpha )\end{equation} \tag{ 39 }$

for −1 < α < 1.

First we derive an expression for $\frac{\mathrm{d}{\Delta}}{\mathrm{d}\alpha }$ . A tedious-but-straightforward calculation starting from equation (35) reveals that

$\begin{align}\frac{\mathrm{d}{\Delta}}{\mathrm{d}\alpha }=& \left({p}_{00}-{p}_{01}+{p}_{10}-{p}_{11}\right)\mathrm{log}\left(\frac{1+\alpha \left({p}_{00}-{p}_{01}+{p}_{10}-{p}_{11}\right)}{1-\alpha \left({p}_{00}-{p}_{01}+{p}_{10}-{p}_{11}\right)}\right)\\ & -\left({p}_{00}-{p}_{01}\right)\mathrm{log}\left(\frac{1+\alpha \left(\frac{{p}_{00}-{p}_{01}}{{p}_{00}+{p}_{01}}\right)}{1-\alpha \left(\frac{{p}_{00}-{p}_{01}}{{p}_{00}+{p}_{01}}\right)}\right)-\left({p}_{10}-{p}_{11}\right)\mathrm{log}\left(\frac{1+\alpha \left(\frac{{p}_{10}-{p}_{11}}{{p}_{10}+{p}_{11}}\right)}{1-\alpha \left(\frac{{p}_{10}-{p}_{11}}{{p}_{10}+{p}_{11}}\right)}\right).\end{align} \tag{ 40 }$

If we assign

$\begin{align}\begin{aligned}{\gamma }_{0}& {\leftarrow}\frac{{p}_{00}-{p}_{01}}{{p}_{00}+{p}_{01}},\\ {\gamma }_{1}& {\leftarrow}\frac{{p}_{10}-{p}_{11}}{{p}_{10}+{p}_{11}},\\ {w}_{0}& {\leftarrow}{p}_{00}+{p}_{01},\\ {w}_{1}& {\leftarrow}{p}_{10}+{p}_{11},\\ {f}_{\alpha }(x)& {\leftarrow}x\enspace \mathrm{log}\left(\frac{1+\alpha x}{1-\alpha x}\right),\end{aligned}\end{align} \tag{ 41 }$

we can rewrite equation (40) as

$\begin{equation}\frac{\mathrm{d}{\Delta}}{\mathrm{d}\alpha }={f}_{\alpha }\left({w}_{0}{\gamma }_{0}+{w}_{1}{\gamma }_{1}\right)-{w}_{0}{f}_{\alpha }({\gamma }_{0})-{w}_{1}{f}_{\alpha }({\gamma }_{1}).\end{equation} \tag{ 42 }$

Noting that −1 < γ_0,1 < 1 and 0 < w_0,1 < 1 when none of p₀₀, p₀₁, p₁₀, p₁₁ are equal to zero, we have $\frac{\mathrm{d}{\Delta}}{\mathrm{d}\alpha }$ is positive/negative/zero when f_α is concave/convex/linear on the interval (−1, 1). If one or more of p₀₀, p₀₁, p₁₀, p₁₁ are zero, we can have the logarithm argument become of the 0/0 type, in which case we apply l'Hôpital's rule to obtain the limit.

We now show that f_α is concave/linear/convex depending on whether sgn α = −1, 0, 1. It is easy to see that f₀ ≡ 0, which is a linear function, so we focus instead on the case α ≠ 0. In that case, the identity $\mathrm{log}\left(\frac{1+y}{1-y}\right)=\mathrm{a}\mathrm{r}\mathrm{c}\mathrm{t}\mathrm{a}\mathrm{n}\mathrm{h}\enspace y$ shows ${f}_{\alpha }(x)=\frac{1}{2\alpha }{\times}g(\alpha x)$ , where g(y) = y arctanh y. Hence we need only show that g(y) is convex for −1 < y < 1. This convexity can be seen directly from the Maclaurin series for arctanh,

$\begin{equation}g(y)=\sum\limits _{k=1}^{\infty }\frac{{y}^{2k+2}}{2k+1},\end{equation} \tag{ 43 }$

which is a sum of convex functions and hence is convex. Thus, f_α is convex/concave depending on the sign of α. Hence, $\frac{\mathrm{d}{\Delta}}{\mathrm{d}\alpha }$ has the appropriate sign; hence, $\frac{\mathrm{d}{\Delta}}{\mathrm{d}H}$ is monotonic as required.

5. Discussion

We have developed a full classical (i.e. non-quantum) theory for channel discord, which establishes the meaning of nonzero discord in the context of stochastic information theory. Although classical discord and stochastic information theory were introduced in 2015 [8], key notions were sketched rather than fully developed. Here we have given a detailed theory of non-quantum discord all the way from the context of a three-party protocol with noisy measurement to building in Hadamard notation for making expressions clear and elegant to the unprecedented connection between average channel discord, average channel distortion and entropy of the noisy measurement (noisy channel), including monotonic relations between them, which thereby show that channel discord, in a classical setting, is a form of channel distortion arising due to one party's noisy measurement.

Scatter plots for average discord vs average distortion in figures 1 and 2 show this monotonic relation between average discord and average distortion and, through the heat maps, also the monotonicity between average distortion and channel entropy. These results are purely numerical but show a simple quadratic relationship for two choices of message sizes. A general mathematical relation connecting average discord to average distortion is beyond the scope of this work but we derive mathematical relations for the two-bit case both to illustrate how analytical results can be obtained and also to lend support to our conjectures based on numerical analysis; our analysis has focused on developing the protocol, making clear the problem, defining appropriate quantities and tackling numerically. Mathematically proving the general case is challenging so we instead solve the special two-bit case, i.e. the case that each of Alice and Bob hold one bit and Bob's measurement is noisy, and there we prove that channel discord is a monotonic function of channel entropy. Thus, the numerical results are backed up by a closed-expression analysis, and, furthermore, this analysis points the way to general proofs, likely using the Hadamard calculus elaborated in appendix A.

The plots in section 4 display a high level of scatter for medium-entropy cases and much less scatter for low- and high-entropy cases. Although the spread is large, monotonicity and quadratic scaling is clearly evident in these plots and the root-mean-squared error (RMSE) of each plot hovering around 0.1 is testament to the quality of the quadratic fit and hence the inference of monotonicity.

Our analysis has focused only on noise represented by doubly stochastic maps, which correspond to permutahedrons. In other words, we have concentrated on noise that would arise from random permutations of classical message measurements correspondingly measuring some messages as different messages incorrectly. In the spirit of quantum discord, our average discord and average distortion calculations are built on minimizing over all such permutations. Future work should involve generalization from doubly stochastic to stochastic maps; in the quantum context, this generalization would be akin to extending from completely positive trace-preserving mappings to completely positive mappings.

6. Conclusions

Discord has emerged as one of the most significant quantum resources [7] but not without controversy [8, 21]. Separating quantum and non-quantum aspects of discord is vital to determine quantum resourcefulness and otherwise for discord. The connection between state discord and entanglement is known, but discord for channels has been unexplored under the treatment of noisy measurement as being manifested by a noisy channel. Here we establish and elucidate connections between classical channel discord and channel distortion and entropy. To this end, we have developed a protocol, mathematical framework and numerical analysis of average channel discord, with averaging being over random shared message states (for random initial joint distributions and, to check consistency, over cp states) and over random doubly stochastic channels representing noisy measurement by one party. Note that we have defined classical channel discord to quantify channel 'fluctuations' (i.e. 'noise'), in contrast to earlier work on classical discord, which quantifies fluctuations for a state [8]. Our notion of classical channel discord then leads to our numerical demonstration of monotonicity between classical (channel) discord and channel distortion. Thus, our results show numerically that average discord, in the non-quantum setting, is equivalent to average distortion of a channel with channel distortion based on total-variation distance. Furthermore, we show numerically that this distortion measure is monotonic in channel entropy, which builds confidence that total-variation distance is a reasonable way to quantify distortion.

Akin to quantum discord, we have incorporated minimization of average discord and average distortion over all permutations, which permutations referring to permuting messages. The identity permutation corresponds to reading each message correctly and other permutations cause some messages to be read as other messages. Given that the noisy measurement is modelled as a doubly stochastic channel, which is a permutahedron, the idea of minimizing over all permutations is to identify which permutation minimizes channel discord and minimizes channel distortion averaged over all states. This minimization is key to connecting our notion of classical distortion to the quantum version.

We have created a full framework for studying the connection between average discord and average distortion for a noisy channel and have shown numerically a monotonic relation between the two. This monotonic relation is satisfying as we can now regard discord, in the classical setting, as an alternative measure of channel distortion, manifested as noisy readout by one party. We augment our numerical analysis of channel discord vs channel distortion by studying the two-bit example analytically and provide results for discord corresponding to certain special but important two-bit states. We can see that analytic methods are challenging, even in the two-bit case, but our analytical study shows a path forward for general analytical work, which benefits from using the Hadamard calculus we discuss in appendix A.

Acknowledgments

This work has been supported by the Australian Research Council (ARC) via the Centre of Excellence in Engineered Quantum Systems (EQuS) project number CE110001013. BCS appreciates financial support from NSFC (Grant No. 11675164). The authors appreciate useful discussion with Si-Hui Tan and Nigum Arshed in the early stages of this project. The authors acknowledge University of Sydney for providing high-performance computing used to obtain early informative results.

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Appendix A.: Hadamard calculus

We review the convenient Hadamard notation [22], and introduce what we call Hadamard calculus, which is novel but useful for studying stochastic information. Specifically, we explain the Hadamard product and summing over elements of a Hadamard product, the Hadamard logarithm and the entropy in this notation. Also we introduce vector calculus based on Hadamard notation principles.

The Hadamard product between two rank-t tensors

$\begin{equation}\boldsymbol{a}=\left({a}_{{{\imath}}_{1}{{\imath}}_{2}\dots {{\imath}}_{t}}\right),\qquad \boldsymbol{b}=\left({b}_{{{\imath}}_{1}{{\imath}}_{2}\dots {{\imath}}_{t}}\right)\end{equation} \tag{ A1 }$

is

$\begin{equation}\boldsymbol{a}{\circ}\boldsymbol{b}{:=}\left({a}_{{{\imath}}_{1}{{\imath}}_{2}\dots {{\imath}}_{t}}{b}_{{{\imath}}_{1}{{\imath}}_{2}\dots {{\imath}}_{t}}\right),\end{equation} \tag{ A2 }$

which is simply the rank-t tensor comprising element-wise products of the elements of each of the two rank-t tensors in the product. For t = 1, a and b are vectors, and ◦ is just the vector obtained by element-wise products of the corresponding vector elements. We define the sum over all elements in the Hadamard product (A2) by

$\begin{equation}\boldsymbol{a}\odot \boldsymbol{b}{:=}\sum\limits _{{{\imath}}_{1}{{\imath}}_{2}\dots {{\imath}}_{t}}{(\boldsymbol{a}{\circ}\boldsymbol{b})}_{{{\imath}}_{1}{{\imath}}_{2}\dots {{\imath}}_{t}}=\sum\limits _{{{\imath}}_{1}{{\imath}}_{2}\dots {{\imath}}_{t}}\enspace {a}_{{{\imath}}_{1}{{\imath}}_{2}\dots {{\imath}}_{t}}{b}_{{{\imath}}_{1}{{\imath}}_{2}\dots {{\imath}}_{t}}.\end{equation} \tag{ A3 }$

Typically, in the literature, ◦ and ⊙ are both employed to refer to our ◦, but here we use ◦ and ⊙ as distinct operations as we have explained here. The norm of a vector, matrix or tensor is

$\begin{equation}\enspace \mathrm{norm}\enspace \boldsymbol{a}=\sqrt{\boldsymbol{a}\odot \boldsymbol{a}},\end{equation} \tag{ A4 }$

which is nonnegative.

For a (A1) restricted by

$\begin{equation}0{< }{a}_{{{\imath}}_{1}\dots {{\imath}}_{t}}{\leqslant}1,\end{equation} \tag{ A5 }$

and introducing $\mathbb{J}{:=}\left(1\right)$ as the tensor with every entry being 1 (in contrast to $\mathbb{I}$ being the matrix such that only diagonals are 1 and off-diagonal elements are all 0) of equal size to tensor a , the element-wise logarithm is

$\begin{equation}\mathrm{log}\boldsymbol{a}{:=}-{\sum\limits _{\ell =1}^{\infty }}^{1}\ell {\left(\boldsymbol{1}-\boldsymbol{a}\right)}^{{\circ}\ell }=\left(\mathrm{log}\enspace {a}_{{{\imath}}_{1}\dots {{\imath}}_{t}}\right).\end{equation} \tag{ A6 }$

Here we use the notation •^◦ℓ to refer to the ℓ-fold element-wise product of the tensor • with itself.

For constructing conditional states, we employ Hadamard division ⊘ [23]. Hadamard division for two same-dimensional tensors (including vectors and matrices) is simply their element-wise division. Another definition of Hadamard division applies for a matrix divided by a vector, where the length of the vector equals the number of rows (or columns) of the matrix; in this case Hadamard division of the matrix by the vector corresponds to division of the row (or column) vectors of the matrix by the elements of a vector.

Appendix B.: Probability vector and entropy

We introduce stochastic information, which is essentially already known [11], but we employ the elegant notation of appendix A in a novel way to convey the concepts with simple, easy-to-grasp expressions once Hadamard arithmetic [23] is clear. Below we explain the concept of a stochastic-information state. Then we have set the stage for the subsequent discussion on bipartite stochastic-information states.

Our stochastic-information construct is a distribution of messages, with each message labelled by integer

$\begin{equation}m\in [M]{:=}\left\{1,2,\dots ,M\right\}\subset {\mathbb{Z}}^{+}\end{equation} \tag{ B1 }$

so the distribution of such messages is

$\begin{equation}\boldsymbol{p}{:=}\left({p}_{m};m\in [M]\right)\in {\mathbb{R}}^{L},\enspace {p}_{m}{\geqslant}0\enspace \forall \enspace m,\sum\limits _{m=1}^{M}\enspace {p}_{m}=1\end{equation} \tag{ B2 }$

represented here as a probability vector [24]. The probability vector can be permuted in the sense of rearranging the probabilities of various messages.

A given permutation is represented by some σ ∈ S_M, for S_M the permutation (or symmetry) group over all M messages. The cardinality of S_M is M!, and the permutation group can be ordered in various ways, and we adopt reverse lexicographical ordering [25]. A permutation of the probability vector is represented by permutation matrix Π_σ for permutation $\sigma \in {M}_{M}\left(\left\{0,1\right\}\right)$ (i.e. M × M matrices whose entries are only 0 or 1) given by [26]

$\begin{equation}{{\Pi}}_{\sigma }\boldsymbol{p}=\left({p}_{\sigma (b)}\right)\quad \forall \enspace \boldsymbol{p}\end{equation} \tag{ B3 }$

so Π_σ contains exactly one entry of 1 in each row or column and the rest of the entries are 0. We represent the sequence of all permutation matrices by the vector

$\begin{equation}\mathbf{\Pi }{:=}\left({{\Pi}}_{\sigma };\sigma \in {S}_{M}\right)\end{equation} \tag{ B4 }$

with the sequence of σ drawn from the permutation group in reverse lexicographical order.

A specific message of interest is a versor, which is a vector whose entries all zero except one element whose entry is one [27]. Any permutation of a versor corresponds to just replacing a given message by a new message so a permutation of a versor is just another versor. A versor is equivalent to a stochastic-information state ${p}_{m}={\delta }_{m\check {m}}$ for $\check {m}$ the given message. We write this versor, corresponding to specific message $\check {m}$ , as ${\boldsymbol{\delta }}_{\check {m}}$ . Versors form a basis for stochastic-information states, which we call the message basis. A permutation matrix (B3) is actually a tensor product of a versor and a coversor, with a coversor defined to be a covector version of a versor.

The Cartesian product of versors forms a basis for bipartite stochastic-information states, from which a joint distribution can be constructed. Suppose the two parties, Alice and Bob, each hold an information state ${\boldsymbol{\delta }}_{\check {m}}^{\text{A}}$ and ${\boldsymbol{\delta }}_{{\check {m}}^{\prime }}^{\text{B}}$ , respectively, where we use superscripts ^A and ^B to denote who owns which of the vector spaces in the tensor product. Alice's and Bob's joint state is the bipartite versor

$\begin{equation}{\boldsymbol{\delta }}_{m{m}^{\prime }}^{\text{AB}}{:=}{\boldsymbol{\delta }}_{m}^{\text{A}}{\boldsymbol{\delta }}_{{m}^{\prime }}^{\text{B}}\in {M}_{{M}^{\text{A}}{\times}{M}^{\text{B}}}\left(\left\{0,1\right\}\right),\end{equation} \tag{ B5 }$

which is pure over the Cartesian product. For M^A = M^B, the state represented by the M^A × M^B identity matrix $\mathbb{1}/\sqrt{M}$ , for M := M^A M^B, corresponds to all messages to Alice and Bob being identical but all message instances are equally likely. Thus, the conditional entropy of this mixture of states is zero whereas the entropy of this set of messages pairs is maximum; i.e. the maximum entropy is log M^A.

Appendix C.: Quantum discord

We now explain quantum discord at a high level instead of delving into a full mathematical description of quantum discord, which requires Hilbert space. We begin with a brief discussion of the history of quantum discord, both theoretical and experimental, and typical interpretations of what quantum discord means such as quantum correlations and resources.

The concept of quantum discord was proposed to separate total correlations of a bipartite quantum state into purely quantum and classical parts [1, 2]. Quantum discord per se is the difference between two classically identical expressions for mutual information but adapted for a quantum system and was described as a measure of quantumness of correlations. Discord appeared to be a more general way of quantifying quantum correlations [21, 28–30] compared to entanglement as vanishing entanglement does not ensure vanishing discord, and the absence of entanglement does not imply classicality. Operationally, quantum discord has been interpreted in the context of quantum state merging, if pertinent prior information is discarded [31, 32], the quantum–classical separation associated with discord has been cast in terms of negative conditional entropy [8]. The value of discord as a quantum resource has been debated vigorously, sometimes as a powerful quantum resource [7] but also critically, for example that pure states of nonzero discord have zero measure [21] and that nonzero discord is classically explainable iff entanglement is not present, as discussed in remark 4 of reference [8]. Discord has been generalised for different types of quantum measurements [33], for Rényi entropy [34, 35] and for higher dimension [36].

Quantum discord has been experimentally studied in optical systems in which quantum discord was shown to be a resource for quantum remote-state preparation, specifically showing that separable states with non-zero quantum discord can outperform entangled states [37]. In continuous-variable Gaussian optics, the experimental Gaussian quantum discord has been studied for a two-mode squeezed thermal state [38]. Experimentally encoding information within the discordant correlations of two separable Gaussian states shows that bipartite discord can be consumed to encode information that is only accessible by coherent quantum interactions [39]. A flexible two-photon setup has realized a three-qubit system with programmable degrees of initial correlations, measurement interaction, and characterization processes, thereby yielding the demonstration that local observation in an activation protocol for converting discord into distillable entanglement [40]. A trapped-ion experiment has shown that quantum discord inference of open-system dynamics detects system-environment quantum correlations without accessing the environment [41].

In contrast to stochastic information states, which are probability distributions (B2) or joint distributions for the bipartite case (1), the quantum state is a trace-class bounded completely positive operator ρ on Hilbert space $\mathcal{H}$ , or on the tensor product $\mathcal{H}\otimes \mathcal{H}$ for the bipartite case [9]. The quantum state's entropy is $H(\rho )=-\mathrm{tr}\left(\rho \enspace \mathrm{log}\enspace \rho \right)$ for tr the trace operation. In quantum information theory, measurement is described by positive operator-valued measures [9], but, for quantum discord, only projective-valued measures {P} [9], which are self-adjoint projections on $\mathcal{H}$ , are considered. Each projective-valued measure P comprises a set of projective operators P_ı with P_ı P_ȷ = P_ı δ_ıȷ. Measurement of a state yields a real-valued outcome, and the state is subsequently described by the jth projection P_j ρ corresponding to that outcome. This projection is expressed as a conjugation of ρ in the literature [2], but this way of expressing is superfluous for our purposes and hence not employed.

The conditional quantum state, first defined by Cerf and Adami [42], is now typically defined as [2]

$\begin{equation}{\rho }_{j}^{\text{A}\vert \text{B}}{:=}\left(\mathbb{1}\otimes {P}_{j}\right){\rho }^{\text{AB}\left(\mathbb{1}\otimes {P}_{j}\right)}/{p}_{j},\enspace {p}_{j}{:=}\mathrm{tr}\left(\left(\mathbb{1}\otimes {P}_{j}\right){\rho }^{\text{AB}\left(\mathbb{1}\otimes {P}_{j}\right)}\right)\end{equation} \tag{ C1 }$

with p_j the probability of Bob obtaining jth outcome after he has applied projection-valued measures (PVM) P. The conditional quantum entropy [42] is

$\begin{equation}{H}^{\text{A}\vert \text{B}}=\sum\limits _{j}\enspace {p}_{j}{H}_{j}^{\text{A}\vert \text{B}},\quad {H}_{j}^{\text{A}\vert \text{B}}\left({\rho }^{\text{AB}}\right){:=}H\left({\rho }_{j}^{\text{A}\vert \text{B}}\right),\end{equation} \tag{ C2 }$

which is a probability-weighted average of conditional entropy.

Mutual information I^A;B(ρ^AB) is the same as for equation (9) except that H now corresponds to the quantum entropy, and the last term is replaced by ${H}^{\text{A:B}}{\mapsto}H\left({\rho }^{\text{AB}}\right)$ . The alternative mutual information is

$\begin{equation}{J}^{\text{A};\text{B}}={H}^{\text{A}}-\underset{{P}^{\text{B}}}{\mathrm{sup}\enspace }{H}^{\text{A}\vert \text{B}},\quad {H}^{\text{A}}{:=}H\left({\rho }^{\text{A}}\right),\quad {\rho }^{\text{A}}={\mathrm{tr}}_{\text{B}}{\rho }^{\text{AB}},\end{equation} \tag{ C3 }$

which is the supremum over all Bob's PVMs. Quantum discord, analogous to classical discord (11), which was actually defined later than quantum discord [8], is

$\begin{equation}{{\Delta}}^{\text{A};\text{B}}\left({\rho }^{\text{AB}}\right){:=}{I}^{\text{A};\text{B}}\left({\rho }^{\text{AB}}\right)-{J}^{\text{A};\text{B}}\left({\rho }^{\text{AB}}\right),\end{equation} \tag{ C4 }$

which quantifies the difference between the two mutual information quantities for quantum discord.

Quantum discord is interpreted as correlations that remain after classical correlations are subtracted from total correlation and recognised to quantify the non-classical correlations in a quantum system, including entanglement and therefore identified as a quantum resource [30]. The primary feature encapsulated by its quantum property is how a state is affected by local measurements and seen as a form of classical correlation aided with quantum coherence (superpositions) at the level of individual subsystems [21, 28, 43]. The presence of discord in quantum computing protocols [5, 6], motivates the assertion that discord is a quantum resource, operationalized by state merging [4], which can deliver the quantum advantage.

Channel discord and distortion

Article metrics

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

2. Background

2.1. Stochastic information

2.2. Stochastic map and stochastic matrix

2.3. Classical discord and distortion for a state

3. Approach

3.1. Model

3.2. Mathematics

3.2.1. Generating joint distributions

3.2.2. Generating channels

3.2.3. Applying the channel to the joint distributions

3.2.4. Estimating discord and distortion

3.2.5. Example: two-bit channel

3.3. Methods

3.3.1. Numerically generating states

3.3.2. Numerically generating all permutation matrices

3.3.3. Numerically generating random channels

3.3.4. Relating average discord to average distortion

3.3.5. Example: two-bit channel

4. Results

4.1. Plots for randomly generated joint states

4.2. Plots for randomly generated conditionally pure states

4.3. Quadratic best fit to the plots

4.4. Monotonicity of two-bit channel discord as function of channel entropy

5. Discussion

6. Conclusions

Acknowledgments

Data availability statement

Appendix A.: Hadamard calculus

Appendix B.: Probability vector and entropy

Appendix C.: Quantum discord

Channel discord and distortion

Article metrics

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

2. Background

2.1. Stochastic information

2.2. Stochastic map and stochastic matrix

2.3. Classical discord and distortion for a state

3. Approach

3.1. Model

3.2. Mathematics

3.2.1. Generating joint distributions

3.2.2. Generating channels

3.2.3. Applying the channel to the joint distributions

3.2.4. Estimating discord and distortion

3.2.5. Example: two-bit channel

3.3. Methods

3.3.1. Numerically generating states

3.3.2. Numerically generating all permutation matrices

3.3.3. Numerically generating random channels

3.3.4. Relating average discord to average distortion

3.3.5. Example: two-bit channel

4. Results

4.1. Plots for randomly generated joint states

4.2. Plots for randomly generated conditionally pure states

4.3. Quadratic best fit to the plots

4.4. Monotonicity of two-bit channel discord as function of channel entropy

5. Discussion

6. Conclusions

Acknowledgments

Data availability statement

Appendix A.: Hadamard calculus

Appendix B.: Probability vector and entropy

Appendix C.: Quantum discord