Tema con variazioni: quantum channel capacity
Dennis Kretschmann and Reinhard F Werner
Institut für Mathematische Physik, Technische Universität Braunschweig, Mendelssohnstr. 3, 38106 Braunschweig, Germany
Email: d.kretschmann@tu-bs.de
Received 11 November 2003
Published 23 February 2004
| Abstract. Channel capacity describes the size of the nearly ideal channels, which can be obtained from many uses of a given channel, using an optimal error correcting code. In this paper we collect and compare minor and major variations in the mathematically precise statements of this idea which have been put forward in the literature. We show that all the variations considered lead to equivalent capacity definitions. In particular, it makes no difference whether one requires mean or maximal errors to go to zero, and it makes no difference whether errors are required to vanish for any sequence of block sizes compatible with the rate, or only for one infinite sequence. |
Contents
1. Introduction
Quantum channel capacity is one of the key quantitative notions of the young field of quantum information theory. Whenever one asks `how much quantum information' can be stored in a device, or sent on a transmission line, it is implicitly a question about the capacity of a channel. Like Shannon's classical definition, the concept applies also to noisy channels, which do corrupt the signal. In this case one may apply an error correction scheme, and still use the channel like an almost ideal one. Capacity expresses this quantitatively: it is the maximal number of ideal qubit (resp. bit) transmissions per use of the channel, taken in the limit of long messages and using error correction schemes asymptotically eliminating all errors.
Many of the terms in this informal definition can be, and in fact have been, formalized mathematically in different ways. As a result, there are many published definitions of quantum capacity in the literature. Some of these are immediately seen to be equivalent, but with other variants this is less obvious. Moreover, some of the differences seem to have gone unnoticed, creating the danger that some results would be unwittingly transferred between inequivalent concepts, creating a mixture of rigorous argument and folklore hard to unravel.
The purpose of the present paper is to show that, fortunately, all the major definitions are indeed equivalent. In order to make the presentation self-contained, we have also included abridged versions of arguments from the literature. Other points, however, e.g., concerning the question whether the rate has to be achieved on every sequence of increasing blocks, or just on an infinite, possible sparse set of increasing block sizes, seem to be new. We have also made an effort to lay out the required tools carefully, so that they can be used in other applications.
All this does not help much to come closer to the proof of a coding theorem, i.e., to a rigorous formula for the capacity not requiring the solution of asymptotically large optimization problems. Major progress in this direction has recently been obtained by Shor [1, 2] and Devetak [3]. We hope that our work will contribute to an unambiguous interpretation of these results, as well.
The key chapter (section 2) of this paper begins with presenting the theme: a basic rigorous definition of quantum channel capacity. This is followed by nine logical variations on this theme, which like musical variations are not all of the same weight. In each variation a result is stated to the effect that a modified definition is equivalent to the basic one after all. All proofs, however, are left to the later sections. A coda at the end of the variations comments on the coding theorem and recent developments.
1.1. Notations
In order to state the basic definition of capacity and its variations, we have to introduce some notation. A quantum channel which transforms input systems described by a Hilbert space
into output systems described by a (possibly different) Hilbert space
is represented by a completely positive trace-preserving linear map
, where by
we denote the space of trace class operators on
. This map takes the input state to the output state, i.e., we work in the Schrödinger picture (see Kraus' textbook [4] for a detailed description of the concept of quantum operations).
The definition of channel capacity requires the comparison of the channel after correction with an ideal channel. As a measure of the distance between two channels we take the norm of complete boundedness (or cb-norm, for short) [5], denoted by ||
||cb. For two channels T and S, the distance
can be defined as the largest difference between the overall probabilities in two statistical quantum experiments differing only by exchanging one use of S by one use of T. These experiments may involve entangling the systems on which the channels act with arbitrary further systems. Equivalently, we may set ||T||cb = supn||T⊗idn||∞, where the norm is the norm of linear operators between the Banach spaces
, and idn denotes the identity map (ideal channel) on the n×n matrices.
Among the properties which make the cb-norm well-suited for capacity estimates are multiplicativity, ||T1⊗T2||cb = ||T1||cb||T2||cb, and unitality, ||T||cb = 1 for any channel T. The equivalence with other error measures is discussed extensively below.
Note that throughout this work we use base two logarithms, and we write ld x := log2 x.
2. Tema: quantum channel capacity
Definition 2.1. A positive number R is called the achievable rate for the quantum channel
with respect to the quantum channel
iff for any pair of integer sequences
and
with limν→∞nν = ∞ and
we have
where we set
the infimum taken over all encoding channels E and decoding channels D with suitable domain and range. The channel capacity Q(T, S) of T with respect to S is defined to be the supremum of all achievable rates. The quantum capacity is the special case Q(T) := Q(T,id2), with id2 being the ideal qubit channel.
This definition is a transcription of Claude E Shannon's definition of the capacity of a discrete memoryless channel in classical information theory, as presented originally in his famous 1948 paper [6] and now found in most standard textbooks on the subject (e.g., [7, 8]). To make the translation one only needs to express Shannon's maximal error probabilities in terms of norm estimates [9] and take an ideal one-bit channel rather than the one-qubit channel as the reference. This choice can also be made for quantum channels T, defining the capacity C(T) of a quantum channel for classical information. Much more is known about C(T) than about Q(T) [10, 11].
2.1. Prima variazione: choice of units
Formally, definition 2.1 assigns a special role to the ideal qubit channel id2. Is this essential? What do we get if we take the ideal channel idn on a Hilbert space of some dimension n > 2 as reference?
We will show in section 3.2 that the choice n = 2 only amounts to a choice of units, fixing the unit bit:
2.2. Seconda variazione: testing only one sequence
At first sight definition 2.1 of channel capacity, as given above and widely used throughout the community [9], [12]-[15], seems a little impractical, since it involves checking an infinite number of pairs of sequences when testing a given rate R. Work would be substantially reduced if only one such pair had to be tested. For the sake of discussion let us say that a rate R is sporadically achievable if, for some pair of sequences
,
, with nν→∞ and vanishing errors, the rate R is achieved infinitely often:
. For example, there might be a special coding scheme which utilizes some rare number theoretical properties of n and m. Many published definitions [16]-[19] would accept sporadically achievable rates as achievable in the capacity definition. Often the choice nν = ν is made [20]-[24]. While this sequence of block sizes can hardly be called `sporadic', it is a logically similar variation to the sparse sequences, so we include it for convenience.
In section 7 we show that all sporadically achievable rates are, in fact, achievable. Hence there is no need to introduce a `sporadic capacity'. What we have to show is that coding schemes that work infinitely often can be extended to all block sizes. This is a non-trivial result, since we also show that by merely putting blocks together, and by perhaps not using some of the code bits such an extension is not possible.
2.3. Terza variazione: minimum fidelity
The cb-norm is by no means the only way to evaluate the distance between two channels. Another distance measure that has appeared particularly widely (e.g., in [17, 20, 21, 25]) is the minimal overlap between input and corresponding output states: the minumum fidelity of a quantum channel
is defined as
When we want to particularly emphasize the Hilbert space
on which the minimization is performed, we will write
instead.
Of course,
, and
implies that T acts as the ideal channel on
. These features make the minimum fidelity a suitable distance measure. We might then call a positive number R an achievable rate for the channel T if there is a sequence
of Hilbert spaces such that
and
for suitable encodings
and decodings
.
In section 4.2 we show that the quantum channel capacity arising from this definition is the same.
2.4. Quarta variazione: average fidelity
Instead of requiring that the maximum error be small we might be less demanding, and just require an average error to vanish. In the previous section we would then have to replace the minimum fidelity F(T) by the average fidelity,
where the integral is over the normalized unitarily invariant measure `dψ' on the unit vectors in
.
In sections 4.3 and 4.4 we show that this modification has no effect on the quantum channel capacity. An alternative proof is presented in section 6.2.
2.5. Quinta variazione: entanglement fidelity
Entanglement fidelity was introduced by Ben Schumacher in 1996 [26] and is closely related to minimum fidelity. It characterizes how well the entanglement between the input states and a reference system not undergoing the noise process is preserved: For a quantum channel
and a quantum state
, the entanglement fidelity of
with respect toT, Fe(
, T), is given as
where ψ is a purification of
. This quantity does not depend on the details of the purification process, as is made evident by the alternative expression [26]
where T(σ) = ∑itiσt*i is the Kraus decomposition of T [4]. Obviously, 0≤Fe(
, T)≤1. Moreover, Fe(
,T) = 1 implies that T is noiseless on the support of
:T|supp(
) = idsupp(
).
We might then define achievable rates exactly as in section 2.3 above, replacing the condition
by the requirement that
for suitable encodings
and decodings
.
The quantum capacity that stems from this definition of achievable rates is likewise equivalent, as shown in section 4.2.
In the previous definition, instead of minimizing over all density operators
, one can simply choose
to be the maximally mixed state on
,
, with the shorthand
. The resulting variant of entanglement fidelity we call channel fidelity [27] of the quantum channel T,
where Ω = d - 1/2∑i = 1d|i, i is a maximally entangled state on
.
Channel fidelity is a very handy figure of merit, since it is a linear functional, does not involve a maximization process, and is completely equivalent to the error criteria discussed above. The details are spelled out in section 4.3.
A further variant arises when in the definition of channel fidelity instead of the maximally entangled state Ω an arbitrary input state
is permitted, replacing the channel fidelity Fc(T) by the quantity
. This is the error quantity on which Devetak's entanglement generating capacity [3] is built on, and is also equivalent (section 4.5).
2.6. Sesta variazione: entropy rate
The original definition of quantum channel capacity in terms of entanglement fidelity involves a different concept of computing the rates [16, 18]. According to this definition, the capacity of a quantum channel is the maximal entropy rate of a quantum source whose entanglement with the reference system is preserved by the noisy channel. A quantum source
consists of a pair of sequences of Hilbert spaces
and corresponding density operators
. It is meant to represent a stream of quantum particles produced by some physical process. Its entropy rate is defined as
where S(
) = - tr(
ld
) is the von Neumann entropy.
The quantum capacity as defined by Schumacher is then the supremum of all entropy rates for sources such that limn→∞Fe(
n, DnT⊗nEn) = 1 for suitable encodings
and decodings
.
It turns out that in order to make this definition equivalent to the others, some mild constraint on the sources is needed. In fact, we will show in section 4.4 that the supremum over all sources will be infinite for all channels with positive capacity. However, for a wide range of interesting sources equivalence does hold, namely (cf section 4.4),
, which brings us back to the definition based on channel fidelity discussed in the previous section,
2.7. Settima variazione: errors vanishing quickly or not at all
In the various definitions of achievable rates presented so far, instead of simply requiring the error quantity to approach zero in the large block limit, one could impose a certain minimum speed of convergence, e.g., linear, polynomial, exponential or super-exponential convergence, as a function of the number of channel invocations. We will show in section 7 that all these definitions coincide, as long as the speed of convergence is at the most exponential.
If we require the errors to vanish even faster or, in the extreme case, that Δ(nν, mν) = 0 for large enough ν, as in the theory of error-correcting codes invented by Knill and Laflamme [31], equivalence no longer holds: if a channel has a small, but non-vanishing probability for depolarization, the same also holds for its tensor powers, and no such channel allows the perfect transmission of even one qubit. Hence the capacity based on exactly vanishing errors will be zero for such channels.
On the other hand, one might sometimes feel inclined to tolerate (small) finite errors in transmission: For some
> 0, let Q
(T) denote the quantity defined exactly like the quantum channel capacity in definition 2.1, but requiring only Δ(nν, mν)≤
for large ν instead of limν→∞Δ(nν,mν) = 0. Obviously, Q
(T)≥Q(T) for any quantum channel T. We even have lim
→0Q
(T) = Q(T) (see section 7.3).
In the purely classical setting even more is known: if
> 0 is small enough, one cannot achieve bigger rates by allowing small errors, i.e., C
(T) = C(T). This is the so-called strong converse to Shannon's coding theorem. It is still unknown whether an analogous property holds for quantum channels.
2.8. Ottava variazione: isometric encodings and homomorphic decodings
Definition 2.1 of channel capacity involves an optimization over the set of all encoding and decoding maps. This set is very large, and it may thus seem favourable to restrict both encoding and decoding to smaller classes.
In [17] it has been shown that we may restrain our attention to isometric encodings, i.e., encodings of the form
with isometric V, and still be left with the same capacity (see section 5 for details). Physically, this means that encoding can always be thought of as a unitary process augmented by an initial projection onto a subspace small enough to fit into the channel.
In the Knill and Laflamme [31] setting of perfect error correction, not only are encoding maps isometric, but in addition the decodings can be chosen to be of the (Heisenberg picture) form
with isometric V and an arbitrary reference state
0. We call maps of this type homomorphic, because the first term is an algebraic homomorphism, and the second term only serves to render the whole channel unital.
Since the sufficiency of isometric encoding transfers from the perfect error correction setting to asymptotically perfect error correction, it may seem reasonable to conjecture that a similar result holds for homomorphic decodings. However, up to now no such result is known.
2.9. Nona variazione: coding with a little help from a classical friend
Here we consider a setup in which a quantum channel T is assisted by additional classical forward communication between the sender (Alice) and the receiver (Bob). Clearly, this allows Alice and Bob to collaborate in a more coordinated fashion: Alice may use the additional resource to transfer information about the encoding process, which Bob on his part may try to take advantage of in his choice of the decoding channel.
However, it is a straightforward consequence of the isometric encoding theorem that these new possibilities do not help to increase the channel capacity, even if the classical side channel is noiseless: we have Q(T⊗idc) = Q(T) [17, 25], where by idc we denote an ideal channel of arbitrarily large dimensionality. That this is not a trivial statement is seen from the observation that classical feedback between successive channels uses may increase the capacity [18, 20, 25, 32].
The uselessness of classical forward communication may be extended to cover the so-called separable side channels, i.e., quantum channels with intermediate measurement and re-preparation processes. The details on both classes of side channels are spelled out in section 6.
2.10. Coda: the coding theorem
Computing channel capacities on the basis of the definitions given, even the simplified ones, is a tricky business. It involves optimization in systems of asymptotically many tensor factors. It has therefore been a long-time challenge to find a quantum analogue of Shannon's noisy coding theorem [6], which would allow to compute the channel capacity as an optimization over a low dimensional space.
According to Shannon's famous theorem, the classical capacity is obtained by finding the supremum of the so-called mutual information, which itself is given in terms of the Shannon entropy. A quantum analogue of mutual information, coherent information, has been identified early. For the quantum channel
and the density operator
, it is defined as
where
is a purification of the density operator
, and S, as before, is the von Neumann entropy.
The regularized coherent information has long been known to be an upper bound on the quantum channel capacity [16]-[18], i.e.,
Unlike the classical or quantum mutual information, coherent information is not additive; hence taking the limit n→∞ in equation (14) is indeed required [21].
The first sketch of an argument to close the gap in equation (14) was given by Lloyd [33]. At a recent conference, Shor [1, 2] presented a coding scheme based on random coding to attain coherent information. His results have not been published yet. Shortly thereafter, Devetak [3] released another coding scheme based on a key generation protocol made `coherent' [34, 35]. By the same techniques, Devetak and Winter [36, 37] very recently were able to prove the long-conjectured hashing inequality [25], which states that the regularized coherent information is an achievable entanglement distillation rate, and implies the channel capacity result by teleportation [22].
These achievements certainly mark a major step in the direction of a coding theorem, but do not satisfy all the properties desired of such a theorem. In particular, they still demand the solution of asymptotically large variational problems.
3. Elementary properties of channel capacity
3.1. Basic inequalities
Before we enter the proof sections we need to review some basic properties of channel capacity, which will turn out to be helpful as we proceed, but are also interesting in their own right. All proofs are easy, and may be found in [14], albeit for noiseless reference channels only. The generalization is straightforward.
Running two channels, T1 and T2, in succession, the capacity of the composite channel, T1
T2, cannot be bigger than the capacity of the channel with the smallest bandwidth. This is known as the bottleneck inequality:
Instead of running T1 and T2 in succession, we may also run them in parallel, which is represented mathematically by the tensor product T1⊗T2. In this case the capacity can be shown to be super-additive,
For the standard ideal channels we even have additivity. The same holds true if both S and one of the channels T1, T2 are noiseless, the third channel being arbitrary. However, to decide whether additivity holds generally is one of the big open problems in the field.
Finally, the two-step-coding inequality tells us that by using an intermediate channel in the coding process we cannot increase the transmission rate:
3.2. Quantum capacity of noiseless channels
There are special cases in which the quantum channel capacity can be evaluated relatively easily, the most relevant one being the noiseless channel idn, where by the subscript n we denote the dimension of the underlying Hilbert space. In this case we have
A proof follows below. Combining this with the two-step coding inequality (17), we see that for any quantum channel T
which shows that quantum channel capacities relative to noiseless quantum channels of different dimensionality only differ by a constant factor. Fixing the dimensionality of the reference channel then only corresponds to a choice of units. Conventionally the ideal qubit channel is chosen as a standard of reference, fixing the unit bit.
Proof of equation (18). This is an immediate consequence of estimates of the simulation error Δ(idn, idm) = infD,E||D idn E - idm||cb between ideal channels. We have
The first relation is shown by explicitly constructing
and
such that D idn E = idm. To this end we may consider
as a subspace with projection Pm. Then E is defined by extending each n×n matrix by zeros for the additional (m - n) dimensions, and
where the second term serves to make D trace-preserving. Then, clearly, DE = idm as claimed.
To prove the inequality (21), choose a maximal family of one-dimensional orthogonal projections
such that ∑ν = 1m Pν = 1m. Then for any decoding
, the relation
defines a set
of positive operators satisfying ∑ν = 1m Fν = 1n. For any encoding
we thus have
where in the fourth step we have used equation (23). Equation (24) then immediately implies equation (21).
We now have to convert these estimates equations (20) and (21) into statements for achievable rates for S = idn and T = idm. Thus equations (20) and (21) apply with n replaced by
and m replaced by
. So let
and
be two integer sequences such that limν→∞nν = ∞ and
. Then for all sufficiently large ν we have
, and therefore Δ(nν, mν) = 0, which implies that any R < (ld n)/(ld m) is achievable.
On the other hand, let R = ((ld n)/(ld m)) +
, for some
> 0, and choose diverging sequences such that
. Then
infinitely often, and thus, by equation (21), Δ(nν, mν) is close to 1 infinitely often. Hence the errors do not go to zero, and the rate R is not achievable. To summarize, Q(idn, idm) = (ld n)/(ld m) is the supremum of all achievable rates. ![]()
By the same techniques, one may also show that the capacity of the ideal channel does not increase if the information to be transmitted is restricted to be classical.
3.3. Partial transposition bound
The upper bound on the capacity of ideal channels can also be obtained from a general upper bound on quantum capacities, which has the virtue of being easily calculated in many situations. It involves, on each system considered, the transposition map, which we denote by Θ, defined as matrix transposition with respect to some fixed orthonormal basis. None of the quantities we consider will depend on this basis. As is well known, transposition is positive but not completely positive. Similarly, we have ||Θ||∞ = 1, but generally ||Θ||cb > 1. More precisely, ||Θ||cb = d, when the system is described on a d-dimensional Hilbert space [5]. We claim that, for any channel T and small
> 0,
where Q
(≥Q(T)) is the finite error capacity introduced in section 2.7. In particular, for the ideal channel this implies Q(idd)≤ld(d).
The proof of equation (25) is quite simple [12]: suppose R is an achievable rate, and that mν/nν→R≤Q
(T), and encoding Eν and decoding Dν are such that
. Then we have
where in the last step we have used that Dν and ΘEνΘ are channels with cb-norm = 1, and that the cb-norm is exactly tensor multiplicative, so ||X⊗n||cb = ||X||cbn. Hence, by taking the binary logarithm and dividing by nν, we get
Then in the limit ν→∞ we find R≤QΘ(T) for any achievable rate R.
The upper bound QΘ(T) computed in this way has some remarkable properties, which make it a capacity-like quantity in its own right. For example, it is exactly additive:
for any pair S, T of channels, and satisfies the bottleneck inequality QΘ(ST)≤min{QΘ(S), QΘ(T)}. Moreover, it coincides with the quantum capacity on ideal channels: QΘ(idn) = Q(idn) = ld n, and it vanishes whenever TΘ is completely positive. In particular, if id⊗T maps any entangled state to a state with positive partial transpose, we have QΘ(T).
4. Alternative error criteria
In this section we will show that the various distance measures introduced in sections 2.3-2.6 are equivalent, as long as the reference channel is chosen to be noiseless, i.e., S = idd for some d < ∞. Hence the remarkable insensitivity of the quantum capacity to the choice of the error criterion holds only for the capacity Q(T) which is our main concern, but not for the more general Q(T, S) comparing two arbitrary channels. The reason for this difference is the analogous observation for distance measures on the state space: different ways of quantifying the distance between states become inequivalent in the limit of large dimensions, but all measures for the distance between a pure state and a general state essentially agree.
4.1. Preliminaries
The following lemma will serve as a starting point for showing the equivalence of fidelity, ordinary operator norm, and cb-norm criteria. By
we will denote the trace norm of the operator
, by
its Hilbert-Schmidt norm, and by ||A||∞ the ordinary operator norm. These norms are related by the following chain of inequalities:
(see chapter VI of [38] for a thorough discussion of these Schatten classes, these and other useful properties, and the relation to
-spaces). Of course, all norms in a finite-dimensional space are equivalent, so there must also be a bound in the reverse direction. This is
The crucial difference between these estimates is that the bound in equation (30) explicitly depends on the Hilbert space dimension d, which makes this inequality useless for applications in capacity theory, where dimensions grow exponentially. Our aim in this section is therefore to relate the various error measures with dimension-independent bounds (see proposition 4.3 below).
Lemma 4.1. Let
be a density operator and ψ be a unit vector in a Hilbert space
. Then
with equality in equation (31) iff
is pure or ψ is orthogonal to the support of
.
Proof. Suppose first that
= |![]()
![]()
![]()
| is pure. Then we can compute the trace norm in the two-dimensional space spanned by ψ and
. For the moment we will only use this property, i.e., we assume that ψ = (1, 0) is the first basis vector, and
is an arbitrary (2×2) density matrix. Then we may expand the traceless operator
- |ψ![]()
ψ| in terms of the Pauli matrices {σi}i = 1,2,3, as follows:
From this we find the eigenvalues of
- |ψ![]()
ψ| to equal
, and hence
Positivity of
clearly requires det
≥0, implying |
12|2≤
11
22 =
11(1 -
11). Since tr(
2) = 1 + 2(
11(
11 - 1) + |
12|2), equality holds if
is pure. By inserting |
12|2 =
11(1 -
11) into equation (33) we see that for pure states we indeed have equality in equation (31).
We now drop the assumption that
is pure and consider an arbitrary convex decomposition
= ∑iλi
i into pure states
i. Then because
is concave we obtain
where in the second step the result for pure states has been used. This establishes equation (31).
Now suppose that equality holds in equation (34). Then because the concavity of
is strict,
ψ|
i|ψ
=
ψ|
|ψ
∀i. But since the convex decomposition of
is arbitrary, we may conclude that
for any vector
in the support of
. By polarization this implies ![]()
1|ψ![]()
ψ|
2
= ![]()
1|
2![]()
ψ|
|ψ
, from which it follows that
where S denotes the projection operator on supp(
). Hence either the factor
ψ|
|ψ
vanishes, entailing Sψ = 0, or else S is a rank one operator, and thus
is pure. This concludes the proof. ![]()
From lemma 4.1 we may derive a fidelity-based expression for the deviation of a given channel from the ideal channel:
Lemma 4.2. Let
be a Hilbert space,
, and
be a channel. We then have
Proof. Note that the operator norm ||T - id||∞ equals the norm of the adjoint operator on the dual space, i.e.,
(cf chapter VI of [38] or section 2.4 of [39] for details). Any matrix
, with ||
||1≤1, has a decomposition
=
1 + i
2 into Hermitian
i satisfying ||
i||1≤1. Inserting this decomposition into equation (38) and using the triangle inequality, we find ||T - id||∞≤2 sup||T*
-
||1, where the supremum is now over all Hermitian matrices
obeying ||
||1≤1.
By spectral decomposition, any Hermitian matrix
can be given the form
= ∑iri
i, where the
i are rank one projectors and the coefficients ri are real numbers satisfying ∑i|ri| = ||
||1. Inserting this into the supremum, we see that ||T - id||∞≤2 sup||T*
-
||1, where optimization is now with respect to all one-dimensional projectors |ψ![]()
ψ|. The inequality then directly follows from lemma 4.1. ![]()
4.2. Four equivalent distance measures
We now have in hand all the tools we need to prove that the distance measures presented in sections 2.3-2.6 do indeed coincide:
Proposition 4.3. Let
be a Hilbert space,
, and let
be a channel. Then
These are the dimension-independent bounds we need: if a sequence of channels becomes close to ideal in the sense of any of the error measures appearing in this proposition, so it will be in terms of all the others. The equivalence of the basic capacity definition 2.1 based on the cb-norm and the definitions based on minimum fidelity and entanglement fidelity, as presented in sections 2.3 and 2.5, then directly follows.
It is crucial for proposition 4.3 that we are considering only the deviation of T from the ideal channel, so we can use lemma 4.1 for the distance between an output state and a pure state. Therefore, for the general capacity Q(T, S) the choice of the error quantity may remain important. General properties such as superadditivity (16), which are easy to see for the cb-norm criterion, might therefore fail for the simpler-looking operator norm ||T - S||∞. This is the principal reason for choosing the cb-norm in the basic definition.
Mean fidelity, as used in section 2.4, and channel fidelity, as introduced in section 2.5, are conspicuously absent from proposition 4.3. Their role will be discussed in section 4.3.
The equivalence of Schumacher's original definition of channel capacity in terms of the entropy rate will then be treated in section 4.4.
Proof of proposition 4.3. Let
be a purification of
. We then have
By Schmidt decomposition,
can be given a representation |![]()
= ∑jλj|j
⊗|j '
, where {|j
}j and {|j '
}j are orthonormal systems in
, and the so-called Schmidt coefficients {λj}j are non-negative real numbers satisfying ∑jλj2 = 1. Inserting this representation into equation (40), we see that
where in the last step the normalization ∑jλj2 = 1 has been applied.
The first inequality then immediately follows from lemma 4.2 and the definition of minimum fidelity, equation (4).
An application of the Schwarz inequality directly gives the second inequality:
for all unit vectors
.
The third inequality is obvious from the definition of cb-norm, so we only need to prove the last step. Applying lemma 4.2 to the operator T⊗idn and then taking the supremum over n on both sides, we see that
concluding the proof. ![]()
4.3. Average fidelity and channel fidelity
Average fidelity and channel fidelity have been shown [40, 41] to be directly related error criteria.
Proposition 4.4. Let
be the average fidelity and Fc(T) be the channel fidelity of a quantum channel
, as introduced in equations (5) and (9), respectively. We then have
where d is the dimension of the underlying Hilbert space
.
From proposition 4.4 we may conclude that both quantities coincide in the large-dimension limit d→∞. Consequently, average fidelity and channel fidelity are equivalent error criteria for capacity purposes.
However, neither appears in proposition 4.3. After giving a somewhat simplified proof of equation (44), we show by an explicit counterexample that this omission is not accidental. Since a coding for which the worst case fidelity goes to 1 also makes the average fidelity go to 1, the capacity defined with average fidelity might in principle be larger than the standard one. That these capacities nevertheless coincide will then follow from proposition 4.5 in section 4.4. A more direct proof for the equivalence of average fidelity, i.e., a proof not making use of proposition 4.4, is then presented in section 6.2.
Proof of proposition 4.4. Suppose that {ti}i is a set of Kraus operators for the quantum channel T, i.e.,
.
In the course of the proof we will repeatedly employ the so-called flip operator
, defined by
. In a basis {n}n = 1, ...,d of
,
, this corresponds to the representation
Working in this representation one easily verifies that for all operators 
In terms of the Kraus operators {ti}i the average fidelity of T then reads
where
:= |ψ![]()
ψ| is an arbitrary pure reference state, integration is over all unitaries
and in the last step we have applied equation (46). The second factor under the trace,
is obviously invariant under local unitary transformations, i.e., [P(
)
V⊗V] = 0 for all unitary operators
. Such a state is usually called a Werner state [42], and it follows from the theory of group representations that these states are spanned by the identity operator and the flip operator,
with complex coefficients α, β (see [43] and chapter 3.1.2 of [14] for details). The coefficients can be easily obtained by tracing P(
) with the identity and flip operator, respectively, and are both found to equal 1/(d(d + 1)). Inserting the expansion
into equation (47) and using equation (46) again, we see that
where in the second step we have used the normalization ∑it*iti = 1, and in the third step equation (7) has been applied for the state
= 1/d. ![]()
We proceed with the advertised
Counterexample. For
, we set
where P + := |ψ1![]()
ψ1| is some one-dimensional projector and P - := id - P + its ortho-complement. Then by equation (7) we find
and therefore
, the first equality by equation (44). However, using equation (38) we have
and by choosing
such that ψ2⊥ψ1, ||T - id||∞ can be easily shown to be non-zero, and independent of d. Hence there exists no bound of the form ||T - id||∞≤f(Fc(T)) with a dimension independent function f, such that x→1 implies f(x)→0.
4.4. Entanglement fidelity and entropy rate
Let us briefly summarize what we have learned so far about the interrelation of the various distance measures introduced in section 2: from proposition 4.3 and the results of the previous section we may infer the existence of two classes of equivalent error criteria, one of them containing average and channel fidelity, the other one cb-norm distance, operator norm, minimum fidelity and entanglement fidelity.
To show that both classes lead to the same quantum channel capacity, we will have to construct, from a given coding scheme with rate R and channel fidelity approaching one, a sequence of Hilbert spaces
all pure states of which may be sent reliably with rate R. This is the essence of the following proposition, which closely follows the argument presented in section V of [17]. Although for this purpose we only need to consider channel fidelities, and thus the chaotic density operator, the statement is kept more general to apply to all density matrices, since this will immediately allow us to cope with Schumacher's definition of channel capacity in terms of entropy rates as well.
Proposition 4.5. Let
be a Hilbert space with
. Let
be a channel, and
a density operator. Then, for a suitable k-dimensional projection
, and for the `compressed channel'
given by
the estimate
holds with both
and
where S again denotes the von Neumann entropy.
Proof. The idea of the proof is to recursively remove dimensions of low fidelity from the support of
until we are left with a Hilbert space of given dimension k and a minimum pure state fidelity bounded from below in terms of Fe(
, T). To this end, define
Setting d := dim supp(
) and
0 :=
, we now recursively define a collection {
i}i = 0, ...,d of positive operators, as follows:
where
i is the state vector in the support of
i - 1 that minimizes f and qi is the largest positive number that leaves
i positive. Note that since
is finite, qi can be chosen to be strictly positive. By construction, supp(
i)⊂supp(
i - 1), and rank(
i) = rank(
i - 1) - 1; so our procedure removes dimensions from the support of
one by one. It follows that
implying ∑i = 1dqi = tr(
) = 1.
Using the convexity of entanglement fidelity in the density operator input, we see that
where in the last step we have used that k
f(
k) is non-decreasing by construction. We now take the subspace
as the span of all vectors {
i}i = d - k + 1, ...,d. Then, since
ψ|Tk(|ψ![]()
ψ|)|ψ
≥
ψ|T(|ψ![]()
ψ|)|ψ
for
, we have
. Introducing q* = ∑i = d - k + 1dqi, and using ∑i = 1d - kqi = 1 - q*, we immediately have the desired estimate.
Our remaining task is to give upper bounds on q*, either in terms of the largest eigenvalue of ρ or its entropy. Note that from the sum representing
in equation (60) we have
Therefore, each of the k terms in q* is bounded by ||
||∞, and we get q*≤k||
||∞, which gives the first estimate.
For the entropic estimate, note first that in the inequality
which is valid for arbitrary convex combinations of states σi with weights qi (cf chapter 11.3.6 of [44]), the case of pure states σi leaves just the entropy of the probability distribution q. On the other hand, it is obvious that among all probability distributions with given weight q* for the last group of k indices, the one with the highest entropy is equidistribution, in each of the ranges 1≤i≤d - k and d - k + 1≤i≤d. Evaluating the entropy of this distribution, and combining this with the previous estimate we find
where the first term denotes the binary Shannon entropy,
and we have also used ld (d - k)≤ld d. Hence the result follows by writing this as an upper bound for q*. ![]()
Proposition 4.5 allows us to make the transition from average error criteria and entropy rates to maximal error criteria. So let us assume that a coding scheme
is given, together with a sequence
of source states, such that Fe(ρn, DnT⊗nEn)→1. Then the channel Tk will again be a corrected version of T⊗n, but we can now conclude that its worst case fidelity goes to one.
Let us first consider the case in which the source does not appear explicitly, i.e., in which we assume either the mean fidelity or, equivalently, the channel fidelity to go to one for a scheme with rate R. Since the channel fidelity is just the entanglement fidelity with respect to a maximally mixed ρ, we may apply proposition 4.5 with
and
, where we denote the largest integer no larger than x conventionally by
x
(read: floor ofx). We set
, which is to say that the modified coding scheme corrects just one qubit less than the original one. Then k||ρ||∞ = 1/2, and we immediately find that the minimum fidelity is at least 1 - (1 - Fe)/2, and hence also goes to 1.
The second case of interest is that of a source satisfying the quantum asymptotic equipartition property (QAEP). That is to say, for large n, the Hilbert space
can be decomposed into a subspace on which ρn essentially looks like a multiple of the identity, and a subspace of low probability: given any
> 0, for large enough n essentially all the eigenvalues λ of a QAEP quantum source
with entropy rate R are concentrated in a so-called
-typical subspace, i.e.,
in the sense that the sum of the eigenvalues that do not satisfy equation (66) can be made arbitrarily small. We can then conclude that ||ρn||∞≤2 - n(R -
) for large n. Hence, if we choose k≈2n(R - 2
), we can guarantee that q*→0, and once again the worst case fidelity has to go to 1. This case covers product sources and stationary ergodic sources [28]-[30], and many others of interest. The discussion of the equivalence between the minimum fidelity version and Schumacher's entanglement fidelity version of channel capacity in [17] is limited to this case.
Does the equivalence hold even without the equipartition property? We will give a counterexample below, which is, however, rather artificial from the point of view of typical coding situations: the dimension of the spaces
grows superexponentially. This is indeed necessary. For if we have an upper bound
for some positive constant τ, S(ρn)≈nR, and k≈2n(R -
), we find that q* in equation (57) goes to the constant (ld τ - R)/(ld τ - R +
) < 1. Therefore, the maximal subspace fidelity in equation (55) in proposition 4.5 goes to one if the entanglement fidelity does.
Counterexample. Here we show the claim that the Schumacher capacity with unconstrained sources is infinite for all channels with positive quantum capacity. In fact, suppose that we are given a coding scheme with channel fidelity going to 1. Then we simply enlarge the Hilbert space
by a direct summand
of some large dimension, and let
The coding operations on
can be completely depolarizing, for as long as
n→0, the entanglement fidelity of this source goes to 1, as required. On the other hand, the entropy of this source is
Clearly, we can make S(ρn)/n diverge if only we let
go to ∞ fast enough.
4.5. Entanglement generation capacity
We now focus on Devetak's [3] entanglement generation capacity, as introduced in section 2.5, and verify that it is totally equivalent to the definitions discussed above. The proof is based on entanglement-assisted teleportation [45] and therefore involves classical forward communication from the encoding to the decoding apparatus. However, this additional resource is shown in section 6.1 not to affect the quantum channel capacity.
Due to the additional freedom of choosing an arbitrary pure input state
instead of the maximally entangled state Ω, the entanglement generation capacity is certainly no smaller than the capacity based on channel fidelity, which was shown to be a valid figure of merit in the previous section. So we only need to prove the converse. This is easilys done with the help of teleportation: in the entanglement generation scenario, the sender and receiver end up sharing a state
which has asymptotically perfect overlap with the maximally entangled state,
for some (small)
> 0. The output system can thus be readily interpreted as being in the maximally entangled state with probability F≈1, and hence can be used as a resource in the standard teleportation protocol [45] to transfer arbitrary quantum states from the sender to the receiver with fidelity no smaller than F, at the same rate R.
5. Isometric encoding suffices
In this section we will show that if there exists a coding scheme that achieves high fidelity transmission for a given source, there is another coding scheme with isometric encoding, as in equation (11), that also achieves high fidelity transmission. It then directly follows that in the definition of channel capacity we may restrict our attention to isometric encodings, as claimed in section 2.8. While this result is originally due to Barnum et al [17], here we give a slightly generalized version of Holevo's presentation (cf chapter 9 of [46]). All we need is the following
Proposition 5.1. Let
,
be Hilbert spaces with dimensions
and
. Let
be a density operator, and
a completely positive map such that tr(E
) = 1. Let
be a channel. We may then find a channel
such that
where for
we have
with isometries
and
, respectively.
Proof. Let {ti}i = 1, ...,τ and {ej}j = 1, ...,
be sets of Kraus operators for the maps T and E, respectively. By equation (7) we have
where Xi,j := tr tiej
. If τ≠
, add zero components so that X becomes an (m×m) square matrix, m := max {τ,
}.
By the singular value decomposition we may find unitary matrices A, B such that X = ADB, where D is diagonal with real non-negative entries. Since this decomposition simply corresponds to a change of the Kraus representation of T and E, we may assume without loss that X is diagonal already, and thus
Now for k = 1, ...,m, we define λk := tr ek
e*k. Let
= ∑jpj|ψj![]()
ψj|, pj > 0, ∑jpj = 1, be a diagonal representation of the density operator
. Then λk = ∑jpj||ekψj||∞2 and tr tkek
= ∑jpj
ψj|tkek|ψj
. Thus, λk = 0 implies that ekψj = 0 ∀j, and therefore tr tkek
= 0, so that these terms do not contribute to the sum in equation (73). We may therefore assume without loss that λk > 0 ∀k = 1, ...,m. Moreover,
where in the last step we have used that E is trace-preserving on the state
. Since
we may find an index k such that
where we have introduced the short-hand notation
and t := tk, respectively.
Applying the Schwarz inequality we see that
Let us treat the case η≤κ first: since
, by working in the spectral representation one easily obtains
, and |t*|2≤|t*|, from which it follows that
where by
we denote the polar isometry of t*, i.e., t* = V|t*|. Existence of this isometry requires η≤κ. Since
tracing backwards our results leaves us with the following chain of inequalities:
the result we set out to prove.
If η > κ, the proof proceeds very similarly: we denote the polar isometry of t by
, i.e., t = W|t|. From equation (77) we may then conclude that
where in the second to last step we have used that
, and thus |t|2≤|t|. Substituting W* for V, we now mimic equations (79) and (80) to conclude that (Fe(
, TE))2≤Fe(
, TW*(
)W). The map W*(
)W, while being completely positive, is not necessarily trace-preserving. The desired result then follows by renormalization, as above in equations (22) and (54). ![]()
In proposition 5.1 we have included cases in which the input space of the channel is strictly smaller than the input space of the encoding map,
. Though we do not need to consider this situation in our settings, it may prove helpful in other applications to avoid cumbersome distinction of cases, and thus has been added for convenience.
To arrive at the statement that isometric encoding suffices, all we need to do then is to combine proposition 5.1 with the channel fidelity definition of quantum capacity, section 2.5, taking
to be the chaotic density matrix, E to be the encoding channel, and thinking of T as the concatenation of quantum channel and decoding channel.
6. Classical side information
As claimed in section 2.9, it is a straightforward consequence of proposition 5.1 that classical forward communication has no effect on the quantum channel capacity [17, 25]. However, before entering the proof we need to make a few more comments on the assisted channel T⊗idΛc, where
is an arbitrary quantum channel and idLgr;c denotes the identity on a classical system with
states. Thus, in the limit of large dimensions, an n-fold tensor product of the channel T will be assisted by a classical system with a total of Λn states.
The results we are going to present in this section apply slightly more generally: instead of a priori fixing a side channel of given (if arbitrarily large) dimension, the encoder may choose the size of the side channel in the encoding process, which also covers the case of super-exponentially growing side channels. The capacity of a channel T assisted by this type of classical forward communication will be marked with a subscript, Qcf(T). This generalization will play a role in section 6.2.
Of course, Qcf(T)≥Q(T⊗idΛc)≥Q(T). It is the aim of section 6.1 to show that all these capacities are equal.
6.1. Classical forward communication does not increase the channel capacity
Obviously, for classically assisted channels the encoding is a channel with both a classical and a quantum output. Such channels are usually called instruments [47], and can be thought of as a collection of trace-non-increasing operators {Eλ}λ = 1, ...,Λ summing up to a channel E = ∑λ = 1ΛEλ. The index λ
{1, ...,Λ} represents classical information that may be obtained in the encoding process and sent undisturbed to the decoder over a noiseless classical channel. Depending on the value of λ, one channel
out of a collection of trace-preserving quantum channels {Dλ}λ = 1, ...,Λ is used in the decoding process.
The definition of achievable rates and channel capacity now completely parallels the definition of the unassisted quantities. Here we focus on the channel fidelity version of channel capacity (cf section 2.5), since this is a definition for which proposition 5.1 is well suited. Of course, all other definitions of channel capacity can be extended to classically assisted capacities in the same spirit.
We may thus say that R is an achievable rate for the classically assisted quantum channel T iff there is a sequence of Hilbert spaces
satisfying
and a sequence of encodings
and decodings
for some integer sequence
such that
The quantum capacity Qcf(T) of the channel T with classical forward communication is then defined as the supremum of all achievable rates.
Of course, the capacity of the channel T⊗idΛc for fixed
is obtained by setting Λn := Λn in the above definition.
Theorem 6.1. Let
be a quantum channel and
. We then have
Before giving the proof let us consider a seeming generalization of this theorem, which allows the side channel R to be any separable channel, i.e., a channel R = R2R1 operating by first collecting classical information, by a channel R1, say and then recoding this into quantum information by another channel R2. Equivalently, (id⊗R) maps any input state to a separable state, so these channels are also called `entanglement breaking' [48, 49]. Then, for every such R and any channel T we have the following
Corollary 6.2. Separable side channels do not increase the quantum channel capacity, i.e., for any quantum channel T and any separable channel R = R2
R1 we have
Proof of corollary 6.2. For a separable channel R = R1
R2 = R2
idΛc
R1, we have
where the first inequality follows by using codings ignoring the channel R, the second follows by the bottleneck inequality (15), and in the last step we have applied theorem 6.1. From this chain of inequalities we get Q(T) = Q(T⊗R), just as claimed. ![]()
We now follow [17] in the
Proof of theorem 6.1. From our remarks it is clear that we only need to prove the inequality Qcf(T)≤Q(T). Given a sequence of Hilbert spaces
and suitable encodings
and decodings
such that the classically assisted channel T achieves the rate R with channel fidelity approaching one, we will show that the same rate can be achieved without using the side channel.
From the definition of channel capacity, for any
> 0 we may find
such that
For the remainder of the proof we will fix n≥n
and drop the index to streamline the notation. Setting eλ := tr Eλ1/d with
and
we may then rewrite equation (86) as
Since ∑λ = 1Λeλ = 1, there is an index μ such that
. However,
and we may therefore apply proposition 5.1 to conclude that there is a channel
such that
from which it follows that the same rate can be achieved without relying on the classical side channel. ![]()
6.2. Average fidelity by forward communication
We already know that average fidelity may be considered a suitable error criterion for capacity purposes. The line of thought we followed to establish this fact proceeded via the equivalence of average fidelity and channel fidelity, proposition 4.4, and is thus ultimately based on proposition 4.5.
The results of the previous section regarding the uselessness of classical forward communication can be employed to give an alternative proof that average fidelity serves as a valid distance measure, making use only of the sufficiency of isometric encoding.
To this end, we will show that instead of evaluating the average fidelity
for a given channel T, we may just as well compute the minimum fidelity
, where the new channel
is simply the old channel T augmented by classical forward communication. However, by theorem 6.1 the quantum capacities of
and T coincide, and therefore average fidelity and minimum fidelity turn out to be equivalent error quantities.
While our concept of classically assisted channel capacity, as presented in section 6.1, only allows for discrete classical messages and therefore involves the calculation of finite sums, for the evaluation of average fidelity we will rather deal with integrals, corresponding to continuous classical messages. However, this extension poses no difficulties.
So suppose we have at hand a quantum channel
together with encoding channel
and decoding channel
. By equation (5) the average fidelity of the concatenation D
T
E then reads
where integration is over all unitaries
, and
is an arbitrary reference state. We will now convert DTE into a channel with classical forward communication. Denoting by
the vector space of continuous functions on the set X, we may define
and
, as follows:
where in the definition of
we have made use of the fact that
is isomorphic to the
-valued functions on X. We then see that for any state 
The average fidelity for the concatenation DTE thus equals the minimum pure state fidelity for the classically assisted channel
, which is what we wanted to show.
Note that for this proof to apply also in the setting of n-fold tensor products, as required by the definition of channel capacity, we may not restrict ourselves to exponentially growing side channels, (T⊗idΛc)⊗n: this would correspond to an averaging over n-fold tensor products of unitary operators,
. However, not all unitary operators on an n-fold tensor product are of this form.
7. Testing a single sequence
We will now prove the claim made in section 2.2: if a coding scheme construction works for a certain pair of integer sequences
,
such that the rate R is achieved infinitely often, i.e.,
, and the error tends to zero, limν→∞Δ(nν, mν) = 0, then coding works for all such pairs.
As mentioned in section 2.2, this requires extending a given coding scheme to more block sizes. Therefore this section will be organized by extension method: in section 7.1 we use only the method of wasting resources, i.e., either using the coded channels for fewer bits than allowed by the given coding scheme (i.e., decreasing mν) or requiring some additional channel uses (thus increasing nν) and simply not using them. This will allow the extension whenever we can find a subsequence along which, on the one hand, the desired rate is achieved and which, on the other hand, does not grow too fast.
A second method would be to use blocks from the given coding scheme and put them together as tensor products, to get to larger block sizes. We show in section 7.2 by an explicit example that this method, combined with the wasteful one, is not sufficient to extend a very sparse coding sequence to all large block sizes.
Finally, we show in section 7.3, based on the work in [24], how hashing codes can be used to achieve the desired extension in all cases.
Throughout, we will denote by
the given coding sequences, and assume, without loss of generality, that the sporadic rate is attained, i.e.,
. The sequences to which we seek to extend the scheme are denoted by
and
, as before.
7.1. Subexponential sequences
Obviously, good coding becomes easier the more parallel channels are available for transmission. Moreover, if a certain coding scheme works for some Hilbert space
, it works at least as well for states supported on a lower dimensional Hilbert space
. Thus, the error quantity Δ(n, m), as introduced in definition 2.1, has the following monotonicity properties:
for all positive integers n, m. We call a diverging sequence
subexponential if
This covers, for example, all arithmetic sequences, and polynomially growing ones. For such sequences the desired result follows directly from the following lemma, which slightly generalizes lemma 3.2 of [24].
Lemma 7.1. Suppose
satisfies the monotonicity properties (94). Let
,
be a pair of integer sequences such that
is subexponential, cf equation (95), and limμ→∞Δ(Nμ, Mμ) = 0. Then for any pair of integer sequences
,
such that limν→∞nν = ∞ and
we have limν→∞Δ(nν, mν) = 0.
Proof. If we have only the monotonicity property of Δ to draw upon, the way to show that Δ(nν, mν)→0 is to find a suitable index μ = μ(ν) for all sufficiently large ν so that Δ(nν, mν)≤Δ(Nμ(ν), Mμ(ν)), for which we need
The first inequality we will ensure by defining
Then
and limνμ(ν) = ∞. Hence it remains to show that the second inequality in equation (96) holds for all sufficiently large ν. We consider
In this product the second factor is ≤1 by equation (98), and the third converges to 1 because
is subexponential. Now pick R - , R + such that strict inequalities
hold. Then for all sufficiently large ν the first factor in equation (99) is ≤R - , and the last factor is ≤1/R + . Hence the product of the first and last factor in equation (99) is ≤R - /R + . Hence equation (96) holds for all sufficiently large ν. ![]()
This result covers most sequences
naturally arising for families of codes. In contrast to proposition 7.2 the result therefore remains useful even for the simulation of one noisy channel by another, i.e., for the definition of capacities Q(T, S) with non-ideal reference channel S.
7.2. A counterexample
From the way in which subexponentiality of
enters the proof of lemma 7.1, it is not clear whether this assumption is really necessary. In this section we will give an example showing that it cannot be omitted, implying that to establish full equivalence we do need the more sophisticated techniques presented in section 7.3. The example will also satisfy another natural constraint on the error function Δ(n, m), which reflects another elementary method of getting new coding schemes from old: we can always split the given number of channels into sub-blocks, and apply a known coding scheme to each block. The total error is then estimated as the sum of the errors of each block. Hence the error function Δ(n, m) is subadditive:
Suppose we are given some codes for a possibly very sparse sequence of block sizes Nμ, with Mμ = Nμ coded bits, and
μ = Δ(Nμ, Nμ)→0. Then the rate 1 is sporadically achievable. The error bound we get by the best combination of blocking and possibly wasting some resources is then
where the infimum is taken over all admissible sets
. This satisfies both monotonicity (94) and subadditivity (101). Our aim in constructing the counterexample is to choose
growing sufficiently rapidly, and
decreasing sufficiently slowly, so that Δ(n, m) can be bounded away from zero even though m/n gets small.
We assume that
is superexponential in the sense that Nμ + 1/Nμ→∞. Of
for the moment we only require that it decreases monotonically to zero. Then the infimum in equation (102) never contains sums arising by breaking up a block of size
into blocks of smaller sizes: in this way one would not only get more terms in the sum of
s, but each term would be larger than
. Therefore we can lower bound Δ by considering only a decomposition into the largest available blocks. For Nμ≤m≤n < Nμ + 1 this means
where
x
denotes the largest integer ≤x. Now we choose nμ = Nμ + 1 - 1, and mμ close to the geometric mean:
. Then on the one hand
, because
is superexponential. Hence this pair of sequences has rate zero. On the other hand, if we only let
μ decrease slowly enough we can prevent Δ(nμ, mμ) from going to zero. For example, with
we get Δ(nμ, mμ)≥1/2 asymptotically.
In summary, we have constructed a monotonic and subadditive function Δ(n, m), for which the rate 1 can be achieved sporadically, but for which the proper achievable rate is 0.
7.3. Hashing helps
We will now explain how hashing helps to establish full equivalence of the one-sequence and all-sequence definitions, showing that it is indeed sufficient to check only one pair of sequences when testing a given rate R. As we know from the previous subsection, this requires that if we have found a fairly good coding for some large block size, we must make better use of it than just repeating the blocks, and maybe not using some of the input bits.
This problem is in essence the same as that arising when we have a fairly good channel to begin with: just repeating it without further encoding will not make errors go to zero. Instead they will accumulate. But, on the other hand, a channel which is nearly ideal should also have nearly the capacity of an ideal channel, or else the whole idea of capacity would make no sense. In fact, in our paper, so far we have only shown one type of channel to have positive capacity, namely the ideal channels. As shown in section 3.2, in that case the problem of accumulating errors simply does not occur, and all coding is with Δ(n, m).
So it would actually be conceivable that capacities are always zero, unless coding can be done without errors (see also section 2.7). However, it can be shown that small errors can be corrected with only a small loss in capacity. This problem is treated in a self-contained way in [24], and we refer to that paper for details and proofs. Here we only point out the statements needed in the present context, and sketch the main arguments.
The non-trivial family of codes needed for this argument are called hash codes. In [24] they are constructed as random graph codes, based on a scheme [50] which turns graphs into quantum error correcting codes of the Knill-Laflamme type. The verification that a certain number of errors is corrected by such a code amounts to showing that a certain system of linear equations is non-singular. Then the existence of codes with suitable parameters is shown by checking that this condition holds true in a generic random graph of suitable size. The random graphs are generated such that the probabilities for each edge are independent and equidistributed. This is quite different from Shannon's idea of random coding, where the distribution depends on the noise in the channel and the input state. The argument based on graph codes works in any Hilbert space dimension d which is a prime number. It shows that if we want to encode m systems of dimension d into n systems of the same dimension, and
then we can arrange for the code to correct arbitrary errors occurring on up to f of the n subsystems. Here
is again the binary entropy function. Moreover, the expression in equation (104) is an upper bound on the exponential rate at which the probability for a random graph code not to correct that many errors decreases. The crucial feature is that if f/n is small, i.e., we do not require many errors to be corrected, then we can get m close to n, i.e., the rate of the coding scheme is nearly that of the ideal channel.
The next step is to convert the correction of rare errors to that of arbitrary small errors. Here a straightforward norm estimate is
when E, D are a code correcting f out of n errors on m input systems, as above, and idm denotes the ideal channel on md-level systems. Then as soon as the expression in parentheses is < 1, in particular, if the channel T is close to ideal, we see that the errors go to zero exponentially in n.
We now apply these ideas to a given coding solution for some channel T, i.e., we assume that for the given channel we have some encoding of a d-level system through N parallel uses of the channel. The nominal rate of this coding scheme, expressed in the units `qubits per channel use' is ld d/N. For large N we may as well assume that d is a prime number, because the gaps between consecutive primes go to zero [51]. Then we apply the above ideas to the encoded channel
. The overall code will require nN channel uses, and the encoded systems are dm dimensional. If we use n as the index of the resulting sequence of channel uses, the resulting sequence of block lengths grows linearly, hence is clearly subexponential, and has rate (m ld d)/(nN). The errors go to zero exponentially, provided we can find an f satisfying equation (104) and such that the quantity in parentheses in equation (106) is strictly less than one. Combining all this gives the following estimate (theorem 8.2 in [24]):
Proposition 7.2. Let T be a channel, not necessarily between systems of the same dimension. Let
with d a prime number, and suppose that there are channels E and D encoding and decoding a d-level system through N parallel uses of T, with error Δ = ||DT⊗NE - idd|| < (1/2e), with e = exp 1. Then
Moreover, Q(T) is the least upper bound on all expressions of this form, and for coding rates below the bound the errors decrease exponentially.
Note that here a single successful coding scheme (E, D) guarantees at least a lower bound to the capacity. The most important aspect of this bound is once again that the precision Δ required does not depend on the dimension d. Therefore, even if we know such codes only on an arbitrarily thinly spaced sequence of N's, with vanishing errors along this thin subsequence, we can achieve all rates below the sporadic rate
by subexponential sequences as well, and hence for any sequence, as required by definition 2.1. Thus the sporadic capacity is equal to the capacity.
Note that proposition 7.2 also clarifies the questions brought up in section 2.7: indeed a requirement that errors should vanish exponentially fast can be met for any achievable rate strictly below the capacity. Analogous results have been presented very recently by Hamada [52], building on earlier work in [13, 19, 53].
Moreover, it is clear from proposition 7.2 that tolerating finite errors is possible: since we require the capacity Q
(T) to be achieved for arbitrarily large N, the second term in equation (107) also goes to zero, and we get the bound
Hence lim
→0Q
(T) = Q(T), as claimed.
Acknowledgments
We thank A Winter for fruitful discussions and A S Holevo for letting us use his version of the isometric encoding theorem, as well as for his perceptive comments on an earlier version of the manuscript. Funding from Deutsche Forschungsgemeinschaft (DFG) is gratefully acknowledged.
References
Dennis Kretschmann and Reinhard F Werner 2004 New J. Phys. 6 26
C. S. Kochanek et al. 2003 ApJ 585 161
Y. Takei et al. 2007 ApJ 655 831
Edson D Leonel 2007 J. Phys. A: Math. Theor. 40 F1077
Richard F Katz et al 2005 New J. Phys. 7 37
Boncho P. Bonev et al. 2006 ApJ 653 774
Carl Angell et al 2008 Phys. Educ. 43 256
E. Schinnerer et al. 2000 ApJ 533 826
A. Frank and E. G. Blackman 2004 ApJ 614 737
S C Benjamin et al 2005 New J. Phys. 7 194