Belief propagation decoding of quantum channels by passing quantum messages

The belief propagation (BP) algorithm is a powerful tool in a wide range of disciplines from statistical physics to machine learning to computational biology, and is ubiquitous in decoding classical error-correcting codes. The algorithm works by passing messages between nodes of the factor graph associated with the code and enables efficient decoding of the channel, in some cases even up to the Shannon capacity. Here we construct the first BP algorithm which passes quantum messages on the factor graph and is capable of decoding the classical–quantum channel with pure state outputs. This gives explicit decoding circuits whose number of gates is quadratic in the code length. We also show that this decoder can be modified to work with polar codes for the pure state channel and as part of a decoder for transmitting quantum information over the amplitude damping channel. These represent the first explicit capacity-achieving decoders for non-Pauli channels.

Factor graphs and message passing algorithms on them are a crucial part of modern coding theory [1,2].Factor graphs are used to efficiently represent the joint probability distribution of encoded inputs and outputs of a classical channel.And belief propagation (BP), an algorithm that works by passing messages between nodes on the factor graph, can be used to estimate the values of each input bit from the observed output, sometimes exactly.This leads to efficient decoding algorithms for high rate codes, several of which are employed in current wireless communication standards.Moreover, it was recently shown that belief propagation decoding of a certain class of low-density parity-check (LDPC) codes can achieve the Shannon capacity [3].
Factor graphs have been adapted to the quantum-mechanical setting from several different perspectives [4][5][6][7].Applied to quantum communication, BP and other message passing methods been constructed for syndrome decoding of a variety of stabilizer codes subjected to Pauli noise channels [5,[8][9][10][11][12][13][14].Despite their use in decoding quantum codes, these message passing algorithms are classical.Indeed, decoding any stabilizer code used for a Pauli channel or the erasure channel is essentially a classical task due to the Gottesman-Knill theorem [15].Nevertheless, the quantum decoding problem is significantly more complicated than its classical counterpart, due to the degeneracy of quantum errors [11,16,17].
However, stabilizer decoding is not optimal for non-Pauli channels such as the amplitude damping channel, for either the entanglement fidelity achievable by fixed-size codes or the largest achievable rates for codes with increasing blocklength.Therefore it would be of interest to extend BP decoding to more general channels.As much also holds in the setting of quantum polar codes, where the classical decoding method (ultimately a variant of BP) can only be employed without loss of rate for Pauli channels or the erasure channel [18][19][20].
Note that the quantum decoding problem is different than the one solved by the classical algorithm for "quantum belief propagation" in [5]. 1 There, one is interested in computing marginals of quantum states which have a structure given by a factor graph.For classical decoding, computing such marginals is indeed sufficient, as we will describe in more detail below.But even for bitwise decoding of a classical-quantum (CQ) channel having classical input and quantum output, it is not enough to know the relevant marginal state; we need a way to perform the optimal (Helstrom) measurement [22] or some suitable approximation.Put differently, a quantum BP decoder is a quantum algorithm, and we may expect that it will need to pass quantum messages.
In this paper we construct a quantum BP decoding algorithm for the pure state channel, a binary input CQ channel whose outputs are pure states.The algorithm for estimating a single input bit works by passing single qubits as well as classical information along the factor graph, while sequential estimation of all input bits requires passing many qubits.For codes whose factor graphs are trees, as well as for polar codes, we show how the BP decoder leads to explicit circuits for the optimal measurement that have quadratic size in the code length.To the best our knowledge, this is the first instance of a quantum algorithm for belief propagation.The pure state channel arises, for instance, in binary phase-shift keying (BPSK) modulation of a pure loss Bosonic quantum channel, where the channel outputs are the coherent states |˘αy [23].Thus, our result gives an explicit construction of a successive cancellation decoder for the capacity-achieving polar code described in [23], and addresses the issue of decoding CQ polar codes discussed in [19].Moreover, the pure state channel also arises as part of the quantum polar decoder for the amplitude damping channel [18,20], and therefore our result gives an explicit decoder for polar codes over this channel.
The remainder of the paper is structured as follows.In the next section give a very brief overview of factor graphs and their use in classical decoding, and then rewrite the BP rules in a manner that lead to the quantum algorithm.Section 2 gives the quantum BP decoding algorithm and applications to polar codes are given in Section 3. We finish with several open questions for future research raised by our result.

Belief propagation decoding on factor graphs
Let us examine BP on factor graphs directly in the coding context; for a more detailed treatment see [2, Ch. 2].Fix a linear n-bit code C, a classical channel W , and consider the joint probability distribution P X n Y n of input X n and output Y n , supposing that the codewords are chosen uniformly at random.It is simply where |C| is the number of encoded bits, ½ is the indicator function, and W p y|xq are the transition probabilities of the channel.Here x n 1 denotes x 1 , x 2 , . . ., x n .The effective channel from any given bit, say x 1 to the entire output y n 1 is defined by the marginal distribution P X 1 Y n .To find the most likely input x 1 given the observed y n 1 we simply need to compute P X 1 Y n px 1 , y n 1 q and determine which value of x 1 maximizes this value.Exact marginalization is however generally intractable since the size of the joint distribution grows exponentially in the number of variables.
However, for linear codes the joint distribution can be factorized, which greatly simplifies the marginalization task.Since the channel is memoryless, the channel contribution to (1) is already in factorized form.Meanwhile, code membership is enforced by a sequence of parity-check constraints associated with the code, which also leads to factorization.In the three-bit repetition code, for instance, there are two parity constraints, x 1 `x2 " 0 and x 2 `x3 " 0 (or x 1 `x3 " 0), and therefore ½rx 3  1 P Cs " ½rx 1 `x2 " 0s ½rx 2 `x3 " 0s.
We can represent the joint distribution (up to normalization) by the factor graph in Figure 1.For an arbitrary factorizeable function, the factor graph contains one (round) variable node for each variable and one (square) factor node for each factor, and factor nodes are connected to all their constituent variable nodes.This convention is violated in the figure by not including y j variable nodes; instead they are treated as part of the channel factors since their values are fixed and in any case each is connected to only one factor node.
For factor graphs which are trees, meaning only one path connects any two nodes as in Figure 1, the belief propagation algorithm can compute the marginal distributions exactly.Treating variable node x 1 as the root of the tree, BP proceeds by passing messages between nodes, working inward and combining all relevant information from the leaves as it goes.Simplifying the general BP rules for the decoding problem, the initial messages from the channel factors to the variable nodes can be taken as the log-likelihood ratios ℓ " logrW p y j |0q{W p y j |1qs (here we suppress the nominal dependence on the channel output y j ).At variable nodes the messages simply add, so that the outgoing ℓ is the sum of incoming ℓ k .At check nodes the rule is more complicated: By adopting a modified update rule it is in fact possible to compute all the marginals at once with only a modest overhead.Instead of only proceeding inward from the leaves, we send messages in both directions along each edge, starting by sending channel log-likelihoods in from the leaves.Each node sends messages on each edge once it has received messages on all its other edges.For graphs that contain loops, the algorithm is not guaranteed to converge, but one can nevertheless hope that the result is a good approximation and that the decoder outputs the correct value.This is borne out in practice for turbo codes and LDPC codes.
There is an intuitive way of understanding the BP decoding algorithm which is the basis of our quantum generalization.At every step the message can be interpreted as the log-likelihood ratio of the effective channel from that node to its descendants.This is sensible as the likelihood ratio is a sufficient statistic for estimating the (binary) input from the channel output.The rules for combining messages can then be interpreted as rules for combining channels.At variable nodes, adding the log-likelihood ratios for two channels W 1 and W 2 amounts to considering the convolution channel W 1 f W 2 with transition probabilities given by ( That is, the effective channel associated with a variable node is simply the convolution W 1 f ¨¨¨f W k of its descendants.The form of the effective channel at check nodes is not as immediate, but it is not too difficult to verify that the appropriate channel convolution W 1 f W 2 has transition probabilities These two channel convolutions are also the fundamental building blocks of polar codes [24], at least when the input channels are symmetric.The check node convolution is the "worse" channel in the channel splitting or channel synthesis step (cf.[24,Eq. 19]); this holds regardless of the symmetry of the channel.On the other hand, the "better" combination of W 1 and W 2 is defined by (cf.[24, Eq. 20]) W2 p y 2  1 , Compared to (2), the input x 1 is uniformly random and not always zero, but it is given at the channel output.When W 1 is symmetric in the sense that W p y|x `uq " W pπ u p yq|xq for a suitable permutation π of the output alphabet depending on u, we can reversibly transform W 2 into W 1 f W 2 and vice versa.

Belief propagation decoding of quantum outputs
The form of the check and variable convolutions also applies to channels with quantum output. 2We need only replace the probability distributions over the output alphabet by quantum states.Abusing notation, let us denote by W pxq the quantum state of the output of W given input x.This includes the previous case by considering commuting W pxq. The the variable and check node convolutions are now just To properly generalize the BP decoding algorithm we need a "sufficient statistic" for the quantum channels at the various nodes.For pure state channels we can get away with a combination of classical bits and just one qubit.We may as well regard the two pure outputs of the bare channels themselves as states of a qubit.For concreteness, suppose that W 1 outputs |˘θ 1 y and W 2 |˘θ 2 y, where |θ y " cos θ 2 |0y `sin θ 2 |1y.Note that the overlap of the two states is given by cos θ and the Helstrom measurement is just projection onto ˇˇ˘π 2 E .
The convolution W 1 fW 2 outputs either |θ 1 yb|θ 2 y or |´θ 1 yb|´θ 2 y, again two pure states with an overlap angle θ f given by cos θ f " cos θ 1 cos θ 2 .Thus, the algorithm for combining two channels at a variable node is to unitarily compress the output into one qubit and pass the qubit to the parent node.This is accomplished by the unitary transformation which leaves the second qubit in the state |0y.To combine more than two channels, we just perform the pairwise convolution sequentially.The f convolution is more complicated because the outputs are no longer pure.However, applying the unitary U f " CNOT 2Ñ1 CNOT 1Ñ2 results in a CQ state of the form ř jPt0,1u p j ˇˇ˘θ f j E A ˘θ f j ˇˇb| jy x j|.We are free to measure the second qubit, and conditional state of the first qubit is again one of two pure states, though For outcome 0 the angle between the states has decreased, while for outcome 1 the angle has increased.The algorithm for combining two channels at a check node is to apply U f , measure the second qubit, and forward both the qubit and the measurement result to the parent node.As before, several channels can be combined sequentially; for K channels this will generate a message of one qubit and K ´1 bits.The classical messages are required to inform the variable node how to choose the angles in the unitary U f .
The quantum decoding algorithm now proceeds as in classical BP, taking the quantum outputs of the channels and alternately combining them at variable and check nodes.Ultimately this results in one qubit at the root node such that σ x measurement corresponds to the optimal Helstrom measurement for the associated bit.This is sufficient to estimate one input bit.However, this is a sort of "destructive" measurement, since once we estimate the first bit we no longer have the original channel output in order to estimate the second bit.And we cannot run the algorithm backwards to reproduce the channel output as we have made measurements at every check node.To implement the Helstrom measurement "nondestructively", we can leave the CQ output states unmeasured and instead use the classical subsystems to (coherently) control the variable node unitaries U f .In this way the steps in the algorithm can be reversed.
Denoting the unitary action of the algorithm for the jth bit by V j , the Helstrom measurement can be implemented by the projective measurement with projectors Π j,k " V j ˇˇk E A kˇˇˇj V j , where ˇˇk E A kˇˇˇj denotes the kth σ x basis projector on the jth qubit.Each V j is composed of Opnq gates, yielding an overall circuit size of Opn 2 q to decode all bits.Supposing that the code is designed such that the jth input bit can be estimated with error no larger than ε j , the non-commutative union bound [26] implies that the error in sequentially estimating all bits is no worse than 2 b ř j ε j (see [25] for an explicit calculation).

Applications to polar codes 3.1 Polar codes for the pure state channel
Polar codes for the pure state channel may also be decoded with this algorithm, though for polar codes the associated graph is related to a fixed reversible encoding circuit, and the choice of code amounts to where the bits of the message are to be input to the circuit (see [24,25] for more details).Importantly, the graph associated to each input of the encoding circuit is a tree.Indeed each such graph has logarithmic depth from all channel factors to each variable, and every node has degree three.However, unlike the BP decoder, the successive cancellation decoder used by polar codes takes previously decoded bits into account.But these bits can be handled by the BP decoder since the pure state channel is symmetric in the manner described above.Instead of using a permutation to transform W p1q into W p0q, in this case we need only perform σ z .Thus, the quantum BP decoding algorithm gives a successive cancellation decoder for polar codes over the pure loss Bosonic channel using the BPSK constellation [23].

Quantum polar codes for amplitude damping
The idea behind the quantum polar coding scheme of [18,20] is to decompose the problem of transmitting quantum information over a channel AÑB into transmitting classical information about two conjugate observables, "amplitude" and "phase", consider polar codes for each subproblem, and then combine the coding schemes using CSS codes at the encoder and coherent sequential decoding of amplitude and phase at the decoder.This decoding strategy is depicted in [18,Fig. 3] for Pauli channels and [27, Fig. 1] for the general case.As detailed in [20], the two classical transmission tasks are to transmit "amplitude" information over the CQ channel given by z Ñ ρ z " p|zy xz|q and "phase" information over the CQ channel given by x Ñ σ x " pZ x b ½qp b qrΦspZ x b ½q.Here |zy is an arbitrary basis, and we choose that of σ z for convenience, while |Φy A 1 A " ř p z |zy |zy is a bipartite pure state in this same basis with coefficients of our choosing.(See [20] for the precise relation to the conjugate observables σ x and σ z .) Let us now show how to build a decoder for the amplitude damping channel γ with damping parameter γ P r0, 1s.First note that the amplitude outputs all commute due to the form of γ ; the amplitude channel is effectively a classical Z channel in which the input 0 is always transmitted perfectly, but the input 1 may decay to 0 with probability γ.Therefore we can use the classical polar encoder and decoder for this channel.Since the Z channel is not symmetric, the optimal input distribution in the capacity formula is not the uniform distribution, but one with probabilities p and 1 ´p.Now suppose that the bipartite pure state in the phase channel is the state |Φy " ?p |00y `?1 ´p |11y.
Abusing notation slightly and denoting the channel outputs σ ˘, it is not difficult to verify that for U " cos θ " Each of these states is a CQ state with the second qubit classical.Given the second qubit, the first is either in the pure state |˘θ y corresponding to the channel input, or the state |1y independently of the input.Hence, the output of the phase channel is a mixture of an erasure channel and a pure state channel.Since no information about the input can be gleaned from the erased output, the decoder only needs to work with the received pure state outputs, and thus the BP decoder can be used to decode the phase channel.

Conclusions and open questions
We have presented a belief propagation algorithm for bitwise decoding of CQ channels which operates by passing quantum messages on tree factor graphs, and shown several applications to polar codes.This invites the study of quantum message passing algorithms generally and raises many questions.Most immediately, it would be very interesting to understand what happens when the algorithm runs on a factor graph with loops, or how it can be modified to handle some set of non-pure output states.Perhaps in the latter case one can make use of the work on quantum sufficiency (see e.g.[29,30] and references therein).
Another interesting question is the relation of this algorithm to tensor network methods.The problem of marginalization in the commutative setting is explicitly treated as tensor network contraction in [14], and the particulars of the quantum BP decoder bear a similarity with the data gathering approach using tensor network states in [31].Can the methods of approximating quantum states by tensor networks be used to create approximate decoders? x

Figure 1 :
Figure 1: Factor graph for the joint probability distribution of the three-bit repetition code.