Tristochastic operations and products of quantum states

The notion of convolution of two probability vectors, corresponding to a coincidence experiment can be extended to a family of binary operations determined by (tri)stochastic tensors, to describe Markov chains of a higher order. The problem of associativity, commutativity, and the existence of neutral elements and inverses for such operations acting on classical states is analyzed. For a more general setup of multi-stochastic tensors, we present the characterization of their probability eigenvectors. Similar results are obtained for the quantum case: we analyze tristochastic channels, which induce binary operations defined in the space of quantum states. Studying coherifications of tristochastic tensors we propose a quantum analogue of the convolution of probability vectors defined for two arbitrary density matrices of the same size. Possible applications of this notion to construct schemes of error mitigation or building blocks in quantum convolutional neural networks are discussed.


Introduction
Stochastic maps acting on the space of probability vectors are the main tools to describe discrete dynamics in a set of classical probabilistic states.A special, and most widely studied case of such evolution is called a Markov chain and (in the finite scenario) can be characterized by a stochastic matrix M , satisfying M ij ≥ 0 i M ij = 1, for any value of j.
An important subcase are bistochastic matrices, also called doubly stochastic, for which the sums of entries in each row and each column is equal to one.For this class, many interesting results have been obtained, most notably the Birkhoff-von Neumann theorem [1].Bistochastic matrices, which enjoy interesting spectral properties [2], have been applied to various mathematical problems [3,4].
Further generalization of these concepts leads us to the idea of cubic stochastic tensors, corresponding to bilinear (or multilinear) stochastic maps in the space of probability vectors.Such construction naturally emerges primarily in the higher-order Markov chains [5].Any such bilinear map can also be viewed as a product of two probability vectors written ⃗ p ⋆ A ⃗ q induced by a stochastic cubic tensor A satisfying A ijk ≥ 0 and i A i,j,k = 1.This class of classical channels is very general, hence many important problems for stochastic tensors are unsolvable, or solvable only numerically.The noteworthy example is a problem of finding the generalized eigenvectors of A, which resulted in numerous numerical algorithms for finding them [5][6][7][8][9].
We may, however, introduce additional requirements for bistochasticity or tristochasticity of a tensor A, requiring the sum of its elements in any of three possible indices to be equal to one.More specific class consists of permutation tensors, which are tristochastic tensors with entries being either 0 or 1.The simplest example of bilinear operation (1.1) with A being a permutation tensor is the standard convolution between two probability vectors.
Operations induced by these tensors with a maximal degree of stochasticity will be more closely examined in this work.Naturally arising questions concern the structure of the set of tristochastic tensors and their generalized eigenvectors.
The main goal of the paper is to analyze quantum analogues of tristochastic tensors and to introduce a convolution of quantum states.As it turns out, such an analogy will allow us to translate the results obtained for probability vectors into the framework of density matrices.
As it was noted by Lomont [10], there is no proper "operation of convolution", which for arbitrary two pure states produces a pure state as an outcome.However, we provide here a construction of quantum convolution of arbitrary density matrices of the same size, which results in a legitimate density matrix.Although some constructions of quantum convolution of density matrices [11] and quantum channels [12] have already been proposed, they usually correspond to a small subset of operations, compared to those that might find useful applications.Hence in this work we propose, as an analogue of the classical map (1.1), an original definition of convolution for density matrices as a quantum channel, Ω N ⊗ Ω N → Ω N , where Ω N denotes the set of density matrices of size N .Such a definition gives a direct analogue between tristochastic tensors and the proposed quantum tristochastic channels, which allows us to translate several properties of the classical convolution into the quantum domain, such as generalized eigenvectors or neutral elements of the convolution.
Following this analogy, the most natural idea for applying quantum convolution would be to interpret it as a quantum analogue of a coincidence experiment, by the same token as the standard convolution of two probability vectors.
To provide further motivation for this work, consider an important application of convolution of classical probability vectors: convolutional neural networks [13].A quantum version of convolutional neural networks has been recently proposed and proven to be useful in classification tasks, such as determining the phase in spin chains phase transitions, see [14][15][16][17] and references therein.From this perspective, our work can be understood as an attempt to study the building blocks of quantum convolutional neural networks.We would like to note here, that the structure of the proposed convolution corresponds to the joint action of convolution and pooling in these networks.
Another idea [14] would be to use an "inverse" of quantum convolution to encode some quantum state from H into greater Hilbert space to make it more robust against correlated noise, and then to decode information to obtain the original state.
The structure of this paper is as follows.In Section 2 we analyze the notion of classical tristochastic tensors and classical convolutions.We characterize and describe the generalized eigenvectors [5] of a multi-stochastic tensor, the neutral elements of the convolution and its possible inverses.Analogous results for multi-stochastic quantum channels are established in Section 3 and a broader class of quantum convolutions based on coherifications of tristochastic tensors are discussed therein.In Section 4 an explicit construction for such a coherification for permutation tensors is presented.Finally, in Section 5 we discuss the construction of quantum convolution using quantum circuits and investigate the properties of quantum convolution of arbitrary two states of a singlequbit system.
Appendix A contains a note on quantum channels of the form H ⊗H → H, whereas the Appendix B includes some calculations and technical proofs, for the sake of clarity not provided in the main body of the paper.Appendix C serves as a short recipe, explaining how to construct a quantum convolution.

Convolutions of classical probability vectors
Consider a set ∆ N of N-dimensional probability vectors: which characterize any discrete random variable.The set of such vectors forms an N − 1 dimensional simplex ∆ N ∈ R N , whose vertices are standard unit vectors in R N .Linear transformations of probability vectors are determined by a stochastic matrix M , with nonnegative entries, and a fixed sum of entries in each column, These matrices of order N describe discrete dynamics in the set ∆ N of classical probabilistic states, that depend only on the current state of the system.Such processes are famously known as Markov chains.
In the space of stochastic matrices satisfying (2.3), one distinguishes the subset of bistochastic matrices, which additionally fulfil the dual condition for the sum in each row, (2.4) Bistochastic matrices have several interesting mathematical properties [1,2], and they are often applied to problems in physics and the theory of communication [3].
A natural next step is to generalize these concepts and introduce a class of bilinear operations on probability vectors, determined by cubic stochastic tensor A, which describes the evolution of a second order Markov chain [5].Moreover, any such operation describes a certain generalized coincidence experiment.In particular, this class includes the convolution and correlation of probability vectors.The counterpart to bistochasticity for cubic tensor can be defined as: Definition 2.1.Rank three tensor A is called tristochastic if all of its entries are nonnegative and satisfy the following three conditions: A special subclass of tristochastic tensors is formed by permutation tensors.
Definition 2.2.Permutation tensor A is a tristochastic tensor with all of the entries equal to either 0 or 1.
An example of a tristochastic permutation tensor for N = 3 is provided below.
where P n denotes the cyclic matrix of order N .It is easy to see that such a tensor consists of N 2 entries equal to 1 and N 2 (N − 1) entries equal to 0. Tristochastic tensors lead to a generalization of the convolution of discrete probability vectors.It is worth investigating, which properties of bistochastic matrices generalize to higher-rank tensors with a maximal degree of stochasticity.To make our arguments general, we shall use the following notion for the tensors of rank m.

Definition 2.3. A tensor
To not confuse the operation defined by m−stochastic tensor with multiple actions of tristochastic channel, we denote the former by

Associavity and commutativity of stochastic product
For a generic stochastic tensor, the induced product will be neither commutative nor associative.For certain classes of stochastic tensors, however, the product may acquire these important properties.Below we present the necessary and sufficient conditions for these properties.
Theorem 2.4.Let A be an N-dimensional cubic tensor satisfying conditions (2.6) defining a stochastic map (⃗ p ⋆ A ⃗ q) i = j,k A ijk p j q k , where p, q are N-dimensional probability vectors.The operation ⋆ A is commutative if and only if the tensor A ijk is symmetric with respect to the exchange of the last two indices: Proof.Condition of commutativity of the product ⋆ A is equivalent to the following, N j,k=1 (2.8) As the condition needs to be satisfied for any input vectors, the equality holds if and only if A ijk = A ikj .By the same token one may prove the associativity.
The simplest example of a tristochastic tensor inducing an associative convolution reads A ijk = r i−j−k , where ⃗ r is a probability vector.More involved example is given by a "classical analogue" of the convolution proposed in [11], where Σ(N ) is a set of all permutations of elements from {1, • • • , N } and P σ is a matrix representing a permutation σ.The associativity is straightforward to prove by direct calculations.To establish the tristochasticity of (2.9) it is sufficient to notice that conditions (2.6) for tristochasticity are equivalent to the following statement: For any probability vector p one has see Lemma 2.5.

Fixed point of convolution and eigenvectors of multi-stochastic tensors
In this subsection, we resolve the issue of generalized eigenvectors for multi-stochastic tensors.Firstly, we focus on the map p → A[p, p, • • • ], since all of its fixed points are by definition generalized eigenvectors of A. Then using the reducibility of multi-stochastic tensors we present an explicit description of all probability eigenvectors of A by its tristochastic subtensors.
Let us start by generalizing the invariance property of bistochastic matrices.
Lemma 2.5.Let A be an arbitrary m-stochastic channel.Then for any sequence of probability vectors {⃗ p 2 , • • • , ⃗ p m } one of which is ⃗ e, the following equality holds:

12)
Proof.Without loss of generality let ⃗ e be the k-th vector in the sequence {⃗ p 1 , • • • , ⃗ p m }.The left hand side of eq.(2.12) can be written as, Using this fact, we describe the fixed point of the map, ⃗ q → A[⃗ q, • • • , ⃗ q], for any ⃗ q in the interior of the probability simplex.
Theorem 2.6.Let ⃗ q (0) be any point in the interior of probability simplex ∆ N and ⃗ p (0) be a point at its boundary ∂∆ N , such that ⃗ q (0) = α⃗ e + (1 − α)⃗ p (0) for some α ∈ (0, 1].For any m-stochastic map A let us denote the sequences {⃗ q (n) } and {⃗ p Lengthy proof of this theorem is relegated to Appendix B.1 For further work we need one more notion.
(iii) The truncation of the tensor is also an m-stochastic tensor.Moreover, for any m-stochastic tensor Proof of this lemma is provided in Appendix B.1.We are now ready to present the main theorem of this section.Theorem 2.9.Let A be an m-stochastic tensor and m ≥ 3. Then each subset of indexes values corresponds to exactly one eigenvector of A i 1 ,••• ,im inside the probability simplex ∆ N and vice versa.Moreover, such an eigenvector has the form where k = #I.Each such generalized eigenvector corresponds to the eigenvalue 1. Proof.Firstly we prove that each eigenvector ⃗ p corresponds to only one set of indexes I described in the Theorem and then show that each set of indexes I corresponds to a single eigenvector ⃗ p.Along the way, we prove a second statement of the Theorem.Let ⃗ p be an eigenvector of Acorresponding to the eigenvalue λ.If all p i are greater than zero, then by Theorem 2.6 one has ⃗ p = ⃗ e and λ = 1, since ⃗ p belongs to the interior of the probability simplex.Hence the theorem is satisfied with I = ∅.Otherwise, there exist a nonempty subset I ⊂ {1, • • • , N } such that p i = 0 for all i ∈ I. Then for each i ∈ I one has: ∈ I, which proves the claim in one direction.
Consider a subspace defined by indices not belonging to the set I. Then by Lemma 2.8, the tensor A i 1 ••• im can be truncated to an m-stochastic tensor, and truncated ⃗ p is its eigenvector with all positive entries.Using Theorem 2.6 to tensor (2.18), the vector ⃗ p must be maximally mixed on the discussed subspace.Hence it has the structure (2.16) and forms a generalized eigenvector of A with an eigenvalue equal to unity.
To prove the implication in the opposite direction we use Lemma 2.8.Because a tensor

Identity and inverses
Let us review the convolution of two probability vectors as a binary operation and focus on the identity and inverse of convolution.
Definition 2.10.Let A be a tristochastic tensor and ⃗ I A a probability vector such that for any probability vector ⃗ p one has Then ⃗ I A is called an identity of the product ⋆ A generated by the tensor A.
Theorem 2.11.Let A be a tristochastic tensor and ⃗ I A an identity of ⋆ A then: . it has only one nonzero value on k-th place, (iii) For any i, j ∈ {1, • • • , N }, and k as in the previous point , if all of the above points are true for some vector ⃗ q, then ⃗ q is an identity of A.
Proof.By the definition of identity ⃗ I A ⋆ A ⃗ I A = ⃗ I A , which implies the statement (i).
To prove the point (ii) let I be a set of indices such that ( ⃗ I A ) i = 0. Since ⃗ I A is a maximally mixed state on subspace corresponding to {1, • • • , N } \ I, then for any probability vector ⃗ q with nonzero values only for indices form

By the definition of identity ⃗ I
The point (iii) follows form the calculations performed below for any probability vector ⃗ p, To prove the last statement (iv) one calculates, where the second equality follows from ⃗ q satisfying point (ii), and the third equality follows from ⃗ q satisfying point (iii).
Theorem 2.12.Let A be a tristochastic tensor and ⃗ I A be an identity of ⋆ A .Then for a probability vector ⃗ p there exist an inverse probability vector ⃗ q, if and only if ⃗ p is of the form Proof.Assume there exists ⃗ q, which is an inverse of ⃗ p.Since the identity is of the form Because each term in the above sum is nonnegative, it implies that if p j > 0 and q l > 0, then A ijl = 0 for any i ̸ = k.On the other hand i A ijl = 1, hence for j and l such that p j > 0 and q l > 0, one has A kjl = 1.Let us pick any l such that q l > 0, then: Hence ⃗ p can have at most one nonzero element, p i = δ i,m .Analogically, we can show that q l = δ l,n .Finally, because A kjl p j q l = A kmn and 1 = ⃗ I Ak = j,l A klj q l p j = A knm .
The implication in the opposite direction is trivial.
From Theorem 2.11 one immediately sees that for generic tristochastic tensor, the identity of convolution usually does not exist.However, in each dimension, it is quite easy to construct tristochastic tensors, for which the convolution always possesses an identity, for example , where P N is a permutation matrix corresponding to a cyclic permutation σ(i) = i + 1 mod N .In the dimension 3 the permutation tensor of interest is provided in (2.7).Moreover, Theorem 2.11 implies another property of the identity of convolution.
Corollary 2.13.For each tristochastic tensor A the convolution ⋆ A possesses at most a single identity.
Proof.To prove this statement by contradiction assume that the product ⋆ A possesses two identities.Then by Theorem 2.11, item (iii) for each pair of indices i, j and some k 1 , k 2 we have Hence evaluating the expression k A i i k , for any value of i we get: which contradicts the tristochasticity of A.

Quantum multi-stochastic operations
Now we change our focus to a quantum system.Our main constituent is a set Ω N of density matrices of size N : The transformations of interest between density matrices are completely positive trace preserving maps, Ψ : [18,19] implies that any quantum channel Ψ can be represented by a Jamio lkowski state ρ Ψ on an extended system as The rescaled state D = N ρ Ψ is called a dynamical matrix of Ψ or a Choi matrix and allows to express the action of a channel Ψ as where D ≥ 0 and Tr 2 is a partial trace over the second subsystem.Moreover to assure the trace preserving condition the dynamical matrix D satisfies where a tri-partite dynamical matrix D 123 is an arbitrary semipositive defined matrix of order N 3 , which satisfies a single partial trace condition, Tr For our work, it is essential to specify a class of convolutions determined by a tristochastic matrix D 123 , which satisfies three conditions analogous to (2.6), Moreover, one can check that these conditions are equivalent to the statement that channels defined via are also well defined trace preserving quantum channels.For brevity the indices 1, 2, 3, emphasizing that D 123 is a tripartite state, will be suppressed in the future part of the work, so we shall write D 123 = D.
In analogy to the classical case, let us also define a general m-stochastic channel, which we use to make our proofs more general: is called an m-stochastic channel, if for any sequence of density matrices also forms a valid quantum channel.
The question, of whether a quantum convolution is commutative or associative depends on the choice of the dynamical matrix D and is analogous to the classical problem.It is worth mentioning that the convolution proposed by Aniello [11], which for the Lie group SU (N ) has a form where ρ, σ and ϑ are density matrices, is a tristochastic associative convolution.The proof of its tristochasticty is analogous to the proof in the classical case (2.11).
The last important definition needed in the quantum framework is reducibility.
With those definitions, we are prepared to present the first main result of this Section, which is a direct analogue of Theorem 2.9.
Theorem 3.4.Let Φ D be an m-stochastic channel acting on the N -dimensional states and m ≥ 3. Then each subspace V ⊂ H such that correspond to exactly one density matrix being a generalized eigenvector of Φ D and vice versa.Moreover, each such eigenvector is the maximally mixed state on the subspace V ⊥ and the corresponding eigenvalue is 1.
Proof of this theorem, analogous to the proof of the theorem 2.9, is presented in Appendix B.2 together with all intermediate steps.
One can also form the quantum analogues of Theorems 2.11 and 2.12 established for the classical convolution.
if and only if ρ is a pure state spanning one dimensional subspace V ρ and there exist a one dimensional subspace V σ such that where P I is a projection onto a subspace spanned by an identity of Φ D , P ρ is a projection on a subspace V ρ and P σ is a projection on a subspace V σ .The inverse of ρ, σ is a state from V σ .
Theorems 3.6 and 3.7 can be proven in the analogy to the proofs of their classical counterparts.One just needs to rephrase all steps in terms of density matrices instead of probability vectors.

Coherification of m-stochastic tensors
Given two diagonal density matrices ρ and σ one can calculate their convolution in two ways: as a quantum convolution or as a classical convolution of their diagonal elements: The results of these two operations agree for any diagonal ρ and σ if and only if By this token, we may define a coherification [20] of an m-stochastic tensor.
such that the diagonal of its dynamical matrix D agrees with the elements of A, Note that we do not demand here the m-stochasticity of the channel Φ D , relaxing this property from now on.This is because the m-stochasticity of quantum channels imposes rather strong constraints.For instance, in the case N = 2, the only tristochastic coherification of the permutation tensor, is trivial in the sense that the dynamical matrix of the quantum channel Φ D remains diagonal: Therefore such a "classical" channel cannot give any "quantum advantage" over the standard classical convolution.Thus from now on we focus on coherifications of a classical convolution, represented by non diagonal dynamical matrix D, without demanding its tristochasticity on the quantum level.An explicit formula for a convolution obtained by a coherification of the matrix (3.14) is provided in Appendix C.
Even though coherification D of an m-stochastic tensor A is in general not an mstochastic channel, one may extract some information about generalized eigenvectors ρ of Φ D using only m-stochasticity of A, at least in the case of permutation tensors.Lemma 3.9.For any coherification Φ D of an m-stochastic permutation tensor T , for each input states Proof.To prove this statement it is sufficient to show that off diagonal terms of which is equivalent to: where D is the dynamical matrix of the channel Φ D written in the tensor form.Using the positivity of D, calculating the determinant of a minor,   one obtains an inequality Moreover, we must also invoke the trace preserving property of D, which implies, that for any In order to have a nonzero value of certain Which by (3.15) imply that all 4 elements of permutation tensor ln must be nonzero.This condition cannot be satisfied because a permutation tensor cannot have both entries Theorem 3.10.Let T be an m-stochastic permutation tensor of dimension N .Then for any coherification Φ D , the fixed points (generalized eigenvectors to eigenvalue 1) of Φ D inside Ω N have a form where the I is a set of indices with respect to which the permutation tensor A is reducible and k = #I.
Proof.By Lemma 3.9 the diagonal elements of the fixed point of Φ D , ρ i i , must be equal to the coefficients of some generalized eigenvectors of T : p i .Moreover, Theorem 2.9 implies that p i = 0 if i ∈ I and p i = 1 N −k if i / ∈ I, hence the values of diagonal terms of ρ follows.The off-diagonal terms ρ j i for i ∈ I or j ∈ I are equal 0 due to positivity of ρ, since at least one of the diagonal terms ρ i i or ρ j j in the minor: is equal to zero.

Optimal coherification of tristochastic permutation tensors
The aim of this section is to identify convolution between quantum states determined by optimal coherification of tristochastic tensors.The main quantity which we choose to maximize is the 2-norm coherification C 2 [20], which quantifies the average contribution of the non-diagonal entries of the dynamical matrix D, (4.1)Here Φ T is a given coherification of a tristochastic tensor T , λ D diag ,µ are eigenvalues of dynamical matrix D T and λ D T,diag ,µ are eigenvalues of the dynamical matrix of diagonal coherification -see example (3.14).Expression (4.1) without extracting diagonal terms is also called the purity of a channel γ(Φ T ).The first pair of indices k, l run over the set {1, • • • , N }, whereas the second one n, m over the set The difference between the entropies of the rescaled Choi dynamical matrix and of its diagonal leads to another, entopic measure of coherence [20] C e : where S is von Neumann entropy and the prefactor N 2 comes from the normalization of the Jamio lkowski state, ρ Φ = D/N 2 .

Coherification of the tristochastic permutation tensor of dimension 2
We start by discussing the coherification of the permutation tensor (3.13), rewriting the channel Φ T in the Kraus representation, Where k is the number of Krauss operators.For further details consult Appendix A.2.
Comparing the diagonal elements we immediately get (D T ) a c a c = i |(K i ) a c | 2 , hence the Kraus operators must be of the form, Further simplification comes form the trace preserving property, Tr 1 0 0 0 0 z 1 z 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 z 3 z 4 0 0 0 0 0 0 0 0 0 z1 0 0 z3 0 1 0 0 z2 0 0 z4 0 0 1 0 0 0 0 0 0 0 0 0 In this case, the expression for purity of the corresponding channel Φ T reduces to Notice that |f ⟩ and |g⟩ are two orthonormal vectors, therefore one can span an orthonormal basis including those two vectors |f ⟩ = |e 1 ⟩ and |g⟩ = |e 2 ⟩.The number of basis vectors depends on the number k of Kraus operators K i .Hence the term with the inner products in (4.7) can be rewritten as Here a i , d i denote the components of vectors |a⟩, |b⟩ in the basis mentioned above.Thus we found an upper bound on the channel purity γ(Φ T ), which can be achieved using k = 2 Kraus operators.The maximal value of 2-norm coherence is, In Appendix C we present the final expression for the optimal coherification Φ T of a tensor T .Moreover, in Appendix B.3 we propose our candidate for the optimal coherification of any N = 2 dimensional tristochastic tensor with respect to the 2-norm coherification (4.1).

Coherification of permutation tensors
Knowing how to construct an optimal coherification of the permutation tensor of dimension 2, we can generalize this method for an arbitrary permutation tensor of dimension N .
Let us start with the condition i |(K i ) kl j | 2 = T jkl , which implies that each Kraus operator has at most N 2 nonzero entries.Let us denote the value of n th nonzero entry in j th row of i th Kraus operator as B j in , with scalar product between them understood as For example, for the permutation tensor (2.7) of order three, the Kraus operators are parametrized as Since T is a permutation tensor there is only a single nonzero element in each column in K i .Hence checking the trace preserving condition, K † i K i = I, we obtain || |B j n ⟩|| 2 = 1 on the diagonal, and scalar products ⟨B j n 1 |B j n 2 ⟩ = 0, outside the diagonal.Therefore, for each j the set {|B j n ⟩} N n=1 is an orthonormal set with N elements and the minimal number of Kraus operators is N .
Calculation of 2-norm coherence C 2 (Φ) boils down to a sum of squared norms of projections of vectors onto elements of a certain basis: As in the case of qubits, the result is maximal, if all vectors |a (k,n) ⟩ belong to the same N dimensional space, so none of their components in any basis would be lost.In such a case the resulting coherence of the channel Φ T reads, To calculate the maximal entropic coherence, first note that the eigenvalues of ρ Φ are equal to Where in the second step we used the fact that {|B j n ⟩} N n=1 are orthogonal bases.Therefore the optimal entropic coherence is equal to

Convolution of quantum states
Having understood the structure of coherifications for permutation tensors, we can move on to their implementation.The Theorem presented below serves as a helpful tool in achieving this goal.
Theorem 5.1.Let B k denote unitary matrices, whose rows correspond to basis vectors in C N .Then for each coherification Φ T of a permutation tensor T there exist a unitary matrix of the form U = BP , where P is a permutation matrix of dimension N 2 and B is a block diagonal matrix with diagonal given by a simple sum of N blocks, Proof.The coherification of the permutation tensor gives exactly N Kraus operators of the form (4.9), hence we can define unitary operator U = i K i ⊗ |i⟩, so that: (5.2) In the case of the permutation tensor T 3 of order three defined in eq.(2.7) the matrix U of order nine has the form: where B k in are simultaneously the coefficients from eq. (4.9) and the matrix elements of B k .In general, for a permutation tensor T , the unitary U is of the form: (5.4) The last step is the observation that there always exists a permutation of columns, such that the U P ⊤ = B achieves the desired block structure.
2 −e iα e iϕ cos θ 2 0 (5.5)Repeating the steps from eq. ( 5.1) we get the unitary matrix U 4 , To design the corresponding circuit we decompose an operation U 4 into gates from the universal set [21] where the gate Λ(α, θ, ϕ) can be further decomposed by Z(α) = |0⟩⟨0| + e iα |1⟩⟨1| and (5.8)The gate U 4 can be decomposed in an alternative, more transparent way, e iα e −iϕ/2 , (5.9) where Q.CON V (θ) = U 4 (0, θ, 0) is a two-qubit gate, dependent only on the parameter θ.Therefore the parameter α is an additional global phase of the resulting state and often can be set to 0. The role of the parameter ϕ is similar and it corresponds to applying a −ϕ/2 phase gate on both states before the convolution and +ϕ/2 phase gate after it.Thus only the angle θ genuinely influences the way how the convolution acts.
Figure 2: The action of two qubit convolution (5.6), ω = ρ ⋆ T σ, with parameters α, ϕ set to 0, for pure states ρ, σ visualized on the Bloch ball by blue vectors.Three versions of the convolution operator parametrized by the angle θ = 0, π 2 and π are shown in red, and the decoherent input ⃗ p, ⃗ q and output ⃗ r states are denoted by black vertical segments.The red plane marks all possible outputs of U 4 (α, θ, ϕ), for given input states.

Further study of qubit convolution
The choice of the optimal convolution (5.6) depends on the way, how these operators will be used.An example, proposed in Ref. [14] concerns coding against the correlated noise.
Notice that if we encode one-bit values 0 → 00 (or 11) and 1 → 10 (or 01), and allow for action of the noise that may flip simultaneously both bits, the permutation tensor (3.13) recovers the original value.A similar procedure can be performed on the quantum level.Firstly we enlarge the Hilbert space of the system ρ → ρ ⊗ |0⟩⟨0| .
Since matrix U 4 corresponds to decoding and error correction we use U † 4 as an encoding, so that with an absence of noise their joint action compensates.
Consider a noise of the form R(⃗ r) ⊗ R(⃗ r), where R(r) is a rotation of a qubit along a vector r, with a phase proportional to |r|: Operation of encoding, transformation by the noisy channel and decoding are given by the following evolution of a given state ρ, Considering the action of (5.10) we calculate the fidelity between the input ρ and output Φ(ρ), which for the pure state ρ = |ψ⟩⟨ψ| reads Such a value describes how much noise diminishes our capabilities to recognise the outcome state.For the basis states |0⟩ and |1⟩ on gets the following values w(sin(θ) + 1) r2 3 (cos(πr) − 1) 2 + sin 2 (πr) , with w = 1 − r2 3 .The results depend only on a single parameter describing the matrix U 4 , the angle θ, appearing only in the second expression.The optimal value θ opt = − π 2 corresponds to: Another, more abstract, way to determine the influence of parameters α, θ, ϕ in the qubit convolution is to examine entangling power [22] and gate typicality [23] of any two-qubit unitary gate U 4 .
Entangling power e p ∈ [0, 1] of a channel U is a quantity describing, how much the outcome U (|ψ⟩ ⊗ |ϕ⟩) is entangled on average for random input pure states |ψ⟩ and |ϕ⟩.After the partial trace in (5.1) it gives us insight how much the result of the convolution becomes mixed.The entangling power of U 4 achieves the maximal value of qubit channels equale ( U 4 ) = 2/3, independent of the parameters θ, ϕ, α.Another measure, the gate typicality g t ∈ [0, 1], specifies how much input states have been "interchanged" during the action of the unitary channel and for U 4 obtain values: Hence for θ = 0 more information from the first state is lost during the partial trace, and for θ = π more information from the second state is lost during the partial trace.
The case θ = ±π/2 corresponds to the "symmetric treatment" of both input states which can be desired.Detailed discussion of entangling power and gate typicality is provided in Appendix B.4.Our numerical simulations also suggest that setting θ = ± π 2 guarantees the slowest rate of entropy increase in multiple convolution schemes.Note that for θ = ± π 2 the basis {|a⟩, |d⟩} and {|f ⟩, |g⟩} in (4.6) are mutually unbiased [24].Thus, we presume that the most preferable coherification of an arbitrary permutation tensor usually has all the bases {|a (k,l) ⟩} N l=1 mutually unbiased.Such a scheme is doable for any dimension N , which is a prime or a power of a prime [24].

Concluding remarks
In this work, we presented a family of products r = p⋆ A q, parametrized by a tristochastic tensor A, defined on the set of classical probability vectors.Furthermore, we analysed the discrete dynamics induced by m-stochastic tensors.Investigations performed from the perspective of generalized Markov processes led to the characterization of the generalized eigenvectors.An alternative perspective of binary operations allowed us to study the connectivity, commutativity, and the existence of a neutral element and inverse elements for these operations.
The above notions and results were translated into the quantum setup.We analyzed tristochastic and multi-stochastic quantum operations and studied their properties.In the next step, we enlarged a class of discussed channels defining coherification [20] of m-stochastic tensors to take full advantage of the quantum properties.We provided an explicit way to construct a coherification matrix D ≥ 0 of tristochastic permutation tensors with maximal norm two coherence, which yields a constructive recipe to convolute arbitrary two quantum density matrices of the same dimension ω = ρ × D σ such that the coherence in preserved as much as possible.Finally, we analyzed this class of convolutions for qubits, discussing their properties and possible applications.
Our results raise several questions worthy further study.First and foremost, the action of quantum m-stochastic tensors and coherification of m-stochastic channels should be examined for the case of entangled states as the input of the convolution operator, which is the more natural assumption in the case of convolutions in quantum convolution neural networks [17].Another useful topic is the collective behaviour of "interconnected" convolutions operating across multiple subsystems.This leads us also to the issue of parametrization and implementation of convolution, which was discussed here only in the simplest case of the convolution of two single qubit states.

A. On binary quantum channels
In this Appendix, we recall the basic properties of binary quantum channels that map two density matrices of order N to another density matrix of the same size.We temporarily denote these three Hilbert spaces as A, B and C and the corresponding sets of density matrices as Ω A , Ω B , Ω B .It suffices to give a definition of such a channel for separable states and extend it by linearity Such maps can be represented in various ways.The following ones occur to be the most convenient for our work.

A.1. Dynamical matrix representation via Choi-Jamio lkowski isomorphism
In general, any completely positive trace preserving channel Φ between Hilbert spaces X and Y can be expressed using Choi-Jamio lkowski isomorphism [18,19] as Any valid dynamical matrix, D = D ABC , acting on the space A ⊗ B ⊗ C, induces a binary operation on quantum states ρ ⋆ D σ = Φ D (ρ ⊗ σ).The necessary and sufficient conditions for trace preserving and complete positivity of such a channel in the Choi representation are The requirement of tristochasticity is just a repetition of the first condition for partial traces over the subsystems A and B.

A.2. Kraus representation
Kraus operators for binary operation are linear maps between spaces A⊗B and C, which means that they can be represented as N × N 2 rectangular matrices.The action of a channel Φ in the Kraus representation takes the form.
The condition of positivity is satisfied automatically, and the condition for trace preservation requires that i K † i K i = I.A minimal number of Kraus operators necessary to represent the channel r Φ is called the rank of the channel.
To connect the Kraus representation with the dynamical matrix let us apply the Choi-Jamio lkowski.Let the indices µ, ν, c, d run from 1 to N 2 and correspond to A ⊗ B space, while a, b run from 1 to N and correspond to C space.Moreover, let E (µ,ν) be a matrix with 1 in the matrix entry µ, ν and 0 in all the others, then (A.4)

A.3. Unitary evolution in an enlarged space
Any channel Φ : X → Y, can be associated with an isometric transformation V ∈ L(X , Y ⊗ Z), for certain auxiliary space Z by so called Stinespring representation, The minimum dimension of the auxiliary space Z is equal to the rank r Φ of a channel Φ.For the channel with equal Hilbert spaces, X = Y, it is straightforward to rewrite this expression using some unitary transformation, U ∈ L(X ⊗ Z, Y ⊗ Z), defined by the relation U (|i⟩ ⊗ |0⟩) := V |i⟩.One obtains then In the case of binary channels (A.3) the description of their action can take various forms.One can treat both arguments in the symmetric way and the auxiliary space plays the role of the output.After unitary evolution and the partial trace over the input spaces A and B one obtains the form similarly to (A.6).
where U ∈ U (A ⊗ B ⊗ C).This approach is valid for any channel Φ D with any rank r Φ .Alternatively, in a case in which the binary channel has the rank r Φ = N , another, more compact unitary representation becomes natural.The isometry V from (A.5) is then a unitary operator, and the channel can be represented as: Note that the asymmetry concerning both arguments related to the partial trace over subsystem B is only apparent, as a dual form, corresponding to partial trace over subsystem A can also be written.

B.1. Proofs and calculations form Section 2
In this subsection, we present the proofs of the theorems stated in section 2, starting with Theorem (2.6), which for convenience is restated here.
Theorem B.1.Let ⃗ q (0) be any point in the interior of probability simplex ∆ N and ⃗ p (0) be a point at the boundary of the probability simplex ∂∆ N such that ⃗ q (0) = α⃗ e+(1−α)⃗ p (0) for some α ∈ [0, 1), Next let us denote sequences {⃗ q Proof.Let us derive the general formula for ⃗ q (n) using ⃗ p (n) .Starting from ⃗ q (1) one ges, where the 4-th equality follows from Lemma 2.5.Repeating the above calculations one finds that, ⃗ q Hence the norm of ⃗ q (n) is equal to: 2 is bounded by 1 and (1 − α) 2(m−1) n converges to 0, we arrive at lim Because each ⃗ q (n) lie inside the probability simplex and ⃗ e is the only vector inside probability simplex with norm equal 1/ √ N , we get: Furthermore, we present below a proof of Lemma 2.8, rewritten for convenience.
Lemma B.2.Let A i 1 ••• im be an m-stochastic reducible tensor with m ≥ 3 and let I be a set of indexes defined as above with #I = k.Then the following holds: is reducible with respect to any of its indexes, with the same set I of indexes values, The truncation of the tensor is also an m-stochastic tensor.Moreover, for any m-stochastic tensor Proof.To prove the first statement let us fix the value of indices i 4 , • • • , i m / ∈ I, and consider the following sum: This expression can be bounded from above by Hence we obtain an inequality k 2 ≥ k(N − k), implying that k ≥ N/2.
To prove the second statement notice that Then for each index i r two equalities hold, which ends this part of the third statement.To prove the second part of the last statement notice that for each value of indexes i 2 , • • • i m / ∈ I the following relations hold, Hence for all i 1 ∈ I, i 2 , • • • i m / ∈ I the following entries vanich A i 1 ••• im = 0, which ends the proof.

B.2. Generalized eingenvectors of quantum multi-stochastic channels
In this Appendix we prove Theorem 3.4, using quantum counterparts of techniques applied earlier in the classical case of m-stochastic tensors.To simplify the notation let us denote the maximally mixed state of order N as ρ * = I/N .Let us start with the following lemma: Lemma B.3.Let Φ D be quantum m-stochastic channel.Then for any sequence of density matrices {ρ 1 , • • • , ρ m−1 }, one of which is a maximally mixed state ρ * , the following equality holds: Proof.We prove the lemma by induction.Firstly notice that if m = 2, D is just a unital channel and hence Where the tensor If the last state in the tensor product in (B.15) is equal to ρ * , we can define an m − 1 stochastic channel Φ D,ρ m−1 using the first state in an analogous way.By induction, one can assume that the statement (B.14) is true for the m − 1 stochastic channels.Thus we have, which ends the proof.
The next theorem is a direct quantum generalization of Theorem 2.6.
Theorem B.4.Let σ (0) be a density matrix in the interior of Ω N and let ρ (0) be a density matrix at the boundary of Ω N such that σ (0) = αρ * + (1 − α)ρ (0) for some α ∈ (0, 1], Let us define a sequences {σ Proof.Let us derive a general formula for σ (n) using ρ (n) .By explicit calculation of σ (1)  one obtains: 16) The fourth equality above follows from Lemma B.3.Repeating the above calculations one finds that (B.17) Now, we may calculate the || • || 2 norms of σ (n) defined by the Hilbert-Schmidt inner product, ⟨ρ, σ⟩ HS = Tr ρ † σ .The squared norms of σ (n) reads: Finally because each σ (n) is an element of Ω N , which is a closed set and ρ * is the only density matrix inside Ω N with norm equal 1/ √ N , the limit of interest is For further work, we need a quantum counterpart of reducible m-stochastic tensor provided in Definition 3.3.Let us introduce a short-hand notation: P V denotes projection onto a subspace V ⊂ H while P V ⊥ a projection onto the complementary subspace, orthogonal to V .Then for an m-stochastic channel, we write the projections of its dynamical matrix in consecutive subsystems, The partial trace of the multi-partite matrix D over a subspace V on k-th subsystem will be written as Note that in Definition 3.3 we can equivalently use density matrices supported on subspaces V and V ⊥ instead of pure states.Hence we use these formulations interchangeably.
Lemma B.5.Let Φ D be an m-stochastic reducible channel with m ≥ 3 and let V be a proper subspace defined as above with dim(V ) = k.Then the following properties hold: (ii) Channel Φ D is reducible with respect to any of subsystems, with the same subspace and consider the following expression, But the same sum can also be bounded from above by Any blank space • in equations (B.23) and (B.24) means that the dynamical matrix is truncated analogically as in Lemma B.3 with projections inserted onto every subsystem except the second or the third one.Hence we arrive at the inequality, k 2 ≥ k(N − k), so that k ≥ N/2.Now we focus on the statement (ii).In the following calculations, we omit the last steps from (B.23) and (B.24), i.e. the expansion of the traces.Notice that where in the second to last step we used m-stochasticity of Φ D combined with Lemma B.3.Then for each subsystem labelled by t we write in a similar manner, But we also have which is a sum of non negative terms, because is semipositive define density matrix, and so is |f s 1 ⟩⟨f s 1 |.Therefore, each of those terms is equal to 0, which by multi-linearity of the channel Φ D and by the fact that bases |e i ⟩, |f i ⟩ can be chosen arbitrarily in each subsystem, proves the statement (ii).
In the statement (iii) the semi positivity of the channel D| V ⊥ ,•••V ⊥ follows immediately from semi positivity of D. Thus we are left to show the m-stochasticity of the channel ΦD| V ⊥ ,••• ,V ⊥ that for any subsystem labeled by t and any density matrices Since all density matrices ρ l are supported in V ⊥ , we get the following chain of equalities: where the second to last step follows from the multi-linearity and the second statement.
To prove the second part of the statement (iii) let us choose an arbitrary set of states which ends the proof.
Now we are ready to present a proof of Theorem 3.4, which we invoke here for completeness.
Theorem B.6.Let Φ D be an m-stochastic channel acting on the N -dimensional states and m ≥ 3. Then each subspace V ⊂ H such that corresponds to a single density matrix being a generalized eigenvector of Φ D and vice versa.Moreover, each such eigenvector is a maximally mixed state on the subspace V ⊥ and the corresponding eigenvalue is 1.
Proof.We demonstrate first that each eigenvector ρ corresponds to a subspace V described in the theorem and then show that each subspace V corresponds to an eigenvector ρ.Along the way, we also prove the second statement of the theorem.
Let ρ be an eigenvector of D to the eigenvalue λ.If all eigenvalues of ρ are greater than zero, then by Theorem B.4 one has ρ = ρ * and λ = 1, since ρ belongs to the interior of Ω N so the theorem is satisfied with V = 0. Otherwise, there exist a subspace V spanned by eigenvectors |e i ⟩ corresponding to zero eigenvalues of ρ, hence P where each λ ir and |f ir ⟩ are a i r -th positive eigenvalues and the corresponding eigenvector of ρ and λ min is the smallest positive eigenvalue of ρ.Because D is semipositive and its trace on the dynamical matrix D must be identically zero on this subspace, which proves the thesis in one direction.By Lemma B.5 the dynamical matrix D can be truncated to D V ⊥ ,••• ,V ⊥ , which defines an m-stochastic channel.On the subspace V ⊥ , ρ is an eigenvector of Φ D V ⊥ ,••• ,V ⊥ with no eigenvalues equal to 0. Therefore, by Theorem B.4 applied to the channel Φ D V ⊥ ,••• ,V ⊥ , the state ρ| V ⊥ must be a totally mixed state on subspace V ⊥ and the generalized eigenvalue corresponding to ρ is equal 1.
To prove the implication in the opposite direction we use the third point of Lemma B.5.Because a channel Φ D V ⊥ ,••• ,V ⊥ is m-stochastic, by Lemma B.3 it also has a generalized eigenvector ρ := ρ * on the subspace V ⊥ .Moreover, since Φ D is reducible we know the second point of Lemma B.5 also holds, hence Using these assumptions one can calculate the norm 2 coherence [20]:  (B.37)Notice that there cannot exist a channel Ω 2 ⊗Ω 2 → Ω 2 described by only one Kraus operator, because it would imply that such channel is a unitary between vector spaces of different dimensions.Therefore, described coherifications use a minimal number of Kraus operators.The squared norms of each Kraus operator rescaled by 1/4 are eigenvalues of ρ Φ and each of them is equal to:

C. Explicit forms of convolution of quantum states
The aim of this appendix is to explicitly provide the formulas for convolution between two quantum states.First let us show the dynamical matrix of the optimal coherification of bit permutation tensor (4.6), using a convenient parametrization from (5.5): (C.1)As we argued in Section 5, this channel is easier to study as a composition of unitary channel U 4 (α, θ, ϕ) described in (5.6), followed by a partial trace over the second subsystem.Within this class of unitary maps, at least a couple of well-known bipartite gates are hidden.
For larger systems the convolution, in the sense of optimal coherification (Definition 3.8) of permutation tensor, is also easier to study and implement in the form of a unitary channel followed by a partial trace.This form was already presented in the Theorem 5.1, together with an example of the qutrit channel.

Figure 1 :
Figure 1: Schematic representation of a reducible tensor A with marked set of indices I and a tristochastic subtensor A ′ .

Tr 1 [
D] = I .Quantum operations satisfying the unitality condition Ψ B (I) = I are called bistochastic and form an important class of the channels.The dynamical matrix of any bistochastic channel Φ B satisfies two conditions for both partial traces, Tr 1 [D] = I N , Tr 2 [D] = I N .(3.2) Making use of the channel description (3.1), we propose a quantum convolution inspired by the convolution (2.5) of classical probability vectors.

Definition 3 . 5 .
Let Φ D be a tristochastic channel established in the Definition 3.2 and I D a density matrix such that for any density matrix ρ: I D ⋆ D ρ = ρ ⋆ D I D = ρ , then the density matrix I D is called an identity of Φ D .Theorem 3.6.Let Φ D be a tristochastic channel and I D an identity of Φ D .Then the following statements hold (i) I D is a nontrivial eigenvector of Φ D , hence Φ D is reducible, (ii) I D is a pure state, spanning one dimensional subspace V I , (iii) Let P I be a projection on the subspace defined above, then Tr 3 [(I N ⊗ I N ⊗ P I )D(I N ⊗ I N ⊗ P I )] = Tr 2 [(I N ⊗ P I ⊗ I N )D(I N ⊗ P I ⊗ I N )] = D I , where [D I ] i k j l = δ ik δ jl is a dynamical matrix of the identity channel, (iv) If all of the above points are true for some density matrix ρ, then ρ is an identity of Φ D , (v) Moreover, for each tristochastic channel Φ D , there exists at most a single identity state.Theorem 3.7.Let Φ D be a tristochastic channel and I D an identity of Φ D .Then for density matrix ρ there exist an inverse density matrix σ,

. 5 )
Thus four scalar products are equal to zero and the remaining free terms are z 1 = ⟨f |a⟩, z 2 = ⟨g|a⟩, z 3 = ⟨f |d⟩, z 4 = ⟨g|d⟩ and their conjugates.The resulting dynamical matrix D T for T from (3.13) reads,

Remark 5 . 2 .
In general there, are many equivalent sets of Kraus operators defining the same channel, hence to find only the relevant information describing any channel one should rather consider its dynamical matrix D. The values of the dynamical matrix contain only scalar products ⟨B k n , |B l m ⟩.Thus one can rotate all vectors |B k n ⟩ without affecting a channel action.Therefore in Theorem 5.1 we may fix the first diagonal block B 1 = I N .5.1.Channel generation and quantum circuit for qubits As an illustrative example let us examine the coherification of the simplest tristochastic tensor (3.13).Note that the dynamical matrix of the channel (C.1) depends only on the scalar product between vectors |a⟩, |d⟩, and |f ⟩, |g⟩.Therefore we might rotate the parameter vectors |a⟩, |d⟩ by any unitary matrix u of order two, and |f ⟩, |g⟩ by u † without affecting the action of the channel.Using this freedom we set |a⟩ = |1⟩ and |b⟩ = |2⟩.Hence the Kraus operators for any channel in this set are defined by only three phases: α ∈ [0, 2π], θ ∈ [0, π] and ϕ ∈ [0, 2π]:

2 )
with D ≥ 0 and Tr A [D] = I.In this work, we study the generalized case of two inputs, X → A ⊗ B; Y → C. Then such an expression takes the form,

Figure B1 :
Figure B1: Projection of the set of two-qubit unitary gates U ∈ U (4) into the plane:entangling power e p versus gate typicality g t , limited by black and red borders.The gates U 4 defined in (5.6), corresponding to quantum convolution, occupy the red bold border with the highest possible entanglement power e p = 2/3 for two-qubit gates.

iα cos θ 2 0 0 e iα e iϕ sin θ 2
iα e −iϕ sin θ 2 −e −iα e −iϕ cos θ by Lemma 2.5 it also has a generalized eigenvector ⃗ p := ⃗ e on the N −k dimensional subspace described by the values of indices not in I. Hence ⃗ p has a form (2.16).Moreover, since A i 1 ••• im is reducible, the second point of Lemma 2.8 holds, so ⃗ p is also an eigenvector of A i 1 ••• im .