Reducing Computational Complexity for DBZF Precoding in xDSL Downlinks

However broad the Decomposition-based Zero-forcing (DBZF) precoder acceptance may be, reducing the computational complexity of its implementation is an absolute necessity for the VDSL networking professionals. The paper digs deeper into this problem from the perspective of matrix inversion which is inherent in the very nature of the DBZF. Eight strategies considered here differ in mode of action: four of them include matrix inversion, and four others drop implementing the procedure. While the baseline strategy itemized under No. 1 acts with the Gaussian LU-decomposition, strategy No. 2 deals with the Jordanian LU - decomposition thereby enabling mild reduction of the operation count. Strategy No. 3 works for more significant reduction as it operates with the elimination form of the inverse matrix. The most cost-cutting are strategies excluding the question of matrix inversion and replacing it by far more straightforward linear system solution, as it is in Strategy No. 4. Both cases No. 3 and No. 4 are examined in their two subcases marked (1) and (2) differing in Gaussian (1) or Jordanian (2) subversions of the same LU-decomposition. An alternative strategy No. 5 uses the least squares-based square-root-type sequential system solution, and it is the most accurate computational procedure when compared with other strategies. Strategy No. 6 is different in that it implements the QR-decomposition of the channel transition matrix H, and it is the most numerically robust one against the ill-conditioned matrix H.


Introduction
In the last two decades, communication technologies and transmission equipment are developing at an unprecedented pace to provide for both the residential and mobile access. Broadband wireline access networks offer promising and stable bandwidth to resident user premises. Their services include many modern network applications such as video-streaming, file sharing, telecommuting, online gaming, video-conferencing, and others. It became possible in the early 2000s with the invention of Very High-Speed Digital Subscriber Line (VDSL).
Although passive optical networks (PON) is the most popular access network worldwide, countries with abundant copper line resources make good use of xDSL (such as VDSL) by taking advantage of existing telephone lines. The combined technology called 'fiber-to-the-x' (FTTx), where 'x' may stand for 'N=node,' or 'C=curb,' or 'Cab=cabinet,' or 'B=building,' delivers both low deployment cost and better performance. In densely populated areas or cities, many customers are within 1.5 km of the central office (CO) or Local Exchange (LEx). In such cases, VDSL can be deployed directly from the CO. When fiber extends deeper into the network, public service carriers deploy VDSL from the optical network unit (ONU) in a configuration known as 'fiber-to-the-cabinet' (FTTCab) [1, p. 6]. In this case, PON connects the optical signal to 2 the cabinet ONU, and then, telephone line or twisted pair (TP) will carry the signal to user premises using xDSL. That way, hybrid fiber-copper systems deploying VDSL in the last mile to the customer premises (CPs) make the optical network core closer to the customer, and so offer sufficient bandwidth retaining the edge over pure fiber networks as a more economical solution [2, p. 15], [3, p. 55].
The Organization for Economic Co-operation and Development publishes broadband-related information of its members every year ( Figure 1). As China is not an OECD member but owns the most broadband users in the world, China's broadband information is also put together in some charts with OECD information for reference. This information attests to the fact that in our age of high technology betting on fiber, the DSL remains an attractive choice to transport the data of high speed real time processing systems by using Public Switched Telephone Network (PSTN) with Public Old Telephone Service (POTS). However, a crucial problem in VDSL networks limiting both the data rate and reach of service is the phenomenon known as near-end crosstalk (NEXT) and far-end (FEXT) crosstalk. It is known that FEXT has more huge consequences on shorter VDSL lines: "it can dominate noise profiles" [1, p. 11]. Once the modems transmitting signals in the downstream direction are collocated at the CO, a technique of crosstalk precoding can be applied to each modem's signal before transmission. A comprehensive review of precoding techniques for digital communication systems is recently given in [4]. A near-optimal linear crosstalk precoder [5] helps to cope with this problem for wireline downstream VDSL. This precoder dates back to 2004 as can be seen from a broadcast draft paper version [5]. Thanks to a low complexity and no-need-for-additionalarithmetic at the receiver-side, it gained broader acceptance and further consideration as in [6, p. 34-35]. 3 This solution termed in work last cited as Decomposition-based Zero-forcing Precoder is producing the desired effect in crosstalk cancellation, but along with this, it poses a problem of computation complexity. The point is that the precoder computations include the operation of matrix inversion and so may have a meaningful effect on VDSL power consumption as stressed in [1, p. 20]. If VDSL modems are deployed from the ONU, which is typically located in a small curbside cabinet with no cooling or temperature control, VDSL power consumption must be very low. Besides, to fit in the ONU, VDSL line cards must also be small. For the preceding reasons, networking professionals refer to reducing the complexity of big matrices inversion as a 'perennial, evergreen topic' [7, pp. 39, 55-70].
The paper investigates the problem of computational complexity of different algorithms related to precoder design-both including and avoiding matrix inversion. Section 2 recalls mind to the subject. Section 3 gives two formal answers to the question of how to obtain the soughtfor solution for the problem above formulated regarding inverse matrix A −1 intended to be a precoder P . Section 4 presents the baseline solution to the system Ax ′ = (Gx) using any one of three forms of LU -decomposition for matrix A. Section 5 describes the Jordanian version of LU -decomposition for A with the intention of computing A −1 as precoder P . Section 6 uses the baseline LU -decomposition to obtain the elimination form of A −1 . Section 7 proposes an off-the-beaten-path LU -decomposition based solution for precoding without direct computing A −1 . Section 8 introduces another non-mainstream alternative in the form of least squaresbased square root sequential system solution to avoid A −1 . Section 9 comes up with solution to the system Ax ′ = (Gx) by way of QR-decomposition for channel matrix H to prevent explicit calculation of A −1 formally meant to be the precoder P . Section 10 makes tradeoff study from the perspective of matrix inversion in precoder design. Section 11 contains the obtained results of numerical experiments conducted in MATLAB. The last section concludes the paper with some recommendation for the network service management.
Throughout this paper we use the following notations: • the complex n vector space by C n and the complex m × n matrix space by C m×n , • the real n vector space by R n and the real m × n matrix space by R m×n , • n-vectors of the same dimension n by lowercase latin letters, • n × n matrices by uppercase latin letters, • scalars by small Greek letters.

Channel Model and DMT Transmission
Because of the below reasons VDSL systems are treated as Multiple Input Multiple Output (MIMO) systems.
(i) VDSL system is DMT (Discrete Multi-tone Transmission)-based. In this method, the allocated frequency band (or channel) is separated into many frequency subbands (or subchannels as they are specified at times). DMT uses the fast Fourier transform (FFT) algorithm for signal modulation (before transmission) and demodulation (at receiving side). The transmission process runs on each tone k, i. e., on the k-th carrier frequency f in the k-th subchannel, k = 1, . . . , K ( Figure 2). (ii) Modulated data are passed to N users in parallel through N twisted pairs. As a general rule, N individual TPs are grouped in binder-groups of 4 to 10 cables, and 50 to 100 TPs are bundled together into a cable. So, N may be from two to ten hundred [8, p. 5], and thus the FEXT is rendered the most dangerous phenomenon ( Figure 3).
The k-th subchannel is modeled by the k-th complex valued channel transfer matrix Figure 2). The n-th diagonal elements of H k correspond to the direct channel coefficients of the different TPs and describe the impact of  the direct channel of user n on his transmit signal. The off-diagonal elements correspond to the crosstalk interference contributions and are the crosstalk coefficients.
Assuming that the modems are synchronized, and DMT modulation is employed, one can model transmission independently on each tone by the relation (Figure 4, a) where v k is the additive noise on tone k. It is comprised of thermal noise, alien crosstalk, radio frequency interference (RFI), etc. [5], and is frequently modeled as an additive white Gaussian noise (AWGN).

Signal Transmission Modification with Gain G and Precoder P
To control the transmit power spectral density (PSD) of n user on tone k, which is denoted Then the resulting vector (G k x k ) is pre-disturbed by the precoder matrix P k ∈ C N ×N to obtain the channel input signal  is to be specific for every tone k, while the tones number into thousands (K = 2048 as is in a typical case or may reach 4096). So, relation (1) is replaced (Figure 4, b) by Question: • How to design Precoder P k in order to cancel crosstalk in y ′ k , i. e., to ensure an element-wise relation between y ′ k (the received vector) and x k (the transmit vector)? In other words, we need: One formal ("the-beaten-path") solution: This relation means the decomposition: , as it is shown in the above Figure 4, b to obtain at the receiving side Steps from to in the above list are presented to substantiate the known answer to the above question: ∈ C N ×N whose row and column indices n and m for entries h n,m k run the range 1, . . . , N , premultiplied by matrix F k . Such decomposing matrix H k into F −1 k × A k and precoding only with P k leads to a high transmission overhead of the CO/ONU due to the increased computational complexity if only matrix P k is to be precomputed in explicit form by inverting matrix A k .
Another formal (slightly modified) solution: Find Q k and R k resulting in H k = Q k R k by one of the left-side orthogonal transforms T k Q T k to obtain R k = T k H k (Householder or Givens or GramSchmidt) [10, pp. 107-140]. Find With such P k , produce x ′ k = P k (G k x k ) on entering the channel H k -as it is shown in the above Figure 4, b-to yield the receiving side vector: What is different in the second solution is that steps and lean heavily on the orthogonal Q k R k technique, and what makes it formally equal to the first one is that P k H −1 k F −1 k and A k F k Q k R k are the same, whilst calculated differently.
Below for the sake of simplicity, we omit index k and compare different execution strategies for obtaining the channel input signal x ′ k satisfying A k x ′ k = (G k x k ). Note: We use the same symbol Σ i with subindex i to designate the i-th strategy and its complexity understood as the total multiplication/division count in it. Calculations for Section 4 to Section 9 are rigorously substantiated in [9] by summing finite series of positive numbers.

Baseline solution Σ 1 : Gaussian LU -decomposition followed by the forward and backward substitutions to designate A −1 as precoder P
The LU -decomposition of N × N matrix A is well known and may be performed by a variety of ways [10, pp. 27-81, 117-120, 137]. Take for consideraton: (a) Gauss column sweep algorithm; (b) Crout's reduction algorithm; and (c) bordering algorithm, in each case matrix L being lower triangular, matrix U unit diagonal upper triangular, and non-trivial elements of L and U overwrite the given matrix A.

+ Σ
Step 2(b) 4 = N 2 . Consequently, strategy Σ 4 has the following complexity: 8. Alternative Σ 5 : Least squares-based square root sequential system solution to avoid finding and designating A −1 as precoder P As a test specimen, let us take Potter's square root least squares (LS) algorithm [10, pp. 250-251]. It avoids matrix inversion by means of row-by-row matrix processing while solving Ax ′ = (Gx).
Modelling on the LS-algorithm [10, p. 250]) and utilizing complex conjugate matrix transposition (where needed), we obtain: I. Initialization. Initial values: LS-estimator x 0 and its covariance P 0 . A'priori data is lacking, it means x 0 = 0 and with as possible small ε, theoretically ε → 0.
: a a T n is the n-th row of A (viewed as a column) and z z n is the n-th item of (Gx) in system Ax ′ = (Gx). Cycle on n = 1, . . . , N : III. Propagating the instantaneous solution estimator to the next matrix row for Item II repetition:S :=Ŝ ,x :=x .
On exit from Item III after having n = N at Item II, one obtainsx as the desired solution x ′ for system Ax ′ = (Gx).
Counting shows that strategy Σ 5 has the following complexity: 9. Alternative Σ 6 : Channel QR-decomposition based system solution to avoid finding and designating A −1 as precoder P This algorithm comprises, in its most streamlined layout, three steps as follows.
Step 2: is a scratch array. Prior to action,Ȳ : . Let T be the Householder transformation. To present its pseudocode more compactly, use notation0 N −k for any zero-valued vector of dimension N − k: Step 3: Solving system Rx ′ = b requires Σ Step 3 6 = N (N +1) 2 m/ds. As measured in m/ds, strategy Σ 6 has the following complexity:

Complexity: Implementation tradeoff analysis in precoder design from the perspective of matrix inversion
For the six strategies considered, we have relations (4), (5), (6), (7), (8) , and (9) to characterize their complexity. To intercompare them, we introduce a complexity trend index associated with passing from Σ i to Σ j : ∆ ij Σ i − Σ j . We also propose the limit relative indices δ ij lim N →∞ Σ i /Σ j obtained for N → ∞ when moving from Σ i to Σ j , and sum up the exact results in Table 1.

Numerical Experiments
Keeping in mind the necessity of expanded analysis, we record a fact to be used later, namely that we rename Σ 3 as Σ 3(1) and Σ 4 as Σ 4 (1) . In parallel with them, we will test two more versions appropriately labeled as Σ 3(2) and Σ 4(2) which are different in that they make use of Jordanian LU -decomposition instead of Gaussian one. Thus, we test the following seven strategies for finding the desired solution x ′ : • Σ 1 : Gaussian LU -decomposition followed by the forward and backward substitutions to designate A −1 as precoder P ; • Σ 2 : Jordanian LU -decomposition followed by the forward substitution only to designate A −1 as precoder P • Σ 3(1) : Gaussian LU -decomposition followed by finding the elimination form of A −1 to designate it as precoder P ; • Σ 3(2) : Jordanian LU -decomposition followed by finding the elimination form of A −1 to designate it as precoder P ; • Σ 4(1) : Gaussian LU -decomposition followed by system solution to avoid finding and designating A −1 as precoder P ; • Σ 4(2) : Jordanian LU -decomposition followed by system solution to avoid finding and designating A −1 as precoder P ; • Σ 5 : Least squares-based square root sequential system solution to avoid finding and designating A −1 as precoder P ; • Σ 6 : Channel QR-decomposition based system solution to avoid finding and designating A −1 as precoder P .
The first four strategies suppose that having A −1 found, we designate the productx ′ = P (Gx)-with P := A −1 -as the desired solution x ′ . In the four last-named strategies,x ′ is the result of the system Ax ′ = (Gx) solving phase. Apart from method delivering the solutioñ x ′ , it is essential to know the estimated accuracy e x ′ − x ′ . Because the precise meaning x ′ is to be supposed unknown, all that remains is to verify the residual r Ax ′ − (Gx). As is evident, A −1 r = e. Considering that matrices A for the downstream VDSL channels would 10 possess the property of row-wise diagonal dominance [5, p. 860]-and so they are meant to be well conditioned-one may with good reason estimate accuracy by ||r|| ∞ = ||Ax ′ − (Gx)|| ∞ for all afore-mentioned strategies.
We have implemented eight m-functions in MATLAB. Using these implementations, we conducted computational experiments with the set of test matrices H k ∈ C N ×N on one arbitrarily selected tone k with N = 20, 40, 80, . . . , 800. For each matrix H k and given complex vector (Gx), we saved the computed solutionx ′ and the execution time (in the sec) and the accuracy of computations ||r|| ∞ . Figure 5 and Table 2 show the obtained results. One can see from there that the new proposed strategy Σ 5 has the almost minimal execution time and the best accuracy of computations for all test matrices. As for Σ 6 , its accuracy did not go into Table 2 because it is in the mould of Σ 5 's accuracy. That gives rise to a suggestion that Σ 5 is a numerically efficient method for practical applications concerning both of computational complexity and accuracy.
It is of the primary interest for network carriers management. • The fifth strategy Σ 5 seems a bit unusual or even 'exotic.' As N tends to infinity, it imposes a thrice as much computational load as compared to Σ 1 . Nevertheless, our MATLAB experiments show that it is the most accurate and fast computational procedure when compared with other strategies. While Σ 5 is affording good accuracy due to operating with square roots of matrix A T A that appears in the LS normal equations, the MATLAB quickness indicates that Σ 5 has coding tricks in store for programmers. In that context, this new proposed strategy Σ 5 can be useful when hardware implemented. • Although Σ 6 requires two times more computations than Σ 4 , it is attractive due to its high numerical robustness against the ill-conditioned channel transfer matrix H.