Implementing smooth functions of a Hermitian matrix on a quantum computer

We consider methods for implementing smooth functions f(A) of a sparse Hermitian matrix A on a quantum computer, and analyse a further combination of these techniques which has advantages of simplicity and resource consumption in some cases. Our construction uses the linear combination of unitaries method with Chebyshev polynomial approximations. The query complexity we obtain is  log C ϵ where ϵ is the approximation precision, and C > 0 is an upper bound on the magnitudes of the derivatives of the function f over the domain of interest. The success probability depends on the 1-norm of the Taylor series coefficients of f, the sparsity d of the matrix, and inversely on the smallest singular value of the target matrix f(A).


Introduction
There are many quantum algorithms that exhibit an advantage over known classical algorithms 3 . Such a diversity of results can make it difficult to express the computational capability of a quantum computer in a clear and faithful manner to someone new to quantum computing. Here we address a large family of quantum algorithms that are often used as the main subroutine in many important applications. In particular we consider quantum algorithms that apply some smooth function of a Hermitian matrix to an input state. Examples of algorithms of this type include Hamiltonian simulation used in a variety of applications including quantum chemistry, the Quantum Linear Systems or matrix inversion algorithm [HHL09,CKS15] used in quantum machine learning applications, and sampling from Gibbs distributions used in the recent quantum semi-definite programming (SDP) solvers [BKL + 17, AGG + 17].
We briefly outline the methods for implementing matrix functions f (A) used in these algorithms, and we further use one of them to obtain an algorithm with query complexity expressed simply in terms of properties of the matrix A and the Taylor expansion of f. Restricting to Hermitian matrices allows one to take advantage of their spectral decomposition to naturally extend results on approximating real functions to functions of matrices. Thus, for a real valued smooth function f, we look for a quantum algorithm which, when equipped with a quantum oracle for a Hermitian matrix A and a map that prepares some state xñ | , returns a state that is close to in l 2 -norm. If measurements are involved, we require the desired output state to be obtained with high probability. Among the earliest work of this kind, Klappenecker and Rötteler [KR03] studied the implementation of functions of unitary matrices, under some mild assumptions. This work embodies the same principles and ideas that were later developed into a method for Hermitian matrices in works of Berry, Childs, Kothari and collaborators. Other early work in this direction was motivated by studies on Hamiltonian simulation (for example, [SOK + 02] and [CW12]), and algorithms for quantumly solving ordinary differential equations (for example, [LO08] describe a method to implement non-linear transformations of the amplitudes of a given input state). Kothari [Kot14] gives a detailed description of probabilistic implementations of operators and technical lemmas on modified versions of amplitude amplification. Broadly, three different methods have emerged to realise the action of functions of Hermitian matrices on a quantum computer: 1. Using Hamiltonian simulation and Quantum Phase Estimation (QPE) as a technique to obtain a representation of the target state in the spectral basis of the matrix, followed by applying a suitably engineered unitary that computes the matrix function. This is the method used in the matrix inversion algorithm of [HHL09] and in the quantum recommendation systems algorithm of [KP16].
2. By representing the target matrix as a Linear Combination of Unitaries (LCU). Given an algorithm to implement each of the unitaries in the summation, the target matrix can be embedded in a unitary operation on a larger state space (adjoining the necessary ancillary qubits). This necessitates a post-selection step at the end, and so produces the required state with some associated success probability. This method has been used widely for Hamiltonian simulation and matrix inversion algorithms [CW12, BCC + 15, CKS15].
3. More recently, Low and Chuang [LC16] have introduced a method called Qubitization and Quantum Signal Processing (QSP) (also referred to as the block-encoding method). They study how to implement functions of a diagonalisable complex matrix when provided with a unitary oracle and a signal state such that the action of the unitary on the subspace flagged by the signal state is proportional to the action of the target matrix. This method is thought to have optimal query complexity and optimal ancilla requirements for a large class of functions.  [CKS15] with an approximation by Fourier series to provide a constructive approach for the implementation of bounded smooth functions of a d-sparse Hermitian matrix (specified by an oracle) on quantum computers. They give an algorithm for constructing the approximating linear combination, and their quantum algorithm has query complexity linear in the sparsity d. This method uses Hamiltonian simulation as a black-box subroutine and involves converting the Taylor series of the function into a Fourier series, through a sequence of approximation steps. Both [LC16] and [GSL + 18] also prove theorems about the classes of functions their methods can implement and describe the approach to construct the implementation for important examples like Hamiltonian simulation.
In this article, we first describe the spectral method, the LCU method, and the Qubitization method in sections 3, 4, and 5 respectively. We then give a constructive approach for implementing smooth functions of a Hermitian matrix on a quantum computer in section 6, based on the method of Chebyshev polynomial approximations used for matrix inversion in [CKS15]. The complexity of this alternative method is not directly comparable to that of [AGG + 17] but the main attraction is its simplicity. The use of Chebyshev polynomials allows the query complexity of the quantum algorithm to be directly obtained from the properties of the Taylor series expansion of f.

Preliminaries
We follow the usual model in which quantum algorithms access classical data (such as matrix entries) using a unitary oracle, and use the query complexity as a measure of efficiency. Another important quantity in the circuit model of quantum algorithms is the gate complexity, the number of 2-qubit gates used. An algorithm is gate efficient if its gate complexity is larger than the query complexity by at most poly-logarithmic factors. For details on the properties of the quantum walk construction including its gate complexity, we refer to [BC12] and [BCK15].
A matrix is d-sparse if in any row, there are at most d non-zero entries. For a d-sparse N×N matrix A, we assume we have an oracle A  which performs two functions: for all j, k ä {1, K, N}, and l ä {1, K, d}. The first line simply returns the matrix entry A jk in fixed-precision arithmetic. The second line computes the column index of the lth non-zero entry in row j, with the convention that it returns the column index of the first zero entry when there are fewer than l non-zero entries in row j. When this can be efficiently done, the matrix is said to be efficiently row-computable.
Functions of a matrix are defined through its spectral decomposition. An N×N Hermitian matrix A has N real eigenvalues λ j , with a spectral decomposition as a sum of projectors onto its eigenspaces spanned by the eigenvectors u j ñ | . A function f of such a matrix is then defined as having the same eigenspaces, but with the eigenvalues f (λ j ) For finite dimensional matrices, any two functions f and g that are equal at the N eigenvalues of the matrix give rise to the same matrix function. This suggests that it might be possible to treat matrix functions as corresponding interpolating polynomials.

The spectral method
The first of the methods mentioned in the introduction uses Hamiltonian simulation and phase estimation as algorithmic primitives. The general framework is as follows. Given some initial state yñ | , the first step is to perform Quantum Phase Estimation (QPE) on the Hamiltonian evolution under the N×N Hermitian matrix A, logically decomposing the state in the spectral basis of A, as a linear combination of product states with an ancillary register containing (approximate) eigenvalues of A normalised to [0, 1]. To this end, the first step in QPE is to simulate Hamiltonian evolution under A, which is the operation of (approximately) preparing the state e iAt yñ | for a chosen time t; for QPE, we will actually require the Hamiltonian evolution to be performed in superposition for a set of time parameters, which can be done using a control register and a conditional evolution operator of the form t t e t iAt å ñá Ä | | . At this stage the novelty lies in constructing an operation that will use the output state from the previous step to transform the probability amplitudes to the chosen function of the eigenvalues-typically a unitary conditioned on the register containing the eigenvalues. Schematically, Algo. e.g. HHL | , the j l approximate the true eigenvalues λ j , and C is a normalisation factor that is f max j j  l ( ( )). All three steps will have some associated error, and the last step will typically only succeed probabilistically. Choosing a precision ò>0, the final state f a i N j j j 1 l y å ñ = (˜) | (having uncomputed the eigenvalue register coming from QPE) can be made ò-close to f A yñ ( )| , suitably normalised. Harrow, Hassidim and Lloyd [HHL09] used this method to devise an algorithm for solving a system of linear equations Ax b =   , in a formulation called the Quantum Linear Systems Problem (QLSP). The key step is matrix inversion, corresponding to the function f (λ j )=1/λ j . Given the oracle in (1) for a Hermitian matrix A, and a state preparation map for an input vector b  , the HHL algorithm outputs a quantum state xñ  | that is ò-close in the l 2 -norm to the normalised solution vector The matrix inversion routine essentially implements conditional rotations to use phase-estimated eigenvalues to create the necessary amplitude factors. The transformation achieved by a simple conditional rotation is ] and is represented to fixed precision in the first qubit register. Cao et al [CPP + 13] use the HHL algorithm to solve the Poisson equation under some regularity assumptions, and give details of efficient quantum subroutines for the string of computations which are useful in matrix inversion. The actual function that is implemented is not quite 1/x, but a carefully chosen filter function that matches with the inverse on the domain of interest, and ensures that the error can be kept under control. The choice of filter functions and their error analysis forms a major part of the work in designing the conditional unitary that implements the desired function.

Complexity and hardness results for matrix inversion
The HHL algorithm was originally presented as a method for the efficient estimation of averages or other statistical quantities associated with the probability distribution corresponding to the normalised solution vector of a linear system of equations. The advantage of the algorithm lies in being able to prepare this probability distribution as a quantum state, and the state may then be used to sample from this distribution, or to compute x Mx When the matrix A is d-sparse and efficiently row-computable, the algorithm has query complexity d poly , 1 , Here, ò is the accuracy to which the output state approximates the normalised solution vector, and κ is the condition number of the matrix A. The condition number is given by the ratio of the largest to the smallest eigenvalues for a Hermitian matrix, and measures how invertible the matrix is: k  ¥ reflects an eigenvalue of the matrix going to zero, making it singular. The dependence of the algorithm on the condition number enters through the of filter function, which is chosen such that it matches 1/x on the domain 1, 1 1 , 1 ] and interpolates between these two pieces in [−1/κ,1/κ]. Consequently the success probability of the conditional rotation step also involves κ.
In the regime where the sparsity d N log  = ( ), the HHL algorithm can be exponentially faster than the best known classical algorithms achieving the same results. It is still unknown whether better classical algorithms exist in the slightly modified framework of QLSP, i.e., the oracular setting where the goal is to compute global or statistical properties of the solution vector.
Nevertheless, the complexity of this algorithm was shown by [HHL09] to be nearly optimal: (1) under the complexity theoretic assumption BQP PSPACE ¹ the dependence on condition number cannot be made sublinear, i.e. improved to 1 k d for any δ>0, and (2) under the assumption that BQP PP ¹ , the error dependence cannot be improved to poly log 1  ( ) . The proof is an efficient reduction from simulating a general quantum circuit to matrix inversion, establishing that matrix inversion is a BQP-complete problem, so that the existence of a classical algorithm for this problem implies the ability to classically simulate quantum mechanics efficiently, which is widely believed to be impossible. The class BQP consists of problems that have a bounded error polynomial time quantum algorithm that succeeds with a constant probability pc>1/2, while PP consists of problems with probabilistic polynomial time classical algorithms with the success probability allowed to be arbitrarily close to 1/2.
For the problem of preparing the state encoding the probability distribution corresponding to the solution vector, avoiding errors and repetitions from the sampling step, Childs, Kothari and Somma [CKS15] designed an algorithm for QLSP with exponentially improved dependence on the precision parameter ò. The key ingredient in their approach is the method of writing the target operator, A −1 in this case, as a linear combination of unitary operators. We discuss this LCU method in the next section.

The Linear Combination of Unitaries (LCU) method
One of the disadvantages in using QPE is that achieving ò-precision requires 1  ( ) uses of the matrix oracle. The LCU method offers a way to overcome this disadvantage by exploiting results from approximation theory.
The LCU method is a way to probabilistically implement an operator specified as a linear combination of unitary operators with known implementations. In essence, we construct a larger unitary matrix of which the the matrix f (A) is a sub-matrix or block. Childs and Wiebe [CW12] show how to implement a sum of two unitaries. We describe this simple case below.
Suppose A U U 0 0 1 1 a a = + . Without loss of generality α i >0, since phase factors can be absorbed into the unitaries. Consider a state preparation unitary V α which has the action where α=α 0 +α 1 . When dealing with a linear combination of more than two unitaries, there is a lot of freedom in the choice of this V α , as we will see later.
Assume that we can perform the unitaries U 0 and U 1 controlled by an ancillary qubit, i.e., that we can apply the conditional unitary U U U 0 0 1 1 Then for a state yñ | for which we want A yñ | , first attach an ancillary qubit and perform the map V α on it, followed by U, and finally uncompute the ancilla with V a † . This results in the following transformation Measuring the ancilla and getting the outcome 0 will leave behind the state A yñ | , up to normalisation. Getting a measurement outcome of 1 means the algorithm fails. The probability of failure can be easily bounded, as p fail , since the distance between two unitaries is at most 2.

Success probability and complexity
In most cases of interest, the probability of success can be increased using amplitude amplification, by repeating the procedure A  a yñ   ( | )times, when we have an estimate of the norm in the denominator. This gives a quantum algorithm that prepares the desired quantum state with constant success probability and outputs a single bit indicating whether it was successful. Indeed, when the LCU method is used to implement non-unitary operations such as in matrix inversion, a significant contribution to the complexity comes from the fact that usually the success probability is small.
This framework was investigated in detail by Berry, Childs, Kothari and coworkers. The resulting method of implementing linear combinations of unitaries makes it straightforward to translate results on approximating real functions into implementations for matrix functions: since Hermitian matrices have all real eigenvalues, we just need to find a good approximation to the real function f (x). The overall query complexity of the algorithm will depend on the 1-norm or weight α of the coefficients of the linear combination, the number of terms m, and the least eigenvalue of the matrix function f (A).
Thus, finding a new algorithm boils down to optimising the first two parameters, and getting good bounds on the eigenvalues of f (A). We also need to choose the basis functions used in the approximation in such a way that on translating the statement to matrix functions, we get unitaries which we know how to implement-for example, a fourier series in e ixt gives rise to unitaries e iAt that can be implemented via Hamiltonian simulation techniques.
[CKS15] used this method for the function f (x)=1/x to obtain improved error dependence of the algorithm for the QLSP. The dependence of the complexity of their algorithm on the precision parameter is log 1  ( ( )), an exponential improvement over the precision dependence in [HHL09]. This is achieved by carefully choosing the approximating series and its truncation, such that with m log 1   = ( ( ))terms, ò precision is achieved.
For completeness, a full description of the LCU method is included in appendix F. We now turn to the most recent method of implementing matrix functions, the method of qubitization and quantum signal processing, before returning to the LCU method in section 6.

Qubitization or the block-encoding method
Introduced by Low and Chuang [LC16,LC17], this framework subsumes and generalises the LCU method. The method can be split into two steps-qubitization and quantum signal processing. The idea behind quantum signal processing is to ask what kinds of 2×2 (block) unitaries can be obtained by iterating a given unitary operator, interleaving the iteration with single qubit rotations through different angles. Thus, picking a sequence of phases f 0 , f 1 , K, f k , we ask what class of untaries can be represented in the form for some functions P(x) and Q(x), and work out what properties these functions can have. We can also calculate what choice of phases f j give rise to a certain P and Q. This then provides a method for constructing functions of U using iteration alternating with a sequence of single qubit rotations.
Qubitization, as the name suggests, is a technique of obtaining a suitable block encoding of an input matrix A, on which quantum signal processing can then be applied. The input is an m n + ( )-qubit unitary matrix U that encodes an n-qubit normal operator A (i.e. AA A A = † † ) in its top left block. More precisely, given a signal state G G0 ñ = ñ |ˆ| that flags a subspace of the n-dimensional signal space, the input unitary U has the block form is the encoded operator (where we assume A is normalised so that A 1    ). Since normal operators have a spectral decomposition, we can write A e i l = å l q l . The goal is to use U, G, their controlled versions and conjugates to obtain a unitary W that can be expressed as a direct sum over SU(2) invariant subspaces, i.e., to 'qubitize' U: where f, g are real functions, and f  represents the sequence of phases f j . The construction corresponds to quantum signal processing when the functions f and g have opposite parity, in which case the matrix function is implemented as a 2×2 block unitary. Furthermore, the algorithms using this technique generally achieve polylogarithmic dependence on precision, and are shown to have optimal query complexity in several cases. The ancillary qubit requirement is brought down significantly since the primitive gates used are single qubit rotations, and controlled versions of the input unitaries.
The iterate W is similar to the quantum walk operator used with the Chebyshev decompositions for the LCU method. Indeed, when A is Hermitian, W is the same as that walk operator, up to a reflection. Furthermore, the unitary on the larger system consisting of input and ancilla qubits that is constructed in the LCU approach is a particular case of a block-encoding, since it achieves the same result: given A U , the operator the LCU method constructs has A/α as the top left block (with i i a a = å | |).
The optimal Hamiltonian simulation algorithm based on the qubitization method [LC16] can be used to speed up any algorithm that uses Hamiltonian simulation as a subroutine. Chakraborty, Gilyén and Jeffery [CGJ18] have applied the qubitization method, also called the block encoding method, to obtain an improved Hamiltonian simulation technique for non-sparse matrices, improved quantum algorithms for linear algebra applications such as regression, and for estimating quantities like effective resistance in networks. Furthermore, they note that results in the block-encoding framework apply also when the input is specified in the quantum data structure model (e.g. QRAM) rather than as a unitary oracle. This connection is established through a result of Kerenedis and Prakash [KP17], which shows that if a matrix A is stored in a quantum data structure such as QRAM, an approximate block encoding for it can be implemented with polylogarithmic overhead.
The spectral and LCU methods were originally developed for and apply to Hermitian matrices, taking advantage of the fact that they have a spectral decomposition with real eigenvalues. Qubitization and quantum signal processing can deal with normal operators as well, which have complex eigenvalues. Very recently, . Their extensive analysis also shows how the method achieves optimal complexity for a wide variety of quantum algorithms.
In the last three sections we have briefly looked at the methods used for implementing smooth functions of (Hermitian) matrices on a quantum computer. In the rest of this article, we analyse a certain combination of these methods, using the LCU technique with Chebyshev polynomial approximations and a quantum walk construction.

Implementing smooth functions using the LCU method
Given a Hermitian matrix A and a smooth function f : , we describe below a quantum algorithm that implements an operator proportional to f (A) for any input state yñ | . . 5 x i The circuit also outputs a flag indicating success.
Here C>0 is an upper bound on the magnitudes of the derivatives of f in (−1, 1), and μ is the eigenvalue of f (A) with the least magnitude on the domain of interest (i.e. on the spectrum of A).
In the rest of this section, we prove this theorem by constructing the algorithm that achieves the target operation. For simplicity, we focus on the case where A 1    . The case when A 1  < L   (and correspondingly the domain of f is [−Λ, Λ]) can be reduced to this case by scaling appropriately.

Chebyshev series
The circuit in theorem 1 is obtained using the LCU method. The first step is to derive an approximation for f (A) in terms of Chebyshev polynomials. The matrix functions corresponding to these Chebyshev polynomials are then performed on a quantum computer using a quantum random walk.
The Chebyshev polynomials of the first kind, denoted T n where n is the degree, are orthonormal polynomials on [−1, 1] with weight function x 1 2 1 --( ) . We collect a few relevant facts about these polynomials in appendix H. In particular, we use two nice properties of Chebyshev polynomials in the quantum algorithm. The first is that monomials x k on [−1, 1] can be exactly represented as a finite sum of Chebyshev polynomials where the coefficents are given by ) . We use these properties to write down a truncated Chebyshev series for f (x), based on the Talyor series, which will lead to a simple expression for the success probability in the LCU method.
For a smooth function f on the interval [−1, 1], consider the Taylor series about x 0 =0, (also called the Maclaurin series) is the ith Taylor coefficient, f (i) denoting the ith derivative of f. Suppose the radius of convergence of the series is some r>0. Truncating this series for some finite integer L, we get the Taylor polynomial, with truncation error bounded by Taylor's theorem for some ξ L ä (−1, 1). Let us denote by | | the 1-norm of the coefficients. If we have bounds on the derivatives of f, or if we can bound the tail of the series as for the exponential function, we can get a good bound on the truncation error. This will enable us to quantify the rate of convergence of the series, and to decide the order of truncation based on the desired precision.
A simple assumption that we can make about the derivatives of f is that they are bounded in magnitude by some known constant C, i.e.
for all i and ∀xä (−1, 1). This holds for Schwartz functions, for example. It is then a simple calculation to show that taking L C log 2  > ( )ensures that the truncation error will be smaller than ò, as long as L e 2 > , i.e. L6 4 . Now using (6) to represent the monomials x i exactly as a finite sum of Chebyshev polynomials, we obtain a truncated Chebyshev series for f. We have 4 If we instead assume the weaker condition that B i i 0 a å < = ¥ | | , and that the series converges in a (1−δ)-ball around x 0 , the corresponding truncation order is L log The probability of success in the LCU implementation depends on this sum of coefficients, and we note that rewriting the Taylor polynomial as a Chebyshev decomposition does not by itself increase the weight of the coefficients.
However, taking n steps of the quantum walk described in appendix I results in the transformation That is, the quantum walk implements the operator A d n  ( )rather than A n  ( ). To account for this, we further rewrite the series (12) as This results in an increase in the weight of the coefficients: The truncation error does not change since we are expanding around x 0 =0 and simply rescaling the coefficients and argument of the series.
Finally, when

Algorithm description and complexity
By lemma 1 (appendix F), we can implement f (A) approximately by using the linear combination of Chebyshev polynomials in equation (13), using the LCU method as described in lemma 2 (appendix F). We thus have a quantum algorithm which for an input state yñ | produces a state fñ |˜such that with the least magnitude on the domain of interest. For monotone functions, this can be estimated if a suitable upper or lower bound is known on A  . Typically, amplitude amplification is used to boost the probability of success to a constant. The simplest setting where this is possible is when a state preparation map for for the input state yñ | is available. Using lemma 2, an upper bound on the worst-case query complexity of this implementation when amplitude amplification is used to boost the probability of success is given by . In fact, f d g » L ( ). The linear factor of L comes from the fact that we need to implement the Chebyshev polynomials of degree up to L using the quantum walk. The factor γ/μ comes from using amplitude amplification to increase the success probability of obtaining the desired state. A simple lower bound on μ is , this factor can be omitted from the complexity. The implementation is invariably expensive when f (A) has eigenvalues close to zero.
Thus, if amplitude amplification is used, plugging in the expression for L gives queries to the oracle for A, and Typically, one does not know an estimate of the success probability of the method beforehand, and needs to first use some form of (coarse) amplitude estimation in order to make sure amplitude amplification does not overshoot the target. This can be avoided by using a version of amplitude amplification that uses a fixed-point search method, which converges monotonically to the target state [Gue18].
The gate complexity can be obtained by multiplying the query complexity by the gate complexity of performing one step of the quantum walk. From [BCK15], one step of the walk costs only a constant number of queries and N m log where m is the number of bits of precision used for the entries of the matrix A. The factor of m 2.5 arises from the cost of black-box state preparation for implementing the quantum walk operator, and can be reduced to nearly m ( )as recently described in [SLS + 18]. For completeness, note that the control state preparation map V in the LCU method can be constructed using the method of [GR02], since the coefficients are known.

Related work
As noted previously, [AGG + 17] provide a constructive method for implementing smooth functions of Hermitian matrices, based on transforming the Taylor series for the function into a Fourier series. The algorithm is then obtained by using the LCU method and Hamiltonian simulation to implement the Fourier components. For comparison, we quote below the results as described in theorem 40 of their paper.
[AGG + 17], theorem 40 Given the Taylor series of f about some point x f x x a x , ] we can implement a unitary U f on t+n-qubits such that for any n-qubit state yñ | , we have ( ), and A is d-sparse and accessible using an oracle as in (1), then the whole circuit can be implemented using queries to the oracle. The number of 3-qubit gates used for the circuit is larger by the polylogarithmic factor N log log We notice that the query complexity in this method depends linearly on d A || ||, in addition to the factor log 1  . In the Chebyshev method, the dependence on d and A || ||comes in through the success probability, leaving the query complexity dependent only on the properties of the Taylor series approximation.
The methods based on Fourier and Chebyshev series cannot be directly compared because they may not be usable in all situations-the Fourier method is possible whenever Hamiltonian simulation can be performed for the matrix A, while the Chebyshev method is possible only when the quantum walk in appendix I is feasible to implement. Since Hamiltonian simulation can be performed using quantum walks, the Fourier method has a wider range of applicability, in general.
The probability of success in preparing the desired state by postselection on the t-ancillary qubits is as expected in the LCU method, given by

Advantages of the Chebyshev method
In comparison to the method based on Fourier series approximations, the Chebyshev series based method described here has the following advantages.
1. Since the quantum walk exactly produces the effect of Chebyshev polynomials in A/d (apart from a choice of precision in representing the entries of the matrix), and polynomials have exact representations as a finite sum of Chebyshev polynomials, we can exactly implement polynomial functions of a matrix. In the Fourier series based approach, the series truncation adds another layer to the error in the approximation of polynomials. The simplicity of the analysis also makes it apparent that the Chebyshev method could be more suitable for applications involving polynomial functions, such as iterative methods that use Krylov subspaces.
2. Leaving out amplitude amplification, the query complexity depends only on the degree of the approximating polynomial. This isolation of the dependence of the complexity on the norm of the matrix and its sparsity into the success probability could be useful in studying lower bounds on the query complexity for different matrix functions.
3. Methods based on quantum walks extend to non-sparse matrices, since they do not depend on row computability [BC12]. The complexity will generally be worse, however.
4. The classical calculation of the coefficients is particularly simple for the Chebyshev series.
There are, of course, both advantages and disadvantages of using any method. Chebyshev polynomial implementation uses quantum walk methods, and the construction of the walk requires a doubling of the input space, i.e., n ( ) ancillary qubits. Furthermore, the rescaling of the Taylor series coefficients means that machine errors due to fixed-precision representation are magnified. To work at precision ò, we need to use B log 2  W( ( )) bits for the matrix entries.

Special function classes
The main difficulty in the approach we have described is the scaling up of the weight of the coefficients in the Taylor series approximation due to the fact that the Chebyshev polynomials obtained from the quantum walk are in A/d rather than A. This affects the success probability, potentially necessitating many rounds of repetition or amplitude amplification. The step that requires this rescaling is the reconstruction of f (x) using an approximation to f (x/d) for a fixed d>1.

Polynomials
For the special case of monomials, which are homogeneous functions, we simply use the Chebyshev decomposition (6), which is exact. This eliminates the precision parameter ò. For f (x)=x k on [−1, 1], the query complexity is k ( ), or if amplitude amplification is used, kd k || || is the least eigenvalue of A (where we assume A does not have 0 as an eigenvalue). For any polynomial of degree k, the complexity is the same to leading order, since the highest degree Chebyshev polynomial comes from the highest degree term in the polynomial. Computing matrix polynomials may find use in iterative methods in numerical linear algebra. These methods typically proceed by }starting with an initial guess v 0  , and iteratively improving it. Patel and Priyadarsini [PP17] propose a matrix inversion algorithm using this method. However, they use a different formalism to quantumly implement the monomials.

The exponential function
The matrix exponential is an immensely important function, primarily in the form of the complex exponential e iAt that describes quantum evolution under the Hamiltonian operator A. Hamiltonian simulation is important in a variety of applications, ranging from quantum chemistry, to use as a subroutine in linear algebra and machine learning applications. This vast topic has received a lot of attention in the last two decades, so we shall not endeavour to elaborate on it here. The simple exponential e A is also an important function. For example, being able to sample from the Gibbs' state e e Tr H H --( )has been found useful in quantum algorithms for semidefinite programming [BKL + 17, AG18]. Exponentiating density matrices can be used to construct a quantum algorithm for principal component analysis [LMR13]. Matrix exponentiation is also expected to be useful in variational quantum chemistry algorithms, for example in implementing coupled cluster techniques [RBM + 17].
Using our approach, to implement the exponential e A of the Hermitian matrix A, we can first approximate e A/ d and repeat this operation d times. If A 1  || || , this leads to a query complexity of d log 1  ( ) . However, the success probability decays exponentially and thus many rounds of amplitude amplification may be required. Following the construction in section 6, we note that γ=e d to order ò, so the complexity of using amplitude amplification can only be constrained to e d ( ). Many authors have considered the problem of matrix exponentiation previously. Some have considered the problem of solving a system of linear ordinary differential equations [Ber14, BCO + 17], while others have focused on the problem of Gibbs' state preparation [CS16]. [AGG + 17] also describe an algorithm for this problem, which they use for Gibbs' sampling in quantum SDP algorithms. [PP17] give an algorithm that also uses the Chebyshev expansion of the exponential function, but their method of implementation is based on the recursion relation for Chebyshev polynomials, and uses a digital encoding of quantum states.

Conclusions
We have shown a simple calculation motivated by the approximation of real functions by Chebyshev series to illustrate the use of quantum walks with the LCU method to implement a wide variety of smooth functions. The method is particularly simple, as the approximating linear combination is obtained from a truncated Taylor series that is transformed into a Chebyshev series. While we do not present any improvements in complexity, the simplicity of the method makes it attractive from a pedagogical viewpoint.
Although the LCU method has been widely investigated over the last few years, there are still a few interesting questions related to it-for example, the only unitaries found useful so far are tensor products of Pauli operators, Fourier basis terms, and Chebyshev polynomials, because these can be implemented using known methods (Hamiltonian simulation and quantum walks). It would be interesting to study other families of unitary circuits that can be used in conjunction with this method. In particular, for a special case such as say matrix exponentiation, is it possible to systematically determine an optimal basis of unitaries? This would also be related to lower bounds on the query complexity of implementing different matrix functions.
We briefly describe below the LCU method for approximately implementing a linear combination of unitary matrices. This description is drawn from [CKS15], and refer readers to the proofs therein for more details.
First, we need to make sure that approximating a Hermitian operator C by another operator D in spectral norm ensures that they produce states C C y y ñ ñ   | | and D D y y ñ ñ   | | that are close in the Hilbert space norm, for any state yñ | .
Lemma 1 ([CKS15], proposition 9). Let C be a Hermitian operator whose eigenvalues satisfy 1  l | | , and D be an operator satisfying C D We can exactly represent the monomials x k on 1, 1 -[ ]in the basis of Chebyshev polynomials as the finite sum of terms up to degree k as follows [Tha64] x C x C 1 2 . C . 2 The real part of the exponential integrates to zero on [0, π] unless l k j 2 = -. In this case, the integral is just π, the length of the interval, and we get This makes sense, since for k even, x k is an even function and will contain only Chebyshev terms of even degree in its expansion (similarly when k is odd). The Chebyshev polynomials  all evaluate to 1 at x=1 (can be seen from x n x cos cos n  = ( ( )) ( )), and this gives us a useful property: for all integers k 0 